date:20111208

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-08 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 8:54 PM, Eric Paris  wrote:
> On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
>> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>
>> > That would seem like the logical approach. I think there may be new mode 2
>> > patches coming soon so we can see how they go over.
>>
>> I'd like to see what the whitelist would need to be for something like QEMU 
>> in
>> mode 2.  My biggest concern is that the whitelist would need to be so large 
>> that
>> the practical security what's all that much improved.
>
> When I prototyped my version of seccomp v2 for qemu a while back I did
> it by only looking at syscalls after inital setup was completed (aka the
> very last thing before main_loop() was to lock it down).  My list was
> much sorter than even these:
>
> +        __NR_brk,
> +        __NR_close,
> +        __NR_exit_group,
> +        __NR_futex,
> +        __NR_ioctl,
> +        __NR_madvise,
> +        __NR_mmap,
> +        __NR_munmap,
> +        __NR_read,
> +        __NR_recvfrom,
> +        __NR_recvmsg,
> +        __NR_rt_sigaction,
> +        __NR_select,
> +        __NR_sendto,
> +        __NR_tgkill,
> +        __NR_timer_delete,
> +        __NR_timer_gettime,
> +        __NR_timer_settime,
> +        __NR_write,
> +        __NR_writev,
>
> There is simple obvious negligible overhead value here, but every
> proposal I know of, including mine, has been shot down by Ingo.  Ingo's
> proposal is much more work, more overhead, but clearly more flexible.
> His suggestions (and code based on those suggestions from others) has
> been shot down by PeterZ.
>
> So I feel like seccomp v2 is between a rock and a hard place.  We have
> something that works really well, and could be a huge win for all sorts
> of programs, but we don't seem to be able to get anything past the
> damned if you do, damned if you don't nak's.
>
> (There's also a cgroup version of seccomp proposed, but I'm guessing it
> will go just about as far as all the other versions)

Still, these sorts of situations are overcome all the time.  Sometimes
it takes a while and several LWN.net articles about the drama but at
the end things can be worked out.

If we want to discuss the specifics of mode 2 and especially what Ingo
or Peter think then I think we should do it in a forum where they
participate.  Maybe their positions have changed.

Stefan

Re: [Qemu-devel] [PATCH 1/1 V6] qemu-kvm: fix improper nmi emulation

2011-12-08 Thread Jan Kiszka

On 2011-12-07 11:29, Avi Kivity wrote:
> On 10/17/2011 06:00 PM, Lai Jiangshan wrote:
>> From: Lai Jiangshan 
>>
>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>> button event happens. This doesn't properly emulate real hardware on
>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>> the processor even when LINT1 is maskied in LVT. For example, this
>> causes the problem that kdump initiated by NMI sometimes doesn't work
>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>
>> With this patch, inject-nmi request is handled as follows.
>>
>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>   interrupt.
>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>
>> Changed from old version:
>>   re-implement it by the Jan's suggestion.
>>   fix the race found by Jan.
> 
> This patch fell through the cracks, sorry.  Now applied.

Lai, what is the state of a corresponding QEMU upstream patch? I'd like
to build on top of it for my upstream irqchip series.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-08 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 7:32 PM, Corey Bryant  wrote:
>
>
> On 12/07/2011 01:48 PM, Anthony Liguori wrote:
>>
>> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>>> * The trusted helper thread would run beside the untrusted thread,
>>> enabling the untrusted thread to make syscalls beyond read(),
>>> write(), exit(), and sigreturn().
>>
>>
>> I assume you mean process, not thread BTW?
>>
>
> I do mean thread.  When making calls on behalf of the seccomp'd thread, I
> think there will be syscalls that must be called from the same address
> space.  That's where the the trusted helper thread would come into play.

It's worth pointing out that "isolation within the same process"
schemes work by running the trusted thread in a very special execution
environment.  It cannot trust memory and cannot use the stack for
control flow.  Everything must be done in registers.

This can be made to work but it's highly unportable across host
architectures and hard to make changes to the trusted helper because
you have to be so careful.

Stefan

Re: [Qemu-devel] an error while loading checkpoint

2011-12-08 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 11:41 PM, sparsh mittal  wrote:
> I get this warning while using a checkpoint. I issued the command -m 4G
> -icount 0 and other parameters.
> Unknown savevm section type 32
> qemu-system-x86_64: Error -22 while loading VM state
>
> My computer has 48GB RAM, so I don't know what is the problem?

Looks like the stream has gotten out of sync.

Please post the full QEMU command-line as well as any monitor commands
you issued so it is possible to reproduce this.

Stefan

Re: [Qemu-devel] Insane virtio-serial semantics

2011-12-08 Thread Markus Armbruster

Anthony Liguori  writes:

> On 12/07/2011 01:44 PM, Michael Roth wrote:
>> On 12/07/2011 07:49 AM, Anthony Liguori wrote:
>>> On 12/07/2011 02:21 AM, Markus Armbruster wrote:
 Anthony Liguori writes:
[...]
> They have the same purpose (which are both vague TBH). The only
> reason I'm sympathetic to this device is that virtio-serial has such
> insane semantics.

 Could you summarize what's wrong? Is it fixable?
>>>
>>> I don't think so as it's part of the userspace ABI now.
>>>
>>> Mike, please help me make sure I get this all right. A normal
>>> file/socket has the following guest semantics:
>>>
>>> 1) When a disconnect occurs, you will receive a return of '0' or -EPIPE
>>> depending on the platform. The fd is now unusable and you must
>>> close/reopen.
>>>
>>> 2) You can setup SIGIO/SIGPIPE to fire off whenever a file descriptor
>>> becomes readable/writable.
>>>
>>> virtio serial has the following semantics:
>>>
>>> 1) When a disconnect occurs, if you read() you will receive an -EPIPE.
>>>
>>> 2) However, if a reconnect occurs before you issue your read(), the read
>>> will complete with no indication that a disconnect occurred.
>>>
>>> 3) This makes it impossible to determine whether a disconnect has
>>> occurred which makes it very hard to reset your protocol stream. To deal
>>> with this, virtio-serial can issue a SIGIO signal upon disconnect.
>>>
>>> 4) Signals are asynchronous, so a reconnect may have occurred by the
>>> time you get the SIGIO signal. It's unclear that you can do anything
>>> useful with this.
>>
>> That about sums it up. There was a thread about this a while back where there
>> was some tentative agreement on a way to fix this by introducing QEMU flags 
>> that
>> invoke similar semantics to unix sockets:
>>
>> http://thread.gmane.org/gmane.comp.emulators.qemu/94721/focus=95496
>>
>> But at this point we'd need to re-visit, since there's a fair number of
>> virtio-serial users now. It'd probably need to be something you could switch 
>> on
>> from the guest via an fcntl() or something.
>>
>>>
>>> So besides overloading the meaning of SIGIO, there's really no way to
>>> figure out in the guest when a reconnect has occurred. To deal with this
>>> in qemu-ga, we actually only allow 7-bit data transfers and use the 8th
>>> bit as an in-band message to tell the guest that a reset has occurred.
>>
>> Yup, it's not perfect though, due to a delayed/spurious response from an 
>> agent
>> that sent data before it read/handled the reset sequence. We don't get that
>> problem with unix sockets since they'd get an -EPIPE and would be blocked 
>> from
>> sending to a newly opened session.
>>
>> We try to account for this on the host by following up a reset sequences will
>> the guest-sync RPC, which contains a unique ID that the guest echos back to 
>> us.
>> That way we can throw away stale data on the host until we get the intended
>> response. In our case, it's not quite perfect since if the agent sent a "{"
>> before getting reset, subsequent transmission of the guest-sync response can 
>> be
>> lost. We'd need to precede responses to guest-sync with a 0xFF as well, so 
>> that
>> the host flushes it's rcv buffer/parser state...
>>
>> And, somewhat off-topic, but none of addresses the case where an agent hangs 
>> on
>> an RPC. This would require some additional handling by the agent side where 
>> we
>> might have tie some additional action to the 0xFF sequence.
>>
>> Previously this scenario was handled by a hard-coded timeout mechanism in the
>> agent, with a seperate thread handling the RPCs, but we've since dropped the
>> thread due to potential for memory leaks (with plans to re-introduce using a
>> child process).
>>
>> client-induced resets would be much nicer though, and a reserved byte is the
>> best solution we've been able to come up with given the current virtio-serial
>> semantics.
>
> Yeah, we really need a "sane reset semantics" flag for virtio-serial
> that provides a guest and host initiated channel close mechanism.
>
> I think you need to do this by using a single ring and using a simple
> session id with an explicit open/close message.  That way there is
> never ambiguity.

So it is fixable.

> And yes, I can't help but think of Dave Millers comments long ago that
> any PV transport is eventually going to reinvent TCP, poorly.

No doubt then, no doubt now.  But if I remember correctly, we didn't
create virtio-serial because we thought we could do better than TCP/IP.
We thought we need a zero-config communication channel that doesn't
interfere in any way with the guest's networking.  Since the network
folks were unwilling to give us one ("use TCP already"), we looked for
another bare metal thing to imitate, and chose serial lines.

Now, comparing serial lines to TCP/IP makes no sense.  Different layers.

Layering a real network protocol on top of serial line is possible; SLIP
exists.  But as long as we insist on "don't interfere in any way with
the

Re: [Qemu-devel] [PATCH] use pci macro in virtio

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 5:49 AM, hkran  wrote:
>
> Signed-off-by: hkran 

Anthony asked that you put your real name in the Signed-off-by.  You
don't need to resend the patch, please just reply to this mail with:
Signed-off-by: $YOUR_NAME 

Thanks,
Stefan

Re: [Qemu-devel] [PATCH 1/1 V6] qemu-kvm: fix improper nmi emulation

2011-12-08 Thread Jan Kiszka

On 2011-12-08 10:42, Jan Kiszka wrote:
> On 2011-12-07 11:29, Avi Kivity wrote:
>> On 10/17/2011 06:00 PM, Lai Jiangshan wrote:
>>> From: Lai Jiangshan 
>>>
>>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>>> button event happens. This doesn't properly emulate real hardware on
>>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>>> the processor even when LINT1 is maskied in LVT. For example, this
>>> causes the problem that kdump initiated by NMI sometimes doesn't work
>>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>>
>>> With this patch, inject-nmi request is handled as follows.
>>>
>>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>>   interrupt.
>>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>>
>>> Changed from old version:
>>>   re-implement it by the Jan's suggestion.
>>>   fix the race found by Jan.
>>
>> This patch fell through the cracks, sorry.  Now applied.
> 
> Lai, what is the state of a corresponding QEMU upstream patch? I'd like
> to build on top of it for my upstream irqchip series.

Never mind, I'll include a patch in my series as it requires some
tweaking to the APIC backend concept.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] qemu Website down?

2011-12-08 Thread Birk Bremer

Hi,

sorry for being off topic but I have no I idea how to reach somebody
otherwise...

Since yesterday the Website http://www.qemu.org seems to be down
(rejecting connections).
I tried it from different IP addresses from different computers, but no
luck.
Would somebody please forward this to the Server administrator?

I browsed an old Version of the Website through the Internet Archive and
did not find any other means of contact than this, so sorry again for
being off topic.

Best Regards

Birk

[Qemu-devel] [PATCH v2] Documentation: Add qemu-img -t parameter in man page

2011-12-08 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 qemu-img-cmds.hx |6 +++---
 qemu-img.texi|   10 +++---
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 4be00a5..49dce7c 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -24,13 +24,13 @@ ETEXI
 DEF("commit", img_commit,
 "commit [-f fmt] [-t cache] filename")
 STEXI
-@item commit [-f @var{fmt}] @var{filename}
+@item commit [-f @var{fmt}] [-t @var{cache}] @var{filename}
 ETEXI
 
 DEF("convert", img_convert,
 "convert [-c] [-p] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s 
snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename")
 STEXI
-@item convert [-c] [-p] [-f @var{fmt}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] 
[-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 ETEXI
 
 DEF("info", img_info,
@@ -48,7 +48,7 @@ ETEXI
 DEF("rebase", img_rebase,
 "rebase [-f fmt] [-t cache] [-p] [-u] -b backing_file [-F backing_fmt] 
filename")
 STEXI
-@item rebase [-f @var{fmt}] [-p] [-u] -b @var{backing_file} [-F 
@var{backing_fmt}] @var{filename}
+@item rebase [-f @var{fmt}] [-t @var{cache}] [-p] [-u] -b @var{backing_file} 
[-F @var{backing_fmt}] @var{filename}
 ETEXI
 
 DEF("resize", img_resize,
diff --git a/qemu-img.texi b/qemu-img.texi
index 70fa321..b2ca3a5 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -45,6 +45,10 @@ indicates the consecutive number of bytes that must contain 
only zeros
 for qemu-img to create a sparse image during conversion. This value is rounded
 down to the nearest 512 bytes. You may use the common size suffixes like
 @code{k} for kilobytes.
+@item -t @var{cache}
+specifies the cache mode that should be used with the (destination) file. See
+the documentation of the emulator's @code{-drive cache=...} option for allowed
+values.
 @end table
 
 Parameters to snapshot subcommand:
@@ -87,11 +91,11 @@ this case. @var{backing_file} will never be modified unless 
you use the
 The size can also be specified using the @var{size} option with @code{-o},
 it doesn't need to be specified separately in this case.
 
-@item commit [-f @var{fmt}] @var{filename}
+@item commit [-f @var{fmt}] [-t @var{cache}] @var{filename}
 
 Commit the changes recorded in @var{filename} in its base image.
 
-@item convert [-c] [-p] [-f @var{fmt}] [-O @var{output_fmt}] [-o 
@var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} 
[@var{filename2} [...]] @var{output_filename}
+@item convert [-c] [-p] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] 
[-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] 
@var{filename} [@var{filename2} [...]] @var{output_filename}
 
 Convert the disk image @var{filename} or a snapshot @var{snapshot_name} to 
disk image @var{output_filename}
 using format @var{output_fmt}. It can be optionally compressed (@code{-c}
@@ -121,7 +125,7 @@ they are displayed too.
 
 List, apply, create or delete snapshots in image @var{filename}.
 
-@item rebase [-f @var{fmt}] [-p] [-u] -b @var{backing_file} [-F 
@var{backing_fmt}] @var{filename}
+@item rebase [-f @var{fmt}] [-t @var{cache}] [-p] [-u] -b @var{backing_file} 
[-F @var{backing_fmt}] @var{filename}
 
 Changes the backing file of an image. Only the formats @code{qcow2} and
 @code{qed} support changing the backing file.
-- 
1.7.6.4

Re: [Qemu-devel] [PATCH 00/14] ARM: Samsung S5PC210-based boards support.

2011-12-08 Thread Peter Maydell

On 7 December 2011 10:45, Dmitry Solodkiy  wrote:
> Dear Peter,
>
>  Orion, s5pc210 and Exynos4210 are aliases for the same hardware chip.
>  We decided that s5pc210 is more informative than exynos4 (there are
> many hardware models started with Exynos4...) or Orion(obsolete one).

Hmm. You know the SoC better than I do, but I'm a bit dubious
about using a different naming scheme than the kernel does.
u-boot is currently working on a renaming too (eg patches being
submitted: http://patchwork.ozlabs.org/patch/129922/)

-- PMM

Re: [Qemu-devel] [PATCH v2] Documentation: Add qemu-img -t parameter in man page

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 10:57 AM, Kevin Wolf  wrote:
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-img-cmds.hx |    6 +++---
>  qemu-img.texi    |   10 +++---
>  2 files changed, 10 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi

Re: [Qemu-devel] [PATCH] block/cow : return real error code in cow.c

2011-12-08 Thread Kevin Wolf

Am 08.12.2011 06:40, schrieb Li Zhi Hui:
> Signed-off-by: Li Zhi Hui 
> ---
>  block/cow.c |   22 --
>  1 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/block/cow.c b/block/cow.c
> index 3c52735..383482b 100644
> --- a/block/cow.c
> +++ b/block/cow.c
> @@ -64,10 +64,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>  struct cow_header_v2 cow_header;
>  int bitmap_size;
>  int64_t size;
> +int ret;
>  
>  /* see if it is a cow image */
> -if (bdrv_pread(bs->file, 0, &cow_header, sizeof(cow_header)) !=
> -sizeof(cow_header)) {
> +ret = bdrv_pread(bs->file, 0, &cow_header, sizeof(cow_header));
> +if (ret < 0) {
>  goto fail;
>  }
>  

More context:

if (be32_to_cpu(cow_header.magic) != COW_MAGIC ||
be32_to_cpu(cow_header.version) != COW_VERSION) {
goto fail;
}

This jumps to fail: with uninitialised ret. Needs to be fixed.

> @@ -88,7 +89,7 @@ static int cow_open(BlockDriverState *bs, int flags)
>  qemu_co_mutex_init(&s->lock);
>  return 0;
>   fail:
> -return -1;
> +return ret;
>  }
>  
>  /*
> @@ -182,14 +183,14 @@ static int coroutine_fn cow_read(BlockDriverState *bs, 
> int64_t sector_num,
>  ret = bdrv_pread(bs->file,
>  s->cow_sectors_offset + sector_num * 512,
>  buf, n * 512);
> -if (ret != n * 512)
> -return -1;
> +if (ret < 0)
> +return ret;

Please add braces here while touching the code.

>  } else {
>  if (bs->backing_hd) {
>  /* read from the base image */
>  ret = bdrv_read(bs->backing_hd, sector_num, buf, n);
>  if (ret < 0)
> -return -1;
> +return ret;

And here.

Otherwise the patch looks good to me.

Kevin

Re: [Qemu-devel] qemu Website down?

2011-12-08 Thread Alex Bradbury

On 8 December 2011 10:29, Birk Bremer  wrote:
> Hi,
>
> sorry for being off topic but I have no I idea how to reach somebody
> otherwise...
>
> Since yesterday the Website http://www.qemu.org seems to be down
> (rejecting connections).
> I tried it from different IP addresses from different computers, but no
> luck.
> Would somebody please forward this to the Server administrator?

It's a known issue. See:
https://plus.google.com/u/0/101344524535025574253/posts/cfVrcVDHSzV

Alex

Re: [Qemu-devel] qemu Website down?

2011-12-08 Thread Birk Bremer

Ok,

this information did not turn up in a google search when searching for
"qemu website down" 
Thanks & best regards

Birk



Am 08.12.11 12:28, schrieb Alex Bradbury:
> On 8 December 2011 10:29, Birk Bremer  wrote:
>> Hi,
>>
>> sorry for being off topic but I have no I idea how to reach somebody
>> otherwise...
>>
>> Since yesterday the Website http://www.qemu.org seems to be down
>> (rejecting connections).
>> I tried it from different IP addresses from different computers, but no
>> luck.
>> Would somebody please forward this to the Server administrator?
> It's a known issue. See:
> https://plus.google.com/u/0/101344524535025574253/posts/cfVrcVDHSzV
>
> Alex
>

[Qemu-devel] [PATCH v4 15/15] kvm: Arm in-kernel irqchip support

2011-12-08 Thread Jan Kiszka

Make the basic in-kernel irqchip support selectable via
-machine ...,kernel_irqchip=on. Leave it off by default until it can
fully replace user space models.

Signed-off-by: Jan Kiszka 
---
 qemu-config.c   |4 
 qemu-options.hx |5 -
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/qemu-config.c b/qemu-config.c
index 597d7e1..a761bea 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -490,6 +490,10 @@ static QemuOptsList qemu_machine_opts = {
 .name = "accel",
 .type = QEMU_OPT_STRING,
 .help = "accelerator list",
+}, {
+.name = "kernel_irqchip",
+.type = QEMU_OPT_BOOL,
+.help = "use KVM in-kernel irqchip",
 },
 { /* End of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 681eaf1..60b7dc0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "-machine [type=]name[,prop[=value][,...]]\n"
 "selects emulated machine (-machine ? for list)\n"
 "property accel=accel1[:accel2[:...]] selects 
accelerator\n"
-"supported accelerators are kvm, xen, tcg (default: 
tcg)\n",
+"supported accelerators are kvm, xen, tcg (default: tcg)\n"
+"kernel_irqchip=on|off controls accelerated irqchip 
support\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -44,6 +45,8 @@ This is used to enable an accelerator. Depending on the 
target architecture,
 kvm, xen, or tcg can be available. By default, tcg is used. If there is more
 than one accelerator specified, the next one is used if the previous one fails
 to initialize.
+@item kernel_irqchip=on|off
+Enables in-kernel irqchip support for the chosen accelerator when available.
 @end table
 ETEXI
 
-- 
1.7.3.4

[Qemu-devel] [PATCH v4 08/15] ioapic: Introduce backend/frontend infrastructure for KVM reuse

2011-12-08 Thread Jan Kiszka

Split up the IOAPIC analogously to APIC and i8259. KVM will share the
device description, reset logic and certain init parts with the user
space model.

Signed-off-by: Jan Kiszka 
---
 Makefile.target  |2 +-
 hw/ioapic.c  |  130 ---
 hw/ioapic_common.c   |  137 ++
 hw/ioapic_internal.h |  105 ++
 hw/pc_piix.c |1 +
 5 files changed, 254 insertions(+), 121 deletions(-)
 create mode 100644 hw/ioapic_common.c
 create mode 100644 hw/ioapic_internal.h

diff --git a/Makefile.target b/Makefile.target
index c46f062..b549988 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic_common.o ioapic.o 
piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 27b07c6..2db72e0 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -24,9 +24,7 @@
 #include "pc.h"
 #include "apic.h"
 #include "ioapic.h"
-#include "qemu-timer.h"
-#include "host-utils.h"
-#include "sysbus.h"
+#include "ioapic_internal.h"
 
 //#define DEBUG_IOAPIC
 
@@ -37,62 +35,6 @@
 #define DPRINTF(fmt, ...)
 #endif
 
-#define MAX_IOAPICS 1
-
-#define IOAPIC_VERSION  0x11
-
-#define IOAPIC_LVT_DEST_SHIFT   56
-#define IOAPIC_LVT_MASKED_SHIFT 16
-#define IOAPIC_LVT_TRIGGER_MODE_SHIFT   15
-#define IOAPIC_LVT_REMOTE_IRR_SHIFT 14
-#define IOAPIC_LVT_POLARITY_SHIFT   13
-#define IOAPIC_LVT_DELIV_STATUS_SHIFT   12
-#define IOAPIC_LVT_DEST_MODE_SHIFT  11
-#define IOAPIC_LVT_DELIV_MODE_SHIFT 8
-
-#define IOAPIC_LVT_MASKED   (1 << IOAPIC_LVT_MASKED_SHIFT)
-#define IOAPIC_LVT_REMOTE_IRR   (1 << IOAPIC_LVT_REMOTE_IRR_SHIFT)
-
-#define IOAPIC_TRIGGER_EDGE 0
-#define IOAPIC_TRIGGER_LEVEL1
-
-/*io{apic,sapic} delivery mode*/
-#define IOAPIC_DM_FIXED 0x0
-#define IOAPIC_DM_LOWEST_PRIORITY   0x1
-#define IOAPIC_DM_PMI   0x2
-#define IOAPIC_DM_NMI   0x4
-#define IOAPIC_DM_INIT  0x5
-#define IOAPIC_DM_SIPI  0x6
-#define IOAPIC_DM_EXTINT0x7
-#define IOAPIC_DM_MASK  0x7
-
-#define IOAPIC_VECTOR_MASK  0xff
-
-#define IOAPIC_IOREGSEL 0x00
-#define IOAPIC_IOWIN0x10
-
-#define IOAPIC_REG_ID   0x00
-#define IOAPIC_REG_VER  0x01
-#define IOAPIC_REG_ARB  0x02
-#define IOAPIC_REG_REDTBL_BASE  0x10
-#define IOAPIC_ID   0x00
-
-#define IOAPIC_ID_SHIFT 24
-#define IOAPIC_ID_MASK  0xf
-
-#define IOAPIC_VER_ENTRIES_SHIFT16
-
-typedef struct IOAPICState IOAPICState;
-
-struct IOAPICState {
-SysBusDevice busdev;
-MemoryRegion io_memory;
-uint8_t id;
-uint8_t ioregsel;
-uint32_t irr;
-uint64_t ioredtbl[IOAPIC_NUM_PINS];
-};
-
 static IOAPICState *ioapics[MAX_IOAPICS];
 
 static void ioapic_service(IOAPICState *s)
@@ -278,83 +220,31 @@ ioapic_mem_write(void *opaque, target_phys_addr_t addr, 
uint64_t val,
 }
 }
 
-static int ioapic_post_load(void *opaque, int version_id)
-{
-IOAPICState *s = opaque;
-
-if (version_id == 1) {
-/* set sane value */
-s->irr = 0;
-}
-return 0;
-}
-
-static const VMStateDescription vmstate_ioapic = {
-.name = "ioapic",
-.version_id = 3,
-.post_load = ioapic_post_load,
-.minimum_version_id = 1,
-.minimum_version_id_old = 1,
-.fields = (VMStateField[]) {
-VMSTATE_UINT8(id, IOAPICState),
-VMSTATE_UINT8(ioregsel, IOAPICState),
-VMSTATE_UNUSED_V(2, 8), /* to account for qemu-kvm's v2 format */
-VMSTATE_UINT32_V(irr, IOAPICState, 2),
-VMSTATE_UINT64_ARRAY(ioredtbl, IOAPICState, IOAPIC_NUM_PINS),
-VMSTATE_END_OF_LIST()
-}
-};
-
-static void ioapic_reset(DeviceState *d)
-{
-IOAPICState *s = DO_UPCAST(IOAPICState, busdev.qdev, d);
-int i;
-
-s->id = 0;
-s->ioregsel = 0;
-s->irr = 0;
-for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-s->ioredtbl[i] = 1 << IOAPIC_LVT_MASKED_SHIFT;
-}
-}
-
 static const MemoryRegionOps ioapic_io_ops = {
 .read = ioapic_mem_read,
 .write = ioapic_mem_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static int ioapic_init1(SysBusDevice *dev)
+static void ioapic_backend_init(IOAPICState *s, int index)
 {
-IOAPICState *s = FROM_SYSBUS(IOAPICState, dev);
-static int ioapic_no;
-
-if (ioapic_no >= MAX_IOAPICS) {
-return -1;
-}
-
 memory

[Qemu-devel] [PATCH v4 14/15] kvm: x86: Add user space part for in-kernel IOAPIC

2011-12-08 Thread Jan Kiszka

This introduces the KVM-accelerated IOAPIC backend and extends the IRQ
routing setup by the 0->2 redirection when needed.

The IOAPIC gains a KVM-specific property that allows to define the GSI
base for injecting interrupts into the kernel model. This will allow to
disentangle PIC and IOAPIC pins for chipsets that support more
sophisticated IRQ routes than the PIIX3. So far the base is kept at 0,
i.e. PIC and IOAPIC share pins 0..15.

Signed-off-by: Jan Kiszka 
---
 Makefile.target  |2 +-
 hw/ioapic_common.c   |1 +
 hw/ioapic_internal.h |1 +
 hw/kvm/ioapic.c  |  101 ++
 hw/pc_piix.c |   15 +++-
 5 files changed, 118 insertions(+), 2 deletions(-)
 create mode 100644 hw/kvm/ioapic.c

diff --git a/Makefile.target b/Makefile.target
index fb10143..b48bb57 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -236,7 +236,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/ioapic_common.c b/hw/ioapic_common.c
index 094551c..efc1d44 100644
--- a/hw/ioapic_common.c
+++ b/hw/ioapic_common.c
@@ -122,6 +122,7 @@ static SysBusDeviceInfo ioapic_info = {
 .qdev.no_user = 1,
 .qdev.props = (Property[]) {
 DEFINE_PROP_STRING("backend", IOAPICState, backend_name),
+DEFINE_PROP_UINT32("kvm_gsi_base", IOAPICState, kvm_gsi_base, 0),
 DEFINE_PROP_END_OF_LIST(),
 },
 };
diff --git a/hw/ioapic_internal.h b/hw/ioapic_internal.h
index c5fab8b..bf63115 100644
--- a/hw/ioapic_internal.h
+++ b/hw/ioapic_internal.h
@@ -95,6 +95,7 @@ struct IOAPICState {
 
 char *backend_name;
 IOAPICBackend *backend;
+uint32_t kvm_gsi_base;
 };
 
 void ioapic_register_device(void);
diff --git a/hw/kvm/ioapic.c b/hw/kvm/ioapic.c
new file mode 100644
index 000..0e66240
--- /dev/null
+++ b/hw/kvm/ioapic.c
@@ -0,0 +1,101 @@
+/*
+ * KVM in-kernel IOPIC support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "hw/pc.h"
+#include "hw/ioapic_internal.h"
+#include "hw/apic_internal.h"
+#include "kvm.h"
+
+static void kvm_ioapic_get(IOAPICState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int ret, i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, &chip);
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_IRQCHIP failed: %s\n", strerror(ret));
+abort();
+}
+
+kioapic = &chip.chip.ioapic;
+
+s->id = kioapic->id;
+s->ioregsel = kioapic->ioregsel;
+s->irr = kioapic->irr;
+for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+s->ioredtbl[i] = kioapic->redirtbl[i].bits;
+}
+}
+
+static void kvm_ioapic_put(IOAPICState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_ioapic_state *kioapic;
+int ret, i;
+
+chip.chip_id = KVM_IRQCHIP_IOAPIC;
+kioapic = &chip.chip.ioapic;
+
+kioapic->id = s->id;
+kioapic->ioregsel = s->ioregsel;
+kioapic->base_address = s->busdev.mmio[0].addr;
+kioapic->irr = s->irr;
+for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+kioapic->redirtbl[i].bits = s->ioredtbl[i];
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, &chip);
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_IRQCHIP failed: %s\n", strerror(ret));
+abort();
+}
+}
+
+static void kvm_ioapic_reset(IOAPICState *s)
+{
+ioapic_reset_internal(s);
+
+kvm_ioapic_put(s);
+}
+
+static void kvm_ioapic_set_irq(void *opaque, int irq, int level)
+{
+IOAPICState *s = opaque;
+int delivered;
+
+delivered = kvm_irqchip_set_irq(kvm_state, s->kvm_gsi_base + irq, level);
+apic_set_irq_delivered(delivered);
+}
+
+static void kvm_ioapic_backend_init(IOAPICState *s, int index)
+{
+memory_region_init_reservation(&s->io_memory, "kvm-ioapic", 0x1000);
+
+qdev_init_gpio_in(&s->busdev.qdev, kvm_ioapic_set_irq, IOAPIC_NUM_PINS);
+}
+
+static IOAPICBackend kvm_ioapic_backend = {
+.name = "KVM",
+.init = kvm_ioapic_backend_init,
+.reset = kvm_ioapic_reset,
+.pre_save = kvm_ioapic_get,
+.post_load = kvm_ioapic_put,
+};
+
+static void kvm_ioapic_register(void)
+{
+ioapic_register_backend(&kvm_ioapic_backend);
+}
+
+device_init(kvm_ioapic_register)
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 8650319..93d0eba 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -68,6 +68,15 @@ static void kvm_piix3_setup_irq_routing(bool pci_enabled)
 for (i = 8; i < 16; ++i) {
 kvm_irqchip_add_route(s, i, KVM_IRQCHIP_PIC_SLAVE, i - 8);
 }
+

[Qemu-devel] [PATCH v4 05/15] apic: Introduce backend/frontend infrastructure for KVM reuse

2011-12-08 Thread Jan Kiszka

The KVM in-kernel APIC model will reuse parts of the user space model
while providing the same frontend view to guest and most management
interfaces. Introduce an APIC backend concept to encapsulate those
parts that will tell user space and KVM model apart. The backend offers
callback hooks for init, base/tpr setting, and the external NMI delivery
that will be implemented accordingly.

Signed-off-by: Jan Kiszka 
---
 Makefile.target|2 +-
 hw/apic.c  |  282 +++-
 hw/apic_common.c   |  265 
 hw/apic_internal.h |  119 ++
 hw/pc.c|1 +
 trace-events   |2 +-
 6 files changed, 403 insertions(+), 268 deletions(-)
 create mode 100644 hw/apic_common.c
 create mode 100644 hw/apic_internal.h

diff --git a/Makefile.target b/Makefile.target
index 1d24a30..c46f062 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -231,7 +231,7 @@ obj-$(CONFIG_IVSHMEM) += ivshmem.o
 # Hardware support
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o
+obj-i386-y += cirrus_vga.o sga.o apic_common.o apic.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index b9d733c..f25be80 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -16,53 +16,13 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see 
  */
-#include "hw.h"
+#include "apic_internal.h"
 #include "apic.h"
 #include "ioapic.h"
-#include "qemu-timer.h"
 #include "host-utils.h"
-#include "sysbus.h"
 #include "trace.h"
 #include "pc.h"
 
-/* APIC Local Vector Table */
-#define APIC_LVT_TIMER   0
-#define APIC_LVT_THERMAL 1
-#define APIC_LVT_PERFORM 2
-#define APIC_LVT_LINT0   3
-#define APIC_LVT_LINT1   4
-#define APIC_LVT_ERROR   5
-#define APIC_LVT_NB  6
-
-/* APIC delivery modes */
-#define APIC_DM_FIXED  0
-#define APIC_DM_LOWPRI 1
-#define APIC_DM_SMI2
-#define APIC_DM_NMI4
-#define APIC_DM_INIT   5
-#define APIC_DM_SIPI   6
-#define APIC_DM_EXTINT 7
-
-/* APIC destination mode */
-#define APIC_DESTMODE_FLAT 0xf
-#define APIC_DESTMODE_CLUSTER  1
-
-#define APIC_TRIGGER_EDGE  0
-#define APIC_TRIGGER_LEVEL 1
-
-#defineAPIC_LVT_TIMER_PERIODIC (1<<17)
-#defineAPIC_LVT_MASKED (1<<16)
-#defineAPIC_LVT_LEVEL_TRIGGER  (1<<15)
-#defineAPIC_LVT_REMOTE_IRR (1<<14)
-#defineAPIC_INPUT_POLARITY (1<<13)
-#defineAPIC_SEND_PENDING   (1<<12)
-
-#define ESR_ILLEGAL_ADDRESS (1 << 7)
-
-#define APIC_SV_DIRECTED_IO (1<<12)
-#define APIC_SV_ENABLE  (1<<8)
-
-#define MAX_APICS 255
 #define MAX_APIC_WORDS 8
 
 /* Intel APIC constants: from include/asm/msidef.h */
@@ -75,40 +35,7 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
-#define MSI_ADDR_SIZE   0x10
-
-typedef struct APICState APICState;
-
-struct APICState {
-SysBusDevice busdev;
-MemoryRegion io_memory;
-void *cpu_env;
-uint32_t apicbase;
-uint8_t id;
-uint8_t arb_id;
-uint8_t tpr;
-uint32_t spurious_vec;
-uint8_t log_dest;
-uint8_t dest_mode;
-uint32_t isr[8];  /* in service register */
-uint32_t tmr[8];  /* trigger mode register */
-uint32_t irr[8]; /* interrupt request register */
-uint32_t lvt[APIC_LVT_NB];
-uint32_t esr; /* error register */
-uint32_t icr[2];
-
-uint32_t divide_conf;
-int count_shift;
-uint32_t initial_count;
-int64_t initial_count_load_time, next_time;
-uint32_t idx;
-QEMUTimer *timer;
-int sipi_vector;
-int wait_for_sipi;
-};
-
 static APICState *local_apics[MAX_APICS + 1];
-static int apic_irq_delivered;
 
 static void apic_set_irq(APICState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICState *s);
@@ -205,10 +132,8 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
-void apic_deliver_nmi(DeviceState *d)
+static void apic_external_nmi(APICState *s)
 {
-APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
-
 apic_local_deliver(s, APIC_LVT_LINT1);
 }
 
@@ -300,14 +225,8 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, 
uint8_t delivery_mode,
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
-void cpu_set_apic_base(DeviceState *d, uint64_t val)
+static void apic_set_base(APICState *s, uint64_t val)
 {
-APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
-
-trace_cpu_set_apic_base(val);
-
-if (!s)
-return;
 s->apicbase = (val & 0xf000) |
 (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
 /* if disabled, cann

[Qemu-devel] [PATCH v4 06/15] apic: Open-code timer save/restore

2011-12-08 Thread Jan Kiszka

To enable migration between accelerated and non-accelerated APIC models,
we will need to handle the timer saving and restoring specially and can
no longer rely on the automatics of VMSTATE_TIMER. Specifically,
accelerated model will not start any QEMUTimer.

This patch therefore factors out the generic bits into apic_next_timer
and introduces a post-load callback that can be implemented differently
by both models.

Signed-off-by: Jan Kiszka 
---
 hw/apic.c  |   30 --
 hw/apic_common.c   |   51 +--
 hw/apic_internal.h |3 +++
 3 files changed, 64 insertions(+), 20 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index f25be80..ed6411d 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -521,25 +521,9 @@ static uint32_t apic_get_current_count(APICState *s)
 
 static void apic_timer_update(APICState *s, int64_t current_time)
 {
-int64_t next_time, d;
-
-if (!(s->lvt[APIC_LVT_TIMER] & APIC_LVT_MASKED)) {
-d = (current_time - s->initial_count_load_time) >>
-s->count_shift;
-if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) {
-if (!s->initial_count)
-goto no_timer;
-d = ((d / ((uint64_t)s->initial_count + 1)) + 1) * 
((uint64_t)s->initial_count + 1);
-} else {
-if (d >= s->initial_count)
-goto no_timer;
-d = (uint64_t)s->initial_count + 1;
-}
-next_time = s->initial_count_load_time + (d << s->count_shift);
-qemu_mod_timer(s->timer, next_time);
-s->next_time = next_time;
+if (apic_next_timer(s, current_time)) {
+qemu_mod_timer(s->timer, s->next_time);
 } else {
-no_timer:
 qemu_del_timer(s->timer);
 }
 }
@@ -770,12 +754,22 @@ static void apic_backend_init(APICState *s)
 local_apics[s->idx] = s;
 }
 
+static void apic_post_load(APICState *s)
+{
+if (s->timer_expiry != -1) {
+qemu_mod_timer(s->timer, s->timer_expiry);
+} else {
+qemu_del_timer(s->timer);
+}
+}
+
 static APICBackend apic_backend = {
 .name = "QEMU",
 .init = apic_backend_init,
 .set_base = apic_set_base,
 .set_tpr = apic_set_tpr,
 .external_nmi = apic_external_nmi,
+.post_load = apic_post_load,
 };
 
 static void apic_register_devices(void)
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 73241e4..f38ffc1 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -89,6 +89,39 @@ void apic_deliver_nmi(DeviceState *d)
 s->backend->external_nmi(s);
 }
 
+bool apic_next_timer(APICState *s, int64_t current_time)
+{
+int64_t d;
+
+/* We need to store the timer state separately to support APIC
+ * implementations that maintain a non-QEMU timer, e.g. inside the
+ * host kernel. This open-coded state allows us to migrate between
+ * both models. */
+s->timer_expiry = -1;
+
+if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_MASKED) {
+return false;
+}
+
+d = (current_time - s->initial_count_load_time) >> s->count_shift;
+
+if (s->lvt[APIC_LVT_TIMER] & APIC_LVT_TIMER_PERIODIC) {
+if (!s->initial_count) {
+return false;
+}
+d = ((d / ((uint64_t)s->initial_count + 1)) + 1) *
+((uint64_t)s->initial_count + 1);
+} else {
+if (d >= s->initial_count) {
+return false;
+}
+d = (uint64_t)s->initial_count + 1;
+}
+s->next_time = s->initial_count_load_time + (d << s->count_shift);
+s->timer_expiry = s->next_time;
+return true;
+}
+
 void apic_init_reset(DeviceState *d)
 {
 APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
@@ -116,7 +149,10 @@ void apic_init_reset(DeviceState *d)
 s->next_time = 0;
 s->wait_for_sipi = 1;
 
-qemu_del_timer(s->timer);
+if (s->timer) {
+qemu_del_timer(s->timer);
+}
+s->timer_expiry = -1;
 }
 
 static void apic_reset(DeviceState *d)
@@ -181,12 +217,23 @@ static int apic_load_old(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
+static int apic_dispatch_post_load(void *opaque, int version_id)
+{
+APICState *s = opaque;
+
+if (s->backend->post_load) {
+s->backend->post_load(s);
+}
+return 0;
+}
+
 static const VMStateDescription vmstate_apic = {
 .name = "apic",
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 1,
 .load_state_old = apic_load_old,
+.post_load = apic_dispatch_post_load,
 .fields  = (VMStateField[]) {
 VMSTATE_UINT32(apicbase, APICState),
 VMSTATE_UINT8(id, APICState),
@@ -206,7 +253,7 @@ static const VMStateDescription vmstate_apic = {
 VMSTATE_UINT32(initial_count, APICState),
 VMSTATE_INT64(initial_count_load_time, APICState),
 VMSTATE_INT64(next_time, APICState),
-VMSTATE_TIMER(timer, APICState),
+VMSTATE_INT64(timer_expiry, APICState), /* open-coded timer state */
 VMSTA

Re: [Qemu-devel] [PATCH v4 2/3] Extract code to nbd_setup function to be used for many purposes

2011-12-08 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 4:23 AM, Chunyan Liu  wrote:

Overall looks good, some suggestions:

> @@ -53,7 +53,7 @@ static void usage(const char *name)
>  "  -o, --offset=OFFSET  offset into the image\n"
>  "  -b, --bind=IFACE interface to bind to (default `0.0.0.0')\n"
>  "  -k, --socket=PATH    path to the unix socket\n"
> -"   (default '"SOCKET_PATH"')\n"
> +"   (default /var/lock/qemu-nbd-%s)\n"
>  "  -r, --read-only  export read-only\n"
>  "  -P, --partition=NUM  only expose partition NUM\n"
>  "  -s, --snapshot   use snapshot file\n"
> @@ -67,7 +67,7 @@ static void usage(const char *name)
>  "  -V, --version    output version information and exit\n"
>  "\n"
>  "Report bugs to \n"
> -    , name, NBD_DEFAULT_PORT, "DEVICE");
> +    , name, NBD_DEFAULT_PORT, "PID");
>  }

Since we're not using the SOCKET_PATH format string anymore it's nicer
to drop this format string argument and just change to "(default
/var/lock/qemu-nbd-PID" above.

> +    fd = open(device, O_RDWR);
> +    if (fd == -1) {
> +    fprintf(stderr, "Failed to open %s", device);

Missing \n in string.

> +    goto out;
> +    }
> +
> +    ret = nbd_init(fd, sock, nbdflags, size, blocksize);
> +    if (ret == -1) {
> +    goto out;
> +    }
> +    } else {
> +    int i = 0;
> +    int max_nbd = 16;
> +    for (i = 0; i < max_nbd; i++) {
> +    if (!asprintf(&device, "/dev/nbd%d", i)) {

asprintf() is GNU and BSD only (not C or POSIX).  I suggest using
g_strdup_printf() instead which is guaranteed to be available because
QEMU builds against glib:

http://developer.gnome.org/glib/2.28/glib-String-Utility-Functions.html#g-strdup-printf

> +    continue;
> +    }
>
> +
> +    fd = open(device, O_RDWR);
> +    if (fd == -1 || nbd_init(fd, sock, nbdflags, size, blocksize)
> == -1) {

nbd_init() does not close(fd) on failure.  Please separate the open()
and nbd_init() error cases and close the fd.

Stefan

[Qemu-devel] [PATCH v4 02/15] kvm: Move kvmclock into hw/kvm folder

2011-12-08 Thread Jan Kiszka

More KVM-specific devices will come, so let's start with moving the
kvmclock into a dedicated folder.

Signed-off-by: Jan Kiszka 
---
 Makefile.target|4 ++--
 configure  |1 +
 hw/{kvmclock.c => kvm/clock.c} |4 ++--
 hw/{kvmclock.h => kvm/clock.h} |0
 hw/pc_piix.c   |2 +-
 5 files changed, 6 insertions(+), 5 deletions(-)
 rename hw/{kvmclock.c => kvm/clock.c} (98%)
 rename hw/{kvmclock.h => kvm/clock.h} (100%)

diff --git a/Makefile.target b/Makefile.target
index a111521..1d24a30 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -236,7 +236,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvmclock.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
@@ -428,7 +428,7 @@ qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx
 
 clean:
rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o
-   rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o
+   rm -f *.d */*.d tcg/*.o ide/*.o 9pfs/*.o kvm/*.o
rm -f hmp-commands.h qmp-commands-old.h gdbstub-xml.c
 ifdef CONFIG_TRACE_SYSTEMTAP
rm -f *.stp
diff --git a/configure b/configure
index ac4840d..12cd9d1 100755
--- a/configure
+++ b/configure
@@ -3338,6 +3338,7 @@ mkdir -p $target_dir/fpu
 mkdir -p $target_dir/tcg
 mkdir -p $target_dir/ide
 mkdir -p $target_dir/9pfs
+mkdir -p $target_dir/kvm
 if test "$target" = "arm-linux-user" -o "$target" = "armeb-linux-user" -o 
"$target" = "arm-bsd-user" -o "$target" = "armeb-bsd-user" ; then
   mkdir -p $target_dir/nwfpe
 fi
diff --git a/hw/kvmclock.c b/hw/kvm/clock.c
similarity index 98%
rename from hw/kvmclock.c
rename to hw/kvm/clock.c
index 5388bc4..5983271 100644
--- a/hw/kvmclock.c
+++ b/hw/kvm/clock.c
@@ -13,9 +13,9 @@
 
 #include "qemu-common.h"
 #include "sysemu.h"
-#include "sysbus.h"
 #include "kvm.h"
-#include "kvmclock.h"
+#include "hw/sysbus.h"
+#include "hw/kvm/clock.h"
 
 #include 
 #include 
diff --git a/hw/kvmclock.h b/hw/kvm/clock.h
similarity index 100%
rename from hw/kvmclock.h
rename to hw/kvm/clock.h
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 970f43c..530fe9c 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -34,7 +34,7 @@
 #include "boards.h"
 #include "ide.h"
 #include "kvm.h"
-#include "kvmclock.h"
+#include "kvm/clock.h"
 #include "sysemu.h"
 #include "sysbus.h"
 #include "arch_init.h"
-- 
1.7.3.4

[Qemu-devel] (no subject)

2011-12-08 Thread wong

...I suppose you won�t find the place more interesting than this site!
 http://www.spassvoegel-woellstein.de/page.december.php?uvpage=16t5

[Qemu-devel] [PATCH v4 03/15] apic: Stop timer on reset

2011-12-08 Thread Jan Kiszka

All LVTs are masked on reset, so the timer becomes ineffective. Letting
it tick nevertheless is harmless, but will at least create a spurious
trace event.

Signed-off-by: Jan Kiszka 
---
 hw/apic.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 9d0f460..4b97b17 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -528,6 +528,8 @@ void apic_init_reset(DeviceState *d)
 s->initial_count_load_time = 0;
 s->next_time = 0;
 s->wait_for_sipi = 1;
+
+qemu_del_timer(s->timer);
 }
 
 static void apic_startup(APICState *s, int vector_num)
-- 
1.7.3.4

Re: [Qemu-devel] Error while booting

2011-12-08 Thread Kevin Wolf

Am 07.12.2011 17:26, schrieb sparsh mittal:
> I use version 0.15 of qemu with marss cycle-accurate simulator.
> 
> After making checkpoints in an image, I get this error, while trying to
> boot it (i.e. not using loadvm to run checkpoint, but just starting the
> image).
> 
> This kernel requires an x86-64 CPU, but only detected an i686 CPU.
> Unable to boot - please use a kernel appropriate for your CPU

Try using 'qemu-system-x86_64' instead of 'qemu'.

Kevin

Re: [Qemu-devel] [PATCH 1/2] net: expand tabs in net/socket.c

2011-12-08 Thread Zhi Yong Wu

On Wed, Dec 7, 2011 at 11:01 PM, Stefan Hajnoczi
 wrote:
> In order to make later patches sane, expand the tab characters and
> conform to QEMU coding style now.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  net/socket.c |   79 
> ++
>  1 files changed, 41 insertions(+), 38 deletions(-)
>
> diff --git a/net/socket.c b/net/socket.c
> index e9ef128..613a7ef 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -161,10 +161,11 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>  #endif
>
>     if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
> -       fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) 
> does not contain a multicast address\n",
> -               inet_ntoa(mcastaddr->sin_addr),
> +        fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) "
> +                "does not contain a multicast address\n",
> +                inet_ntoa(mcastaddr->sin_addr),
>                 (int)ntohl(mcastaddr->sin_addr.s_addr));
> -       return -1;
> +        return -1;
>
>     }
>     fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
> @@ -177,8 +178,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>     ret=setsockopt(fd, SOL_SOCKET, SO_REUSEADDR,
>                    (const char *)&val, sizeof(val));
>     if (ret < 0) {
> -       perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
> -       goto fail;
> +        perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
> +        goto fail;
>     }
>
>     ret = bind(fd, (struct sockaddr *)mcastaddr, sizeof(*mcastaddr));
> @@ -198,8 +199,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>     ret = setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
>                      (const char *)&imr, sizeof(struct ip_mreq));
>     if (ret < 0) {
> -       perror("setsockopt(IP_ADD_MEMBERSHIP)");
> -       goto fail;
> +        perror("setsockopt(IP_ADD_MEMBERSHIP)");
> +        goto fail;
>     }
>
>     /* Force mcast msgs to loopback (eg. several QEMUs in same host */
> @@ -207,8 +208,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
> *mcastaddr, struct in_addr
>     ret=setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
>                    (const char *)&loop, sizeof(loop));
>     if (ret < 0) {
> -       perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> -       goto fail;
> +        perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
> +        goto fail;
>     }
>
>     /* If a bind address is given, only send packets from that address */
> @@ -260,37 +261,38 @@ static NetSocketState 
> *net_socket_fd_init_dgram(VLANState *vlan,
>      */
>
>     if (is_connected) {
> -       if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
> -           /* must be bound */
> -           if (saddr.sin_addr.s_addr==0) {
> -               fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, 
> cannot setup multicast dst addr\n",
> -                       fd);
> -               return NULL;
> -           }
> -           /* clone dgram socket */
> -           newfd = net_socket_mcast_create(&saddr, NULL);
> -           if (newfd < 0) {
> -               /* error already reported by net_socket_mcast_create() */
> -               close(fd);
> -               return NULL;
> -           }
> -           /* clone newfd to fd, close newfd */
> -           dup2(newfd, fd);
> -           close(newfd);
> -
> -       } else {
> -           fprintf(stderr, "qemu: error: init_dgram: fd=%d failed 
> getsockname(): %s\n",
> -                   fd, strerror(errno));
> -           return NULL;
> -       }
> +        if (getsockname(fd, (struct sockaddr *) &saddr, &saddr_len) == 0) {
> +            /* must be bound */
> +            if (saddr.sin_addr.s_addr == 0) {
> +                fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
> +                        "cannot setup multicast dst addr\n", fd);
> +                return NULL;
> +            }
> +            /* clone dgram socket */
> +            newfd = net_socket_mcast_create(&saddr, NULL);
> +            if (newfd < 0) {
> +                /* error already reported by net_socket_mcast_create() */
> +                close(fd);
> +                return NULL;
> +            }
> +            /* clone newfd to fd, close newfd */
> +            dup2(newfd, fd);
> +            close(newfd);
> +
The above white line should be removed.
> +        } else {
> +            fprintf(stderr,
> +                    "qemu: error: init_dgram: fd=%d failed getsockname(): 
> %s\n",
> +                    fd, strerror(errno));
> +            return NULL;
> +        }
>     }
>
>     nc = qemu_new_net_client(&net_dgram_socket_info, vlan, NULL, model, name);
>
>     snprintf(nc->info_str, sizeof(nc->info_str),
> -           "socket: fd=%d (%s mcast=%s:%d)",
> -           fd, is_connected ? "cloned" : "",
> -           inet_ntoa(saddr.sin_addr), ntohs

[Qemu-devel] [PATCH v4 10/15] kvm: Introduce core services for in-kernel irqchip support

2011-12-08 Thread Jan Kiszka

Add the basic infrastructure to active in-kernel irqchip support, inject
interrupts into these models, and maintain IRQ routes.

Routing is optional and depends on the host arch supporting
KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as
we can't route GSI0 to IOAPIC pin 2.

In-kernel irqchip support will once be controlled by the machine
property 'kernel_irqchip', but this is not yet wired up.

Signed-off-by: Jan Kiszka 
---
 kvm-all.c |  149 +
 kvm.h |8 +++
 target-i386/kvm.c |   11 
 3 files changed, 168 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 4c466d6..8958abd 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -77,6 +77,13 @@ struct KVMState
 int pit_in_kernel;
 int xsave, xcrs;
 int many_ioeventfds;
+int irqchip_inject_ioctl;
+#ifdef KVM_CAP_IRQ_ROUTING
+struct kvm_irq_routing *irq_routes;
+int nr_allocated_irq_routes;
+uint32_t *used_gsi_bitmap;
+unsigned int max_gsi;
+#endif
 };
 
 KVMState *kvm_state;
@@ -693,6 +700,138 @@ static void kvm_handle_interrupt(CPUState *env, int mask)
 }
 }
 
+int kvm_irqchip_set_irq(KVMState *s, int irq, int level)
+{
+struct kvm_irq_level event;
+int ret;
+
+assert(s->irqchip_in_kernel);
+
+event.level = level;
+event.irq = irq;
+ret = kvm_vm_ioctl(s, s->irqchip_inject_ioctl, &event);
+if (ret < 0) {
+perror("kvm_set_irqchip_line");
+abort();
+}
+
+return (s->irqchip_inject_ioctl == KVM_IRQ_LINE) ? 1 : event.status;
+}
+
+#ifdef KVM_CAP_IRQ_ROUTING
+static void set_gsi(KVMState *s, unsigned int gsi)
+{
+assert(gsi < s->max_gsi);
+
+s->used_gsi_bitmap[gsi / 32] |= 1U << (gsi % 32);
+}
+
+static void kvm_init_irq_routing(KVMState *s)
+{
+int gsi_count;
+
+gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING);
+if (gsi_count > 0) {
+unsigned int gsi_bits, i;
+
+/* Round up so we can search ints using ffs */
+gsi_bits = (gsi_count + 31) / 32;
+s->used_gsi_bitmap = g_malloc0(gsi_bits / 8);
+s->max_gsi = gsi_bits;
+
+/* Mark any over-allocated bits as already in use */
+for (i = gsi_count; i < gsi_bits; i++) {
+set_gsi(s, i);
+}
+}
+
+s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
+s->nr_allocated_irq_routes = 0;
+
+kvm_arch_init_irq_routing(s);
+}
+
+static void kvm_add_routing_entry(KVMState *s,
+  struct kvm_irq_routing_entry *entry)
+{
+struct kvm_irq_routing_entry *new;
+int n, size;
+
+if (s->irq_routes->nr == s->nr_allocated_irq_routes) {
+n = s->nr_allocated_irq_routes * 2;
+if (n < 64) {
+n = 64;
+}
+size = sizeof(struct kvm_irq_routing);
+size += n * sizeof(*new);
+s->irq_routes = g_realloc(s->irq_routes, size);
+s->nr_allocated_irq_routes = n;
+}
+n = s->irq_routes->nr++;
+new = &s->irq_routes->entries[n];
+memset(new, 0, sizeof(*new));
+new->gsi = entry->gsi;
+new->type = entry->type;
+new->flags = entry->flags;
+new->u = entry->u;
+
+set_gsi(s, entry->gsi);
+}
+
+void kvm_irqchip_add_route(KVMState *s, int irq, int irqchip, int pin)
+{
+struct kvm_irq_routing_entry e;
+
+e.gsi = irq;
+e.type = KVM_IRQ_ROUTING_IRQCHIP;
+e.flags = 0;
+e.u.irqchip.irqchip = irqchip;
+e.u.irqchip.pin = pin;
+kvm_add_routing_entry(s, &e);
+}
+
+int kvm_irqchip_commit_routes(KVMState *s)
+{
+s->irq_routes->flags = 0;
+return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
+}
+
+#else /* !KVM_CAP_IRQ_ROUTING */
+
+static void kvm_init_irq_routing(KVMState *s)
+{
+}
+#endif /* !KVM_CAP_IRQ_ROUTING */
+
+static int kvm_irqchip_create(KVMState *s)
+{
+QemuOptsList *list = qemu_find_opts("machine");
+int ret;
+
+if (QTAILQ_EMPTY(&list->head) ||
+!qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+   "kernel_irqchip", false) ||
+!kvm_check_extension(s, KVM_CAP_IRQCHIP)) {
+return 0;
+}
+
+ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
+if (ret < 0) {
+fprintf(stderr, "Create kernel irqchip failed\n");
+return ret;
+}
+
+s->irqchip_inject_ioctl = KVM_IRQ_LINE;
+if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) {
+s->irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS;
+}
+s->irqchip_in_kernel = 1;
+
+kvm_init_irq_routing(s);
+
+return 0;
+}
+
 int kvm_init(void)
 {
 static const char upgrade_note[] =
@@ -788,6 +927,11 @@ int kvm_init(void)
 goto err;
 }
 
+ret = kvm_irqchip_create(s);
+if (ret < 0) {
+goto err;
+}
+
 kvm_state = s;
 cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
 
@@ -1122,6 +1266,11 @@ int kvm_has_many_ioeventfds(void)
 return kvm_state->many_ioeventfds;
 }
 
+int kvm_has_gsi_routing(void)
+

[Qemu-devel] [PATCH v4 11/15] kvm: x86: Establish IRQ0 override control

2011-12-08 Thread Jan Kiszka

KVM is forced to disable the IRQ0 override when we run with in-kernel
irqchip but without IRQ routing support of the kernel. Set the fwcfg
value correspondingly. This aligns us with qemu-kvm.

Signed-off-by: Jan Kiszka 
---
 hw/pc.c|3 ++-
 kvm-all.c  |5 +
 kvm-stub.c |5 +
 kvm.h  |2 ++
 sysemu.h   |1 -
 vl.c   |1 -
 6 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index ee6e59b..066edc4 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -39,6 +39,7 @@
 #include "msi.h"
 #include "sysbus.h"
 #include "sysemu.h"
+#include "kvm.h"
 #include "blockdev.h"
 #include "ui/qemu-spice.h"
 #include "memory.h"
@@ -609,7 +610,7 @@ static void *bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
-fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, &irq0override, 1);
+fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override());
 
 smbios_table = smbios_get_table(&smbios_len);
 if (smbios_table)
diff --git a/kvm-all.c b/kvm-all.c
index 8958abd..7387dd3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1271,6 +1271,11 @@ int kvm_has_gsi_routing(void)
 return kvm_check_extension(kvm_state, KVM_CAP_IRQ_ROUTING);
 }
 
+int kvm_allows_irq0_override(void)
+{
+return !kvm_enabled() || !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 06064b9..6c2b06b 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -78,6 +78,11 @@ int kvm_has_many_ioeventfds(void)
 return 0;
 }
 
+int kvm_allows_irq0_override(void)
+{
+return 1;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 }
diff --git a/kvm.h b/kvm.h
index 0d6c453..a3c87af 100644
--- a/kvm.h
+++ b/kvm.h
@@ -53,6 +53,8 @@ int kvm_has_xcrs(void);
 int kvm_has_many_ioeventfds(void);
 int kvm_has_gsi_routing(void);
 
+int kvm_allows_irq0_override(void);
+
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
 
diff --git a/sysemu.h b/sysemu.h
index 22cd720..3bd896e 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -102,7 +102,6 @@ extern int vga_interface_type;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
-extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index de5ecef..f9a8caf 100644
--- a/vl.c
+++ b/vl.c
@@ -218,7 +218,6 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
-uint8_t irq0override = 1;
 const char *watchdog;
 QEMUOptionRom option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
-- 
1.7.3.4

[Qemu-devel] [PATCH v4 04/15] apic: Inject external NMI events via LINT1

2011-12-08 Thread Jan Kiszka

On real hardware, NMI button events are injected via the LINT1 line of
the APICs. E.g. kdump expect this wiring and gets upset if the per-APIC
LINT1 mask is not respected, i.e. if NMIs are injected to VCPUs that
should not receive them. Change the APIC emulation code to reflect this.

Based on qemu-kvm patch by Lai Jiangshan.

CC: Lai Jiangshan 
Reported-by: Kenji Kaneshige 
Signed-off-by: Jan Kiszka 
---
 hw/apic.c |7 +++
 hw/apic.h |1 +
 monitor.c |6 +-
 3 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 4b97b17..b9d733c 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,13 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index a5c910f..a62d83b 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -8,6 +8,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t 
delivery_mode,
   uint8_t vector_num, uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index 1be222e..6bd0fb1 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2354,7 +2354,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env->next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env->apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env->apic_state);
+}
 }
 
 return 0;
-- 
1.7.3.4

Re: [Qemu-devel] [PATCH 2/2] net: take ownership of fd in socket init functions

2011-12-08 Thread Zhi Yong Wu

On Wed, Dec 7, 2011 at 11:01 PM, Stefan Hajnoczi
 wrote:
> Today net/socket.c has no consistent policy for closing the socket file
> descriptor when initialization fails.  This means we leak the file
> descriptor in some cases or we could also try to close it twice.
>
> Make error paths consistent by taking ownership of the file descriptor
> and closing it on error.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  net/socket.c |   17 +
>  1 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/net/socket.c b/net/socket.c
> index 613a7ef..f999c26 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -266,14 +266,13 @@ static NetSocketState 
> *net_socket_fd_init_dgram(VLANState *vlan,
>             if (saddr.sin_addr.s_addr == 0) {
>                 fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
>                         "cannot setup multicast dst addr\n", fd);
> -                return NULL;
> +                goto err;
>             }
>             /* clone dgram socket */
>             newfd = net_socket_mcast_create(&saddr, NULL);
>             if (newfd < 0) {
>                 /* error already reported by net_socket_mcast_create() */
> -                close(fd);
> -                return NULL;
> +                goto err;
>             }
>             /* clone newfd to fd, close newfd */
>             dup2(newfd, fd);
> @@ -283,7 +282,7 @@ static NetSocketState *net_socket_fd_init_dgram(VLANState 
> *vlan,
>             fprintf(stderr,
>                     "qemu: error: init_dgram: fd=%d failed getsockname(): 
> %s\n",
>                     fd, strerror(errno));
> -            return NULL;
> +            goto err;
>         }
>     }
>
> @@ -304,6 +303,10 @@ static NetSocketState 
> *net_socket_fd_init_dgram(VLANState *vlan,
>     if (is_connected) s->dgram_dst=saddr;
>
>     return s;
> +
> +err:
> +    closesocket(fd);
> +    return NULL;
>  }
>
>  static void net_socket_connect(void *opaque)
> @@ -353,6 +356,7 @@ static NetSocketState *net_socket_fd_init(VLANState *vlan,
>         (socklen_t *)&optlen)< 0) {
>         fprintf(stderr, "qemu: error: getsockopt(SO_TYPE) for fd=%d failed\n",
>                 fd);
> +        closesocket(fd);
>         return NULL;
>     }
>     switch(so_type) {
> @@ -386,9 +390,7 @@ static void net_socket_accept(void *opaque)
>         }
>     }
>     s1 = net_socket_fd_init(s->vlan, s->model, s->name, fd, 1);
> -    if (!s1) {
> -        closesocket(fd);
> -    } else {
Why is it not handled when s1 is NULL?
> +    if (s1) {
>         snprintf(s1->nc.info_str, sizeof(s1->nc.info_str),
>                  "socket: connection from %s:%d",
>                  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
> @@ -549,7 +551,6 @@ int net_init_socket(QemuOpts *opts,
>         }
>
>         if (!net_socket_fd_init(vlan, "socket", name, fd, 1)) {
> -            close(fd);
>             return -1;
>         }
>     } else if (qemu_opt_get(opts, "listen")) {
> --
> 1.7.7.3
>
>



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] RFC: raw device support for block device targets

2011-12-08 Thread Kevin Wolf

Am 06.12.2011 17:20, schrieb Alex Bligh:
> qemu-img convert appears to support block devices as input, but not
> as output. That is irritating, as when using qemu-img convert to
> convert qcow to raw on a block partition, an intermediate file has
> to be used, which slows things down and pointlessly uses disk space.
> 
> The problem is that ftruncate() is being called on the output file
> in order ensure it is sufficiently large, and this fails on
> block devices.
> 
> I appreciate there may be other calls that fail depending on the
> input file format, but these will presumably be error checked
> at the time.
> 
> Is it therefore worth skipping the ftruncate() if the block device
> is large enough, and at least attempting to proceed further? Something
> like the following (not-even compile tested) patch?

Creating an image on a block device shouldn't even call raw_create(),
but only hdev_create(), which doesn't try to truncate the device, but
just uses lseek to make sure that it's large enough.

Which qemu version are you using and what's your command line?

Kevin

[Qemu-devel] [PATCH v4 07/15] i8259: Introduce backend/frontend infrastructure for KVM reuse

2011-12-08 Thread Jan Kiszka

Analogously to the APIC, we will reuse some parts of the user space
i8259 model for KVM. Again, we create a PIC backend infrastructure and
provide hooks for init, reset, and vmload/save. This also introduces a
common helper to instantiate a single i8259 chip from the cascade-
creating i8259_init function.

Signed-off-by: Jan Kiszka 
---
 Makefile.objs   |2 +-
 hw/i8259.c  |  127 +-
 hw/i8259_common.c   |  173 +++
 hw/i8259_internal.h |   82 
 4 files changed, 271 insertions(+), 113 deletions(-)
 create mode 100644 hw/i8259_common.c
 create mode 100644 hw/i8259_internal.h

diff --git a/Makefile.objs b/Makefile.objs
index d7a6539..72d8ee7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -221,7 +221,7 @@ hw-obj-$(CONFIG_APPLESMC) += applesmc.o
 hw-obj-$(CONFIG_SMARTCARD) += usb-ccid.o ccid-card-passthru.o
 hw-obj-$(CONFIG_SMARTCARD_NSS) += ccid-card-emulated.o
 hw-obj-$(CONFIG_USB_REDIR) += usb-redir.o
-hw-obj-$(CONFIG_I8259) += i8259.o
+hw-obj-$(CONFIG_I8259) += i8259_common.o i8259.o
 
 # PPC devices
 hw-obj-$(CONFIG_PREP_PCI) += prep_pci.o
diff --git a/hw/i8259.c b/hw/i8259.c
index ab519de..413802c 100644
--- a/hw/i8259.c
+++ b/hw/i8259.c
@@ -26,6 +26,7 @@
 #include "isa.h"
 #include "monitor.h"
 #include "qemu-timer.h"
+#include "i8259_internal.h"
 
 /* debug PIC */
 //#define DEBUG_PIC
@@ -40,33 +41,6 @@
 //#define DEBUG_IRQ_LATENCY
 //#define DEBUG_IRQ_COUNT
 
-struct PicState {
-ISADevice dev;
-uint8_t last_irr; /* edge detection */
-uint8_t irr; /* interrupt request register */
-uint8_t imr; /* interrupt mask register */
-uint8_t isr; /* interrupt service register */
-uint8_t priority_add; /* highest irq priority */
-uint8_t irq_base;
-uint8_t read_reg_select;
-uint8_t poll;
-uint8_t special_mask;
-uint8_t init_state;
-uint8_t auto_eoi;
-uint8_t rotate_on_auto_eoi;
-uint8_t special_fully_nested_mode;
-uint8_t init4; /* true if 4 byte init */
-uint8_t single_mode; /* true if slave pic is not initialized */
-uint8_t elcr; /* PIIX edge/trigger selection*/
-uint8_t elcr_mask;
-qemu_irq int_out[1];
-uint32_t master; /* reflects /SP input pin */
-uint32_t iobase;
-uint32_t elcr_addr;
-MemoryRegion base_io;
-MemoryRegion elcr_io;
-};
-
 #if defined(DEBUG_PIC) || defined(DEBUG_IRQ_COUNT)
 static int irq_level[16];
 #endif
@@ -248,29 +222,12 @@ int pic_read_irq(PicState *s)
 
 static void pic_init_reset(PicState *s)
 {
-s->last_irr = 0;
-s->irr = 0;
-s->imr = 0;
-s->isr = 0;
-s->priority_add = 0;
-s->irq_base = 0;
-s->read_reg_select = 0;
-s->poll = 0;
-s->special_mask = 0;
-s->init_state = 0;
-s->auto_eoi = 0;
-s->rotate_on_auto_eoi = 0;
-s->special_fully_nested_mode = 0;
-s->init4 = 0;
-s->single_mode = 0;
-/* Note: ELCR is not reset */
+pic_reset_internal(s);
 pic_update_irq(s);
 }
 
-static void pic_reset(DeviceState *dev)
+static void pic_reset(PicState *s)
 {
-PicState *s = container_of(dev, PicState, dev.qdev);
-
 pic_init_reset(s);
 s->elcr = 0;
 }
@@ -418,32 +375,6 @@ static uint64_t elcr_ioport_read(void *opaque, 
target_phys_addr_t addr,
 return s->elcr;
 }
 
-static const VMStateDescription vmstate_pic = {
-.name = "i8259",
-.version_id = 1,
-.minimum_version_id = 1,
-.minimum_version_id_old = 1,
-.fields = (VMStateField[]) {
-VMSTATE_UINT8(last_irr, PicState),
-VMSTATE_UINT8(irr, PicState),
-VMSTATE_UINT8(imr, PicState),
-VMSTATE_UINT8(isr, PicState),
-VMSTATE_UINT8(priority_add, PicState),
-VMSTATE_UINT8(irq_base, PicState),
-VMSTATE_UINT8(read_reg_select, PicState),
-VMSTATE_UINT8(poll, PicState),
-VMSTATE_UINT8(special_mask, PicState),
-VMSTATE_UINT8(init_state, PicState),
-VMSTATE_UINT8(auto_eoi, PicState),
-VMSTATE_UINT8(rotate_on_auto_eoi, PicState),
-VMSTATE_UINT8(special_fully_nested_mode, PicState),
-VMSTATE_UINT8(init4, PicState),
-VMSTATE_UINT8(single_mode, PicState),
-VMSTATE_UINT8(elcr, PicState),
-VMSTATE_END_OF_LIST()
-}
-};
-
 static const MemoryRegionOps pic_base_ioport_ops = {
 .read = pic_ioport_read,
 .write = pic_ioport_write,
@@ -462,24 +393,13 @@ static const MemoryRegionOps pic_elcr_ioport_ops = {
 },
 };
 
-static int pic_initfn(ISADevice *dev)
+static void pic_backend_init(PicState *s)
 {
-PicState *s = DO_UPCAST(PicState, dev, dev);
-
 memory_region_init_io(&s->base_io, &pic_base_ioport_ops, s, "pic", 2);
 memory_region_init_io(&s->elcr_io, &pic_elcr_ioport_ops, s, "elcr", 1);
 
-isa_register_ioport(NULL, &s->base_io, s->iobase);
-if (s->elcr_addr != -1) {
-isa_register_ioport(NULL, &s->elcr_io, s->elcr_addr);
-}
-
-qdev_init_gpio_out(&dev->qdev, s->int_out, ARRAY

[Qemu-devel] qemu 1.0 : booting from scsi device (extboot removal)

2011-12-08 Thread Alexandre DERUMIER

Hi,

I'm working on the proxmox 2.0 distribution (kvm distrib) and I'm trying to 
boot from scsi device with qemu 1.0.

since this patch

Remove extboot support -- Linux KVM
http://www.spinics.net/lists/kvm/msg61865.html


it does'nt work anymore.


I know I can use lsi rom with

-option-rom 8xx_64.rom  


But isn't it possible to use another GPL compliant alternative ? 

Best regards,

Alexandre Derumier




<>

[Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-08 Thread Jan Kiszka

Changes in v4:
- rebased of current uq/master
- fixed stupid bugs that broke bisectability and user space irqchip mode
- integrated NMI-over-LINT1 injection logic

CC: Lai Jiangshan 

Jan Kiszka (15):
  msi: Generalize msix_supported to msi_supported
  kvm: Move kvmclock into hw/kvm folder
  apic: Stop timer on reset
  apic: Inject external NMI events via LINT1
  apic: Introduce backend/frontend infrastructure for KVM reuse
  apic: Open-code timer save/restore
  i8259: Introduce backend/frontend infrastructure for KVM reuse
  ioapic: Introduce backend/frontend infrastructure for KVM reuse
  memory: Introduce memory_region_init_reservation
  kvm: Introduce core services for in-kernel irqchip support
  kvm: x86: Establish IRQ0 override control
  kvm: x86: Add user space part for in-kernel APIC
  kvm: x86: Add user space part for in-kernel i8259
  kvm: x86: Add user space part for in-kernel IOAPIC
  kvm: Arm in-kernel irqchip support

 Makefile.objs  |2 +-
 Makefile.target|6 +-
 configure  |1 +
 hw/apic.c  |  309 ---
 hw/apic.h  |1 +
 hw/apic_common.c   |  312 
 hw/apic_internal.h |  122 
 hw/i8259.c |  127 ++--
 hw/i8259_common.c  |  173 ++
 hw/i8259_internal.h|   82 +++
 hw/ioapic.c|  130 ++---
 hw/ioapic_common.c |  138 ++
 hw/ioapic_internal.h   |  106 ++
 hw/kvm/apic.c  |  154 
 hw/{kvmclock.c => kvm/clock.c} |4 +-
 hw/{kvmclock.h => kvm/clock.h} |0
 hw/kvm/i8259.c |  126 
 hw/kvm/ioapic.c|  101 +
 hw/msi.c   |8 +
 hw/msi.h   |2 +
 hw/msix.c  |9 +-
 hw/msix.h  |2 -
 hw/pc.c|   19 ++-
 hw/pc.h|1 +
 hw/pc_piix.c   |   66 -
 kvm-all.c  |  154 
 kvm-stub.c |5 +
 kvm.h  |   13 ++
 memory.c   |   36 +
 memory.h   |   16 ++
 monitor.c  |6 +-
 qemu-config.c  |4 +
 qemu-options.hx|5 +-
 sysemu.h   |1 -
 target-i386/kvm.c  |   19 +++
 trace-events   |2 +-
 vl.c   |1 -
 37 files changed, 1724 insertions(+), 539 deletions(-)
 create mode 100644 hw/apic_common.c
 create mode 100644 hw/apic_internal.h
 create mode 100644 hw/i8259_common.c
 create mode 100644 hw/i8259_internal.h
 create mode 100644 hw/ioapic_common.c
 create mode 100644 hw/ioapic_internal.h
 create mode 100644 hw/kvm/apic.c
 rename hw/{kvmclock.c => kvm/clock.c} (98%)
 rename hw/{kvmclock.h => kvm/clock.h} (100%)
 create mode 100644 hw/kvm/i8259.c
 create mode 100644 hw/kvm/ioapic.c

-- 
1.7.3.4

[Qemu-devel] [PATCH v4 13/15] kvm: x86: Add user space part for in-kernel i8259

2011-12-08 Thread Jan Kiszka

Introduce the alternative i8259 backend that exploits KVM in-kernel
acceleration.

The PIIX3 initialization code is furthermore extended by KVM specific
IRQ route setup. GSI injection differs in KVM mode from the user space
model. As we can dispatch ISA-range IRQs to both IOAPIC and PIC inside
the kernel, we do not need to inject them separately. This is reflected
by a KVM-specific GSI handler.

Signed-off-by: Jan Kiszka 
---
 Makefile.target |2 +-
 hw/kvm/i8259.c  |  126 +++
 hw/pc.h |1 +
 hw/pc_piix.c|   50 --
 4 files changed, 174 insertions(+), 5 deletions(-)
 create mode 100644 hw/kvm/i8259.c

diff --git a/Makefile.target b/Makefile.target
index 76de485..fb10143 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -236,7 +236,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/kvm/i8259.c b/hw/kvm/i8259.c
new file mode 100644
index 000..98d7141
--- /dev/null
+++ b/hw/kvm/i8259.c
@@ -0,0 +1,126 @@
+/*
+ * KVM in-kernel PIC (i8259) support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/i8259_internal.h"
+#include "hw/apic_internal.h"
+#include "kvm.h"
+
+static void kvm_pic_get(PicState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_pic_state *kpic;
+int ret;
+
+chip.chip_id = s->master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE;
+ret = kvm_vm_ioctl(kvm_state, KVM_GET_IRQCHIP, &chip);
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_IRQCHIP failed: %s\n", strerror(ret));
+abort();
+}
+
+kpic = &chip.chip.pic;
+
+s->last_irr = kpic->last_irr;
+s->irr = kpic->irr;
+s->imr = kpic->imr;
+s->isr = kpic->isr;
+s->priority_add = kpic->priority_add;
+s->irq_base = kpic->irq_base;
+s->read_reg_select = kpic->read_reg_select;
+s->poll = kpic->poll;
+s->special_mask = kpic->special_mask;
+s->init_state = kpic->init_state;
+s->auto_eoi = kpic->auto_eoi;
+s->rotate_on_auto_eoi = kpic->rotate_on_auto_eoi;
+s->special_fully_nested_mode = kpic->special_fully_nested_mode;
+s->init4 = kpic->init4;
+s->elcr = kpic->elcr;
+s->elcr_mask = kpic->elcr_mask;
+}
+
+static void kvm_pic_put(PicState *s)
+{
+struct kvm_irqchip chip;
+struct kvm_pic_state *kpic;
+int ret;
+
+chip.chip_id = s->master ? KVM_IRQCHIP_PIC_MASTER : KVM_IRQCHIP_PIC_SLAVE;
+
+kpic = &chip.chip.pic;
+
+kpic->last_irr = s->last_irr;
+kpic->irr = s->irr;
+kpic->imr = s->imr;
+kpic->isr = s->isr;
+kpic->priority_add = s->priority_add;
+kpic->irq_base = s->irq_base;
+kpic->read_reg_select = s->read_reg_select;
+kpic->poll = s->poll;
+kpic->special_mask = s->special_mask;
+kpic->init_state = s->init_state;
+kpic->auto_eoi = s->auto_eoi;
+kpic->rotate_on_auto_eoi = s->rotate_on_auto_eoi;
+kpic->special_fully_nested_mode = s->special_fully_nested_mode;
+kpic->init4 = s->init4;
+kpic->elcr = s->elcr;
+kpic->elcr_mask = s->elcr_mask;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_SET_IRQCHIP, &chip);
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_IRQCHIP failed: %s\n", strerror(ret));
+abort();
+}
+}
+
+static void kvm_pic_reset(PicState *s)
+{
+pic_reset_internal(s);
+s->elcr = 0;
+
+kvm_pic_put(s);
+}
+
+static void kvm_pic_set_irq(void *opaque, int irq, int level)
+{
+int delivered;
+
+delivered = kvm_irqchip_set_irq(kvm_state, irq, level);
+apic_set_irq_delivered(delivered);
+}
+
+static void kvm_pic_backend_init(PicState *s)
+{
+memory_region_init_reservation(&s->base_io, "kvm-pic", 2);
+memory_region_init_reservation(&s->elcr_io, "kvm-elcr", 1);
+}
+
+qemu_irq *kvm_i8259_init(void)
+{
+i8259_init_chip(true, "KVM");
+i8259_init_chip(false, "KVM");
+
+return qemu_allocate_irqs(kvm_pic_set_irq, NULL, ISA_NUM_IRQS);
+}
+
+static PICBackend kvm_pic_backend = {
+.name = "KVM",
+.init = kvm_pic_backend_init,
+.reset = kvm_pic_reset,
+.pre_save = kvm_pic_get,
+.post_load = kvm_pic_put,
+};
+
+static void kvm_pic_register(void)
+{
+pic_register_backend(&kvm_pic_backend);
+}
+
+device_init(kvm_pic_register)
diff --git a/hw/pc.h b/hw/pc.h
index b7b7e40..fc6f446 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -64,6 +64,7 @@ bool parallel_mm_init(MemoryRegion *address_space,
 typedef struct PicState PicState;
 extern PicState *isa_pic;
 qemu_irq *i8259_init(qemu_irq parent_irq);
+qemu_irq *kvm_i8259_init(void);
 int pic_read_irq(PicSta

[Qemu-devel] [PATCH v4 09/15] memory: Introduce memory_region_init_reservation

2011-12-08 Thread Jan Kiszka

Introduce a memory region type that can reserve I/O space. Such regions
are useful for modeling I/O that is only handled outside of QEMU, i.e.
in the context of an accelerator like KVM.

Any access to such a region from QEMU is a bug, but could theoretically
be triggered by guest code (DMA to reserved region). So only warning
about such events once, then ignore them.

Signed-off-by: Jan Kiszka 
---
 memory.c |   36 
 memory.h |   16 
 2 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index adfdf14..71a252a 100644
--- a/memory.c
+++ b/memory.c
@@ -1031,6 +1031,42 @@ void memory_region_init_rom_device(MemoryRegion *mr,
 mr->backend_registered = true;
 }
 
+static uint64_t invalid_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
+{
+MemoryRegion *mr = opaque;
+
+if (!mr->warning_printed) {
+fprintf(stderr, "Invalid read from memory region %s\n", mr->name);
+mr->warning_printed = true;
+}
+return -1U;
+}
+
+static void invalid_write(void *opaque, target_phys_addr_t addr, uint64_t data,
+  unsigned size)
+{
+MemoryRegion *mr = opaque;
+
+if (!mr->warning_printed) {
+fprintf(stderr, "Invalid write to memory region %s\n", mr->name);
+mr->warning_printed = true;
+}
+}
+
+static const MemoryRegionOps reservation_ops = {
+.read = invalid_read,
+.write = invalid_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void memory_region_init_reservation(MemoryRegion *mr,
+const char *name,
+uint64_t size)
+{
+memory_region_init_io(mr, &reservation_ops, mr, name, size);
+}
+
 void memory_region_destroy(MemoryRegion *mr)
 {
 assert(QTAILQ_EMPTY(&mr->subregions));
diff --git a/memory.h b/memory.h
index 53bf261..1097eac 100644
--- a/memory.h
+++ b/memory.h
@@ -123,6 +123,7 @@ struct MemoryRegion {
 bool terminates;
 bool readable;
 bool readonly; /* For RAM regions */
+bool warning_printed; /* For reservations */
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -250,6 +251,21 @@ void memory_region_init_rom_device(MemoryRegion *mr,
uint64_t size);
 
 /**
+ * memory_region_init_reservation: Initialize a memory region that reserves
+ * I/O space.
+ *
+ * A reservation region primariy serves debugging purposes.  It claims I/O
+ * space that is not supposed to be handled by QEMU itself.  Any access via
+ * the memory API will cause an abort().
+ *
+ * @mr: the #MemoryRegion to be initialized
+ * @name: used for debugging; not visible to the user or ABI
+ * @size: size of the region.
+ */
+void memory_region_init_reservation(MemoryRegion *mr,
+const char *name,
+uint64_t size);
+/**
  * memory_region_destroy: Destroy a memory region and relaim all resources.
  *
  * @mr: the region to be destroyed.  May not currently be a subregion
-- 
1.7.3.4

[Qemu-devel] [PATCH v4 12/15] kvm: x86: Add user space part for in-kernel APIC

2011-12-08 Thread Jan Kiszka

This introduces the alternative APIC backend which makes use of KVM's
in-kernel device model. External NMI injection via LINT1 is emulated by
checking the current state of the in-kernel APIC, only injecting a NMI
into the VCPU if LINT1 is unmasked and configured to DM_NMI.

MSI is not yet supported, so we disable this when the in-kernel model is
in use.

CC: Lai Jiangshan 
Signed-off-by: Jan Kiszka 
---
 Makefile.target   |2 +-
 hw/kvm/apic.c |  154 +
 hw/pc.c   |   15 --
 kvm.h |3 +
 target-i386/kvm.c |8 +++
 5 files changed, 176 insertions(+), 6 deletions(-)
 create mode 100644 hw/kvm/apic.c

diff --git a/Makefile.target b/Makefile.target
index b549988..76de485 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -236,7 +236,7 @@ obj-i386-y += vmport.o
 obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
 obj-i386-y += pc_piix.o
-obj-i386-$(CONFIG_KVM) += kvm/clock.o
+obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
 obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 
 # shared objects
diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
new file mode 100644
index 000..3924f9e
--- /dev/null
+++ b/hw/kvm/apic.c
@@ -0,0 +1,154 @@
+/*
+ * KVM in-kernel APIC support
+ *
+ * Copyright (c) 2011 Siemens AG
+ *
+ * Authors:
+ *  Jan Kiszka  
+ *
+ * This work is licensed under the terms of the GNU GPL version 2.
+ * See the COPYING file in the top-level directory.
+ */
+#include "hw/apic_internal.h"
+#include "kvm.h"
+
+static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
+   int reg_id, uint32_t val)
+{
+*((uint32_t *)(kapic->regs + (reg_id << 4))) = val;
+}
+
+static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
+   int reg_id)
+{
+return *((uint32_t *)(kapic->regs + (reg_id << 4)));
+}
+
+int kvm_put_apic(CPUState *env)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
+struct kvm_lapic_state kapic;
+int i;
+
+if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
+memset(&kapic, 0, sizeof(kapic));
+kvm_apic_set_reg(&kapic, 0x2, s->id << 24);
+kvm_apic_set_reg(&kapic, 0x8, s->tpr);
+kvm_apic_set_reg(&kapic, 0xd, s->log_dest << 24);
+kvm_apic_set_reg(&kapic, 0xe, s->dest_mode << 28 | 0x0fff);
+kvm_apic_set_reg(&kapic, 0xf, s->spurious_vec);
+for (i = 0; i < 8; i++) {
+kvm_apic_set_reg(&kapic, 0x10 + i, s->isr[i]);
+kvm_apic_set_reg(&kapic, 0x18 + i, s->tmr[i]);
+kvm_apic_set_reg(&kapic, 0x20 + i, s->irr[i]);
+}
+kvm_apic_set_reg(&kapic, 0x28, s->esr);
+kvm_apic_set_reg(&kapic, 0x30, s->icr[0]);
+kvm_apic_set_reg(&kapic, 0x31, s->icr[1]);
+for (i = 0; i < APIC_LVT_NB; i++) {
+kvm_apic_set_reg(&kapic, 0x32 + i, s->lvt[i]);
+}
+kvm_apic_set_reg(&kapic, 0x38, s->initial_count);
+kvm_apic_set_reg(&kapic, 0x3e, s->divide_conf);
+
+return kvm_vcpu_ioctl(env, KVM_SET_LAPIC, &kapic);
+}
+
+return 0;
+}
+
+int kvm_get_apic(CPUState *env)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
+struct kvm_lapic_state kapic;
+int ret, i, v;
+
+if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
+ret = kvm_vcpu_ioctl(env, KVM_GET_LAPIC, &kapic);
+if (ret < 0) {
+return ret;
+}
+
+s->id = kvm_apic_get_reg(&kapic, 0x2) >> 24;
+s->tpr = kvm_apic_get_reg(&kapic, 0x8);
+s->arb_id = kvm_apic_get_reg(&kapic, 0x9);
+s->log_dest = kvm_apic_get_reg(&kapic, 0xd) >> 24;
+s->dest_mode = kvm_apic_get_reg(&kapic, 0xe) >> 28;
+s->spurious_vec = kvm_apic_get_reg(&kapic, 0xf);
+for (i = 0; i < 8; i++) {
+s->isr[i] = kvm_apic_get_reg(&kapic, 0x10 + i);
+s->tmr[i] = kvm_apic_get_reg(&kapic, 0x18 + i);
+s->irr[i] = kvm_apic_get_reg(&kapic, 0x20 + i);
+}
+s->esr = kvm_apic_get_reg(&kapic, 0x28);
+s->icr[0] = kvm_apic_get_reg(&kapic, 0x30);
+s->icr[1] = kvm_apic_get_reg(&kapic, 0x31);
+for (i = 0; i < APIC_LVT_NB; i++) {
+s->lvt[i] = kvm_apic_get_reg(&kapic, 0x32 + i);
+}
+s->initial_count = kvm_apic_get_reg(&kapic, 0x38);
+s->divide_conf = kvm_apic_get_reg(&kapic, 0x3e);
+
+v = (s->divide_conf & 3) | ((s->divide_conf >> 1) & 4);
+s->count_shift = (v + 1) & 7;
+
+s->initial_count_load_time = qemu_get_clock_ns(vm_clock);
+apic_next_timer(s, s->initial_count_load_time);
+}
+return 0;
+}
+
+static void kvm_apic_set_base(APICState *s, uint64_t val)
+{
+s->apicbase = val;
+}
+
+static void kvm_apic_set_tpr(APICState *s, uint8_t val)
+{
+s->tpr = (val & 0x0f) << 4;
+}
+
+static void do_inject_e

Re: [Qemu-devel] [PATCH 2/2] net: take ownership of fd in socket init functions

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 12:29 PM, Zhi Yong Wu  wrote:
> On Wed, Dec 7, 2011 at 11:01 PM, Stefan Hajnoczi
>  wrote:
>> Today net/socket.c has no consistent policy for closing the socket file
>> descriptor when initialization fails.  This means we leak the file
>> descriptor in some cases or we could also try to close it twice.
>>
>> Make error paths consistent by taking ownership of the file descriptor
>> and closing it on error.
>>
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>>  net/socket.c |   17 +
>>  1 files changed, 9 insertions(+), 8 deletions(-)
>>
>> diff --git a/net/socket.c b/net/socket.c
>> index 613a7ef..f999c26 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -266,14 +266,13 @@ static NetSocketState 
>> *net_socket_fd_init_dgram(VLANState *vlan,
>>             if (saddr.sin_addr.s_addr == 0) {
>>                 fprintf(stderr, "qemu: error: init_dgram: fd=%d unbound, "
>>                         "cannot setup multicast dst addr\n", fd);
>> -                return NULL;
>> +                goto err;
>>             }
>>             /* clone dgram socket */
>>             newfd = net_socket_mcast_create(&saddr, NULL);
>>             if (newfd < 0) {
>>                 /* error already reported by net_socket_mcast_create() */
>> -                close(fd);
>> -                return NULL;
>> +                goto err;
>>             }
>>             /* clone newfd to fd, close newfd */
>>             dup2(newfd, fd);
>> @@ -283,7 +282,7 @@ static NetSocketState 
>> *net_socket_fd_init_dgram(VLANState *vlan,
>>             fprintf(stderr,
>>                     "qemu: error: init_dgram: fd=%d failed getsockname(): 
>> %s\n",
>>                     fd, strerror(errno));
>> -            return NULL;
>> +            goto err;
>>         }
>>     }
>>
>> @@ -304,6 +303,10 @@ static NetSocketState 
>> *net_socket_fd_init_dgram(VLANState *vlan,
>>     if (is_connected) s->dgram_dst=saddr;
>>
>>     return s;
>> +
>> +err:
>> +    closesocket(fd);
>> +    return NULL;
>>  }
>>
>>  static void net_socket_connect(void *opaque)
>> @@ -353,6 +356,7 @@ static NetSocketState *net_socket_fd_init(VLANState 
>> *vlan,
>>         (socklen_t *)&optlen)< 0) {
>>         fprintf(stderr, "qemu: error: getsockopt(SO_TYPE) for fd=%d 
>> failed\n",
>>                 fd);
>> +        closesocket(fd);
>>         return NULL;
>>     }
>>     switch(so_type) {
>> @@ -386,9 +390,7 @@ static void net_socket_accept(void *opaque)
>>         }
>>     }
>>     s1 = net_socket_fd_init(s->vlan, s->model, s->name, fd, 1);
>> -    if (!s1) {
>> -        closesocket(fd);
>> -    } else {
> Why is it not handled when s1 is NULL?

The point of the patch is to introduce consistent error behavior -
net_socket_fd_init() will close the socket on error so we no longer
have to do that.  If you look at net_socket_accept() there is nothing
else to do on failure it was possible to just remove the if (!s1)
check.

Stefan

[Qemu-devel] [PATCH v4 01/15] msi: Generalize msix_supported to msi_supported

2011-12-08 Thread Jan Kiszka

Rename msix_supported to msi_supported and control MSI and MSI-X
activation this way. That was likely to original intention for this
flag, but MSI support came after MSI-X.

Signed-off-by: Jan Kiszka 
---
 hw/msi.c  |8 
 hw/msi.h  |2 ++
 hw/msix.c |9 -
 hw/msix.h |2 --
 hw/pc.c   |4 ++--
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index f214fcf..5d6ceb6 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -36,6 +36,9 @@
 
 #define PCI_MSI_VECTORS_MAX 32
 
+/* Flag for interrupt controller to declare MSI/MSI-X support */
+bool msi_supported;
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -116,6 +119,11 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
 uint16_t flags;
 uint8_t cap_size;
 int config_offset;
+
+if (!msi_supported) {
+return -ENOTSUP;
+}
+
 MSI_DEV_PRINTF(dev,
"init offset: 0x%"PRIx8" vector: %"PRId8
" 64bit %d mask %d\n",
diff --git a/hw/msi.h b/hw/msi.h
index 5766018..3040bb0 100644
--- a/hw/msi.h
+++ b/hw/msi.h
@@ -24,6 +24,8 @@
 #include "qemu-common.h"
 #include "pci.h"
 
+extern bool msi_supported;
+
 bool msi_enabled(const PCIDevice *dev);
 int msi_init(struct PCIDevice *dev, uint8_t offset,
  unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
diff --git a/hw/msix.c b/hw/msix.c
index 149eed2..107d4e5 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -12,6 +12,7 @@
  */
 
 #include "hw.h"
+#include "msi.h"
 #include "msix.h"
 #include "pci.h"
 #include "range.h"
@@ -32,9 +33,6 @@
 #define MSIX_MAX_ENTRIES 32
 
 
-/* Flag for interrupt controller to declare MSI-X support */
-int msix_supported;
-
 /* Add MSI-X capability to the config space for the device. */
 /* Given a bar and its size, add MSI-X table on top of it
  * and fill MSI-X capability in the config space.
@@ -235,10 +233,11 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
   unsigned bar_nr, unsigned bar_size)
 {
 int ret;
+
 /* Nothing to do if MSI is not supported by interrupt controller */
-if (!msix_supported)
+if (!msi_supported) {
 return -ENOTSUP;
-
+}
 if (nentries > MSIX_MAX_ENTRIES)
 return -EINVAL;
 
diff --git a/hw/msix.h b/hw/msix.h
index 7e04336..5aba22b 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -29,6 +29,4 @@ void msix_notify(PCIDevice *dev, unsigned vector);
 
 void msix_reset(PCIDevice *dev);
 
-extern int msix_supported;
-
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 7c4bfa8..240aaae 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -36,7 +36,7 @@
 #include "elf.h"
 #include "multiboot.h"
 #include "mc146818rtc.h"
-#include "msix.h"
+#include "msi.h"
 #include "sysbus.h"
 #include "sysemu.h"
 #include "blockdev.h"
@@ -896,7 +896,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id)
 apic_mapped = 1;
 }
 
-msix_supported = 1;
+msi_supported = true;
 
 return dev;
 }
-- 
1.7.3.4

Re: [Qemu-devel] qemu 1.0 : booting from scsi device (extboot removal)

2011-12-08 Thread Jan Kiszka

On 2011-12-08 13:41, Alexandre DERUMIER wrote:
> Hi,
> 
> I'm working on the proxmox 2.0 distribution (kvm distrib) and I'm trying to 
> boot from scsi device with qemu 1.0.
> 
> since this patch
> 
> Remove extboot support -- Linux KVM
> http://www.spinics.net/lists/kvm/msg61865.html
> 
> 
> it does'nt work anymore.
> 
> 
> I know I can use lsi rom with
> 
> -option-rom 8xx_64.rom  
> 
> 
> But isn't it possible to use another GPL compliant alternative ? 

It is, e.g. via native support in SeaBIOS. But someone has to implement
that. Or motivate a third party to do this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 0/3] MIPS64 user mode emulation in QEMU

2011-12-08 Thread Andreas Färber

Am 08.12.2011 06:25, schrieb kha...@kics.edu.pk:
> From: Khansa Butt 
> 
> This is the team work of Ehsan-ul-Haq, Abdul Qadeer, Abdul Waheed, Khansa Butt
> from HPCN Lab KICS UET Lahore.
> In previous patch set we were including Cavium specific instructions along 
> with 
> Cavium specifc registers in UME. Because of these register fields we had to 
> bump
> the cpu version up but I noticed that cpu_save() and cpu_load() are not 
> called in
> UME so we decided to postpone Octeon specific changes ( registers and 
> instructions)
> and will include them in our SME work( we are currently working on system 
> mode 
> emulation of Octeon board) so we closing the following thread
> http://lists.gnu.org/archive/html/qemu-devel/2011-10/msg02665.html
> Please review this new patch set which is without cavium instruction support.

I really hate to repeat myself... We receive lots of patches per day, so
please make your patches easy to read, easy to understand and easy to
review. If you check qemu-devel or the archives, you'll find many good
examples that get committed - most people get this right on second try.
Please don't ignore our comments from previous series:

You previously sent a v3 series, now there's two unversioned series
again. These should be v4 and v5 and should ALWAYS include a bulleted
change log since initial version. This is to tell apart which is the
latest series and which is superseded. v4 should include something like:

"Changes since v3" (or short "v3 -> v4"):
* Drop CPU load/save from series

v2 -> v3:
...

v1 -> v2:
...

If v5 changed nothing but the cover letter, there would be no need to
resend it, explanations for us can be added as a normal email reply.

If you're missing feedback, best just reply to your cover letter with
"Ping?" so that we have a chance of reviewing the email thread with
previous comments.

More in the individual patches.

Andreas

> 
>  configure |1 +
>  default-configs/mips64-linux-user.mak |1 +
>  linux-user/main.c |   21 ++-
>  linux-user/mips64/syscall.h   |2 +
>  linux-user/signal.c   |  429 
> -
>  target-mips/translate.c   |4 +
>  6 files changed, 444 insertions(+), 14 deletions(-)
>  create mode 100644 default-configs/mips64-linux-user.mak
>

Re: [Qemu-devel] [PATCH v3] network scripts: don't block SIGCHLD before forking

2011-12-08 Thread Jan Kiszka

On 2011-12-08 04:48, Michael Roth wrote:
> This patch fixes a bug where child processes of launch_script() can
> misbehave due to SIGCHLD being blocked. In the case of `sudo`, this
> causes a permanent hang.
> 
> Previously a SIGCHLD handler was added to reap fork_exec()'d zombie
> processes by calling waitpid(-1, ...). This required other
> fork()/waitpid() callers to temporarilly block SIGCHILD to avoid
> having the final wait status being intercepted by the SIGCHLD
> handler:
> 
> 7c3370d4fe3fa6cda8655f109e4659afc8ca4269
> 
> Since then, the qemu_add_child_watch() interface was added to allow
> registration of such processes and reap only from that specific set
> of PIDs:
> 
> 4d54ec7898bd951007cb6122d5315584bd41d0c4
> 
> As a result, we can now avoid blocking SIGCHLD in launch_script(), so
> drop that behavior.
> 
> Signed-off-by: Michael Roth 
> ---
>  net/tap.c |6 --
>  1 files changed, 0 insertions(+), 6 deletions(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 1f26dc9..6c27a94 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -346,15 +346,10 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
>  
>  static int launch_script(const char *setup_script, const char *ifname, int 
> fd)
>  {
> -sigset_t oldmask, mask;
>  int pid, status;
>  char *args[3];
>  char **parg;
>  
> -sigemptyset(&mask);
> -sigaddset(&mask, SIGCHLD);
> -sigprocmask(SIG_BLOCK, &mask, &oldmask);
> -
>  /* try to launch network script */
>  pid = fork();
>  if (pid == 0) {
> @@ -378,7 +373,6 @@ static int launch_script(const char *setup_script, const 
> char *ifname, int fd)
>  while (waitpid(pid, &status, 0) != pid) {
>  /* loop */
>  }
> -sigprocmask(SIG_SETMASK, &oldmask, NULL);
>  
>  if (WIFEXITED(status) && WEXITSTATUS(status) == 0) {
>  return 0;

Looks sane.

Reviewed-by: Jan Kiszka 

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] an error while loading checkpoint

2011-12-08 Thread sparsh mittal

I am using qemu as a part of Marss cycle-accurate simulator.
http://marss86.org/~marss86/index.php/Home

$ qemu/qemu-system-x86_64 -m 4G -snapshot -loadvm ffnn -vnc :3 -icount 0
-drive cache=unsafe,file=/local/sparsh/specimage_sftn3.qcow2 -simconfig
MyConfigs/ffnn_100M.cfg

Unknown savevm section type 32
qemu-system-x86_64: Error -22 while loading VM state
==
Another thing is that only in this checkpoint (ffnn), I get this error, and
not in any other checkpoint.
==
Here is the information about the image:

$ qemu/qemu-img info /local/sparsh/specimage_sftn3.qcow2
image: /local/sparsh/specimage_sftn3.qcow2
file format: qcow2
virtual size: 9.3G (100 bytes)
disk size: 30G
cluster_size: 65536
Snapshot list:
IDTAG VM SIZEDATE   VM CLOCK
1 sfff1  1.9G 2011-12-02 14:45:23   00:16:36.700
2 sfft1  2.4G 2011-12-02 15:13:21   00:16:12.117
3 stnn   2.5G 2011-12-02 15:42:51   00:16:46.837
4 ftnn   1.5G 2011-12-02 16:06:46   00:16:45.460
6 sftt1  2.3G 2011-12-02 17:06:53   00:16:13.254
7 ffnn   1.0G 2011-12-02 17:30:37   00:16:20.771
8 snnn   2.4G 2011-12-02 17:50:50   00:16:10.128
9 sfft2  1.6G 2011-12-02 18:45:32   00:16:11.053
11sftt2  2.3G 2011-12-02 20:04:26   00:16:15.169
12sffn   1.8G 2011-12-06 10:46:26   00:16:31.435
13ttff   2.0G 2011-12-06 11:05:40   00:16:05.729

I also have question whether -icount 0 or -icount auto can make a
simulation more deterministic and whether qemu is affected by the load on
the host (i.e. how many processes are running?)
Sparsh


On Thu, Dec 8, 2011 at 3:53 AM, Stefan Hajnoczi  wrote:

> On Wed, Dec 7, 2011 at 11:41 PM, sparsh mittal 
> wrote:
> > I get this warning while using a checkpoint. I issued the command -m 4G
> > -icount 0 and other parameters.
> > Unknown savevm section type 32
> > qemu-system-x86_64: Error -22 while loading VM state
> >
> > My computer has 48GB RAM, so I don't know what is the problem?
>
> Looks like the stream has gotten out of sync.
>
> Please post the full QEMU command-line as well as any monitor commands
> you issued so it is possible to reproduce this.
>
> Stefan
>

Re: [Qemu-devel] Error while booting

2011-12-08 Thread sparsh mittal

On Thu, Dec 8, 2011 at 6:20 AM, Kevin Wolf  wrote:

> Am 07.12.2011 17:26, schrieb sparsh mittal:
> > I use version 0.15 of qemu with marss cycle-accurate simulator.
> >
> > After making checkpoints in an image, I get this error, while trying to
> > boot it (i.e. not using loadvm to run checkpoint, but just starting the
> > image).
> >
> > This kernel requires an x86-64 CPU, but only detected an i686 CPU.
> > Unable to boot - please use a kernel appropriate for your CPU
>
> Try using 'qemu-system-x86_64' instead of 'qemu'.
>
> Kevin
>
Thanks. But actually I issued this command itself:
qemu/qemu-system-x86_64 -m 4G -snapshot  -icount 0 -drive
cache=unsafe,file=/home/sparsh/specimage_sftn3.qcow2

I also wanted to tell that, on the same image, I get another error (while
loading one checkpoint): I have sent mail about this to qemu-forum and a
user is helping me, so I just wanted to mention thinking that they may be
related.
Unknown savevm section type 32
qemu-system-x86_64: Error -22 while loading VM state

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011-12-08 Thread Stefan Hajnoczi

2011/12/5 Lluís Vilanova :
> Provides the ability for the guest to communicate with user-provided code 
> inside
> QEMU itself, using a lightweight mechanism.
>
> See first commit for a full description.
>
> Signed-off-by: Lluís Vilanova 
> ---

First off, thank you for your continued efforts - including the
general tracing improvements you have contributed in the past.  I
think many people use QEMU as a tool for research and simulation, but
few end up working upstream.  So thanks for polishing up your work and
submitting it upstream.

Thoughts on the pieces you have described:

> * Support for tracing guest code through TCG.

I'm not clear on whether a backdoor mechanism is needed or not.  A
backdoor mechanism allows a modified guest to participate in tracing.

Depending on the type of analysis you are doing it might be possible
to achieve the same thing by emitting calls to custom instrumentation
in TCG (or is this too low-level for what you're trying to trace in
the guest?).  The advantage is also that you can do this to unmodified
guests.

> * Support for different sets of enabled guest tracing events on a per-vCPU 
> basis.

This feature is very specific but potentially useful.  I haven't
reviewed patches yet but I can imagine that this would be merged.

> * Support for instrumenting tracing events (including those for guest code).

Although I typically use post-processing rather than filtering, I can
see your point about the power and flexibility in being able to react
on the spot to events inside QEMU.  Given the ad-hoc nature of the
instrumentation, as well as that this is really in the guts of QEMU
and TCG, I think we should make it easy to write instrumentation but
keep the infrastructure and APIs minimal.

Is there even a need to add "instrumentation" support on top of QEMU
tracing?  As an instrumentation implementor you are working with the
QEMU source tree and need to be somewhat familiar with the internals.
Why not just add custom functions into QEMU and call them from
relevant points?

The documentation and tracetool extensions you posted provide some
structure but IMO they don't cut down the workflow significantly over
adding plain old functions in places of interest.  i.e. by the time I
have read the docs and begun trying to add instrumentation I could
have already added a custom function in QEMU and built a binary.  Plus
it's harder to understand/debug instrumentation code if it sits on top
of the tracing instrumentation - yet another layer to understand.

So I'm suggesting that we don't *need* explicit support for
instrumenting trace events.  Instead add plain old functions where you
need them.  You may decide to invoke trace events from your
instrumentation function, but that's already covered today by
docs/tracing.txt.

Am I missing the point of instrumentation tracing events or do you
agree that we can work well without it?

Stefan

Re: [Qemu-devel] an error while loading checkpoint

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 1:57 PM, sparsh mittal  wrote:
> I am using qemu as a part of Marss cycle-accurate simulator.
> http://marss86.org/~marss86/index.php/Home
>
> $ qemu/qemu-system-x86_64 -m 4G -snapshot -loadvm ffnn -vnc :3 -icount 0
> -drive cache=unsafe,file=/local/sparsh/specimage_sftn3.qcow2 -simconfig
> MyConfigs/ffnn_100M.cfg

I wonder if the image loads in vanilla QEMU.  Try dropping the -icount
0 and -simconfig options just to see whether vanilla QEMU gets the
same error or not.

Also, have you asked on the Marss mailing list?  Please try that first
because the bug may be due to changes that were made to vanilla QEMU.
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Stefan

Re: [Qemu-devel] [PATCH] use pci macro in virtio

2011-12-08 Thread Jan Kiszka

On 2011-12-08 11:17, Stefan Hajnoczi wrote:
> On Thu, Dec 8, 2011 at 5:49 AM, hkran  wrote:
>>
>> Signed-off-by: hkran 
> 
> Anthony asked that you put your real name in the Signed-off-by.  You
> don't need to resend the patch, please just reply to this mail with:
> Signed-off-by: $YOUR_NAME 

Resending would provide a chance to fix checkpatch.pl complaints. :)

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support

2011-12-08 Thread Mark Wu

I tried to optimize the zero detecting code with SSE instruction.   The 
idea comes from Paolo's patch "migration: vectorize is_dup_page".  It's 
expected to give us an noticeable improvement. But I didn't find any 
improvement in the qemu-io test even though I increased the image size 
to 5GB.  The following is my test patch.  Could you please review it to 
see if I made any mistake and SSE can help for zero detecting?


Thanks.


diff --git a/block/qed.c b/block/qed.c
index 75a44f3..61e4a27 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -998,6 +998,14 @@ static void qed_aio_write_l2_update_cb(void 
*opaque, int ret)

 qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
 }

+#ifdef __SSE2__
+#include 
+#define VECTYPE__m128i
+#define SPLAT(p)   _mm_set1_epi8(*(p))
+#define ALL_EQ(v1, v2) (_mm_movemask_epi8(_mm_cmpeq_epi8(v1, v2)) == 
0x)

+#define VECTYPE_ZERO   _mm_setzero_si128()
+#endif
+
 /**
  * Determine if we have a zero write to a block of clusters
  *
@@ -1027,6 +1035,19 @@ static bool qed_is_zero_write(QEDAIOCB *acb)
 }

 v = iov->iov_base;
+
+#ifdef __SSE2__
+   if ((iov->iov_len & 0x0f)) {
+VECTYPE zero = VECTYPE_ZERO;
+VECTYPE *p = (VECTYPE *)v;
+for(j = 0; j < iov->iov_len / sizeof(VECTYPE); j++) {
+ if (!ALL_EQ(p[j], zero)) {
+return false;
+ }
+}
+continue;
+}
+#endif
 for (j = 0; j < iov->iov_len; j += sizeof(v[0])) {
 if (v[j >> 3]) {
 return false;

Re: [Qemu-devel] an error while loading checkpoint

2011-12-08 Thread sparsh mittal

On Thu, Dec 8, 2011 at 8:18 AM, Stefan Hajnoczi  wrote:

> On Thu, Dec 8, 2011 at 1:57 PM, sparsh mittal 
> wrote:
> > I am using qemu as a part of Marss cycle-accurate simulator.
> > http://marss86.org/~marss86/index.php/Home
> >
> > $ qemu/qemu-system-x86_64 -m 4G -snapshot -loadvm ffnn -vnc :3 -icount 0
> > -drive cache=unsafe,file=/local/sparsh/specimage_sftn3.qcow2 -simconfig
> > MyConfigs/ffnn_100M.cfg
>
> I wonder if the image loads in vanilla QEMU.  Try dropping the -icount
> 0 and -simconfig options just to see whether vanilla QEMU gets the
> same error or not.
>
> I have tried dropping all these: snapshot, icount, vnc, cache=unsafe, but
still the error comes. I deleted this checkpoint and made another one (with
different name, but same benchmarks), still error comes.
I will also ask marss-forum.
Surprisingly, with other checkpoints, error does not come.


> Also, have you asked on the Marss mailing list?  Please try that first
> because the bug may be due to changes that were made to vanilla QEMU.
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>
> Stefan
>

Re: [Qemu-devel] Insane virtio-serial semantics

2011-12-08 Thread Anthony Liguori


On 12/08/2011 04:11 AM, Markus Armbruster wrote:

Anthony Liguori  writes:

And yes, I can't help but think of Dave Millers comments long ago that
any PV transport is eventually going to reinvent TCP, poorly.


No doubt then, no doubt now.  But if I remember correctly, we didn't
create virtio-serial because we thought we could do better than TCP/IP.
We thought we need a zero-config communication channel that doesn't
interfere in any way with the guest's networking.  Since the network
folks were unwilling to give us one ("use TCP already"), we looked for
another bare metal thing to imitate, and chose serial lines.

Now, comparing serial lines to TCP/IP makes no sense.  Different layers.


virtio-serial is not a serial line.  It attempts to have connection semantics 
(like a socket) which is what the fundamental problem is.


It probably makes sense to have a virtio-serial2 that is exposed to the guest as 
a tty device and stick strictly to serial semantics.



Layering a real network protocol on top of serial line is possible; SLIP
exists.  But as long as we insist on "don't interfere in any way with
the guest's networking", we can't use TCP, and thus are doomed to
reinvent it, poorly.


I think we just got too clever.

Regards,

Anthony Liguori

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-08 Thread Corey Bryant




On 12/08/2011 04:47 AM, Stefan Hajnoczi wrote:

On Wed, Dec 7, 2011 at 7:32 PM, Corey Bryant  wrote:



On 12/07/2011 01:48 PM, Anthony Liguori wrote:


On 12/07/2011 12:25 PM, Corey Bryant wrote:

* The trusted helper thread would run beside the untrusted thread,
enabling the untrusted thread to make syscalls beyond read(),
write(), exit(), and sigreturn().



I assume you mean process, not thread BTW?



I do mean thread.  When making calls on behalf of the seccomp'd thread, I
think there will be syscalls that must be called from the same address
space.  That's where the the trusted helper thread would come into play.


It's worth pointing out that "isolation within the same process"
schemes work by running the trusted thread in a very special execution
environment.  It cannot trust memory and cannot use the stack for
control flow.  Everything must be done in registers.

This can be made to work but it's highly unportable across host
architectures and hard to make changes to the trusted helper because
you have to be so careful.

Stefan



That's a good point.  And maybe we would only need the trusted thread 
for a minimal number of syscalls that must be made from the same address 
space, like mmap.  I think another approach to safely making a call on 
behalf of an untrusted thread is to pass the call and parameters to a 
trusted process which sanitizes the parameters, writes them to memory 
shared with the trusted thread (read-only from the thread side), and the 
trusted thread can make the call.


--
Regards,
Corey

Re: [Qemu-devel] an error while loading checkpoint

2011-12-08 Thread sparsh mittal

I also have question whether -icount 0 or -icount auto can make a
simulation more deterministic and whether qemu is affected by the load on
the host (i.e. how many processes are running?) I would be grateful for a
reply.
Thanks and Regards
Sparsh Mittal




On Thu, Dec 8, 2011 at 8:36 AM, sparsh mittal wrote:

>
>
>
>
> On Thu, Dec 8, 2011 at 8:18 AM, Stefan Hajnoczi wrote:
>
>> On Thu, Dec 8, 2011 at 1:57 PM, sparsh mittal 
>> wrote:
>> > I am using qemu as a part of Marss cycle-accurate simulator.
>> > http://marss86.org/~marss86/index.php/Home
>> >
>> > $ qemu/qemu-system-x86_64 -m 4G -snapshot -loadvm ffnn -vnc :3 -icount 0
>> > -drive cache=unsafe,file=/local/sparsh/specimage_sftn3.qcow2 -simconfig
>> > MyConfigs/ffnn_100M.cfg
>>
>> I wonder if the image loads in vanilla QEMU.  Try dropping the -icount
>> 0 and -simconfig options just to see whether vanilla QEMU gets the
>> same error or not.
>>
>> I have tried dropping all these: snapshot, icount, vnc, cache=unsafe, but
> still the error comes. I deleted this checkpoint and made another one (with
> different name, but same benchmarks), still error comes.
> I will also ask marss-forum.
> Surprisingly, with other checkpoints, error does not come.
>
>
>> Also, have you asked on the Marss mailing list?  Please try that first
>> because the bug may be due to changes that were made to vanilla QEMU.
>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>
>> Stefan
>>
>
>

Re: [Qemu-devel] [libvirt] virDomainBlockJobAbort and block_job_cancel

2011-12-08 Thread Stefan Hajnoczi

On Wed, Dec 7, 2011 at 11:01 PM, Eric Blake  wrote:
> On 12/07/2011 03:35 PM, Adam Litke wrote:
> 4) Ask Stefan to make the HMP monitor command synchronous, but only
> expose the JSON command as asynchronous.  After all, it is only HMP
> where we can't wait for an event.

QEMU cannot do async commands, even for HMP.  My QEMU patches used to
work because there is a broken async flag for monitor commands.  I
didn't know it was broken at the time O:-).

I have CCed Luiz who has been working on moving commands off the old
MONITOR_CMD_ASYNC flag.  My current understanding is that patches
adding use of MONITOR_CMD_ASYNC will not be accepted.

>> 2) Poll the qemu monitor
>> To do it this way, I would write a function that repeatedly calls
>> virDomainGetBlockJobInfo() against the disk in question.  Once the job
>> disappears from the list I can return with confidence that the job is gone.
>> This is obviously sub-optimal because I need to poll and sleep.
>
> We've done this before, for both HMP and JSON - see
> qemuMigrationWaitForCompletion.  I agree that an event is nicer than
> polling, but we may be locked into this.

This seems like the safest option although it's ugly.

Stefan

Re: [Qemu-devel] [libvirt] virDomainBlockJobAbort and block_job_cancel

2011-12-08 Thread Adam Litke

On Wed, Dec 07, 2011 at 04:01:58PM -0700, Eric Blake wrote:
> On 12/07/2011 03:35 PM, Adam Litke wrote:
> > Stefan's qemu tree has a block_job_cancel command that always acts
> > asynchronously.  In order to provide the synchronous behavior in libvirt 
> > (when
> > flags is 0), I need to wait for the block job to go away.  I see two 
> > options:
> > 
> > 1) Use the event:
> > To implement this I would create an internal event callback function and
> > register it to receive the block job events.  When the cancel event comes 
> > in, I
> > can exit the qemu job context.  One problem I see with this is that events 
> > are
> > only available when using the json monitor mode.
> 
> I like this idea.  We have internally handled events before, and limited
> it to just JSON if that made life easier: for example, virDomainReboot
> on qemu is rejected if you only have the HMP monitor, and if you have
> the JSON monitor, the implementation internally handles the event to
> change the domain state.
> 
> Can we reliably detect whether qemu is new enough to provide the event,
> and if qemu was older and did not provide the event, do we reliably know
> that abort was blocking in that case?

I think we can say that qemu will operate in one of two modes:
a) Blocking abort AND event is not emitted
b) Non-blocking abort AND event is emitted

The difficulty is in detecting which case the current qemu supports.  I don't
believe there is a way to query qemu for a list of currently-supported events.
Therefore, we'd have to use version numbers.  If we do this, how do we avoid
breaking users of qemu git versions that fall between official qemu releases?

> It's okay to make things work that used to fail, but it is a regression
> to make blocking job cancel fail where it used to work, so rejecting
> blocking job cancel with HMP monitor is not a good idea.  If we can
> guarantee that all qemu new enough to have async cancel also support the
> event, while older qemu was always blocking, and that we can always use
> the JSON monitor to newer qemu, then we're set - merely ensure that we
> use only the JSON monitor and the event.  But if we can't make the
> guarantees, and insist on supporting newer qemu via HMP monitor, then we
> may need a hybrid combination of options 1 and 2, or for less code
> maintenance, just option 2.

Is there a deprecation plan for HMP with newer qemu versions?  I really hate the
idea of needing two implementations for this: one polling and one event-based.

> > 2) Poll the qemu monitor
> > To do it this way, I would write a function that repeatedly calls
> > virDomainGetBlockJobInfo() against the disk in question.  Once the job
> > disappears from the list I can return with confidence that the job is gone.
> > This is obviously sub-optimal because I need to poll and sleep.
> 
> We've done this before, for both HMP and JSON - see
> qemuMigrationWaitForCompletion.  I agree that an event is nicer than
> polling, but we may be locked into this.
> 
> > 
> > 3) Ask Stefan to provide a synchronous mode for the qemu monitor command
> > This one is the nicest from a libvirt perspective, but I doubt qemu wants 
> > to add
> > such an interface given that it basically has broken semantics as far as 
> > qemu is
> > concerned.
> 
> Or even:
> 
> 4) Ask Stefan to make the HMP monitor command synchronous, but only
> expose the JSON command as asynchronous.  After all, it is only HMP
> where we can't wait for an event.

Stefan, how 'bout it?

> > 
> > If this is all too nasty, we could probably just change the behavior of
> > blockJobAbort and make it always synchronous with a 'cancelled' event.
> 
> No, we can't change the behavior without breaking back-compat of
> existing clients of job cancellation.

-- 
Adam Litke 
IBM Linux Technology Center

Re: [Qemu-devel] [PATCH 1/3] linux-user:Support for MIPS64 user mode emulation in QEMU

2011-12-08 Thread Andreas Färber

This is about QEMU and linux-user is the user mode emulation, so please
change the subject to "linux-user: Add support for MIPS64" (note the
space that I reminded you of earlier, it looks weird without on Western
left-to-right screens).

Am 08.12.2011 06:25, schrieb kha...@kics.edu.pk:
> From: Khansa Butt 
> 

As requested earlier, since this is a non-trivial change, please include
a summary here of what the patch does below. Should mention that people
can use it via "mips64-linux-user" and should describe syscall differences.

> 
> Signed-off-by: Khansa Butt 
> ---
>  configure |1 +
>  default-configs/mips64-linux-user.mak |1 +
>  linux-user/main.c |   21 +++--
>  linux-user/mips64/syscall.h   |2 ++
>  4 files changed, 23 insertions(+), 2 deletions(-)
>  create mode 100644 default-configs/mips64-linux-user.mak
> 
> diff --git a/configure b/configure
> index ac4840d..e31229b 100755
> --- a/configure
> +++ b/configure
> @@ -914,6 +914,7 @@ m68k-linux-user \
>  microblaze-linux-user \
>  microblazeel-linux-user \
>  mips-linux-user \
> +mips64-linux-user \
>  mipsel-linux-user \

I would suggest to move your addition one line down, so that mips and
mipsel stay together.

For linux-user IIUC the ABI is relevant, so shouldn't this be
mipsn64-linux-user? We have a patch for mipsn32/mipsn32el. What about
mipsn64el?

>  ppc-linux-user \
>  ppc64-linux-user \
> diff --git a/default-configs/mips64-linux-user.mak 
> b/default-configs/mips64-linux-user.mak
> new file mode 100644
> index 000..1598bfc
> --- /dev/null
> +++ b/default-configs/mips64-linux-user.mak
> @@ -0,0 +1 @@
> +# Default configuration for mips64-linux-user

> diff --git a/linux-user/main.c b/linux-user/main.c
> index d1bbc57..17a74cd 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -2157,7 +2157,8 @@ static int do_store_exclusive(CPUMIPSState *env)
>  void cpu_loop(CPUMIPSState *env)
>  {
>  target_siginfo_t info;
> -int trapnr, ret;
> +int trapnr;
> +abi_long ret;
>  unsigned int syscall_num;
>  
>  for(;;) {
> @@ -2166,8 +2167,23 @@ void cpu_loop(CPUMIPSState *env)
>  cpu_exec_end(env);
>  switch(trapnr) {
>  case EXCP_SYSCALL:
> -syscall_num = env->active_tc.gpr[2] - 4000;
>  env->active_tc.PC += 4;
> +#if defined(TARGET_MIPS64)

TARGET_ABI_MIPSN64?

> +syscall_num = env->active_tc.gpr[2] - 5000;
> +/* MIPS64 has eight argument registers so there is
> + * no need to get arguments from stack
> + */
> +ret = do_syscall(env, env->active_tc.gpr[2],
> + env->active_tc.gpr[4],
> + env->active_tc.gpr[5],
> + env->active_tc.gpr[6],
> + env->active_tc.gpr[7],
> + env->active_tc.gpr[8],
> + env->active_tc.gpr[9],
> + env->active_tc.gpr[10],
> + env->active_tc.gpr[11]);
> +#else
> +syscall_num = env->active_tc.gpr[2] - 4000;
>  if (syscall_num >= sizeof(mips_syscall_args)) {
>  ret = -TARGET_ENOSYS;
>  } else {
> @@ -2205,6 +2221,7 @@ void cpu_loop(CPUMIPSState *env)
>   env->active_tc.gpr[7],
>   arg5, arg6, arg7, arg8);
>  }
> +#endif
>  done_syscall:
>  if (ret == -TARGET_QEMU_ESIGRETURN) {
>  /* Returning from a successful sigreturn syscall.

> diff --git a/linux-user/mips64/syscall.h b/linux-user/mips64/syscall.h
> index 668a2b9..96f03da 100644
> --- a/linux-user/mips64/syscall.h
> +++ b/linux-user/mips64/syscall.h
> @@ -218,4 +218,6 @@ struct target_pt_regs {
>  
>  
>  
> +#define TARGET_QEMU_ESIGRETURN 255
> +
>  #define UNAME_MACHINE "mips64"

Andreas

Re: [Qemu-devel] [libvirt] virDomainBlockJobAbort and block_job_cancel

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 2:55 PM, Adam Litke  wrote:
> On Wed, Dec 07, 2011 at 04:01:58PM -0700, Eric Blake wrote:
>> On 12/07/2011 03:35 PM, Adam Litke wrote:
>> > Stefan's qemu tree has a block_job_cancel command that always acts
>> > asynchronously.  In order to provide the synchronous behavior in libvirt 
>> > (when
>> > flags is 0), I need to wait for the block job to go away.  I see two 
>> > options:
>> >
>> > 1) Use the event:
>> > To implement this I would create an internal event callback function and
>> > register it to receive the block job events.  When the cancel event comes 
>> > in, I
>> > can exit the qemu job context.  One problem I see with this is that events 
>> > are
>> > only available when using the json monitor mode.
>>
>> I like this idea.  We have internally handled events before, and limited
>> it to just JSON if that made life easier: for example, virDomainReboot
>> on qemu is rejected if you only have the HMP monitor, and if you have
>> the JSON monitor, the implementation internally handles the event to
>> change the domain state.
>>
>> Can we reliably detect whether qemu is new enough to provide the event,
>> and if qemu was older and did not provide the event, do we reliably know
>> that abort was blocking in that case?
>
> I think we can say that qemu will operate in one of two modes:
> a) Blocking abort AND event is not emitted
> b) Non-blocking abort AND event is emitted
>
> The difficulty is in detecting which case the current qemu supports.  I don't
> believe there is a way to query qemu for a list of currently-supported events.
> Therefore, we'd have to use version numbers.  If we do this, how do we avoid
> breaking users of qemu git versions that fall between official qemu releases?

I agree.  Checking version numbers is always problematic - distros may
backport features.

This isn't pretty but how about:

1. Issue block_job_cancel
2. Issue query-block-jobs and check if the job is there.
3. Check for QMP events (if applicable).
4. If the block job was visible then we must be async and need to
expect an event if we didn't already get one.
5. If the block job was not visible then it has been stopped.

Stefan

Re: [Qemu-devel] [PATCH v2 07/18] qom: add child properties (composition) (v2)

2011-12-08 Thread Kevin Wolf

Am 02.12.2011 21:20, schrieb Anthony Liguori:
> Child properties express a relationship of composition.
> 
> Signed-off-by: Anthony Liguori 
> ---
> v1 -> v2
>  - fix comments (Kevin)
>  - add a reference when adding a child property (Kevin)
> ---
>  hw/qdev.c |   26 ++
>  hw/qdev.h |   20 
>  2 files changed, 46 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/qdev.c b/hw/qdev.c
> index 2519f00..fa6b489 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
> @@ -1174,6 +1174,32 @@ DeviceState *qdev_get_root(void)
>  return qdev_root;
>  }
>  
> +static void qdev_get_child_property(DeviceState *dev, Visitor *v, void 
> *opaque,
> +const char *name, Error **errp)
> +{
> +DeviceState *child = opaque;
> +gchar *path;
> +
> +path = qdev_get_canonical_path(child);
> +visit_type_str(v, &path, name, errp);
> +g_free(path);
> +}
> +
> +void qdev_property_add_child(DeviceState *dev, const char *name,
> + DeviceState *child, Error **errp)
> +{
> +gchar *type;
> +
> +type = g_strdup_printf("child<%s>", child->info->name);
> +
> +qdev_property_add(dev, name, type, qdev_get_child_property,
> +  NULL, NULL, child, errp);
> +
> +qdev_ref(dev);

Shouldn't you increase the refcount for child rather than dev?

Kevin

Re: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support

2011-12-08 Thread Stefan Hajnoczi

On Thu, Dec 8, 2011 at 2:29 PM, Mark Wu  wrote:
> I tried to optimize the zero detecting code with SSE instruction.   The idea
> comes from Paolo's patch "migration: vectorize is_dup_page".  It's expected
> to give us an noticeable improvement. But I didn't find any improvement in
> the qemu-io test even though I increased the image size to 5GB.  The
> following is my test patch.  Could you please review it to see if I made any
> mistake and SSE can help for zero detecting?

Please put the zero detection function in a common location before
adding serious optimization so that qemu-img.c:is_not_zero() can also
use it.

Out of interest here is the code generated by gcc 4.6.2 from the non-SSE code:

1d50:   89 c2   mov%eax,%edx
1d52:   c1 fa 03sar$0x3,%edx
1d55:   48 63 d2movslq %edx,%rdx
1d58:   48 83 3c d6 00  cmpq   $0x0,(%rsi,%rdx,8)
1d5d:   0f 85 03 ff ff ff   jne1c66 
1d63:   83 c0 08add$0x8,%eax
1d66:   48 63 d0movslq %eax,%rdx
1d69:   48 39 d1cmp%rdx,%rcx
1d6c:   77 e2   ja 1d50 

Once you have the zero detection code in a utility function it's easy
to write a small test program to run a performance benchmark.

Stefan

Re: [Qemu-devel] Still build failure for qemu-1.0

2011-12-08 Thread Erik Rull


Aneesh Kumar K.V wrote:

On Fri, 02 Dec 2011 14:22:03 +0100, erik.r...@rdsoftware.de wrote:

Hi all,

there was the promise to test the build failure from rc4 with the released
version and that it should work.

Its still present :-( please assist me here:

   CClibhw64/9pfs/coxattr.o
   CClibhw64/9pfs/virtio-9p-handle.o
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function
'handle_update_file_cred':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:95: warning: implicit
declaration of function 'openat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:95: warning: nested extern
declaration of 'openat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:103: warning: implicit
declaration of function 'fchownat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:103: warning: nested extern
declaration of 'fchownat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_lstat':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:120: warning: implicit
declaration of function 'fstatat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:120: warning: nested extern
declaration of 'fstatat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_readlink':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:135: warning: implicit
declaration of function 'readlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:135: warning: nested extern
declaration of 'readlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_opendir':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:167: warning: implicit
declaration of function 'fdopendir'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:167: warning: nested extern
declaration of 'fdopendir'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:167: warning: assignment
makes pointer from integer without a cast
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_mknod':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:265: warning: implicit
declaration of function 'mknodat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:265: warning: nested extern
declaration of 'mknodat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_mkdir':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:283: warning: implicit
declaration of function 'mkdirat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:283: warning: nested extern
declaration of 'mkdirat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_symlink':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:333: warning: implicit
declaration of function 'symlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:333: warning: nested extern
declaration of 'symlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_link':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:363: warning: implicit
declaration of function 'linkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:363: warning: nested extern
declaration of 'linkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_renameat':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:570: warning: implicit
declaration of function 'renameat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:570: warning: nested extern
declaration of 'renameat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function 'handle_unlinkat':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:593: warning: implicit
declaration of function 'unlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:593: warning: nested extern
declaration of 'unlinkat'
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c: In function
'handle_ioc_getversion':
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:616: error:
'FS_IOC_GETVERSION' undeclared (first use in this function)
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:616: error: (Each undeclared
identifier is reported only once
/home/erik/qemu-1.0/hw/9pfs/virtio-9p-handle.c:616: error: for each
function it appears in.)
make[2]: *** [9pfs/virtio-9p-handle.o] Error 1
make[1]: *** [subdir-libhw64] Error 2
make[1]: Leaving directory `/home/erik/qemu-1.0'


Can you try the below patch ?

 From 08397638951fd3c131a6145c849f0a218aa52347 Mon Sep 17 00:00:00 2001
From: "Aneesh Kumar K.V"
Date: Tue, 29 Nov 2011 15:10:52 +0530
Subject: [PATCH] hw/9pfs: Improve portability to older systems

handle fs driver require a set of newly added syscalls. Don't
Compile handle FS driver if those syscalls are not available.
Instead of adding #ifdef for all those syscalls we check for
open by handle syscall. If that is available then rest of the
syscalls used by the driver should be available.

Signed-off-by: Aneesh Kumar K.V
---
  Makefile.objs  |4 ++--
  fsdev/qemu-fsdev.c |2 ++
  hw/9pfs/virtio-9p-handle.c |   33 -
  3 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index d7a6539..3a699ee 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -310,8 +310,8 @@ hw-obj-$(CONFIG_SOUND) += $(sound-obj-y)
  9pfs-nested-$(CONFIG_VIRTFS)

Re: [Qemu-devel] [PATCH v2 09/18] qapi: allow a 'gen' key to suppress code generation

2011-12-08 Thread Kevin Wolf

Am 02.12.2011 21:20, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
>  scripts/qapi-commands.py |1 +
>  scripts/qapi-types.py|1 +
>  2 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
> index f7def16..54d1f5d 100644
> --- a/scripts/qapi-commands.py
> +++ b/scripts/qapi-commands.py
> @@ -405,6 +405,7 @@ except os.error, e:
>  
>  exprs = parse_schema(sys.stdin)
>  commands = filter(lambda expr: expr.has_key('command'), exprs)
> +commands = filter(lambda expr: not expr.has_key('gen'), commands)

Might be confusing if someone uses something like gen = True. Shouldn't
we better check the value instead of just existence?

Kevin

Re: [Qemu-devel] [PATCH v2] trace: Provide a per-event status define for conditional compilation

2011-12-08 Thread Stefan Hajnoczi

2011/12/6 Lluís Vilanova :
> Adds a 'TRACE_${NAME}_ENABLED' preprocessor define for each tracing event in
> "trace.h".

I think we should take it a step further: support
TRACE_${NAME}_ENABLED at run-time.  This means skipping the trace
event code if the event is currently disabled.  If the user enables it
at runtime then the if (TRACE_${NAME}_ENABLED) will evaluate to true
and we execute the code.

SystemTap has support for this and it would be easy to do runtime
support for stderr and simple (where we can test
trace_list[event_id].state).

The SystemTap mechanism is a macro, e.g. QEMU_${NAME}_ENABLED(), which
fits perfectly with what you have posted.

What do you think?

(The difference is that your patch plus compiler dead-code elimination
would completely remove the code when built without a trace event.
What I'm suggesting becomes a test and branch to skip over the code
which always gets compiled in.)

Stefan

Re: [Qemu-devel] [PATCH v2 13/18] dev: add an anonymous peripheral container

2011-12-08 Thread Kevin Wolf

Am 02.12.2011 21:20, schrieb Anthony Liguori:
> Signed-off-by: Anthony Liguori 
> ---
>  hw/qdev.c |   21 -
>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/qdev.c b/hw/qdev.c
> index af4c6a2..5348f26 100644
> --- a/hw/qdev.c
> +++ b/hw/qdev.c
> @@ -240,6 +240,19 @@ static DeviceState *qdev_get_peripheral(void)
>  return dev;
>  }
>  
> +static DeviceState *qdev_get_peripheral_anon(void)
> +{
> +static DeviceState *dev;
> +
> +if (dev == NULL) {
> +dev = qdev_create(NULL, "container");
> +qdev_property_add_child(qdev_get_root(), "peripheral-anon", dev, 
> NULL);
> +qdev_init_nofail(dev);
> +}
> +
> +return dev;
> +}
> +
>  DeviceState *qdev_device_add(QemuOpts *opts)
>  {
>  const char *driver, *path, *id;
> @@ -292,7 +305,13 @@ DeviceState *qdev_device_add(QemuOpts *opts)
>  if (id) {
>  qdev->id = id;
>  qdev_property_add_child(qdev_get_peripheral(), qdev->id, qdev, NULL);
> -}
> +} else {
> +static int anon_count;
> +gchar *name = g_strdup_printf("device[%d]", anon_count++);

Does any code depend on this name? If not, I would suggest making it a
bit more convenient for users: g_strdump_printf("%s[%d]", info->name,
info->anon_count++)

Kevin

Re: [Qemu-devel] [PATCH v2 13/18] dev: add an anonymous peripheral container

2011-12-08 Thread Anthony Liguori


On 12/08/2011 10:27 AM, Kevin Wolf wrote:

Am 02.12.2011 21:20, schrieb Anthony Liguori:

Signed-off-by: Anthony Liguori
---
  hw/qdev.c |   21 -
  1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index af4c6a2..5348f26 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -240,6 +240,19 @@ static DeviceState *qdev_get_peripheral(void)
  return dev;
  }

+static DeviceState *qdev_get_peripheral_anon(void)
+{
+static DeviceState *dev;
+
+if (dev == NULL) {
+dev = qdev_create(NULL, "container");
+qdev_property_add_child(qdev_get_root(), "peripheral-anon", dev, NULL);
+qdev_init_nofail(dev);
+}
+
+return dev;
+}
+
  DeviceState *qdev_device_add(QemuOpts *opts)
  {
  const char *driver, *path, *id;
@@ -292,7 +305,13 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  if (id) {
  qdev->id = id;
  qdev_property_add_child(qdev_get_peripheral(), qdev->id, qdev, NULL);
-}
+} else {
+static int anon_count;
+gchar *name = g_strdup_printf("device[%d]", anon_count++);


Does any code depend on this name? If not, I would suggest making it a
bit more convenient for users: g_strdump_printf("%s[%d]", info->name,
info->anon_count++)


Nothing does and that's a great suggestion, thanks!

Regards,

Anthony Liguori



Kevin

Re: [Qemu-devel] [PATCH v2 07/18] qom: add child properties (composition) (v2)

2011-12-08 Thread Anthony Liguori


On 12/08/2011 09:38 AM, Kevin Wolf wrote:

Am 02.12.2011 21:20, schrieb Anthony Liguori:

Child properties express a relationship of composition.

Signed-off-by: Anthony Liguori
---
v1 ->  v2
  - fix comments (Kevin)
  - add a reference when adding a child property (Kevin)
---
  hw/qdev.c |   26 ++
  hw/qdev.h |   20 
  2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/hw/qdev.c b/hw/qdev.c
index 2519f00..fa6b489 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -1174,6 +1174,32 @@ DeviceState *qdev_get_root(void)
  return qdev_root;
  }

+static void qdev_get_child_property(DeviceState *dev, Visitor *v, void *opaque,
+const char *name, Error **errp)
+{
+DeviceState *child = opaque;
+gchar *path;
+
+path = qdev_get_canonical_path(child);
+visit_type_str(v,&path, name, errp);
+g_free(path);
+}
+
+void qdev_property_add_child(DeviceState *dev, const char *name,
+ DeviceState *child, Error **errp)
+{
+gchar *type;
+
+type = g_strdup_printf("child<%s>", child->info->name);
+
+qdev_property_add(dev, name, type, qdev_get_child_property,
+  NULL, NULL, child, errp);
+
+qdev_ref(dev);


Shouldn't you increase the refcount for child rather than dev?


Indeed, I've fixed it in my tree.  Thanks.

Regards,

Anthony Liguori



Kevin

Re: [Qemu-devel] [PATCH v2 09/18] qapi: allow a 'gen' key to suppress code generation

2011-12-08 Thread Anthony Liguori


On 12/08/2011 10:04 AM, Kevin Wolf wrote:

Am 02.12.2011 21:20, schrieb Anthony Liguori:

Signed-off-by: Anthony Liguori
---
  scripts/qapi-commands.py |1 +
  scripts/qapi-types.py|1 +
  2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
index f7def16..54d1f5d 100644
--- a/scripts/qapi-commands.py
+++ b/scripts/qapi-commands.py
@@ -405,6 +405,7 @@ except os.error, e:

  exprs = parse_schema(sys.stdin)
  commands = filter(lambda expr: expr.has_key('command'), exprs)
+commands = filter(lambda expr: not expr.has_key('gen'), commands)


Might be confusing if someone uses something like gen = True. Shouldn't
we better check the value instead of just existence?


Yes, it was a lazy short cut.  I'll fix appropriately.

Regards,

Anthony Liguori



Kevin

Re: [Qemu-devel] [PATCH 0/2] qemu-io tests: More fine grained control of qemu paths

2011-12-08 Thread Christoph Hellwig

Thanks a lot Lucas,

I've applied the patches.  And sorry for the delay, I'm pretty busy at the
moment.

Re: [Qemu-devel] [PATCH v2] trace: Provide a per-event status define for conditional compilation

2011-12-08 Thread Lluís Vilanova

Stefan Hajnoczi writes:

> 2011/12/6 Lluís Vilanova :
>> Adds a 'TRACE_${NAME}_ENABLED' preprocessor define for each tracing event in
>> "trace.h".

> I think we should take it a step further: support
> TRACE_${NAME}_ENABLED at run-time.  This means skipping the trace
> event code if the event is currently disabled.  If the user enables it
> at runtime then the if (TRACE_${NAME}_ENABLED) will evaluate to true
> and we execute the code.

> SystemTap has support for this and it would be easy to do runtime
> support for stderr and simple (where we can test
> trace_list[event_id].state).

> The SystemTap mechanism is a macro, e.g. QEMU_${NAME}_ENABLED(), which
> fits perfectly with what you have posted.

> What do you think?

I think both are useful. In fact, AFAIR Blue Swirl did already propose a hybrid
mechanism like you say, but I just didn't find the time to do it then as well as
forgot about it up until now.

My ideal hybrid solution would have three forms:

* Event is disabled in "trace-events" (which is then in fact using the nop
  backend):

static inline bool trace_${name}_enabled(void)
{
return false;
}

* Event is enabled in "trace-events" (e.g., using SystemTap):

static inline bool trace_${name}_enabled(void)
{
return QEMU_${NAME}_ENABLED();
}

  tracetool should generate extra per-backend code here.

* Event is enabled and instrumented:

Let the user decide, including returning a constant "true" to eliminate all
run-time checks.

The next logical extension is to have a name/number based generic interface to
get the state:

event_t trace_event_get_id(char *name);
char* trace_event_get_name(event_t event);
bool trace_event_get_state(event_t event);
bool trace_event_name_get_state(char *name)
{
return trace_event_get_state(trace_event_get_id(name));
}

with the corresponding setter counterparts:

bool trace_event_set_state(event_t e);
bool trace_event_name_set_state(char *name)
{
return trace_event_set_state(trace_event_get_id(name));
}

This needs some extra backend-specific code that I cannot do right now.

Besides, this loses the ability to inline constant checks unless
"trace_${name}_enabled" is maintained:

static inline bool trace_${name}_enabled(void)
{
return TRACE_${NAME}_ENABLED &&  /* in current patch   */
   /* trace_event_get_state? */; /* should be optional */
}

> (The difference is that your patch plus compiler dead-code elimination
> would completely remove the code when built without a trace event.
> What I'm suggesting becomes a test and branch to skip over the code
> which always gets compiled in.)

The problem here is how to have the three options available. You could have a
new "instrument" backend, but that would then require the user to do all the
work. As it is right now (the instrumentation), the user can simply put ad-hoc
code in between, but still use the regular QEMU backends when necessary.

Of course, one could argue this constant-based optimization is just not
necessary at all. I just don't have numbers.

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

[Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command

2011-12-08 Thread Luiz Capitulino

This is basically suspend to disk on a Linux guest.

Signed-off-by: Luiz Capitulino 
---

This is an RFC because I did it as simple as possible and I'm open to
suggestions...

Now, while testing this or even "echo disk > /sys/power/state" I get several
funny results. Some times qemu just dies after printing that message:

 "Guest moved used index from 20151 to 1"

Some times it doesn't die, but I'm unable to log into the guest: I type
username & password but the terminal kind of locks (the shell doesn't run).

Some times it works...

 qapi-schema-guest.json |   11 +++
 qga/guest-agent-commands.c |   19 +++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/qapi-schema-guest.json b/qapi-schema-guest.json
index fde5971..2c5bbcf 100644
--- a/qapi-schema-guest.json
+++ b/qapi-schema-guest.json
@@ -215,3 +215,14 @@
 ##
 { 'command': 'guest-fsfreeze-thaw',
   'returns': 'int' }
+
+##
+# @guest-hibernate
+#
+# Save RAM contents to disk and powerdown the guest.
+#
+# Notes: This command doesn't return on success.
+#
+# Since: 1.1
+##
+{ 'command': 'guest-hibernate' }
diff --git a/qga/guest-agent-commands.c b/qga/guest-agent-commands.c
index 6da9904..9dd4060 100644
--- a/qga/guest-agent-commands.c
+++ b/qga/guest-agent-commands.c
@@ -550,6 +550,25 @@ int64_t qmp_guest_fsfreeze_thaw(Error **err)
 }
 #endif
 
+#define LINUX_SYS_STATE_FILE "/sys/power/state"
+
+void qmp_guest_hibernate(Error **err)
+{
+int fd;
+
+fd = open(LINUX_SYS_STATE_FILE, O_WRONLY);
+if (fd < 0) {
+error_set(err, QERR_OPEN_FILE_FAILED, LINUX_SYS_STATE_FILE);
+return;
+}
+
+if (write(fd, "disk", 4) < 0) {
+error_set(err, QERR_UNDEFINED_ERROR);
+}
+
+close(fd);
+}
+
 /* register init/cleanup routines for stateful command groups */
 void ga_command_state_init(GAState *s, GACommandState *cs)
 {
-- 
1.7.8.110.g4cb5d1.dirty

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011-12-08 Thread Lluís Vilanova

Stefan Hajnoczi writes:
[...]
>> * Support for tracing guest code through TCG.

> I'm not clear on whether a backdoor mechanism is needed or not.  A
> backdoor mechanism allows a modified guest to participate in tracing.

> Depending on the type of analysis you are doing it might be possible
> to achieve the same thing by emitting calls to custom instrumentation
> in TCG (or is this too low-level for what you're trying to trace in
> the guest?).  The advantage is also that you can do this to unmodified
> guests.

You're right with the first; it exists for two reasons:

* simplicity: it's easier to have, for example, a backdoor + linux tracepoints
  than obtaining the IP of an "interesting" function (think of the
  multiprogramming on a full system)

* performance: it's faster to let the guest tell you rather than (ab)using
  breakpoints or checking the IP of every instruction

The main use of this is to get semanticful events on the guest (my previous
example was about knowing when linux changes the current process, but you could
use anything else as well - e.g., a certain function of a specific thread on the
guest -).

>> * Support for different sets of enabled guest tracing events on a per-vCPU 
>> basis.

> This feature is very specific but potentially useful.  I haven't
> reviewed patches yet but I can imagine that this would be merged.

I'll send it after the initial TCG tracing support.

>> * Support for instrumenting tracing events (including those for guest code).

> Although I typically use post-processing rather than filtering, I can
> see your point about the power and flexibility in being able to react
> on the spot to events inside QEMU.  Given the ad-hoc nature of the
> instrumentation, as well as that this is really in the guts of QEMU
> and TCG, I think we should make it easy to write instrumentation but
> keep the infrastructure and APIs minimal.

> Is there even a need to add "instrumentation" support on top of QEMU
> tracing?  As an instrumentation implementor you are working with the
> QEMU source tree and need to be somewhat familiar with the internals.
> Why not just add custom functions into QEMU and call them from
> relevant points?

Well, the point is to just use the existing tracing points in QEMU. Of course, I
could just modify tracetool and add support for a new tracing backend that uses
my library.

The intention was just to make it brain-dead simple and repeatable (no manual
editing of the auto-generated tracing files), as well as able to reuse existing
backends when the user sees fit.

> The documentation and tracetool extensions you posted provide some
> structure but IMO they don't cut down the workflow significantly over
> adding plain old functions in places of interest.  i.e. by the time I
> have read the docs and begun trying to add instrumentation I could
> have already added a custom function in QEMU and built a binary.  Plus
> it's harder to understand/debug instrumentation code if it sits on top
> of the tracing instrumentation - yet another layer to understand.

True. Maybe an unofficial [1] instrumentation tracing backend would be simpler
to use and maintain.

[1] I'm assuming from Anthony's comments that such a thing would not be accepted
upstream.

> So I'm suggesting that we don't *need* explicit support for
> instrumenting trace events.  Instead add plain old functions where you
> need them.  You may decide to invoke trace events from your
> instrumentation function, but that's already covered today by
> docs/tracing.txt.

> Am I missing the point of instrumentation tracing events or do you
> agree that we can work well without it?

The point is to avoid diving into QEMU's code and instead use the current
out-of-the-box tracing points as the instrumentation hooks, which can be
achieved both through the current approach or an unofficial tracing backend.

The nice thing about the current approach is that the user can do some extra
checks on the tracing event and then (maybe) call the original backend-generated
tracing routine (as a quick and easy way to extend what to actually trace).

Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH V4 00/13] Proxy FS driver for VirtFS

2011-12-08 Thread Stefan Hajnoczi

On Mon, Dec 05, 2011 at 09:48:37PM +0530, M. Mohan Kumar wrote:
> From: "M. Mohan Kumar" 
> 
> Pass-through security model in QEMU 9p server needs root privilege to do
> few file operations (like chown, chmod to any mode/uid:gid).  There are two
> issues in pass-through security model
> 
> 1) TOCTTOU vulnerability: Following symbolic links in the server could
> provide access to files beyond 9p export path.
> 
> 2) Running QEMU with root privilege could be a security issue.
> 
> To overcome above issues, following approach is used: A new filesytem
> type 'proxy' is introduced. Proxy FS uses chroot + socket combination
> for securing the vulnerability known with following symbolic links.
> Intention of adding a new filesystem type is to allow qemu to run
> in non-root mode, but doing privileged operations using socket IO.

Fails to build against qemu.git/master (217bfb4):

  CClibhw64/9pfs/virtio-9p-proxy.o
hw/9pfs/virtio-9p-proxy.c:1195:5: error: unknown field ‘parse_opts’ specified 
in initializer
hw/9pfs/virtio-9p-proxy.c:1195:5: warning: initialization from incompatible 
pointer type [enabled by default]
hw/9pfs/virtio-9p-proxy.c:1195:5: warning: (near initialization for 
‘proxy_ops.init’) [enabled by default]

Is this against another public tree?

Stefan

Re: [Qemu-devel] [PATCH V4 02/13] hw/9pfs: Add validation to {un}marshal code

2011-12-08 Thread Stefan Hajnoczi

On Mon, Dec 05, 2011 at 09:48:39PM +0530, M. Mohan Kumar wrote:
> @@ -187,59 +190,70 @@ size_t v9fs_unmarshal(struct iovec *out_sg, int 
> out_num, size_t offset,
>  }
>  case 's': {
>  V9fsString *str = va_arg(ap, V9fsString *);
> -offset += v9fs_unmarshal(out_sg, out_num, offset, bswap,
> -"w", &str->size);
> -/* FIXME: sanity check str->size */
> -str->data = g_malloc(str->size + 1);
> -offset += v9fs_unpack(str->data, out_sg, out_num, offset,
> -str->size);
> -str->data[str->size] = 0;
> +copied = v9fs_unmarshal(out_sg, out_num, offset, bswap,
> +"w", &str->size);
> +if (copied > 0) {
> +offset += copied;
> +str->data = g_malloc(str->size + 1);

str->size is signed int16_t, we need a check or the type should be
uint16_t.

Stefan

Re: [Qemu-devel] [PATCH V4 04/13] hw/9pfs: File system helper process for qemu 9p proxy FS

2011-12-08 Thread Stefan Hajnoczi

On Mon, Dec 05, 2011 at 09:48:41PM +0530, M. Mohan Kumar wrote:
> +static int read_request(int sockfd, struct iovec *iovec, ProxyHeader *header)
> +{
> +int retval;
> +
> +/*
> + * read the request header.
> + */
> +iovec->iov_len = 0;
> +retval = socket_read(sockfd, iovec->iov_base, PROXY_HDR_SZ);
> +if (retval < 0) {
> +return retval;
> +}
> +iovec->iov_len = PROXY_HDR_SZ;
> +retval = proxy_unmarshal(iovec, 0, "dd", &header->type, &header->size);
> +if (retval < 0) {
> +return retval;
> +}
> +/*
> + * We can't process message.size > PROXY_MAX_IO_SZ, read the complete
> + * message from the socket and ignore it. This ensures that
> + * we can correctly handle the next request. We also return
> + * ENOBUFS as error to indicate we ran out of buffer space.
> + */
> +if (header->size > PROXY_MAX_IO_SZ) {
> +int count, size;
> +size = header->size;
> +while (size > 0) {
> +count = MIN(PROXY_MAX_IO_SZ, size);
> +count = socket_read(sockfd, iovec->iov_base + PROXY_HDR_SZ, 
> count);
> +if (count < 0) {
> +return count;
> +}
> +size -= count;
> +}

I'm not sure recovery attempts are worthwhile here.  The client is
buggy, perhaps just refuse further work.

> +return -ENOBUFS;
> +}

header->size is (signed) int and we didn't check for header->size < 0.
Please use an unsigned type.

> +if (chroot(rpath) < 0) {
> +do_perror("chroot");
> +goto error;
> +}
> +umask(0);

We haven't changed into the chroot yet, we need chdir("/").  Otherwise
the current working directory is outside the chroot (and allows trivial
escape).

Re: [Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command

2011-12-08 Thread Daniel P. Berrange

On Thu, Dec 08, 2011 at 04:52:58PM -0200, Luiz Capitulino wrote:
> This is basically suspend to disk on a Linux guest.

> Now, while testing this or even "echo disk > /sys/power/state" I get several
> funny results. Some times qemu just dies after printing that message:

I think you should really be invoking  'pm-hibernate', because IIUC there
are others tasks that need to be done, besides just telling the kernel
to hibernate via sysfs.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command

2011-12-08 Thread Luiz Capitulino

On Thu, 8 Dec 2011 19:07:00 +
"Daniel P. Berrange"  wrote:

> On Thu, Dec 08, 2011 at 04:52:58PM -0200, Luiz Capitulino wrote:
> > This is basically suspend to disk on a Linux guest.
> 
> > Now, while testing this or even "echo disk > /sys/power/state" I get several
> > funny results. Some times qemu just dies after printing that message:
> 
> I think you should really be invoking  'pm-hibernate', because IIUC there
> are others tasks that need to be done, besides just telling the kernel
> to hibernate via sysfs.

The good thing about it is that it will work among all distros... Maybe
we should try to run '/usr/sbin/pm-hibernate' first, and only write to
the sysfs file if that fails.

[Qemu-devel] Dropping the MONITOR_CMD_ASYNC

2011-12-08 Thread Luiz Capitulino

Hi there,

I'm about to completely drop the MONITOR_CMD_ASYNC API, but it turns out that
the command client_migrate_info uses it. That's a legacy interface and has to
be dropped, no command should be using it...

Something tells me that if I just drop it (and change the command to use the
regular interface), bad things will happen. Am I right? :)

Re: [Qemu-devel] [PULL 00/35] VMState port of all cpus

2011-12-08 Thread Blue Swirl

On Wed, Dec 7, 2011 at 18:42, Juan Quintela  wrote:
> The following changes since commit 217bfb445b54db618a30f3a39170bebd9fd9dbf2:
>
>  hw/arm_gic.c: Ignore attempts to complete nonexistent IRQs (2011-12-05 
> 21:38:56 +0100)
>
> are available in the git repository at:
>  ssh://repo.or.cz/srv/git/qemu/quintela.git vmstate-cpus-v4-for-anthony
>
> [v4]
> - change comment:
>    * Bassed on qemu-file code done by:
>  to
>   * Based on savevm serialization code by:
> - dropped sparc/ppc copyright notices.  After two tries, I haven't got
>  something that Blae is happy with.

I don't have a strong preference, except that if a list exists, it
should be a bit more accurate than just one or two random
contributors. I'd be happy with Paolo's generic "QEMU contributors"
version too, or for example a list of Fabrice, me, Red Hat and "QEMU
contributors" to cover the rest.

>
> [v3]
> - rebase to top
> - fix sparc/arm/i386 changes in upstream
> - all reviews were positive, Anthony, please pull
>
> [v2] Changes since v1
>
> - preserve arm comment that was missing (pbrook)
> - add copyright notice to the files that were empty
> - new patches:
>  * fix formating for i386
>  * remove unneeded includes
>  * rename machine.c to vmstate.c
>
> Later, Juan.
>
> [v1]
>
> This series port all cpus to use vmstate.
> - 1st patch is a fix of vmstate.
> - I discussed the arm changes over irc with Peter, he agreed that some
>  simplification could be good, but he didn't saw the patches O:-)
> - mips: no pci chipset has been ported, so migration don't work there.
>  I have embedded a couple of structs to improve vmstate checking.  Notice
>  that they were always allocated, so there shouldn't be any problem.
> - sparc: I changed the format a little bit to be able to use normal arrays.
> - sparc: If we always send the whole register windows, we don't need
>  VMSTATE_VARRAY_MULTIPLY.  As that array is quite big (520 elements), I am not
>  sure what is best.
> - cpsr_vmstate on arm: I am not sure if I could "abuse" uncached_cpsr for that
>  purpose?
>
> I have only tested on x86, for the rest, I double checked, but it is
> possible that I missed something.  I expect all patches to be
> integrated by Anthony in one go.  Architecture maintainers are CC'd
> for an ACK/NACK/comments.
>
> Please, review.
>
> PD. Is there an easy way of creating this "CC" list of mail addresses,
>    or the only way is to edit comments and write it by hand as I did?
>
>
> Juan Quintela (35):
>      vmstate: Fix VMSTATE_VARRAY_UINT32
>      vmstate: Simplify test for CPU_SAVE_VERSION
>      vmstate: make all architectures export a way to migrate cpu's
>      vmstate: unicore32 don't support cpu migration
>      vmstate: use new cpu style for x86
>      vmstate: use new style for lm32 cpus
>      vmstate: make microblaze cpus not migrateable
>      vmstate: port cris cpu to vmstate
>      vmstate: machine.c is only compiled for !CONFIG_USER_ONLY
>      vmstate: introduce float32 arrays
>      vmstate: introduce float64 arrays
>      vmstate: introduce CPU_DoubleU arrays
>      vmstate: Introduce VMSTATE_STRUCT_VARRAY_INT32_TEST
>      vmstate: port ppc cpu
>      vmstate: introduce VMSTATE_VARRAY_MULTIPLY
>      vmstate: define vmstate_info_uinttls
>      vmstate: port sparc cpu
>      vmstate: make incompatible change for sparc
>      mips_fulong2e: cpu vmstate already registered in cpu_exec_init
>      mips: make mvp an embedded struct instead of a pointer
>      mips: make tlb an embedded struct instead of a pointer
>      mips: bump migration version to 4
>      vmstate: port mips cpu
>      arm: save always 32 fpu registers
>      vmstate: port arm cpu
>      vmstate: all cpus converted
>      vmstate: fix vmstate formating for i386
>      vmstate: remove unneeded includes from target-*/machine.c
>      vmstate: rename machine.c to vmstate-cpu.c
>      vmstate: Add copyright info for alpha processor
>      vmstate: Add copyright info for lm32 processor
>      vmstate: Add copyright info for cris processor
>      vmstate: Add copyright info for arm processor
>      vmstate: Add copyright info for i386 processor
>      vmstate: Add copyright info for mips processor
>
>  Makefile.target                            |    3 +-
>  exec.c                                     |    7 +-
>  hw/hw.h                                    |   62 +-
>  hw/mips_fulong2e.c                         |    1 -
>  hw/mips_malta.c                            |    4 +-
>  hw/mips_timer.c                            |    2 +-
>  hw/sun4u.c                                 |   20 --
>  qemu-common.h                              |    4 -
>  savevm.c                                   |   92 +
>  target-alpha/{machine.c => vmstate-cpu.c}  |   28 ++-
>  target-arm/cpu.h                           |    5 +-
>  target-arm/machine.c                       |  225 
>  target-arm/vmstate-cpu.c                   |  188 +
>  target-cris/cpu.h

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011-12-08 Thread Blue Swirl

On Wed, Dec 7, 2011 at 18:54, Anthony Liguori  wrote:
> On 12/07/2011 12:51 PM, Peter Maydell wrote:
>>
>> 2011/12/7 Lluís Vilanova:
>>>
>>> Anthony Liguori writes:
>>> [...]

 Why should this analyzer live outside of QEMU in the first place?  I
 fail to see
 the rationale for that other than not wanting to do the work of making
 it
 suitable for upstream consumption.
>>>
>>>
>>> For the same reason that SystemTap lets you add user-provided code into
>>> the
>>> trace hooks, you never know what the user actually wants. The difference
>>> is that
>>> when dealing with traces that require a high bandwidth, different people
>>> may be
>>> interested on analyzing different events and under different conditions,
>>> and
>>> having a JIT in QEMU comes in handy to optimize the traces. This includes
>>> the
>>> generation of extra tracing code at translation time (e.g., I generate
>>> checks in
>>> TCG to establish when I want to trace a specific event, and someone else
>>> may
>>> just want to increment a counter using TCG code).
>>>
>>> Guest trace instrumentation turns QEMU into a highly-performant tool for
>>> dynamic
>>> binary instrumentation, with all the benefits that stem from QEMU
>>> (support for a
>>> myriad of target architectures, as well as support for both full-system
>>> and
>>> user-level applications).
>>
>>
>> I think this *is* useful. However I also think that it *is* effectively
>> defining an API for people writing this hook code, and as such we ought
>> to do it properly if we're going to do it. (ie nail down what we are
>> providing for hook authors and don't let them grub around in the internals
>> otherwise).
>
>
> I strongly suspect that you could define interfaces in QEMU such that you
> could do most of what you want without needing to link any code against QEMU
> itself.

I don't see linking as the problem, the instrumenting user who
modifies QEMU needs to follow licensing if the result is ever
distributed.

But the API issue can be a problem. Reusing and extending tracepoints
should help API stability from an instrumenting user point of view and
a rich set of various static and dynamic tracepoint mechanisms should
only be helpful for development of QEMU. But also I agree that the
internals of QEMU shouldn't become an API for any user code which
happens to access them.

> This is probably a case where LUA integration might make a lot of sense.
>
> Regards,
>
> Anthony Liguori
>
>>
>> -- PMM
>>
>>
>

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011-12-08 Thread Blue Swirl

2011/12/8 Lluís Vilanova :
> Stefan Hajnoczi writes:
> [...]
>>> * Support for tracing guest code through TCG.
>
>> I'm not clear on whether a backdoor mechanism is needed or not.  A
>> backdoor mechanism allows a modified guest to participate in tracing.
>
>> Depending on the type of analysis you are doing it might be possible
>> to achieve the same thing by emitting calls to custom instrumentation
>> in TCG (or is this too low-level for what you're trying to trace in
>> the guest?).  The advantage is also that you can do this to unmodified
>> guests.
>
> You're right with the first; it exists for two reasons:
>
> * simplicity: it's easier to have, for example, a backdoor + linux tracepoints
>  than obtaining the IP of an "interesting" function (think of the
>  multiprogramming on a full system)
>
> * performance: it's faster to let the guest tell you rather than (ab)using
>  breakpoints or checking the IP of every instruction

I still think that a breakpoint based system could be a useful method
in addition to others, simply because it is entirely invisible to the
guest and can then be used with unmodified and non-cooperating guests.
Properly implemented, it should not have much overhead.

> The main use of this is to get semanticful events on the guest (my previous
> example was about knowing when linux changes the current process, but you 
> could
> use anything else as well - e.g., a certain function of a specific thread on 
> the
> guest -).
>
>
>>> * Support for different sets of enabled guest tracing events on a per-vCPU 
>>> basis.
>
>> This feature is very specific but potentially useful.  I haven't
>> reviewed patches yet but I can imagine that this would be merged.
>
> I'll send it after the initial TCG tracing support.
>
>
>>> * Support for instrumenting tracing events (including those for guest code).
>
>> Although I typically use post-processing rather than filtering, I can
>> see your point about the power and flexibility in being able to react
>> on the spot to events inside QEMU.  Given the ad-hoc nature of the
>> instrumentation, as well as that this is really in the guts of QEMU
>> and TCG, I think we should make it easy to write instrumentation but
>> keep the infrastructure and APIs minimal.
>
>> Is there even a need to add "instrumentation" support on top of QEMU
>> tracing?  As an instrumentation implementor you are working with the
>> QEMU source tree and need to be somewhat familiar with the internals.
>> Why not just add custom functions into QEMU and call them from
>> relevant points?
>
> Well, the point is to just use the existing tracing points in QEMU. Of 
> course, I
> could just modify tracetool and add support for a new tracing backend that 
> uses
> my library.
>
> The intention was just to make it brain-dead simple and repeatable (no manual
> editing of the auto-generated tracing files), as well as able to reuse 
> existing
> backends when the user sees fit.
>
>
>> The documentation and tracetool extensions you posted provide some
>> structure but IMO they don't cut down the workflow significantly over
>> adding plain old functions in places of interest.  i.e. by the time I
>> have read the docs and begun trying to add instrumentation I could
>> have already added a custom function in QEMU and built a binary.  Plus
>> it's harder to understand/debug instrumentation code if it sits on top
>> of the tracing instrumentation - yet another layer to understand.
>
> True. Maybe an unofficial [1] instrumentation tracing backend would be simpler
> to use and maintain.
>
> [1] I'm assuming from Anthony's comments that such a thing would not be 
> accepted
>    upstream.
>
>
>> So I'm suggesting that we don't *need* explicit support for
>> instrumenting trace events.  Instead add plain old functions where you
>> need them.  You may decide to invoke trace events from your
>> instrumentation function, but that's already covered today by
>> docs/tracing.txt.
>
>> Am I missing the point of instrumentation tracing events or do you
>> agree that we can work well without it?
>
> The point is to avoid diving into QEMU's code and instead use the current
> out-of-the-box tracing points as the instrumentation hooks, which can be
> achieved both through the current approach or an unofficial tracing backend.
>
> The nice thing about the current approach is that the user can do some extra
> checks on the tracing event and then (maybe) call the original 
> backend-generated
> tracing routine (as a quick and easy way to extend what to actually trace).

Tracepoints shouldn't also suffer from bit rot so easily.

>
>
> Lluis
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth

Re: [Qemu-devel] [PATCH v4 12/15] kvm: x86: Add user space part for in-kernel APIC

2011-12-08 Thread Blue Swirl

On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
> This introduces the alternative APIC backend which makes use of KVM's
> in-kernel device model. External NMI injection via LINT1 is emulated by
> checking the current state of the in-kernel APIC, only injecting a NMI
> into the VCPU if LINT1 is unmasked and configured to DM_NMI.
>
> MSI is not yet supported, so we disable this when the in-kernel model is
> in use.
>
> CC: Lai Jiangshan 
> Signed-off-by: Jan Kiszka 
> ---
>  Makefile.target   |    2 +-
>  hw/kvm/apic.c     |  154 
> +
>  hw/pc.c           |   15 --
>  kvm.h             |    3 +
>  target-i386/kvm.c |    8 +++
>  5 files changed, 176 insertions(+), 6 deletions(-)
>  create mode 100644 hw/kvm/apic.c
>
> diff --git a/Makefile.target b/Makefile.target
> index b549988..76de485 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -236,7 +236,7 @@ obj-i386-y += vmport.o
>  obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
>  obj-i386-y += pc_piix.o
> -obj-i386-$(CONFIG_KVM) += kvm/clock.o
> +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o
>  obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>
>  # shared objects
> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
> new file mode 100644
> index 000..3924f9e
> --- /dev/null
> +++ b/hw/kvm/apic.c
> @@ -0,0 +1,154 @@
> +/*
> + * KVM in-kernel APIC support
> + *
> + * Copyright (c) 2011 Siemens AG
> + *
> + * Authors:
> + *  Jan Kiszka          
> + *
> + * This work is licensed under the terms of the GNU GPL version 2.
> + * See the COPYING file in the top-level directory.
> + */
> +#include "hw/apic_internal.h"
> +#include "kvm.h"
> +
> +static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic,
> +                                   int reg_id, uint32_t val)
> +{
> +    *((uint32_t *)(kapic->regs + (reg_id << 4))) = val;
> +}
> +
> +static inline uint32_t kvm_apic_get_reg(struct kvm_lapic_state *kapic,
> +                                       int reg_id)
> +{
> +    return *((uint32_t *)(kapic->regs + (reg_id << 4)));
> +}
> +
> +int kvm_put_apic(CPUState *env)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);

Please pass APICState instead of CPUState.

> +    struct kvm_lapic_state kapic;
> +    int i;
> +
> +    if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
> +        memset(&kapic, 0, sizeof(kapic));
> +        kvm_apic_set_reg(&kapic, 0x2, s->id << 24);
> +        kvm_apic_set_reg(&kapic, 0x8, s->tpr);
> +        kvm_apic_set_reg(&kapic, 0xd, s->log_dest << 24);
> +        kvm_apic_set_reg(&kapic, 0xe, s->dest_mode << 28 | 0x0fff);
> +        kvm_apic_set_reg(&kapic, 0xf, s->spurious_vec);
> +        for (i = 0; i < 8; i++) {
> +            kvm_apic_set_reg(&kapic, 0x10 + i, s->isr[i]);
> +            kvm_apic_set_reg(&kapic, 0x18 + i, s->tmr[i]);
> +            kvm_apic_set_reg(&kapic, 0x20 + i, s->irr[i]);
> +        }
> +        kvm_apic_set_reg(&kapic, 0x28, s->esr);
> +        kvm_apic_set_reg(&kapic, 0x30, s->icr[0]);
> +        kvm_apic_set_reg(&kapic, 0x31, s->icr[1]);
> +        for (i = 0; i < APIC_LVT_NB; i++) {
> +            kvm_apic_set_reg(&kapic, 0x32 + i, s->lvt[i]);
> +        }
> +        kvm_apic_set_reg(&kapic, 0x38, s->initial_count);
> +        kvm_apic_set_reg(&kapic, 0x3e, s->divide_conf);
> +
> +        return kvm_vcpu_ioctl(env, KVM_SET_LAPIC, &kapic);
> +    }
> +
> +    return 0;
> +}
> +
> +int kvm_get_apic(CPUState *env)

Same here.

> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, env->apic_state);
> +    struct kvm_lapic_state kapic;
> +    int ret, i, v;
> +
> +    if (s && kvm_enabled() && kvm_irqchip_in_kernel()) {
> +        ret = kvm_vcpu_ioctl(env, KVM_GET_LAPIC, &kapic);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        s->id = kvm_apic_get_reg(&kapic, 0x2) >> 24;
> +        s->tpr = kvm_apic_get_reg(&kapic, 0x8);
> +        s->arb_id = kvm_apic_get_reg(&kapic, 0x9);
> +        s->log_dest = kvm_apic_get_reg(&kapic, 0xd) >> 24;
> +        s->dest_mode = kvm_apic_get_reg(&kapic, 0xe) >> 28;
> +        s->spurious_vec = kvm_apic_get_reg(&kapic, 0xf);
> +        for (i = 0; i < 8; i++) {
> +            s->isr[i] = kvm_apic_get_reg(&kapic, 0x10 + i);
> +            s->tmr[i] = kvm_apic_get_reg(&kapic, 0x18 + i);
> +            s->irr[i] = kvm_apic_get_reg(&kapic, 0x20 + i);
> +        }
> +        s->esr = kvm_apic_get_reg(&kapic, 0x28);
> +        s->icr[0] = kvm_apic_get_reg(&kapic, 0x30);
> +        s->icr[1] = kvm_apic_get_reg(&kapic, 0x31);
> +        for (i = 0; i < APIC_LVT_NB; i++) {
> +            s->lvt[i] = kvm_apic_get_reg(&kapic, 0x32 + i);
> +        }
> +        s->initial_count = kvm_apic_get_reg(&kapic, 0x38);
> +        s->divide_conf = kvm_apic_get_reg(&kapic, 0x3e);
> +
> +        v = (s->divide_conf & 3) | ((s->divide_conf >> 1) & 4);
> +        s->count_shift = (v + 1) & 7;

Re: [Qemu-devel] [PATCH v4 00/15] uq/master: Introduce basic irqchip support

2011-12-08 Thread Blue Swirl

On Thu, Dec 8, 2011 at 11:52, Jan Kiszka  wrote:
> Changes in v4:
> - rebased of current uq/master
> - fixed stupid bugs that broke bisectability and user space irqchip mode
> - integrated NMI-over-LINT1 injection logic

I had comments to one patch, others look fine.

Overall, string based subtype selection does not somehow seem to be a
hot idea, but this could be used as a starting point which should be
cleaned up later when we have proper device composition. APIC and x86
interrupt handling need more cleanup anyway.

> CC: Lai Jiangshan 
>
> Jan Kiszka (15):
>  msi: Generalize msix_supported to msi_supported
>  kvm: Move kvmclock into hw/kvm folder
>  apic: Stop timer on reset
>  apic: Inject external NMI events via LINT1
>  apic: Introduce backend/frontend infrastructure for KVM reuse
>  apic: Open-code timer save/restore
>  i8259: Introduce backend/frontend infrastructure for KVM reuse
>  ioapic: Introduce backend/frontend infrastructure for KVM reuse
>  memory: Introduce memory_region_init_reservation
>  kvm: Introduce core services for in-kernel irqchip support
>  kvm: x86: Establish IRQ0 override control
>  kvm: x86: Add user space part for in-kernel APIC
>  kvm: x86: Add user space part for in-kernel i8259
>  kvm: x86: Add user space part for in-kernel IOAPIC
>  kvm: Arm in-kernel irqchip support
>
>  Makefile.objs                  |    2 +-
>  Makefile.target                |    6 +-
>  configure                      |    1 +
>  hw/apic.c                      |  309 ---
>  hw/apic.h                      |    1 +
>  hw/apic_common.c               |  312 
> 
>  hw/apic_internal.h             |  122 
>  hw/i8259.c                     |  127 ++--
>  hw/i8259_common.c              |  173 ++
>  hw/i8259_internal.h            |   82 +++
>  hw/ioapic.c                    |  130 ++---
>  hw/ioapic_common.c             |  138 ++
>  hw/ioapic_internal.h           |  106 ++
>  hw/kvm/apic.c                  |  154 
>  hw/{kvmclock.c => kvm/clock.c} |    4 +-
>  hw/{kvmclock.h => kvm/clock.h} |    0
>  hw/kvm/i8259.c                 |  126 
>  hw/kvm/ioapic.c                |  101 +
>  hw/msi.c                       |    8 +
>  hw/msi.h                       |    2 +
>  hw/msix.c                      |    9 +-
>  hw/msix.h                      |    2 -
>  hw/pc.c                        |   19 ++-
>  hw/pc.h                        |    1 +
>  hw/pc_piix.c                   |   66 -
>  kvm-all.c                      |  154 
>  kvm-stub.c                     |    5 +
>  kvm.h                          |   13 ++
>  memory.c                       |   36 +
>  memory.h                       |   16 ++
>  monitor.c                      |    6 +-
>  qemu-config.c                  |    4 +
>  qemu-options.hx                |    5 +-
>  sysemu.h                       |    1 -
>  target-i386/kvm.c              |   19 +++
>  trace-events                   |    2 +-
>  vl.c                           |    1 -
>  37 files changed, 1724 insertions(+), 539 deletions(-)
>  create mode 100644 hw/apic_common.c
>  create mode 100644 hw/apic_internal.h
>  create mode 100644 hw/i8259_common.c
>  create mode 100644 hw/i8259_internal.h
>  create mode 100644 hw/ioapic_common.c
>  create mode 100644 hw/ioapic_internal.h
>  create mode 100644 hw/kvm/apic.c
>  rename hw/{kvmclock.c => kvm/clock.c} (98%)
>  rename hw/{kvmclock.h => kvm/clock.h} (100%)
>  create mode 100644 hw/kvm/i8259.c
>  create mode 100644 hw/kvm/ioapic.c
>
> --
> 1.7.3.4
>

Re: [Qemu-devel] [PATCH] tcg: Remove redundant declarations of TCG_TARGET_REG_BITS

2011-12-08 Thread Stuart Brady

On Thu, Dec 08, 2011 at 08:19:45AM +0100, Stefan Weil wrote:
> Am 08.12.2011 08:03, schrieb 陳韋任:
> >On Wed, Dec 07, 2011 at 11:31:46PM +0100, Stefan Weil wrote:
> >>TCG_TARGET_REG_BITS is declared in tcg.h for all TCG targets.
> >
> >Just want to make sure. When we talk about target in TCG, that
> >_always_ means
> >the host, right?
> >
> >Regards,
> >chenwj
> 
> Yes. See file tcg/README which says this:
> 
>The TCG "target" is the architecture for which we generate the
>code. It is of course not the same as the "target" of QEMU which is
>the emulated architecture. As TCG started as a generic C backend used
>for cross compiling, it is assumed that the TCG target is different
>from the host, although it is never the case for QEMU.

... although that's no longer quite the case when using TCI, as I'm
sure you're quite well aware! :-)

I suppose when using TCI:
 * The host is system that QEMU's running on.
 * The TCG target is the TCI bytecode interpreter.
 * The QEMU target is whatever you're emulating.

... but of course, the vast majority users do not use TCI (as I think
it's mainly intended for development?), so the description in tcg/README
is correct in most cases.

I suppose it's a bit more complicated than I've described, too, since
the TCI interpreter might behave differently depending on the host that
you're running on... e.g. TCI will use the same endianness as the host.

I'm not sure if it would be worth updating tcg/README along these lines,
but the distinction between TCI and the host seems a little murky...

Cheers,
-- 
Stuart

Re: [Qemu-devel] [RFC] Device sandboxing

2011-12-08 Thread Blue Swirl

On Wed, Dec 7, 2011 at 18:25, Corey Bryant  wrote:
> A group of us are starting to work on sandboxing QEMU device emulation
> code.  We're just getting started investigating various approaches, and
> want to engage the community to gather input.
>
> Following are the design points that we are currently considering:
>
> * Decompose QEMU into multiple processes:
>
>    * This could be done such that QEMU devices execute in separate
>      processes based on device type, e.g. all block devices in one
>      process and all network devices in a second process.  Another
>      alternative is executing a separate process per device.
>
>    * Decomposition would not only afford a level of security inherent
>      in process separation, it would also allow development of stricter
>      sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
>      block device specific policy).  This would enable a true sandbox
>      with layers of defense.

I'd start by splitting QEMU into two processes: untrusted process
(which performs most of the work) running in a chroot() jail and
trusted process outside (handling access to drives, network etc.). The
untrusted process could then be split further later like you detail
below but this first step would already give minimal protection with a
reasonable amount of effort.

> * Decompose the device emulation process further into an untrusted and
>  trusted thread:
>
>    * The untrusted thread would be restricted by seccomp mode 1 and
>      would contain the device emulation code.
>
>    * The trusted helper thread would run beside the untrusted thread,
>      enabling the untrusted thread to make syscalls beyond read(),
>      write(), exit(), and sigreturn().

Why limit this to device emulation only? Where in QEMU would this
approach not work?

But the problem here is that conversion of a single-thread application
to multithreading, especially using only API like seccomp, will not be
so trivial or error free work. Therefore I'd propose to start with
something simpler at first.

> * IPC communication mechanisms:
>
>    * An IPC mechanism will be required to enable communication between
>      untrusted and trusted threads.
>
>    * An IPC mechanism will also be required to enable communication
>      between the main QEMU process and device processes.
>
>    * The communication mechanisms must provide secure communication,
>      be low overhead (easy to generate, parse, and validate), and must
>      play well with sVirt/LSMs.
>
>    * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>      Google Native Client's IMC, and shared memory.
>
> * If seccomp mode 2 support becomes available, decomposition of device
>  emulation into untrusted/trusted threads may not be necessary.  This
>  could result in improved performance (no IPC overhead between trusted
>  and untrusted thread) and reduced complexity (no need for trusted
>  helper thread).
>
> * Execution of QEMU with the sandboxed device support should be an
>  optional run-time specification.
>
> * We will be focusing on legacy devices first, both for performance and
>  risk reasons.
>
> Once we settle on a direction, we will develop a proof of concept to
> share with the community.
>
> We appreciate your input.
>
> Regards,
>
> Ashley Lai
> Corey Bryant
> Eduardo Otubo
> Michael Halcrow
> Paul Moore
> Richa Marwaha
>
>

Re: [Qemu-devel] [PATCH v2 0/5] backdoor: lightweight guest-to-QEMU backdoor channel

2011-12-08 Thread Lluís Vilanova

Blue Swirl writes:

> 2011/12/8 Lluís Vilanova :
>> Stefan Hajnoczi writes:
>> [...]
 * Support for tracing guest code through TCG.
>> 
>>> I'm not clear on whether a backdoor mechanism is needed or not.  A
>>> backdoor mechanism allows a modified guest to participate in tracing.
>> 
>>> Depending on the type of analysis you are doing it might be possible
>>> to achieve the same thing by emitting calls to custom instrumentation
>>> in TCG (or is this too low-level for what you're trying to trace in
>>> the guest?).  The advantage is also that you can do this to unmodified
>>> guests.
>> 
>> You're right with the first; it exists for two reasons:
>> 
>> * simplicity: it's easier to have, for example, a backdoor + linux 
>> tracepoints
>>  than obtaining the IP of an "interesting" function (think of the
>>  multiprogramming on a full system)
>> 
>> * performance: it's faster to let the guest tell you rather than (ab)using
>>  breakpoints or checking the IP of every instruction

> I still think that a breakpoint based system could be a useful method
> in addition to others, simply because it is entirely invisible to the
> guest and can then be used with unmodified and non-cooperating guests.
> Properly implemented, it should not have much overhead.

Yup. My plan was to implement this sometime in the future :)

On the performance side, there's only the degenerated case of a very hot code
page that repeatedly triggers false mmu faults due to the (never reached)
breakpoint.

As for simplicity, I can use the backdoor to tell my analuzer to insert a
breakpoint in the given address, instead of having some complex system to get
symbol information from the outside (specially in softmmy mode).


[...]
>>> So I'm suggesting that we don't *need* explicit support for
>>> instrumenting trace events.  Instead add plain old functions where you
>>> need them.  You may decide to invoke trace events from your
>>> instrumentation function, but that's already covered today by
>>> docs/tracing.txt.
>> 
>>> Am I missing the point of instrumentation tracing events or do you
>>> agree that we can work well without it?
>> 
>> The point is to avoid diving into QEMU's code and instead use the current
>> out-of-the-box tracing points as the instrumentation hooks, which can be
>> achieved both through the current approach or an unofficial tracing backend.
>> 
>> The nice thing about the current approach is that the user can do some extra
>> checks on the tracing event and then (maybe) call the original 
>> backend-generated
>> tracing routine (as a quick and easy way to extend what to actually trace).

> Tracepoints shouldn't also suffer from bit rot so easily.

Exactly. That's why I thought that tracepoint instrumentation is not such a bad
option.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [PATCH] pc: map pc ram from user-specified file

2011-12-08 Thread Peter Feiner

>> Do you have any performance number for this?  And examples on how your
>> are using it?

> Our principal use case is implementing VM migration techniques.

There are other uses of a RAM file interface that I can imagine:

- debugging, e.g., inspecting the memory of a VM after it has crashed
- security research, e.g., extracting passwords from a running VM

[Qemu-devel] [RFC][PATCH 00/10] trace-tcg: Allow tracing guest events in TCG-generated code

2011-12-08 Thread Lluís Vilanova

Adds the basic ability to specify which events in the "trace-events" may be used
to trace guest activity in the TCG code.

Such events generate an extra set of tracing functions that can be called during
TCG code generation and will automatically redirect a call to the appropriate
backend-dependant tracing functions when the guest code is executed.

The extra routines and files generated by tracetool are:

* trace-tcg-helper.h

  Included on all "${target}/helper.h" headers.

  Declares a 'trace_${name}_proxy' TCG helper using the 'DEF_HELPER_*' macros
  for each TCG-aware event.

  Defines a 'gen_helper_trace_${name}' that ends up calling
  'gen_helper_trace_${name}_proxy'.

  In order to provide a more streamlined interface, this routine can take a mix
  of tracing-supported types and TCG types. In order to do this, it
  automatically casts the arguments and allocates them into the appropriate TCG
  types, calls 'gen_helper_trace_${name}_proxy' and frees the temporary TCG
  values it has allocated.

* trace-tcg-helper.c

  Defines a 'trace_${name}_proxy' TCG helper for each TCG-aware event. This
  helper will immediately call 'trace_${name}' with all the necessary argument
  type casts to match the signature of the callee.

NOTE: This patch series has been stripped out of all trace instrumentation
  features, and is thus applicable on top of QEMU 1.0

Signed-off-by: Lluís Vilanova 
---

Lluís Vilanova (10):
  trace: [doc] Document event properties on a separate section
  trace-tcg: Add documentation
  Trivial changes to eliminate auto-generated files
  [m68k,s390,xtensa] Move helpers.h to helper.h
  trace: [tracetool] Common functions to manage event arguments
  trace: [tracetool] Add 'get_api_name' to construct the name of tracing 
routines
  trace-tcg: [tracetool] Allow TCG types in trace event declarations
  trace-tcg: [tracetool] Declare TCG tracing helper routines
  trace-tcg: [tracetool] Define TCG tracing helper routines
  trace-tcg: [all] Include TCG-tracing helpers


 .gitignore |2 
 Makefile   |   11 +
 Makefile.objs  |   17 ++
 Makefile.target|2 
 def-helper.h   |9 +
 docs/tracing.txt   |   62 +-
 scripts/simpletrace.py |2 
 scripts/tracetool  |  449 +---
 target-alpha/helper.h  |2 
 target-arm/helper.h|2 
 target-cris/helper.h   |2 
 target-i386/helper.h   |2 
 target-lm32/helper.h   |2 
 target-m68k/helper.c   |2 
 target-m68k/helper.h   |   56 +
 target-m68k/helpers.h  |   54 -
 target-m68k/op_helper.c|2 
 target-m68k/translate.c|6 -
 target-microblaze/helper.h |2 
 target-mips/helper.h   |2 
 target-ppc/helper.h|2 
 target-s390x/helper.h  |  154 +++
 target-s390x/helpers.h |  152 ---
 target-s390x/op_helper.c   |2 
 target-s390x/translate.c   |4 
 target-sh4/helper.h|2 
 target-sparc/helper.h  |2 
 target-unicore32/helper.h  |2 
 target-xtensa/helper.h |   34 +++
 target-xtensa/helpers.h|   32 ---
 target-xtensa/op_helper.c  |2 
 target-xtensa/translate.c  |6 -
 xtensa-semi.c  |2 
 33 files changed, 796 insertions(+), 288 deletions(-)
 create mode 100644 target-m68k/helper.h
 delete mode 100644 target-m68k/helpers.h
 create mode 100644 target-s390x/helper.h
 delete mode 100644 target-s390x/helpers.h
 create mode 100644 target-xtensa/helper.h
 delete mode 100644 target-xtensa/helpers.h

[Qemu-devel] [PATCH 01/10] trace: [doc] Document event properties on a separate section

2011-12-08 Thread Lluís Vilanova

Signed-off-by: Lluís Vilanova 
---
 docs/tracing.txt |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index ea29f2c..853cbf8 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -98,12 +98,6 @@ respectively.  This ensures portability between 32- and 
64-bit platforms.
 4. Name trace events after their function.  If there are multiple trace events
in one function, append a unique distinguisher at the end of the name.
 
-5. If specific trace events are going to be called a huge number of times, this
-   might have a noticeable performance impact even when the trace events are
-   programmatically disabled. In this case you should declare the trace event
-   with the "disable" property, which will effectively disable it at compile
-   time (using the "nop" backend).
-
 == Generic interface and monitor commands ==
 
 You can programmatically query and control the dynamic state of trace events
@@ -234,3 +228,19 @@ probes:
   --target-type system \
   --target-arch x86_64 \
   qemu.stp
+
+== Trace event properties ==
+
+Each event in the "trace-events" file can be prefixed with a space-separated
+list of zero or more of the following event properties.
+
+=== "disable" ===
+
+If a specific trace event is going to be invoked a huge number of times, this
+might have a noticeable performance impact even when the event is
+programmatically disabled.
+
+In this case you should declare such event with the "disable" property. This
+will effectively disable the event at compile time (by using the "nop" 
backend),
+thus having no performance impact at all on regular builds (i.e., unless you
+edit the "trace-events" file).

[Qemu-devel] [PATCH 02/10] trace-tcg: Add documentation

2011-12-08 Thread Lluís Vilanova

Signed-off-by: Lluís Vilanova 
---
 docs/tracing.txt |   40 
 1 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 853cbf8..44c5dba 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -244,3 +244,43 @@ In this case you should declare such event with the 
"disable" property. This
 will effectively disable the event at compile time (by using the "nop" 
backend),
 thus having no performance impact at all on regular builds (i.e., unless you
 edit the "trace-events" file).
+
+=== "tcg" ===
+
+Code from the guest generated by TCG can be traced by defining an event with 
the
+"tcg" event property.
+
+In addition to the regular "trace_" routine in the "trace.h" header,
+events with the "tcg" property will also provide the TCG helper routine
+"gen_helper_trace_". This routine can be called during TCG code
+generation as any other TCG helper to automatically generate TCG code to call
+"trace_" during TCG code execution.
+
+The notable difference is that these events can be declared in the
+"trace-events" file with both basic types as well as TCG types. The
+"gen_helper_trace_" routine will transparently take care of turning
+any non-TCG argument into a TCG value.
+
+For example, the event:
+
+tcg foo(uint8_t a1, TCGv_i32 a2) "a1=%d a2=%d"
+
+Can be invoked from TCG code as:
+
+uint8_t a1 = ...;
+TCGv_i32 a2 = ...;
+gen_helper_trace_foo(a1, a2);
+
+And the intermediate boilerplate code will take care of generating the TCG code
+to call:
+
+void trace_foo(uint8_t a1, uint32_t a2);
+
+=== "tcg-vcpu" ===
+
+Events with the "tcg" property will generate code helpers with the
+"TCG_CALL_CONST" flag (see "tcg/README" and "tcg/tcg.h"), which produces faster
+code that is unable to reliably access the state of the vCPU.
+
+Using the "tcg-vcpu" property will generate a slower TCG helper that will be
+able to reliably get and set values from the vCPU.

[Qemu-devel] [PATCH 03/10] Trivial changes to eliminate auto-generated files

2011-12-08 Thread Lluís Vilanova

Signed-off-by: Lluís Vilanova 
---
 Makefile |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 301c75e..506db1b 100644
--- a/Makefile
+++ b/Makefile
@@ -3,10 +3,14 @@
 # Always point to the root of the build tree (needs GNU make).
 BUILD_DIR=$(CURDIR)
 
-GENERATED_HEADERS = config-host.h trace.h qemu-options.def
+GENERATED_HEADERS = config-host.h qemu-options.def
+
+GENERATED_HEADERS += trace.h
 ifeq ($(TRACE_BACKEND),dtrace)
 GENERATED_HEADERS += trace-dtrace.h
 endif
+GENERATED_SOURCES += trace.c
+
 GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
 GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c
 
@@ -226,10 +230,11 @@ clean:
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o 
qga/*.d
rm -f qemu-img-cmds.h
rm -f trace/*.o trace/*.d
-   rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp
rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp
+   @# May not be present in GENERATED_HEADERS
rm -f trace-dtrace.h trace-dtrace.h-timestamp
-   rm -f $(GENERATED_SOURCES)
+   rm -f $(foreach f,$(GENERATED_HEADERS) $(foreach 
h,$(GENERATED_HEADERS),$(h)-timestamp),$(f) */$(f))
+   rm -f $(foreach f,$(GENERATED_SOURCES) $(foreach 
h,$(GENERATED_SOURCES),$(h)-timestamp),$(f) */$(f))
rm -rf $(qapi-dir)
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) $(QEMULIBS) libcacard; do \

[Qemu-devel] [PATCH 04/10] [m68k, s390, xtensa] Move helpers.h to helper.h

2011-12-08 Thread Lluís Vilanova

This provides a consistent file naming scheme across all targets.

Signed-off-by: Lluís Vilanova 
---
 target-m68k/helper.c  |2 -
 target-m68k/helper.h  |   54 
 target-m68k/helpers.h |   54 
 target-m68k/op_helper.c   |2 -
 target-m68k/translate.c   |6 +-
 target-s390x/helper.h |  152 +
 target-s390x/helpers.h|  152 -
 target-s390x/op_helper.c  |2 -
 target-s390x/translate.c  |4 +
 target-xtensa/helper.h|   32 +
 target-xtensa/helpers.h   |   32 -
 target-xtensa/op_helper.c |2 -
 target-xtensa/translate.c |6 +-
 xtensa-semi.c |2 -
 14 files changed, 251 insertions(+), 251 deletions(-)
 create mode 100644 target-m68k/helper.h
 delete mode 100644 target-m68k/helpers.h
 create mode 100644 target-s390x/helper.h
 delete mode 100644 target-s390x/helpers.h
 create mode 100644 target-xtensa/helper.h
 delete mode 100644 target-xtensa/helpers.h

diff --git a/target-m68k/helper.c b/target-m68k/helper.c
index 674c8e6..123e1d9 100644
--- a/target-m68k/helper.c
+++ b/target-m68k/helper.c
@@ -26,7 +26,7 @@
 #include "qemu-common.h"
 #include "gdbstub.h"
 
-#include "helpers.h"
+#include "helper.h"
 
 #define SIGNBIT (1u << 31)
 
diff --git a/target-m68k/helper.h b/target-m68k/helper.h
new file mode 100644
index 000..cb8a0c7
--- /dev/null
+++ b/target-m68k/helper.h
@@ -0,0 +1,54 @@
+#include "def-helper.h"
+
+DEF_HELPER_1(bitrev, i32, i32)
+DEF_HELPER_1(ff1, i32, i32)
+DEF_HELPER_2(sats, i32, i32, i32)
+DEF_HELPER_2(divu, void, env, i32)
+DEF_HELPER_2(divs, void, env, i32)
+DEF_HELPER_3(addx_cc, i32, env, i32, i32)
+DEF_HELPER_3(subx_cc, i32, env, i32, i32)
+DEF_HELPER_3(shl_cc, i32, env, i32, i32)
+DEF_HELPER_3(shr_cc, i32, env, i32, i32)
+DEF_HELPER_3(sar_cc, i32, env, i32, i32)
+DEF_HELPER_2(xflag_lt, i32, i32, i32)
+DEF_HELPER_2(set_sr, void, env, i32)
+DEF_HELPER_3(movec, void, env, i32, i32)
+
+DEF_HELPER_2(f64_to_i32, f32, env, f64)
+DEF_HELPER_2(f64_to_f32, f32, env, f64)
+DEF_HELPER_2(i32_to_f64, f64, env, i32)
+DEF_HELPER_2(f32_to_f64, f64, env, f32)
+DEF_HELPER_2(iround_f64, f64, env, f64)
+DEF_HELPER_2(itrunc_f64, f64, env, f64)
+DEF_HELPER_2(sqrt_f64, f64, env, f64)
+DEF_HELPER_1(abs_f64, f64, f64)
+DEF_HELPER_1(chs_f64, f64, f64)
+DEF_HELPER_3(add_f64, f64, env, f64, f64)
+DEF_HELPER_3(sub_f64, f64, env, f64, f64)
+DEF_HELPER_3(mul_f64, f64, env, f64, f64)
+DEF_HELPER_3(div_f64, f64, env, f64, f64)
+DEF_HELPER_3(sub_cmp_f64, f64, env, f64, f64)
+DEF_HELPER_2(compare_f64, i32, env, f64)
+
+DEF_HELPER_3(mac_move, void, env, i32, i32)
+DEF_HELPER_3(macmulf, i64, env, i32, i32)
+DEF_HELPER_3(macmuls, i64, env, i32, i32)
+DEF_HELPER_3(macmulu, i64, env, i32, i32)
+DEF_HELPER_2(macsats, void, env, i32)
+DEF_HELPER_2(macsatu, void, env, i32)
+DEF_HELPER_2(macsatf, void, env, i32)
+DEF_HELPER_2(mac_set_flags, void, env, i32)
+DEF_HELPER_2(set_macsr, void, env, i32)
+DEF_HELPER_2(get_macf, i32, env, i64)
+DEF_HELPER_1(get_macs, i32, i64)
+DEF_HELPER_1(get_macu, i32, i64)
+DEF_HELPER_2(get_mac_extf, i32, env, i32)
+DEF_HELPER_2(get_mac_exti, i32, env, i32)
+DEF_HELPER_3(set_mac_extf, void, env, i32, i32)
+DEF_HELPER_3(set_mac_exts, void, env, i32, i32)
+DEF_HELPER_3(set_mac_extu, void, env, i32, i32)
+
+DEF_HELPER_2(flush_flags, void, env, i32)
+DEF_HELPER_1(raise_exception, void, i32)
+
+#include "def-helper.h"
diff --git a/target-m68k/helpers.h b/target-m68k/helpers.h
deleted file mode 100644
index cb8a0c7..000
--- a/target-m68k/helpers.h
+++ /dev/null
@@ -1,54 +0,0 @@
-#include "def-helper.h"
-
-DEF_HELPER_1(bitrev, i32, i32)
-DEF_HELPER_1(ff1, i32, i32)
-DEF_HELPER_2(sats, i32, i32, i32)
-DEF_HELPER_2(divu, void, env, i32)
-DEF_HELPER_2(divs, void, env, i32)
-DEF_HELPER_3(addx_cc, i32, env, i32, i32)
-DEF_HELPER_3(subx_cc, i32, env, i32, i32)
-DEF_HELPER_3(shl_cc, i32, env, i32, i32)
-DEF_HELPER_3(shr_cc, i32, env, i32, i32)
-DEF_HELPER_3(sar_cc, i32, env, i32, i32)
-DEF_HELPER_2(xflag_lt, i32, i32, i32)
-DEF_HELPER_2(set_sr, void, env, i32)
-DEF_HELPER_3(movec, void, env, i32, i32)
-
-DEF_HELPER_2(f64_to_i32, f32, env, f64)
-DEF_HELPER_2(f64_to_f32, f32, env, f64)
-DEF_HELPER_2(i32_to_f64, f64, env, i32)
-DEF_HELPER_2(f32_to_f64, f64, env, f32)
-DEF_HELPER_2(iround_f64, f64, env, f64)
-DEF_HELPER_2(itrunc_f64, f64, env, f64)
-DEF_HELPER_2(sqrt_f64, f64, env, f64)
-DEF_HELPER_1(abs_f64, f64, f64)
-DEF_HELPER_1(chs_f64, f64, f64)
-DEF_HELPER_3(add_f64, f64, env, f64, f64)
-DEF_HELPER_3(sub_f64, f64, env, f64, f64)
-DEF_HELPER_3(mul_f64, f64, env, f64, f64)
-DEF_HELPER_3(div_f64, f64, env, f64, f64)
-DEF_HELPER_3(sub_cmp_f64, f64, env, f64, f64)
-DEF_HELPER_2(compare_f64, i32, env, f64)
-
-DEF_HELPER_3(mac_move, void, env, i32, i32)
-DEF_HELPER_3(macmulf, i64, env, i32, i32)
-DEF_HELPER_3(macmuls, i64, env, i32, i32)
-DEF_HELPER_3(macmulu, i64, env, i32, i32)
-DEF_HELPER_2(macsats, void, env, i32)
-DEF_HELP

[Qemu-devel] [PATCH 05/10] trace: [tracetool] Common functions to manage event arguments

2011-12-08 Thread Lluís Vilanova

This adds:

* zip_lists: Pair elements from two lists.
* get_argtypes: Get the list of types of the arguments; can optionally transform
  the types in the process.

The function 'get_args' is modified to use a combination of 'get_argnames',
'get_argtypes', 'zip_lists' and, optionally, use the type transformation
capabilities of 'get_argtypes'.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool |  117 +
 1 files changed, 116 insertions(+), 1 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 4c9951d..f2b6680 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -74,15 +74,130 @@ has_property()
 return 1
 }
 
+# Convenience function to pair elements from two comma-separated lists of equal
+# size.
+# 1: first list
+# 2: second list
+# 3: printf format for each pair of elements (optional, default: "%s %s, ")
+zip_lists()
+{
+[ -n "$1" -a -n "$2" ] || return
+
+local format
+format=$3
+[ -n "$format" ] || format="%s %s, "
+
+local i elem accum
+i=1
+accum=""
+for elem in $1; do
+if [ "$elem" = "${elem%,}" ]; then
+accum="$accum $elem"
+else
+accum="$accum ${elem%,}"
+eval __elem_$i=\"$accum\"
+i=$(($i + 1))
+accum=""
+fi
+done
+eval __elem_$i=\"$accum\"
+
+local tmp res
+accum=""
+res=""
+i=1
+for elem in $2; do
+if [ "$elem" = "${elem%,}" ]; then
+accum="$accum $elem"
+else
+accum="$accum ${elem%,}"
+eval tmp=\$__elem_$i
+tmp=$(printf "$format" "$tmp" "$accum")
+res="$res$tmp"
+i=$(($i + 1))
+accum=""
+fi
+done
+eval tmp=\$__elem_$i
+tmp=$(printf "$format" "$tmp" "$accum")
+res="$res$tmp"
+res="${res%, }"
+echo $res
+}
+
 # Get the argument list of a trace event, including types and names
+# 1: event name
+# 2: passed as the second argument to 'get_argtypes' (optional)
 get_args()
 {
 local args
 args=${1#*\(}
 args=${args%%\)*}
-echo "$args"
+
+if [ -z "$2" -o "$args" = "void" ]; then
+echo "$args"
+else
+local argtypes argnames res
+argtypes=$(get_argtypes "$1" $2)
+argnames=$(get_argnames "$1" ",")
+res=$(zip_lists "$argtypes" "$argnames")
+echo "${res#, }"
+fi
 }
 
+# Get the argument type list of a trace event
+# 1: event name
+# 2: function to translate each of the types (optional, default: nil_type)
+get_argtypes()
+{
+local translate field type res
+
+translate=$2
+[ -n "$translate" ] || translate=nil_type
+
+res=""
+type=""
+for field in $(get_args "$1"); do
+if [ "${field%,}" != "${field}" ]; then
+# it's a name, but can have a star in it
+[ "${field#\*}" = "$field" ] || type="${type}*"
+# trim spaces
+type="${type## }"
+type="${type%% }"
+# translate
+type=$($translate "$type")
+res="$res$type, "
+type=""
+else
+# types can have spaces
+type="$type $field"
+fi
+done
+
+# remove the accumulated name
+field="${type##* }"
+type="${type% *}"
+
+# trim spaces
+type="${type## }"
+type="${type%% }"
+
+# it's a name, but can have a star in it
+[ "${field#\*}" = "$field" ] || type="${type}*"
+# translate
+type=$($translate "$type")
+res="$res$type"
+
+echo $res
+}
+
+# No-op type translator for get_argtypes
+nil_type()
+{
+echo "$1"
+}
+
+
 # Get the argument name list of a trace event
 get_argnames()
 {

[Qemu-devel] [PATCH 06/10] trace: [tracetool] Add 'get_api_name' to construct the name of tracing routines

2011-12-08 Thread Lluís Vilanova

All backends now use 'get_api_name' to build the name of the routines that are
used by QEMU code when invoking trace points. This is built based on the value
of 'get_api_name_fmt_default'.

The old 'get_name' is still available to refer to the name of an event.

This ensures proper naming across tracing backends.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool |   40 
 1 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index f2b6680..853f1bd 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -57,6 +57,19 @@ get_name()
 echo "${name##* }"
 }
 
+get_api_name_fmt_default="trace_%s"
+
+# Get the tracepoint name of a trace event (the name of the function QEMU uses)
+# The result is 'printf "$fmt" $name', where 'fmt' is
+# 'get_api_name_fmt_default'.
+get_api_name()
+{
+local name fmt
+name=$(get_name "$1")
+fmt="$get_api_name_fmt_default"
+printf "$fmt" "$name"
+}
+
 # Get the given property of a trace event
 # 1: trace-events line
 # 2: property name
@@ -248,13 +261,13 @@ linetoh_begin_nop()
 
 linetoh_nop()
 {
-local name args
-name=$(get_name "$1")
+local api args
+api=$(get_api_name "$1")
 args=$(get_args "$1")
 
 # Define an empty function for the trace event
 cat <

[Qemu-devel] [PATCH 07/10] trace-tcg: [tracetool] Allow TCG types in trace event declarations

2011-12-08 Thread Lluís Vilanova

When found, TCG types are translated into the host native types when declaring
and defining the tracing routines in "trace.h" and "trace.c".
---
 scripts/tracetool |   31 +++
 1 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 853f1bd..1d8a637 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -210,6 +210,21 @@ nil_type()
 echo "$1"
 }
 
+# Native type translator for get_argtypes
+# (e.g., TCGv_i32 -> uint32_t)
+host_type()
+{
+case "$1" in
+"TCGv_i32") echo "uint32_t" ;;
+"TCGv_i64") echo "uint64_t" ;;
+"TCGv_ptr") echo "void *"   ;;
+# force a fixed-size type in trace.{h,c}
+# (ideally would use a host-specific type)
+"TCGv") echo "uint64_t" ;;
+*)  echo "$1"   ;;
+esac
+}
+
 
 # Get the argument name list of a trace event
 get_argnames()
@@ -263,7 +278,7 @@ linetoh_nop()
 {
 local api args
 api=$(get_api_name "$1")
-args=$(get_args "$1")
+args=$(get_args "$1" host_type)
 
 # Define an empty function for the trace event
 cat <

[Qemu-devel] [PATCH 08/10] trace-tcg: [tracetool] Declare TCG tracing helper routines

2011-12-08 Thread Lluís Vilanova

Generate the necessary TCG helper declarations for tracing events in guest code.

Signed-off-by: Lluís Vilanova 
---
 .gitignore |1 
 Makefile   |2 
 Makefile.objs  |9 ++
 def-helper.h   |9 ++
 scripts/simpletrace.py |2 
 scripts/tracetool  |  202 ++--
 6 files changed, 214 insertions(+), 11 deletions(-)

diff --git a/.gitignore b/.gitignore
index 406f75f..d47b586 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@ trace.h
 trace.c
 trace-dtrace.h
 trace-dtrace.dtrace
+trace-tcg-helper.h
 *-timestamp
 *-softmmu
 *-darwin-user
diff --git a/Makefile b/Makefile
index 506db1b..516622a 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ BUILD_DIR=$(CURDIR)
 
 GENERATED_HEADERS = config-host.h qemu-options.def
 
-GENERATED_HEADERS += trace.h
+GENERATED_HEADERS += trace.h trace-tcg-helper.h
 ifeq ($(TRACE_BACKEND),dtrace)
 GENERATED_HEADERS += trace-dtrace.h
 endif
diff --git a/Makefile.objs b/Makefile.objs
index d7a6539..8f97709 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -398,6 +398,15 @@ trace-obj-y += $(addprefix trace/, $(trace-nested-y))
 $(trace-obj-y): $(GENERATED_HEADERS)
 
 ##
+# trace tcg
+
+trace-tcg-helper.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
$(TRACETOOL_EXTRA) --tcg --tcg-h < $< > $@,"  GEN 
$(TARGET_DIR)trace-tcg-helper.h")
+   @cmp -s $@ trace-tcg-helper.h || cp $@ trace-tcg-helper.h
+
+trace-tcg-helper.h: trace-tcg-helper.h-timestamp
+
+##
 # smartcard
 
 libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o 
vcard_emul_type.o card_7816.o
diff --git a/def-helper.h b/def-helper.h
index 8a822c7..c4fce2d 100644
--- a/def-helper.h
+++ b/def-helper.h
@@ -13,6 +13,11 @@
helper.c, defining:
 GEN_HELPER 1 to produce op generation functions (gen_helper_*)
 GEN_HELPER 2 to do runtime registration helper functions.
+
+   After including this header the GEN_HELPER_MACRO will have the following
+   values:
+1  : op generation functions were produced (GEN_HELPER was 1)
+-1 : otherwise
  */
 
 #ifndef DEF_HELPER_H
@@ -142,6 +147,7 @@ dh_ctype(ret) HELPER(name) (dh_ctype(t1), dh_ctype(t2), 
dh_ctype(t3), \
 
 #undef GEN_HELPER
 #define GEN_HELPER -1
+#define GEN_HELPER_CODE -1
 
 #elif GEN_HELPER == 1
 /* Gen functions.  */
@@ -205,6 +211,7 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret) dh_arg_decl(t1, 1
 
 #undef GEN_HELPER
 #define GEN_HELPER -1
+#define GEN_HELPER_CODE 1
 
 #elif GEN_HELPER == 2
 /* Register helpers.  */
@@ -226,6 +233,7 @@ DEF_HELPER_FLAGS_0(name, flags, ret)
 
 #undef GEN_HELPER
 #define GEN_HELPER -1
+#define GEN_HELPER_CODE -1
 
 #elif GEN_HELPER == -1
 /* Undefine macros.  */
@@ -236,5 +244,6 @@ DEF_HELPER_FLAGS_0(name, flags, ret)
 #undef DEF_HELPER_FLAGS_3
 #undef DEF_HELPER_FLAGS_4
 #undef GEN_HELPER
+#undef GEN_HELPER_CODE
 
 #endif
diff --git a/scripts/simpletrace.py b/scripts/simpletrace.py
index f55e5e6..3476d6c 100755
--- a/scripts/simpletrace.py
+++ b/scripts/simpletrace.py
@@ -20,7 +20,7 @@ dropped_event_id = 0xfffe
 
 trace_fmt = '='
 trace_len = struct.calcsize(trace_fmt)
-event_re  = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\).*')
+event_re  = 
re.compile(r'(disable\s+|tcg\s+|tcg-vcpu\s+)*([a-zA-Z0-9_]+)\(([^)]*)\).*')
 
 def parse_events(fobj):
 """Parse a trace-events file into {event_num: (name, arg1, ...)}."""
diff --git a/scripts/tracetool b/scripts/tracetool
index 1d8a637..5431f2d 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -13,7 +13,7 @@ set -f
 usage()
 {
 cat >&2 <&2
+exit 1
+;;
+esac
+}
+
+tcg_helper_decl_type()
+{
+local type
+type=`tcg_type "$1"`
+case "$type" in
+"TCGv") echo "tl"  ;;
+"TCGv_ptr") echo "ptr" ;;
+"TCGv_i32") echo "i32" ;;
+"TCGv_i64") echo "i64" ;;
+*)  echo "Don't know how to translate type $1 into a TCG 
helper declaration type" >&2
+exit 1
+;;
+esac
+}
+
 
 # Get the argument name list of a trace event
 get_argnames()
@@ -662,6 +708,127 @@ linetostap_end_dtrace()
 return
 }
 
+linetotcg_h_nop()
+{
+local has_tcg has_tcg_vcpu
+has_property "$1" "tcg" && has_tcg=1
+has_property "$1" "tcg-vcpu" && has_tcg_vcpu=1
+if [ "$has_tcg" != "1" -a "$has_tcg_vcpu" != "1" ]; then
+return
+fi
+
+local api
+api=$(get_api_name "$1")
+
+cat <&2
+exit 1
+fi
+echo "DEF_HELPER_$argc(${api}_proxy, void$proxy_argtypes)"
+else
+echo "DEF_HELPER_FLAGS_$argc(${api}_proxy, TCG_CALL_CONST, 
void$proxy_argtypes)"
+fi
+
+# mixed-type to TCG helper bridge
+local argnames argname temps_

[Qemu-devel] [PATCH 10/10] trace-tcg: [all] Include TCG-tracing helpers

2011-12-08 Thread Lluís Vilanova

Signed-off-by: Lluís Vilanova 
---
 target-alpha/helper.h  |2 ++
 target-arm/helper.h|2 ++
 target-cris/helper.h   |2 ++
 target-i386/helper.h   |2 ++
 target-lm32/helper.h   |2 ++
 target-m68k/helper.h   |2 ++
 target-microblaze/helper.h |2 ++
 target-mips/helper.h   |2 ++
 target-ppc/helper.h|2 ++
 target-s390x/helper.h  |2 ++
 target-sh4/helper.h|2 ++
 target-sparc/helper.h  |2 ++
 target-unicore32/helper.h  |2 ++
 target-xtensa/helper.h |2 ++
 14 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/target-alpha/helper.h b/target-alpha/helper.h
index b693cee..e7ebed9 100644
--- a/target-alpha/helper.h
+++ b/target-alpha/helper.h
@@ -120,4 +120,6 @@ DEF_HELPER_FLAGS_0(get_time, TCG_CALL_CONST, i64)
 DEF_HELPER_FLAGS_1(set_alarm, TCG_CALL_CONST, void, i64)
 #endif
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 16dd5fc..7ab33b4 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -472,4 +472,6 @@ DEF_HELPER_3(neon_qzip8, void, env, i32, i32)
 DEF_HELPER_3(neon_qzip16, void, env, i32, i32)
 DEF_HELPER_3(neon_qzip32, void, env, i32, i32)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-cris/helper.h b/target-cris/helper.h
index 093063a..9bfb144 100644
--- a/target-cris/helper.h
+++ b/target-cris/helper.h
@@ -23,4 +23,6 @@ DEF_HELPER_FLAGS_2(evaluate_flags_move_2, TCG_CALL_PURE, i32, 
i32, i32)
 DEF_HELPER_0(evaluate_flags, void)
 DEF_HELPER_0(top_evaluate_flags, void)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-i386/helper.h b/target-i386/helper.h
index 6b518ad..68747aa 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -217,4 +217,6 @@ DEF_HELPER_2(rclq, tl, tl, tl)
 DEF_HELPER_2(rcrq, tl, tl, tl)
 #endif
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-lm32/helper.h b/target-lm32/helper.h
index 9d335ef..52a310b 100644
--- a/target-lm32/helper.h
+++ b/target-lm32/helper.h
@@ -11,4 +11,6 @@ DEF_HELPER_0(rcsr_ip, i32)
 DEF_HELPER_0(rcsr_jtx, i32)
 DEF_HELPER_0(rcsr_jrx, i32)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-m68k/helper.h b/target-m68k/helper.h
index cb8a0c7..dc149c7 100644
--- a/target-m68k/helper.h
+++ b/target-m68k/helper.h
@@ -51,4 +51,6 @@ DEF_HELPER_3(set_mac_extu, void, env, i32, i32)
 DEF_HELPER_2(flush_flags, void, env, i32)
 DEF_HELPER_1(raise_exception, void, i32)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-microblaze/helper.h b/target-microblaze/helper.h
index b92aa34..729ffd3 100644
--- a/target-microblaze/helper.h
+++ b/target-microblaze/helper.h
@@ -36,4 +36,6 @@ DEF_HELPER_4(memalign, void, i32, i32, i32, i32)
 DEF_HELPER_2(get, i32, i32, i32)
 DEF_HELPER_3(put, void, i32, i32, i32)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-mips/helper.h b/target-mips/helper.h
index 442f684..d977b6b 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -297,4 +297,6 @@ DEF_HELPER_0(rdhwr_ccres, tl)
 DEF_HELPER_1(pmon, void, int)
 DEF_HELPER_0(wait, void)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 470e42f..86f6e8a 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -408,4 +408,6 @@ DEF_HELPER_2(store_601_batl, void, i32, tl)
 DEF_HELPER_2(store_601_batu, void, i32, tl)
 #endif
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-s390x/helper.h b/target-s390x/helper.h
index 01c8d0e..d702605 100644
--- a/target-s390x/helper.h
+++ b/target-s390x/helper.h
@@ -149,4 +149,6 @@ DEF_HELPER_2(cksm, void, i32, i32)
 DEF_HELPER_FLAGS_4(calc_cc, TCG_CALL_PURE|TCG_CALL_CONST,
i32, i32, i64, i64, i64)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-sh4/helper.h b/target-sh4/helper.h
index 95e3c7c..2ea76fa 100644
--- a/target-sh4/helper.h
+++ b/target-sh4/helper.h
@@ -51,4 +51,6 @@ DEF_HELPER_1(ftrc_DT, i32, f64)
 DEF_HELPER_2(fipr, void, i32, i32)
 DEF_HELPER_1(ftrv, void, i32)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-sparc/helper.h b/target-sparc/helper.h
index 1f67b08..a4fa61a 100644
--- a/target-sparc/helper.h
+++ b/target-sparc/helper.h
@@ -167,4 +167,6 @@ VIS_CMPHELPER(cmpne);
 DEF_HELPER_1(compute_psr, void, env);
 DEF_HELPER_1(compute_C_icc, i32, env);
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --git a/target-unicore32/helper.h b/target-unicore32/helper.h
index 615de2a..63b8260 100644
--- a/target-unicore32/helper.h
+++ b/target-unicore32/helper.h
@@ -67,4 +67,6 @@ DEF_HELPER_2(ucf64_si2df, f64, f32, env)
 DEF_HELPER_2(ucf64_sf2si, f32, f32, env)
 DEF_HELPER_2(ucf64_df2si, f32, f64, env)
 
+#include "trace-tcg-helper.h"
+
 #include "def-helper.h"
diff --

Re: [Qemu-devel] [PATCH 1/2] guest agent: add RPC blacklist command-line option

2011-12-08 Thread Dor Laor


On 12/07/2011 06:45 PM, Michael Roth wrote:

On 12/07/2011 06:12 AM, Dor Laor wrote:

On 12/07/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Dec 07, 2011 at 12:34:01PM +0200, Dor Laor wrote:

On 12/07/2011 06:03 AM, Michael Roth wrote:

This adds a command-line option, -b/--blacklist, that accepts a
comma-seperated list of RPCs to disable, or prints a list of
available RPCs if passed "?".

In consequence this also adds general blacklisting and RPC listing
facilities to the new QMP dispatch/registry facilities, should the
QMP monitor ever have a need for such a thing.


Beyond run time disablement, how easy it is to compile out some of
the general commands such as exec/file-handling?

Security certifications like common criteria usually ask to compile
out anything that might tamper security.


I don't think that's really relevant/needed. As discussed on the
call yesterday, this is security theatre, because nothing can prevent
the host admin from accessing guest RAM or disk data. AFAIK the
virtualization related security certifications acknowledge this
already& don't make any claims about security of guests against
a malicious host admin. In any case, a suitable SELinux policy for
the guest agent could prevent arbitrary file/binary access via
generic 'exec' / 'file-read' commands, in a manner that is sufficient
to satisfy security certications.


I absolutely agree that the hypervisor can tweak the guest in multiple
ways. Nevertheless there are two reasons I asked it:

1. Reduce code and noise from security reviewers eyes.
We were asked to do exactly that for other qemu functionality that
is included but does not run at all. It's just makes the review
faster.


Actually removing the code, or compiling it out?

If it's a matter of compiling it out, the best solution I can think of
is having the QAPI code generators create a #define  for each RPC,
then wrapping the implementations inside an #ifdef . That way you
could compile out the code by simply modifying the schema.

That said, I'd really like to avoid having distros get into the habit of
extensively modifying their guest agent source outside of bug fixes and
whatnot, I think it'll cause too many problems down the road. From a
management perspective, if you're running a cloud with multiple distros,
it'll be really difficult to account for agents that have been modified
or crippled in various ways.


I don't mind ignoring the guest side for security issues, but since 
we're discussing it, isn't the mechanism for capability exchange will 
take of command existence? We'll need it anyway to handle various agent 
versions.




Perhaps we only need, say, shutdown, for ovirt, and compile out the
rest, but maybe a customer wants to run their RHEL guest in home-brewed
environment where they use qemu-ga file read/write to handle a specific
set of guest activation procedures. Now they need a new agent package.

It's a whole lot of hassle for host/guest admins for the sake of saving
a security reviewer a bit of investigating that'll lead right back to
the general operating premise that you have to trust your host
administrators before any chain of trust can be established.

At least with this interface we can provide some semblance of relief to
users with specific security concerns, but don't have to work with
distros to re-package agents when those concerns collide with
requirements on the host side. We can just check to see if they disabled
the functionality and request they re-enable due to  by updating
their configs.



2. Every piece of code is a risk for exploit
Imagine that a bug/leak/use-after-free in the blacklist command or
the exec command on qemu exists and allows attacked to gain control
of qemu.


A host can never assume that a guest [agent] can be trusted. qemu-ga
might've been replaced completely by a malicious guest admin, thus
circumventing any steps a distro has taken to harden it. Fortunately a
guest can only affect memory outside it's address space by going through
the virtio-serial/QMP layer. So we can focus our efforts on hardening
the transport and json parser layers, and a lot of work has gone into
that already (placing limits on token size, recursion depth, etc). So
that's more an issue that needs to be addressed on the qemu side, and is
independent of any particular RPC implementation on the guest side.


I agree that the host side is relevant and this guest side is negligible 
here since a malicious guest will create its own agent.








Regards,
Daniel

[Qemu-devel] [PATCH 09/10] trace-tcg: [tracetool] Define TCG tracing helper routines

2011-12-08 Thread Lluís Vilanova

Generate the necessary TCG helper routines for tracing events in guest code.

Signed-off-by: Lluís Vilanova 
---
 .gitignore|1 +
 Makefile  |2 +-
 Makefile.objs |8 +++
 Makefile.target   |2 ++
 scripts/tracetool |   65 +
 5 files changed, 77 insertions(+), 1 deletions(-)

diff --git a/.gitignore b/.gitignore
index d47b586..1a6de79 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,6 +7,7 @@ trace.c
 trace-dtrace.h
 trace-dtrace.dtrace
 trace-tcg-helper.h
+trace-tcg-helper.c
 *-timestamp
 *-softmmu
 *-darwin-user
diff --git a/Makefile b/Makefile
index 516622a..b1d1249 100644
--- a/Makefile
+++ b/Makefile
@@ -9,7 +9,7 @@ GENERATED_HEADERS += trace.h trace-tcg-helper.h
 ifeq ($(TRACE_BACKEND),dtrace)
 GENERATED_HEADERS += trace-dtrace.h
 endif
-GENERATED_SOURCES += trace.c
+GENERATED_SOURCES += trace.c trace-tcg-helper.c
 
 GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
 GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c
diff --git a/Makefile.objs b/Makefile.objs
index 8f97709..f153fa5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -406,6 +406,14 @@ trace-tcg-helper.h-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.
 
 trace-tcg-helper.h: trace-tcg-helper.h-timestamp
 
+trace-tcg-helper.c-timestamp: $(SRC_PATH)/trace-events 
$(BUILD_DIR)/config-host.mak
+   $(call quiet-command,sh $(SRC_PATH)/scripts/tracetool 
$(TRACETOOL_EXTRA) --tcg --tcg-c < $< > $@,"  GEN   
$(TARGET_DIR)trace-tcg-helper.c")
+   @cmp -s $@ trace-tcg-helper.c || cp $@ trace-tcg-helper.c
+
+trace-tcg-helper.c: trace-tcg-helper.c-timestamp
+
+trace-tcg-helper.o: trace-tcg-helper.c $(GENERATED_HEADERS)
+
 ##
 # smartcard
 
diff --git a/Makefile.target b/Makefile.target
index a111521..8b3ffd6 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -90,6 +90,8 @@ libobj-$(CONFIG_TCI_DIS) += tci-dis.o
 
 tci-dis.o: QEMU_CFLAGS += -I$(SRC_PATH)/tcg -I$(SRC_PATH)/tcg/tci
 
+libobj-y += trace-tcg-helper.o
+
 $(libobj-y): $(GENERATED_HEADERS)
 
 # libqemu
diff --git a/scripts/tracetool b/scripts/tracetool
index 5431f2d..197739a 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -30,6 +30,7 @@ Output formats:
   -d  Generate .d file (DTrace only)
   --stap  Generate .stp file (DTrace with SystemTAP only)
   --tcg-h Generate trace-tcg-helper.h file (tcg only)
+  --tcg-c Generate trace-tcg-helper.c file (tcg only)
 
 Options:
   --binary   [path]Full path to QEMU binary
@@ -241,6 +242,13 @@ tcg_compat_type()
 esac
 }
 
+host_tcg_compat_type()
+{
+local tcg_compat_type
+tcg_compat_type=`tcg_compat_type "$1"`
+host_type "$tcg_compat_type"
+}
+
 tcg_type()
 {
 local tcg_compat_type
@@ -829,6 +837,53 @@ linetotcg_h_end_tcg()
 return
 }
 
+linetotcg_c_nop()
+{
+return
+}
+
+linetotcg_c_tcg()
+{
+local has_tcg has_tcg_vcpu
+has_property "$1" "tcg" && has_tcg=1
+has_property "$1" "tcg-vcpu" && has_tcg_vcpu=1
+if [ "$has_tcg" != "1" -a "$has_tcg_vcpu" != "1" ]; then
+return
+fi
+
+# TCG-compatible to actual type conversion bridge
+local api args argtypes argnames values
+api=$(get_api_name "$1")
+args=$(get_args "$1" host_tcg_compat_type)
+argnames=$(get_argnames "$1" ",")
+argtypes=$(get_argtypes "$1" host_type)
+values=$(zip_lists "$argtypes" "$argnames" "(%s)%s, ")
+values=${values%, }
+
+cat <&2
+exit 1
+fi
+convert tcg_c
+}
+
 
 backend=
 output=
@@ -949,6 +1013,7 @@ do
 "-h" | "-c" | "-d") output="${1#-}" ;;
 "--stap") output="${1#--}" ;;
 "--tcg-h") output="tcg_h" ;;
+"--tcg-c") output="tcg_c" ;;
 
 "--check-backend") exit 0 ;; # used by ./configure to test for backend

Re: [Qemu-devel] [RFC] qemu-ga: Introduce guest-hibernate command

2011-12-08 Thread Andreas Färber

Am 08.12.2011 19:52, schrieb Luiz Capitulino:
> This is basically suspend to disk on a Linux guest.
> 
> Signed-off-by: Luiz Capitulino 
> ---
> 
> This is an RFC because I did it as simple as possible and I'm open to
> suggestions...
> 
> Now, while testing this or even "echo disk > /sys/power/state" I get several
> funny results. Some times qemu just dies after printing that message:
> 
>  "Guest moved used index from 20151 to 1"

On s390x that happened when virtio memory was not zeroed on init...

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

Re: [Qemu-devel] [PATCH 2/2] configure: remove --enable-cocoa (default), add --disable-cocoa.

2011-12-08 Thread Andreas Färber

Am 07.12.2011 22:21, schrieb Peter Maydell:
> On 7 December 2011 21:12, Andreas Färber  wrote:
>> Note that I have a patch that replaces uint16 with uint_fast16_t,
>> properly fixing the Cocoa build. What I don't have yet is all the other
>> conversions (Coccinelle doesn't fully do int16 conversion, for example)
>> to run the benchmarks Peter asked for.
> 
> For the benchmarks surely it suffices to flip the typedefs, ie compare
> typedef uint8_t uint8;
> typedef int8_t int8;
> typedef uint16_t uint16;
> typedef int16_t int16;
> typedef uint32_t uint32;
> typedef int32_t int32;
> typedef uint64_t uint64;
> typedef int64_t int64;
> 
> with
> typedef uint_fast8_t uint8;
> typedef int_fast8_t int8;
> typedef uint_fast16_t uint16;
> typedef int_fast16_t int16;
> typedef uint_fast32_t uint32;
> typedef int_fast32_t int32;
> typedef uint_fast64_t uint64;
> typedef int_fast64_t int64;
> 
> ?
> 
> We only need to do the full search-n-replace when we've picked
> which one we're going for...

FWIW, target-mips/cpu.h has this:

// uint_fast8_t and uint_fast16_t not in 
// XXX: move that elsewhere
#if defined(CONFIG_SOLARIS) && CONFIG_SOLARIS_VERSION < 10
typedef unsigned char   uint_fast8_t;
typedef unsigned intuint_fast16_t;
#endif

This shouldn't stop us from using these types, on the contrary, there is
prior art. We'd just have to move these to qemu-common.h or so.

We still don't build on OpenIndiana due to -std=gnu99 vs.
make_floatx80() initialization code BTW. Any ideas there appreciated.

Andreas

Re: [Qemu-devel] [PATCH 1/2] guest agent: add RPC blacklist command-line option

2011-12-08 Thread Michael Roth


On 12/08/2011 04:53 PM, Dor Laor wrote:

On 12/07/2011 06:45 PM, Michael Roth wrote:

On 12/07/2011 06:12 AM, Dor Laor wrote:

On 12/07/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Dec 07, 2011 at 12:34:01PM +0200, Dor Laor wrote:

On 12/07/2011 06:03 AM, Michael Roth wrote:

This adds a command-line option, -b/--blacklist, that accepts a
comma-seperated list of RPCs to disable, or prints a list of
available RPCs if passed "?".

In consequence this also adds general blacklisting and RPC listing
facilities to the new QMP dispatch/registry facilities, should the
QMP monitor ever have a need for such a thing.


Beyond run time disablement, how easy it is to compile out some of
the general commands such as exec/file-handling?

Security certifications like common criteria usually ask to compile
out anything that might tamper security.


I don't think that's really relevant/needed. As discussed on the
call yesterday, this is security theatre, because nothing can prevent
the host admin from accessing guest RAM or disk data. AFAIK the
virtualization related security certifications acknowledge this
already& don't make any claims about security of guests against
a malicious host admin. In any case, a suitable SELinux policy for
the guest agent could prevent arbitrary file/binary access via
generic 'exec' / 'file-read' commands, in a manner that is sufficient
to satisfy security certications.


I absolutely agree that the hypervisor can tweak the guest in multiple
ways. Nevertheless there are two reasons I asked it:

1. Reduce code and noise from security reviewers eyes.
We were asked to do exactly that for other qemu functionality that
is included but does not run at all. It's just makes the review
faster.


Actually removing the code, or compiling it out?

If it's a matter of compiling it out, the best solution I can think of
is having the QAPI code generators create a #define  for each RPC,
then wrapping the implementations inside an #ifdef . That way you
could compile out the code by simply modifying the schema.

That said, I'd really like to avoid having distros get into the habit of
extensively modifying their guest agent source outside of bug fixes and
whatnot, I think it'll cause too many problems down the road. From a
management perspective, if you're running a cloud with multiple distros,
it'll be really difficult to account for agents that have been modified
or crippled in various ways.


I don't mind ignoring the guest side for security issues, but since
we're discussing it, isn't the mechanism for capability exchange will
take of command existence? We'll need it anyway to handle various agent
versions.


Agreed, and with the capabilities reporting introduced in patch 2 we'd 
be able to determine whether a guest command was simply disabled, or if 
it was compiled out. So that's not too much a concern.


The issue is that the latter case is much easier to rectify if the 
disabled command becomes a requirement on the host side, since it's a 
guest config change, rather than a re-spin of a guest agent package. For 
a homogenous environment, re-spinning the agent package isn't too 
difficult to deal with, but in a mixed environment there would be a lot 
of inertia in needing to coordinate requirements with multiple distro 
package maintainers to support new agent features and provide updated 
packages.


The only way to get around this, for mixed environments, is if our 
primary deployment model is to support the agent for a number of distros 
(RHEL/SLES/etc) and have the host push new versions as needed (via ISO, 
or unattended via guest distro-packaged agent with remote update support).


That way, each host/distro can push an agent that suites their specific 
requirements, while an upstream/community-supported guest tools ISO 
focuses on broader functionality. Kind of like the virtio-win drivers, 
where non-RHEL users can consume via community-supported unsigned drivers.


But that still requires certain agent functionality to remain 
"off-limits", such as remote update (which is currently possible via 
guest-file-write and guest-shutdown, or eventually without shutdown via 
guest-exec, though a specific update interface would probably be 
warranted for this scenario).






Perhaps we only need, say, shutdown, for ovirt, and compile out the
rest, but maybe a customer wants to run their RHEL guest in home-brewed
environment where they use qemu-ga file read/write to handle a specific
set of guest activation procedures. Now they need a new agent package.

It's a whole lot of hassle for host/guest admins for the sake of saving
a security reviewer a bit of investigating that'll lead right back to
the general operating premise that you have to trust your host
administrators before any chain of trust can be established.

At least with this interface we can provide some semblance of relief to
users with specific security concerns, but don't have to work with
distros to re-package agents when those concerns

Re: [Qemu-devel] [PATCH 2/3] target-mips:enabling of 64 bit user mode and floating point operations MIPS_HFLAG_UX is included in env->hflags so that the address computation for LD instruction does not

2011-12-08 Thread Andreas Färber

Thanks for extending the commit description. Please see this for a
template though:

http://live.gnome.org/Git/CommitMessages

Looks like there's an empty line missing between subject and description
(and the space after "target-mips:").

Am 08.12.2011 06:25, schrieb kha...@kics.edu.pk:
> From: Khansa Butt 
> 
> 
> Signed-off-by: Abdul Qadeer 
> ---
>  target-mips/translate.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index d5b1c76..452a63b 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -12779,6 +12779,10 @@ void cpu_reset (CPUMIPSState *env)
>  env->hflags |= MIPS_HFLAG_FPU;
>  }
>  #ifdef TARGET_MIPS64
> +env->hflags |=  MIPS_HFLAG_UX;

So for those of us not knowing mips, it's defined as:

#define MIPS_HFLAG_UX 0x00200 /* 64-bit user mode */

The code above is inside CONFIG_USER_ONLY, so this looks right for n64
but not for n32 ABI.

If you put this into its own patch with a description of

---8<---
target-mips: Enable 64 bit user mode for n64

For user mode n64 ABI emulation, MIPS_HFLAG_UX is included in
env->hflags so that the address computation for LD instruction does not
get treated as 32 bit code, see gen_op_addr_add() in translate.c.

Signed-off-by: Abdul Qadeer 
Signed-off-by: (you)
---8<---

and make it depend on TARGET_ABI_MIPSN64 then I will happily add my
Acked-by.


> +/* if cpu has FPU, MIPS_HFLAG_F64 must be included in env->hflags
> +   so that floating point operations can be emulated */
> +env->active_fpu.fcr0 = env->cpu_model->CP1_fcr0;
>  if (env->active_fpu.fcr0 & (1 << FCR0_F64)) {
>  env->hflags |= MIPS_HFLAG_F64;
>  }

Nack. env->active_fpu.fcr0 gets initialized in translate_init.c based on
cpu_model->CR1_fcr0, where FCR0_F64 is set only for 24Kf, 34Kf,
MIPS64R2-generic. TARGET_ABI_MIPSN64 linux-user defaults to 20Kc. So it
seems to rather be an issue of using the right -cpu parameter or
changing the default for n64. [cc'ing Nathan, who introduced the if]

Andreas

Re: [Qemu-devel] [PATCH 1/3] linux-user:Support for MIPS64 user mode emulation in QEMU

2011-12-08 Thread Andreas Färber

Am 08.12.2011 16:15, schrieb Andreas Färber:
>> diff --git a/configure b/configure
>> index ac4840d..e31229b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -914,6 +914,7 @@ m68k-linux-user \
>>  microblaze-linux-user \
>>  microblazeel-linux-user \
>>  mips-linux-user \
>> +mips64-linux-user \
>>  mipsel-linux-user \

> For linux-user IIUC the ABI is relevant, so shouldn't this be
> mipsn64-linux-user?

Self-nack. "mips64" already sets TARGET_ABI_MIPSN64 so the naming is OK.

> What about mipsn64el?

Question still applies, be it mipsel64 or mips64el.

Andreas

1 2 >

1 - 100 of 113 matches

Mail list logo