date:20111117

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Ayal Baron



- Original Message -
> I have been following this thread pretty closely and the one sentence
> summary of
> the current argument is: ovirt-guest-agent is already featureful and
> tested, so
> let's drop qemu-ga and have everyone adopt ovirt-guest-agent.

What we're suggesting is let's drop *one* of the two agents (obviously it would 
be easier for us to drop qemu-ga, but we'd rather reach consensus and unite 
behind one agent regardless of which agent it is).

>  Unfortunately,
> this track strays completely away from the stated goal of
> convergence.  I have
> at least two examples of why the greater KVM community can never
> adopt
> ovirt-guest-agent as-is.  To address this, I would like to counter
> with an
> example on how qemu-ga can enable the deployment of ovirt-guest-agent
> features
> and satisfy the needs of the whole community at the same time.
> 
> 1) Scope:  The ovirt-guest-agent contains functionality that is
> incredibly
> useful within the context of oVirt.  Single Sign-on is very handy but
> KVM users
> outside the scope of oVirt will not want this extra complexity in
> their agent.
> For simplicity they will probably just write something small that
> does what they
> need (and we have failed to provide a ubiquitous KVM agent).

I totally agree, but that could easily be resolved using the plugin 
architecture suggested before.

> 
> 1) Deployment complexity: The more complex the guest agent is, the
> more often it
> will need to be updated (bug/security fixes, distro compatibility,
> new
> features).  Rolling out guest agent updates does not scale well in
> large
> environments (especially when the guest and host administrators are
> not the same
> person).

Using plugins, you just deploy the ones you need, keeping the attack surface / 
#bugs / need to update lower

> 
> For these reasons (and many others), I support having an agent with
> very basic
> primitives that can be orchestrated by the host to provide needed
> functionality.
> This agent would present a low-level, stable, extensible API that
> everyone can
> use.  Today qemu-ga supports the following verbs: sync ping info
> shutdown
> file-open file-close file-read file-write file-seek file-flush
> fsfreeze-status
> fsfreeze-freeze fsfreeze-thaw.  If we add a generic execute
> mechanism, then the
> agent can provide everything needed by oVirt to deploy SSO.
> 
> Let's assume that we have already agreed on some sort of security
> policy for the
> write-file and exec primitives.  Consensus is possible on this issue
> but I
> don't want to get bogged down with that here.
> 
> With the above primitives, SSO could be deployed automatically to a
> guest with
> the following sequence of commands:
> 
> file-open "/sso-package.bin" "w"
> file-write  
> file-close 
> file-open "/sso-package.bin" "x"
> file-exec  
> file-close 

The guest can run on any number of hosts.  currently, the guest tools contain 
all the relevant logic installed (specifically for the guest os version).
What you're suggesting here is that we keep all the relevant guest-agent 
variants code on the host, automatically detect the guest os version and inject 
the correct file (e.g. SSO on winXP and on win2k8 is totally different).
In addition, there might be things requiring boot for example. So to solve that 
we would instead need to install a set of tools on the guest like we do the 
guest agent today (it would be a separate package because it's management 
specific).  And then we would tell the guest-agent to run tools from that set?  
Sounds overly complex to me.

> 
> At this point, the package is installed.  It can contain whatever
> existing logic
> exists in the ovirt-guest-agent today.  To perform a user login,
> we'll assume
> that sso-package.bin contains an executable 'sso/do-user-sso':
> 
> file-open "/sso/do-user-sso" "x"
> exec  
> file-close 
> 
> At this point the user would be logged in as before.
> 
> Obviously, this type of approach could be made easier by providing a
> well
> designed exec API that returns command exit codes and (optionally)
> command
> output.  We could also formalize the install of additional components
> into some
> sort of plugin interface.  These are all relatively easy problems to
> solve.
> 
> If we go in this direction, we would have a simple, general-purpose
> agent with
> low-level primitives that everyone can use.  We would also be able to
> easily
> extend the agent based on the needs of individual deployments (not
> the least of
> which is an oVirt environment).  If certain plugins become popular
> enough, they
> can always be promoted to first-order API calls in future versions of
> the API.
> 
> What are your thoughts on this approach?
> 
> --
> Adam Litke 
> IBM Linux Technology Center
> 
> ___
> Arch mailing list
> a...@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/arch
>

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Ayal Baron



- Original Message -
> On 11/16/2011 11:53 AM, Barak Azulay wrote:
> > On Wednesday 16 November 2011 17:28:16 Michael Roth wrote:
> >> 2) You'd also need a schema, similar to
> >> qemu.git/qapi-schema-guest.json,
> >> to describe the calls you're proxying. The existing infrastructure
> >> in
> >> QEMU will handle all the work of marshalling/unmarshalling
> >> responses
> >> back to the QMP client on the host-side.
> >>
> >> It's a bit of extra work, but the benefit is unifying the
> >> qemu/guest-level management interface into a single place that's
> >> easy
> >> for QMP/libvirt to consume.
> >>
> >
> > The issue is not whether it's possible or not or the amount of
> > efforts need to
> > be done for that to happen, either for qemu-ga or ovirt-guest-agent
> > this work
> > needs to be done.
> >
> > the question is whether all comminication should go through the
> > monitor (hence
> > double proxy) or ... only a subset of the commands that are closly
> > related to
> > hypervisor functionality and separate it from general
> > management-system
> > related actions (e.g. ovirt or any other management system that
> > wants to
> > communicate to the guest).
> 
> Yes, all guest interaction should be funnelled through QEMU.  QEMU
> has one job
> in life--to expose an interface to guests and turn it into something
> more useful
> to the host.  QEMU expose an emulated AHCI controller and turns that
> into VFS
> operations.
> 
> Likewise, QEMU should expose a paravirtual "agent" device to a guest,
> and then
> turn that into higher level management interfaces.

Exposing higher level management interfaces means that qemu would have to do 
policy.
I have no problem with this, but note that this is counter to what you've been 
advocating to up to now (e.g. high watermark event for disks).

Also, you would still have to have low level interfaces to accomplish things 
that qemu has not implemented yet or is not interested in implementing (the use 
case is too narrow).

> 
> QEMU's job is to sanitize information from the guest and try to turn
> that into
> something that is safer for the broader world to consume.  QEMU also
> deals with
> isolating state in order to support things like live migration.  This

So are you suggesting that when a user reads a file you would automatically 
encode the contents?

> ends up
> being non trivial when it comes to guest agents as it turns out.
> 
> When you bypass QEMU and have higher level components talk directly
> to the
> guest, you effectively skip through many layers of security and
> potentially
> break things like migration by spreading state beyond QEMU.  It's of
> course
> fixable given enough hacking but it makes for a brittle architecture.
> 
> VDSM runs as root, right?  That means that a guest driven attack that

No, vdsm runs as user vdsm.  Operations that need root privileges are in a 
separate process with root privileges and this process exposes a limited API 
which vdsm is allowed to invoke.

> exploits
> an issue with guest-agent protocol handling is going to compromise
> VDSM and gain
> root access.  OTOH, QEMU runs with greatly reduced privileges
> isolating the
> effect of such a compromise.
> 
> VDSM really shouldn't be talking directly to the guest.  libvirt
> shouldn't be
> either although it is now because we haven't properly plumbed the
> guest agent
> protocol through QMP.
> 
> Regards,
> 
> Anthony Liguori
> ___
> vdsm-devel mailing list
> vdsm-de...@lists.fedorahosted.org
> https://fedorahosted.org/mailman/listinfo/vdsm-devel
>

[Qemu-devel] [Bug 891525] [NEW] Guest kernel crashes when booting a NUMA guest without explicitly specifying cpus= in -numa option

2011-11-17 Thread Bharata B Rao

Public bug reported:

Target: x86_64-softmmu

Qemu Command line: [root@hs22 qemu-1.0-rc2]# ./x86_64-softmmu/qemu-
system-x86_64 -smp sockets=2,cores=4,threads=2 -numa
node,nodeid=0,mem=4g -numa node,nodeid=1,mem=1g -cpu core2duo -m 5g
/home/bharata/f15-lvm -nographic --enable-kvm -net
nic,macaddr=54:52:00:46:26:84,model=e1000 -net tap,script=/etc/qemu-
if,ifname=vnet0

Qemu version: 1.0-rc2

When guest is started with -numa option without explicitly specifying
the cpus=, guest kernel crashes as below:

[0.252159] divide error:  [#1] SMP 
[0.252970] last sysfs file: 
[0.252970] CPU 1 
[0.252970] Modules linked in:
[0.252970] 
[0.252970] Pid: 2, comm: kthreadd Not tainted 2.6.38.6-26.rc1.fc15.x86_64 
#1 Bochs Bochs
[0.252970] RIP: 0010:[]  [] 
select_task_rq_fair+0x44a/0x571
[0.252970] RSP: :88011767fc60  EFLAGS: 00010046
[0.252970] RAX:  RBX: 88015d6ad300 RCX: 
[0.252970] RDX:  RSI: 0100 RDI: 
[0.252970] RBP: 88011767fd10 R08: 0100 R09: 88015d6ad338
[0.252970] R10: 00013840 R11: 00800711 R12: 
[0.252970] R13: 88015fc0f810 R14: 0001 R15: 
[0.252970] FS:  () GS:88015fc0() 
knlGS:
[0.252970] CS:  0010 DS:  ES:  CR0: 8005003b
[0.252970] CR2:  CR3: 01a03000 CR4: 06e0
[0.252970] DR0:  DR1:  DR2: 
[0.252970] DR3:  DR6: 0ff0 DR7: 0400
[0.252970] Process kthreadd (pid: 2, threadinfo 88011767e000, task 
88015d671720)
[0.252970] Stack:
[0.252970]  81475873 81a02140 88011767fce0 
8106c5a3
[0.252970]  88015d6ad318 0001000e 00013840 
00013840
[0.252970]  88015d6ad318 007d0001 8801 
88015d6d81e8
[0.252970] Call Trace:
[0.252970]  [] ? _raw_spin_lock_irq+0x1c/0x1e
[0.252970]  [] ? alloc_pid+0x2e6/0x335
[0.252970]  [] select_task_rq+0x16/0x46
[0.252970]  [] wake_up_new_task+0x3a/0xde
[0.252970]  [] do_fork+0x1f1/0x2bf
[0.252970]  [] ? load_TLS+0x10/0x14
[0.252970]  [] ? __switch_to+0xc6/0x220
[0.252970]  [] kernel_thread+0x75/0x77
[0.252970]  [] ? kthread+0x0/0x8c
[0.252970]  [] ? kernel_thread_helper+0x0/0x10
[0.252970]  [] kthreadd+0xe7/0x124
[0.252970]  [] kernel_thread_helper+0x4/0x10
[0.252970]  [] ? kthreadd+0x0/0x124
[0.252970]  [] ? kernel_thread_helper+0x0/0x10
[0.252970] Code: 01 45 c0 8b 8d 78 ff ff ff 48 8b 75 90 89 cf e8 4a 28 ff 
ff 3b 05 bd 89 ae 00 89 c1 7c c5 48 8b 45 c0 8b 4b 08 31 d2 48 c1 e0 0a 
[0.252970]  f7 f1 45 85 e4 75 08 48 3b 45 b0 72 08 eb 0d 48 89 45 b8 eb 
[0.252970] RIP  [] select_task_rq_fair+0x44a/0x571
[0.252970]  RSP 

When cpus= is specified for each node explicitly, guest boots fine.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/891525

Title:
  Guest kernel crashes when booting a NUMA guest without explicitly
  specifying cpus= in -numa option

Status in QEMU:
  New

Bug description:
  Target: x86_64-softmmu

  Qemu Command line: [root@hs22 qemu-1.0-rc2]# ./x86_64-softmmu/qemu-
  system-x86_64 -smp sockets=2,cores=4,threads=2 -numa
  node,nodeid=0,mem=4g -numa node,nodeid=1,mem=1g -cpu core2duo -m 5g
  /home/bharata/f15-lvm -nographic --enable-kvm -net
  nic,macaddr=54:52:00:46:26:84,model=e1000 -net tap,script=/etc/qemu-
  if,ifname=vnet0

  Qemu version: 1.0-rc2

  When guest is started with -numa option without explicitly specifying
  the cpus=, guest kernel crashes as below:

  [0.252159] divide error:  [#1] SMP 
  [0.252970] last sysfs file: 
  [0.252970] CPU 1 
  [0.252970] Modules linked in:
  [0.252970] 
  [0.252970] Pid: 2, comm: kthreadd Not tainted 2.6.38.6-26.rc1.fc15.x86_64 
#1 Bochs Bochs
  [0.252970] RIP: 0010:[]  [] 
select_task_rq_fair+0x44a/0x571
  [0.252970] RSP: :88011767fc60  EFLAGS: 00010046
  [0.252970] RAX:  RBX: 88015d6ad300 RCX: 

  [0.252970] RDX:  RSI: 0100 RDI: 

  [0.252970] RBP: 88011767fd10 R08: 0100 R09: 
88015d6ad338
  [0.252970] R10: 00013840 R11: 00800711 R12: 

  [0.252970] R13: 88015fc0f810 R14: 0001 R15: 

  [0.252970] FS:  () GS:88015fc0() 
knlGS:
  [0.252970] CS:  0010 DS:  ES:  CR0: 8005003b
  [0.252970] CR2:  CR3: 01a03000 CR4: 
06e0

[Qemu-devel] ncurses 5.3 conflicts with latest qemu

2011-11-17 Thread Caraman Mihai Claudiu-B02008

Hi,

A recent patch in qemu conflicts with old ncurses libraries (version 5.3). You 
will see this error cause by bool type redefinition in curses.h (with 
CONFIG_CURSES configured by default):

console.c: In function 'text_console_init':
console.c:1550:23: error: assignment from incompatible pointer type

the qemu patch exposing this problem is:

curses: fix garbling when chtype != long
author  Devin J. Pohly 
Wed, 7 Sep 2011 19:44:36 + (15:44 -0400)
committer   Anthony Liguori
Fri, 9 Sep 2011 17:58:16 + (12:58 -0500)
commit  df00bed0fa30a6f5712456e7add783e470c534c9

The problem seems to be fixed in newer versions of ncurses (5.7 and above). I 
just looked over the sources, so better if someone can confirm this. 
Here is a qemu patch that solve the conflict with old ncurses:


Signed-off-by: Mihai Caraman 
---
Fix compile errors with old ncurses libraries (version 5.3) caused by bool type 
redefinition.

 qemu-common.h |3 +++
 console.h |1 -

 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/qemu-common.h b/qemu-common.h index 5e87bdf..9ac15ba 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -23,6 +23,9 @@ typedef struct Monitor Monitor;  #include   
#include   #include 
+#ifdef CONFIG_CURSES
+#include 
+#endif
 #include 
 #include 
 #include 
diff --git a/console.h b/console.h
index 9c1487e..3327c43 100644
--- a/console.h
+++ b/console.h
@@ -329,7 +329,6 @@ static inline int ds_get_bytes_per_pixel(DisplayState *ds)  
}
 
 #ifdef CONFIG_CURSES
-#include 
 typedef chtype console_ch_t;
 #else
 typedef unsigned long console_ch_t;
--
1.7.4.1

Re: [Qemu-devel] [PATCH 1/4] Makefile: remove more generated files on clean

2011-11-17 Thread Paolo Bonzini


On 11/16/2011 10:58 PM, Michael S. Tsirkin wrote:

make clean missed the source qmp files generated
by python. Fix that.

Signed-off-by: Michael S. Tsirkin
---
  Makefile |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index 168093c..b335f2a 100644
--- a/Makefile
+++ b/Makefile
@@ -8,6 +8,7 @@ ifeq ($(TRACE_BACKEND),dtrace)
  GENERATED_HEADERS += trace-dtrace.h
  endif
  GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
+GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c

  ifneq ($(wildcard config-host.mak),)
  # Put the all: rule here so that config-host.mak can contain dependencies.
@@ -227,6 +228,7 @@ clean:
rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp
rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp
rm -f trace-dtrace.h trace-dtrace.h-timestamp
+   rm -f $(GENERATED_SOURCES)
rm -rf $(qapi-dir)
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) $(QEMULIBS) libcacard; do \


Series looks good,

Reviewed-by: Paolo Bonzini 

Paolo

Re: [Qemu-devel] [PATCH 4/4] Makefile: fix qga dependencies

2011-11-17 Thread Paolo Bonzini


On 11/16/2011 10:58 PM, Michael S. Tsirkin wrote:

.c files include .h files, so .o depends on .h,
and the linked result depends on .o.
We got it wrong for qga rules, fix it up.


Another possible option is to make the "all" target depend on 
GENERATED_HEADERS and GENERATED_SOURCES, like


all: $(GENERATED_HEADERS) $(GENERATED_SOURCES)
@$(MAKE) build-all

and drop the dependency everywhere else.  This will check the dependency 
at the beginning of the build (should be fine since the generated files 
change rarely) and rely on automatic dependency generation for the 
.o->.h dependencies.


Paolo

Re: [Qemu-devel] [PATCH] Add -f option to qemu-nbd

2011-11-17 Thread Stefan Hajnoczi

On Wed, Nov 16, 2011 at 5:23 PM, Ian Campbell  wrote:
> On Wed, 2011-11-16 at 10:34 +, Stefan Hajnoczi wrote:
>> On Wed, Nov 16, 2011 at 6:57 AM, Chunyan Liu  wrote:
>> > Currently qemu-nbd does not support finding free nbd device for users like
>> > "losetup -f" and issuing "qemu-nbd -c /dev/nbdX disk.img" won't report 
>> > error
>> > message when /dev/nbd is already in use. It makes things a little 
>> > confusing.
>> > This patch adds "-f" option to qemu-nbd to support finding a free nbd 
>> > device
>> > for users. Please review and share your comments. Thanks.
>> >
>> > Signed-off-by: Chunyan Liu 
>> > ---
>> >  qemu-nbd.c |   65 
>> > +++-
>> >  1 files changed, 64 insertions(+), 1 deletions(-)
>>
>> This patch finds a free device but does not immediately attach to it
>> and use it.  Interfaces like this are prone to race conditions, I
>> think it would make more sense to combine the -f option with running
>> the actual NBD server.
>>
>> I suggest:
>> qemu-nbd -f disk.img
>>
>> That way it is safe to execute multiple qemu-nbd -f at the same time
>> without race conditions.
>
> I agree, but you'd also need some locking inside qemu-nbd wouldn't you?
> Or have it just keep trying devices until one works perhaps.

Right, I haven't checked the nbd driver iterface but you'd have to
actually open the device and claim it in order to atomically find out
if it is free.  But that's probably not much more work than what this
patch does, it just has the advantage of being safer.

Stefan

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Alon Levy

On Wed, Nov 16, 2011 at 06:55:06PM +0100, Hans de Goede wrote:
> Hi,
> 
> On 11/16/2011 02:47 PM, Anthony Liguori wrote:
> >On 11/16/2011 06:07 AM, Alon Levy wrote:
> >>On Wed, Nov 16, 2011 at 08:53:45AM +0100, Hans de Goede wrote:
> >>>Hi,
> >>>
> >>>On 11/15/2011 11:39 PM, Ayal Baron wrote:
> 
> >>>
> >>>
> >>>
> >If you want to talk about convergence, the discussion should start
> >around
> >collecting requirements. We can then figure out if the two sets of
> >requirements
> >are strictly overlapping or if there are any requirements that are
> >fundamentally
> >in opposition.
> 
> Agreed.
> 
> So vdsm guest agent goal is to ease administration of VMs. This is not 
> saying much as it is quite broad so I will list what is provided today 
> and some things we need to add:
> 
> Assistance in VM life-cycle:
> "desktopShutdown" - Shuts the VM down gracefully from within the guest.
> "quiesce" - does not exist today. This is definitely a requirement for us.
> 
> SSO support for spice sessions (automatically login into guest OS using 
> provided credentials):
> "desktopLock" - lock current session, used when spice session gets 
> disconnected / before giving a new user access to spice session
> "desktopLogin"
> "desktopLogoff"
> In addition, guest reports relevant info (currently active user, session 
> state)
> 
> Monitoring and inventory:
> currently agent sends info periodically, which includes a lot of info 
> which should probably be broken down and served upon request. Info 
> includes -
> - memory usage
> - NICs info (name, hw, inet, inet6)
> - appslist (list of installed apps / rpms)
> - OS type
> - guest hostname
> - internal file systems info (path, fs type, total space, used space)
> 
> >>>
> >>>
> >>>
> >>>If we're gathering requirements and trying to come up with one agent to 
> >>>rule them all, don't forget
> >>>about VDI and the Spice agent. Currently the spice agent handles the 
> >>>following:
> >>>
> >>>1) Paravirtual mouse (needed to get mouse coordinates right with multi 
> >>>monitor setups)
> >
> >I thought there was wide agreement that pv mouse should be extracted from 
> >the guest agent into its own driver.
> 
> Yes AFAIK there is, but no-one has done that yet. I was merely listening what 
> the spice
> agent is doing today, hopefully tomorrow
> 
> >
> >>>2) Send client monitor configuration, so that the guest os can adjust its 
> >>>resolution
> >>>(and number and place of monitors) to match the client
> >
> >I also wonder if this should be part of QXL?
> 
> That is not really practically since this is something between the client and 
> the guest,
> where as the QXL device does communication between the hypervisor (qemu) and 
> the guest.
> Also there is a 1 head per QXL device relation, so that would mean that a 
> single qxl dev
> needs to be aware of other QXL devices in order to communicate the relative 
> position of
> its head to the other heads.

We do want to allow multiple heads on a single qxl device, since it
would make RandR work.

This only relates to the second point, Hans first point is still valid.
> 
> Regards,
> 
> Hans
>

Re: [Qemu-devel] [PATCH] LAN9118: Handling write to BYTE_TEST register

2011-11-17 Thread Peter Maydell

On 14 November 2011 09:39, Bertrand Cachet  wrote:
> @@ -977,6 +977,15 @@ static void lan9118_writel(void *opaque, 
> target_phys_addr_t offset,
>         s->pmt_ctrl &= ~0x34e;
>         s->pmt_ctrl |= (val & 0x34e);
>         break;
> +    case CSR_BYTE_TEST:
> +        /* Even if this register is marked ReadOnly in the datasheet,
> +           a write to this register will wake up the device when
> +           PM_MODE is currently in D1 or D2 mode.
> +
> +           As Power Modes are not handled in this driver, we will
> +           leave this case with no implementation.
> +         */
> +        break;
>     case CSR_GPIO_CFG:
>         /* Probably just enabling LEDs.  */
>         s->gpio_cfg = val & 0x071f;

Having thought about this a little, I think we should have the code
to modify the pmt_ctrl register here, but with a comment that explains
that this is currently a no-op since we are always in power mode D0.

There should also be a second patch in the series that corrects
the handling for the other RO and WO registers (which should
be read as zero and writes ignored, as per the datasheet).

-- PMM

Re: [Qemu-devel] Windows 7 shutdown causes BSOD

2011-11-17 Thread Gleb Natapov

On Thu, Nov 17, 2011 at 06:55:14PM +0800, hkran wrote:
> On 11/17/2011 02:37 PM, Gleb Natapov wrote:
> >On Thu, Nov 17, 2011 at 02:29:47PM +0800, hkran wrote:
> >>On 11/16/2011 06:51 PM, Gleb Natapov wrote:
> >>>On Wed, Nov 16, 2011 at 10:48:15AM +, Stefan Hajnoczi wrote:
> On Wed, Nov 16, 2011 at 10:14 AM, hkran   wrote:
> >On 11/15/2011 09:17 PM, Stefan Hajnoczi wrote:
> >>On Fri, Nov 4, 2011 at 11:25 AM, Stefan Hajnoczi
> >>  wrote:
> >>>On Fri, Nov 4, 2011 at 10:48 AM, Stefan Hajnoczi
> >>>  wrote:
> Windows 7 32-bit guest blue screens when I shut it down properly with
> Start | Shut Down.  The blue screen is only displayed for a split
> second before the guest reboots so I am not able to easily tell what
> it says.  My guess is that Windows is triple-faulting or soft
> rebooting - note that I told Windows to shut down, not reboot.
> 
> This issue happens on qemu.git/master (and Debian kvm 0.14.1+dfsg-3).
> Here is the QEMU command-line:
> 
> x86_64-softmmu/qemu-system-x86_64 -L pc-bios -cpu qemu32 -enable-kvm
> -m 1024 -rtc base=localtime -drive
> file=win7.img,if=none,id=drive-ide0-0-0,format=raw -device
> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> 
> Questions:
> 
> Is anyone else experiencing this?
> 
> Is anyone fixing this?
> 
> If not I will play with it.  Disabling ACPI might reveal the source of
> the problem.  If that turns up nothing I will try to get the BSOD or
> WinDbg output.
> >>>Thanks to Andreas Faerber and Michael Tokarev I found out the
> >>>automatic reboot can be disabled in Windows.  Here is the BSOD
> >>>information:
> >>>
> >>>IRQL_NOT_LESS_OR_EQUAL
> >>>STOP: 0x000A (0x,0x00FF,0x0001,0x828B7220)
> >>This decodes to:
> >>"Windows or a kernel-mode driver accessed paged memory at
> >>DISPATCH_LEVEL or above."
> >>
> >>Memory referenced: 0x
> >>IRQL: 0xff
> >>Read/write: Write (1)
> >>Address which referenced memory: 0x828B7220
> >>
> >>http://msdn.microsoft.com/en-us/library/ff560129%28v=VS.85%29.aspx
> >>
> >>Looks like a NULL pointer reference or maybe a deliberate "we should
> >>never get here" failure.
> >>
> >>Stefan
> >>
> >I can reproduce this bug in my environment and found out that it has
> >something with the type of "CPU".
> >I tried the command line args as the same as Stefan's and definitely 
> >casue
> >the BSOD.
> >If i change the "-cpu qemu32" to "-cpu qemu64" or "-cpu core2duo" or
> >nothing. it will shutdown as expected, that means something?
> Thanks for sharing.  The guest is definitely sees a differed CPUID and
> can therefore take different code paths.  I'm not sure what
> specifically could have changed.
> 
> >>>Try adding/removing individual cpuid bits.
> >>>
> >>>--
> >>>   Gleb.
> >>>
> >>It seems that the .model = 3 for "qemu32" type in struct
> >>builtin_x86_defs in the file target-i386/cpuid.c make it failed.
> >>if I changed it to "2" which is the same as "qemu64". it will be OK.
> >Enable tracing like this:
> ># echo kvm:kvm_msr>  /sys/kernel/debug/tracing/set_event
> >and then reboot windows with qemu32. Look for strange things in the log.
> >Like msr read/write that caused #GP.
> >
> >--
> > Gleb.
> >
> the trace for kvm:kvm_msr, if it is not enough, I can enable more
> kvm tracing.
Hmm, no #GP. Now run with model=2 and do the same trace. Lets see if it
is different.

> # tracer: nop
> #
> #   TASK-PIDCPU#TIMESTAMP  FUNCTION
> #  | |   |  | |
>  qemu-system-x86-14634 [002] 30288.217803: kvm_msr: msr_write 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.217808: kvm_msr: msr_read 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.217842: kvm_msr: msr_write 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.217844: kvm_msr: msr_read 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.217846: kvm_msr: msr_write 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.217849: kvm_msr: msr_read 8b = 0x0
>  qemu-system-x86-14634 [002] 30288.218326: kvm_msr: msr_write 10 = 0x0
>  qemu-system-x86-14634 [002] 30290.891908: kvm_msr: msr_write 277 =
> 0x7010600070106
>  qemu-system-x86-14634 [003] 30290.978139: kvm_msr: msr_read 179 = 0x20
>  qemu-system-x86-14634 [002] 30295.672706: kvm_msr: msr_read 179 = 0x20
>  qemu-system-x86-14634 [002] 30295.672709: kvm_msr: msr_read 401 = 0x0
>  qemu-system-x86-14634 [002] 30295.672710: kvm_msr: msr_read 405 = 0x0
>  qemu-system-x86-14634 [002] 30295.672711: kvm_msr: msr_read 409 = 0x0
>  qemu-system-x86-14634 [002] 30295.672712: kvm_msr: msr_read 40d = 0x0
>  qemu-system-x86-14634 [002] 30295.672713: kvm_msr: msr_read 411 = 0x0
>  qemu-system-x86-14634 [002] 30295.672

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-11-17 Thread Pawel Moll

On Wed, 2011-11-16 at 19:56 +, Anthony Liguori wrote:
> On 11/16/2011 12:41 PM, Peter Maydell wrote:
> > Pawel may have more detail, but to me the significant difference
> > is that virtio-mmio is an implementation of a specification extension
> > agreed with the virtio spec maintainers, whereas syborg doesn't seem
> > to be mentioned in the virtio spec anywhere, so I am unsure what it
> > is intended to be implementing.
> >
> > (There are some technical differences too, like virtio-mmio allowing
> > the guest to specify queue sizes and alignments; these mostly came
> > out of the process of agreeing the spec extension.)
> 
> Correct.  Syborg virtio was something Paul Brook did bit is not an "official" 
> virtio transport as far as Linux or the spec is concerned.

Honestly, that's the first time I hear about it, and as I'm not allowed
to look at qemu code (legal reasons, just don't ask ;-) it's hard for me
to comment. But during the discussions about virtio-mmio, no one
mentioned it at all! Is it similar in ideas?

I do apologise if I jeopardised somebody's work, but it wasn't on
purpose...

Cheers!

Paweł

Re: [Qemu-devel] [PATCH] Add -f option to qemu-nbd

2011-11-17 Thread Chun Yan Liu

Thanks for your suggestions. 
For the usage "qemu-nbd -f disk.img", adding some code could implement it. I 
think it could be like "losetup -f" usage. 
#qemu-nbd -f 
show the first free nbd device at this moment.  
user can choose to issue "qemu-nbd -c THAT_DEVICE disk.img" or not. 
#qemu-nbd -f disk.img 
find a free nbd device and connect disk.img to that device. 

How do you think? 

For the race conditions caused by executing multiple qemu-nbd -f at the same 
time, I've tried both ways (1. lock; 2. if one device not work, trying other 
devices until one works).  
In my testing, the 2nd way has problem. When issuing "qemu-nbd -c /dev/nbd0 
disk.img -v" and "qemu-nbd -c /dev/nbd0 disk1.img -v" at the same time, the 
latter one will eventually exit with EXIT_FAILURE, but the first one cannot 
work normally as well, it cannot show disk partitions. Executing multiple 
"qemu-nbd -f" has same problem.  
So, it seems using lock from a earlier time is more proper. In my testing, I'm 
using file lock (fcntl). For "qemu-nbd -c" case, if lock failed, qemu-nbd exits 
directly. For "qemu-nbd -f" case, if lock failed, redo find_free_nbd_device 
(there might be updates) and then try to connectdisk.img to the new free 
device. 

Will post V2 soon. 

>>> Ian Campbell  11/17/2011 1:23 AM >>>
On Wed, 2011-11-16 at 10:34 +, Stefan Hajnoczi wrote:
> On Wed, Nov 16, 2011 at 6:57 AM, Chunyan Liu  wrote:
> > Currently qemu-nbd does not support finding free nbd device for users like
> > "losetup -f" and issuing "qemu-nbd -c /dev/nbdX disk.img" won't report error
> > message when /dev/nbd is already in use. It makes things a little confusing.
> > This patch adds "-f" option to qemu-nbd to support finding a free nbd device
> > for users. Please review and share your comments. Thanks.
> >
> > Signed-off-by: Chunyan Liu 
> > ---
> >  qemu-nbd.c |   65 
> > +++-
> >  1 files changed, 64 insertions(+), 1 deletions(-)
>
> This patch finds a free device but does not immediately attach to it
> and use it.  Interfaces like this are prone to race conditions, I
> think it would make more sense to combine the -f option with running
> the actual NBD server.
>
> I suggest:
> qemu-nbd -f disk.img
>
> That way it is safe to execute multiple qemu-nbd -f at the same time
> without race conditions.

I agree, but you'd also need some locking inside qemu-nbd wouldn't you?
Or have it just keep trying devices until one works perhaps.

>   Plus it probably makes the user's life
> easier than having to say qemu-nbd -c $(qemu-nbd -f) disk.img.

Absolutely.

Ian.

[Qemu-devel] [PATCH V2] Add -f option to qemu-nbd

2011-11-17 Thread Chunyan Liu

Adding -f option to qemu-nbd to support finding a free nbd device and connect
disk image to that device. Usage of this option is similar to "losetup -f".
#qemu-nbd -f
show the first free nbd device found at the very moment.
#qemu-nbd -f disk.img
find a free nbd device and connect disk.img to that device.

Adding lock to the nbd device before connecting disk image to that device to
handling race conditions.

Signed-off-by: Chunyan Liu 
---
 qemu-nbd.c |   90 +++-
 1 files changed, 89 insertions(+), 1 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 291cba2..64892a8 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define SOCKET_PATH"/var/lock/qemu-nbd-%s"
 
@@ -244,6 +245,60 @@ out:
 return (void *) EXIT_FAILURE;
 }
 
+static int is_nbd_used(int minor)
+{
+FILE *proc;
+int NBDMAJOR = 43;
+char buf[BUFSIZ];
+int find = 0;
+
+proc = fopen("/proc/partitions", "r");
+if (proc != NULL) {
+while (fgets(buf, sizeof(buf), proc)) {
+int m, n;
+unsigned long long sz;
+char name[16];
+char *pname = name;
+char *end;
+
+if (sscanf(buf, " %d %d %llu %128[^\n ]",
+&m, &n, &sz, name) != 4)
+continue;
+if (m != NBDMAJOR)
+continue;
+if (strncmp(name, "nbd", 3))
+continue;
+pname += 3;
+n = strtol(pname, &end, 10);
+if (end && end != pname && *end == '\0' && n == minor) {
+find = 1;
+break;
+}
+}
+fclose(proc);
+}
+
+return find;
+}
+
+static char *find_free_nbd_device(void)
+{
+int i;
+int nbds_max = 16;
+char name[16];
+char *devname = NULL;
+
+for (i = 0; i < nbds_max; i++) {
+if (!is_nbd_used(i)) {
+snprintf(name, sizeof(name), "/dev/nbd%d", i);
+devname = strdup(name);
+break;
+}
+}
+
+return devname;
+}
+
 int main(int argc, char **argv)
 {
 BlockDriverState *bs;
@@ -256,7 +311,7 @@ int main(int argc, char **argv)
 struct sockaddr_in addr;
 socklen_t addr_len = sizeof(addr);
 off_t fd_size;
-const char *sopt = "hVb:o:p:rsnP:c:dvk:e:t";
+const char *sopt = "hVb:o:p:rsnP:c:dvk:e:tf";
 struct option lopt[] = {
 { "help", 0, NULL, 'h' },
 { "version", 0, NULL, 'V' },
@@ -273,6 +328,7 @@ int main(int argc, char **argv)
 { "shared", 1, NULL, 'e' },
 { "persistent", 0, NULL, 't' },
 { "verbose", 0, NULL, 'v' },
+{ "find", 0, NULL, 'f' },
 { NULL, 0, NULL, 0 }
 };
 int ch;
@@ -292,6 +348,9 @@ int main(int argc, char **argv)
 int max_fd;
 int persistent = 0;
 pthread_t client_thread;
+char *devname = NULL;
+int find = 0;
+struct flock lock;
 
 /* The client thread uses SIGTERM to interrupt the server.  A signal
  * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
@@ -374,6 +433,18 @@ int main(int argc, char **argv)
 case 'v':
 verbose = 1;
 break;
+case 'f':
+find = 1;
+devname = find_free_nbd_device();
+if (devname == NULL)
+exit(1);
+if (argc == optind) {
+printf("%s\n", devname);
+free(devname);
+exit(0);
+}
+device = devname;
+break;
 case 'V':
 version(argv[0]);
 exit(0);
@@ -464,11 +535,28 @@ int main(int argc, char **argv)
 /* Open before spawning new threads.  In the future, we may
  * drop privileges after opening.
  */
+retry:
 fd = open(device, O_RDWR);
 if (fd == -1) {
 err(EXIT_FAILURE, "Failed to open %s", device);
 }
 
+memset(&lock, 0, sizeof(lock));
+lock.l_type = F_WRLCK;
+lock.l_whence = SEEK_SET;
+if (fcntl(fd, F_SETLK, &lock) != 0) {
+if (find) {
+close(fd);
+free(device);
+device = find_free_nbd_device();
+if (!device)
+err(EXIT_FAILURE, "Could not find free nbd device");
+goto retry;
+} else {
+err(EXIT_FAILURE, "Could not lock %s", device);
+}
+}
+
 if (sockpath == NULL) {
 sockpath = g_malloc(128);
 snprintf(sockpath, 128, SOCKET_PATH, basename(device));
-- 
1.7.3.4

Re: [Qemu-devel] [RFC 0/4] virtio-mmio transport

2011-11-17 Thread Paolo Bonzini

On 11/17/2011 12:20 PM, Pawel Moll wrote:

>  Correct.  Syborg virtio was something Paul Brook did bit is not an "official"
>  virtio transport as far as Linux or the spec is concerned.

Honestly, that's the first time I hear about it, and as I'm not allowed
to look at qemu code (legal reasons, just don't ask;-)  it's hard for me
to comment. But during the discussions about virtio-mmio, no one
mentioned it at all! Is it similar in ideas?

Yes. :)

I do apologise if I jeopardised somebody's work, but it wasn't on
purpose...

No worries at all, I was more interested about code duplication in QEMU. 
 But perhaps syborg_virtio could simply go away.

Paolo

Re: [Qemu-devel] Windows 7 shutdown causes BSOD

2011-11-17 Thread hkran


On 11/17/2011 02:37 PM, Gleb Natapov wrote:

On Thu, Nov 17, 2011 at 02:29:47PM +0800, hkran wrote:

On 11/16/2011 06:51 PM, Gleb Natapov wrote:

On Wed, Nov 16, 2011 at 10:48:15AM +, Stefan Hajnoczi wrote:

On Wed, Nov 16, 2011 at 10:14 AM, hkran   wrote:

On 11/15/2011 09:17 PM, Stefan Hajnoczi wrote:

On Fri, Nov 4, 2011 at 11:25 AM, Stefan Hajnoczi
  wrote:

On Fri, Nov 4, 2011 at 10:48 AM, Stefan Hajnoczi
  wrote:

Windows 7 32-bit guest blue screens when I shut it down properly with
Start | Shut Down.  The blue screen is only displayed for a split
second before the guest reboots so I am not able to easily tell what
it says.  My guess is that Windows is triple-faulting or soft
rebooting - note that I told Windows to shut down, not reboot.

This issue happens on qemu.git/master (and Debian kvm 0.14.1+dfsg-3).
Here is the QEMU command-line:

x86_64-softmmu/qemu-system-x86_64 -L pc-bios -cpu qemu32 -enable-kvm
-m 1024 -rtc base=localtime -drive
file=win7.img,if=none,id=drive-ide0-0-0,format=raw -device
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1

Questions:

Is anyone else experiencing this?

Is anyone fixing this?

If not I will play with it.  Disabling ACPI might reveal the source of
the problem.  If that turns up nothing I will try to get the BSOD or
WinDbg output.

Thanks to Andreas Faerber and Michael Tokarev I found out the
automatic reboot can be disabled in Windows.  Here is the BSOD
information:

IRQL_NOT_LESS_OR_EQUAL
STOP: 0x000A (0x,0x00FF,0x0001,0x828B7220)

This decodes to:
"Windows or a kernel-mode driver accessed paged memory at
DISPATCH_LEVEL or above."

Memory referenced: 0x
IRQL: 0xff
Read/write: Write (1)
Address which referenced memory: 0x828B7220

http://msdn.microsoft.com/en-us/library/ff560129%28v=VS.85%29.aspx

Looks like a NULL pointer reference or maybe a deliberate "we should
never get here" failure.

Stefan


I can reproduce this bug in my environment and found out that it has
something with the type of "CPU".
I tried the command line args as the same as Stefan's and definitely casue
the BSOD.
If i change the "-cpu qemu32" to "-cpu qemu64" or "-cpu core2duo" or
nothing. it will shutdown as expected, that means something?

Thanks for sharing.  The guest is definitely sees a differed CPUID and
can therefore take different code paths.  I'm not sure what
specifically could have changed.


Try adding/removing individual cpuid bits.

--
Gleb.


It seems that the .model = 3 for "qemu32" type in struct
builtin_x86_defs in the file target-i386/cpuid.c make it failed.
if I changed it to "2" which is the same as "qemu64". it will be OK.

Enable tracing like this:
# echo kvm:kvm_msr>  /sys/kernel/debug/tracing/set_event
and then reboot windows with qemu32. Look for strange things in the log.
Like msr read/write that caused #GP.

--
Gleb.

the trace for kvm:kvm_msr, if it is not enough, I can enable more kvm 
tracing.

# tracer: nop
#
#   TASK-PIDCPU#TIMESTAMP  FUNCTION
#  | |   |  | |
 qemu-system-x86-14634 [002] 30288.217803: kvm_msr: msr_write 8b = 0x0
 qemu-system-x86-14634 [002] 30288.217808: kvm_msr: msr_read 8b = 0x0
 qemu-system-x86-14634 [002] 30288.217842: kvm_msr: msr_write 8b = 0x0
 qemu-system-x86-14634 [002] 30288.217844: kvm_msr: msr_read 8b = 0x0
 qemu-system-x86-14634 [002] 30288.217846: kvm_msr: msr_write 8b = 0x0
 qemu-system-x86-14634 [002] 30288.217849: kvm_msr: msr_read 8b = 0x0
 qemu-system-x86-14634 [002] 30288.218326: kvm_msr: msr_write 10 = 0x0
 qemu-system-x86-14634 [002] 30290.891908: kvm_msr: msr_write 277 = 
0x7010600070106

 qemu-system-x86-14634 [003] 30290.978139: kvm_msr: msr_read 179 = 0x20
 qemu-system-x86-14634 [002] 30295.672706: kvm_msr: msr_read 179 = 0x20
 qemu-system-x86-14634 [002] 30295.672709: kvm_msr: msr_read 401 = 0x0
 qemu-system-x86-14634 [002] 30295.672710: kvm_msr: msr_read 405 = 0x0
 qemu-system-x86-14634 [002] 30295.672711: kvm_msr: msr_read 409 = 0x0
 qemu-system-x86-14634 [002] 30295.672712: kvm_msr: msr_read 40d = 0x0
 qemu-system-x86-14634 [002] 30295.672713: kvm_msr: msr_read 411 = 0x0
 qemu-system-x86-14634 [002] 30295.672714: kvm_msr: msr_read 415 = 0x0
 qemu-system-x86-14634 [002] 30295.672715: kvm_msr: msr_read 419 = 0x0
 qemu-system-x86-14634 [002] 30295.672716: kvm_msr: msr_read 41d = 0x0
 qemu-system-x86-14634 [002] 30295.672717: kvm_msr: msr_read 421 = 0x0
 qemu-system-x86-14634 [002] 30295.672718: kvm_msr: msr_read 425 = 0x0
 qemu-system-x86-14634 [002] 30295.672719: kvm_msr: msr_read 429 = 0x0
 qemu-system-x86-14634 [002] 30295.672720: kvm_msr: msr_read 42d = 0x0
 qemu-system-x86-14634 [002] 30295.672721: kvm_msr: msr_read 431 = 0x0
 qemu-system-x86-14634 [002] 30295.672722: kvm_msr: msr_read 435 = 0x0
 qemu-system-x86-14634 [002] 30295.672723: kvm_msr: msr_read 439 = 0x0
 qemu-system-x86-14634 [002] 30295.672724: kvm_msr: msr_read 43d = 0x0
 qemu-system-x86-14634 [0

[Qemu-devel] Demande de partenariat : Echange de liens entre nos sites

2011-11-17 Thread Enzo Delage


Bonjour,


Je suis chargé du référencement du site http://sens.guide-demenagement.com, 
site référence dans lunivers du déménagement.

Je me permets de vous contacter pour vous demander un échange de liens entre 
nos sites, ce qui améliorerait nos référencement respectifs.

Seriez-vous intéressé ?



Bien cordialement,

Enzo Delage

Chargé de projet Echange de liens



PS. Une page est particulièrement adaptée pour notre échange : 
http://sens.guide-demenagement.com/

Re: [Qemu-devel] [PATCH 4/4] Makefile: fix qga dependencies

2011-11-17 Thread Andreas Färber

Am 17.11.2011 10:31, schrieb Paolo Bonzini:
> On 11/16/2011 10:58 PM, Michael S. Tsirkin wrote:
>> .c files include .h files, so .o depends on .h,
>> and the linked result depends on .o.
>> We got it wrong for qga rules, fix it up.
> 
> Another possible option is to make the "all" target depend on
> GENERATED_HEADERS and GENERATED_SOURCES, like
> 
> all: $(GENERATED_HEADERS) $(GENERATED_SOURCES)
> @$(MAKE) build-all
> 
> and drop the dependency everywhere else.

Please don't. `make qemu-img`, for example, should work, too. That
bypasses the "all" target IIUC.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

[Qemu-devel] [PATCH 4/5] sh_intc: convert interrupt controller to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh7750.c |2 +-
 hw/sh_intc.c|   87 ++
 hw/sh_intc.h|7 +++-
 target-sh4/helper.c |3 ++
 4 files changed, 68 insertions(+), 31 deletions(-)

diff --git a/hw/sh7750.c b/hw/sh7750.c
index e181305..930d212 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -756,7 +756,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
   "cache-and-tlb", 0x0800);
 memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
 
-sh_intc_init(&s->intc, NR_SOURCES,
+sh_intc_init(sysmem, &s->intc, NR_SOURCES,
 _INTC_ARRAY(mask_registers),
 _INTC_ARRAY(prio_registers));
 
diff --git a/hw/sh_intc.c b/hw/sh_intc.c
index e07424f..38cefc9 100644
--- a/hw/sh_intc.c
+++ b/hw/sh_intc.c
@@ -219,7 +219,8 @@ static void sh_intc_toggle_mask(struct intc_desc *desc, 
intc_enum id,
 #endif
 }
 
-static uint32_t sh_intc_read(void *opaque, target_phys_addr_t offset)
+static uint64_t sh_intc_read(void *opaque, target_phys_addr_t offset,
+ unsigned size)
 {
 struct intc_desc *desc = opaque;
 intc_enum *enum_ids = NULL;
@@ -238,7 +239,7 @@ static uint32_t sh_intc_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void sh_intc_write(void *opaque, target_phys_addr_t offset,
- uint32_t value)
+  uint64_t value, unsigned size)
 {
 struct intc_desc *desc = opaque;
 intc_enum *enum_ids = NULL;
@@ -282,16 +283,10 @@ static void sh_intc_write(void *opaque, 
target_phys_addr_t offset,
 #endif
 }
 
-static CPUReadMemoryFunc * const sh_intc_readfn[] = {
-sh_intc_read,
-sh_intc_read,
-sh_intc_read
-};
-
-static CPUWriteMemoryFunc * const sh_intc_writefn[] = {
-sh_intc_write,
-sh_intc_write,
-sh_intc_write
+static const struct MemoryRegionOps sh_intc_ops = {
+.read = sh_intc_read,
+.write = sh_intc_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 struct intc_source *sh_intc_source(struct intc_desc *desc, intc_enum id)
@@ -302,15 +297,36 @@ struct intc_source *sh_intc_source(struct intc_desc 
*desc, intc_enum id)
 return NULL;
 }
 
-static void sh_intc_register(struct intc_desc *desc, 
-unsigned long address)
+static unsigned int sh_intc_register(MemoryRegion *sysmem,
+ struct intc_desc *desc,
+ const unsigned long address,
+ const char *type,
+ const char *action,
+ const unsigned int index)
 {
-if (address) {
-cpu_register_physical_memory_offset(P4ADDR(address), 4,
-desc->iomemtype, INTC_A7(address));
-cpu_register_physical_memory_offset(A7ADDR(address), 4,
-desc->iomemtype, INTC_A7(address));
+char name[60];
+MemoryRegion *iomem, *iomem_p4, *iomem_a7;
+
+if (!address) {
+return 0;
 }
+
+iomem = &desc->iomem;
+iomem_p4 = desc->iomem_aliases + index;
+iomem_a7 = iomem_p4 + 1;
+
+#define SH_INTC_IOMEM_FORMAT "interrupt-controller-%s-%s-%s"
+snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "p4");
+memory_region_init_alias(iomem_p4, name, iomem, INTC_A7(address), 4);
+memory_region_add_subregion(sysmem, P4ADDR(address), iomem_p4);
+
+snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "a7");
+memory_region_init_alias(iomem_a7, name, iomem, INTC_A7(address), 4);
+memory_region_add_subregion(sysmem, A7ADDR(address), iomem_a7);
+#undef SH_INTC_IOMEM_FORMAT
+
+/* used to increment aliases index */
+return 2;
 }
 
 static void sh_intc_register_source(struct intc_desc *desc,
@@ -415,14 +431,15 @@ void sh_intc_register_sources(struct intc_desc *desc,
 }
 }
 
-int sh_intc_init(struct intc_desc *desc,
+int sh_intc_init(MemoryRegion *sysmem,
+ struct intc_desc *desc,
 int nr_sources,
 struct intc_mask_reg *mask_regs,
 int nr_mask_regs,
 struct intc_prio_reg *prio_regs,
 int nr_prio_regs)
 {
-unsigned int i;
+unsigned int i, j;
 
 desc->pending = 0;
 desc->nr_sources = nr_sources;
@@ -430,7 +447,11 @@ int sh_intc_init(struct intc_desc *desc,
 desc->nr_mask_regs = nr_mask_regs;
 desc->prio_regs = prio_regs;
 desc->nr_prio_regs = nr_prio_regs;
+/* Allocate 4 MemoryRegions per register (2 actions * 2 aliases). */
+desc->iomem_aliases = g_new0(MemoryRegion,
+ (nr_mask_regs + nr_prio_regs) * 4);
 
+j = 0;
 i = sizeof(struct intc_source) * nr_sources;
 desc->sources = g_malloc0(i);
 
@@ -442,15 +463,19 @@ int sh_intc_init(struct intc_desc *desc,
 
 desc->irqs = qemu_allocate_irqs(sh_intc_set_irq,

[Qemu-devel] [PATCH 3/5] sh_timer: convert to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh.h   |3 ++-
 hw/sh7750.c   |4 ++--
 hw/sh_timer.c |   43 ---
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/hw/sh.h b/hw/sh.h
index cf3f6f6..c764be6 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -31,7 +31,8 @@ int sh7750_register_io_device(struct SH7750State *s,
 #define TMU012_FEAT_TOCR   (1 << 0)
 #define TMU012_FEAT_3CHAN  (1 << 1)
 #define TMU012_FEAT_EXTCLK (1 << 2)
-void tmu012_init(target_phys_addr_t base, int feat, uint32_t freq,
+void tmu012_init(struct MemoryRegion *sysmem, target_phys_addr_t base,
+ int feat, uint32_t freq,
 qemu_irq ch0_irq, qemu_irq ch1_irq,
 qemu_irq ch2_irq0, qemu_irq ch2_irq1);
 
diff --git a/hw/sh7750.c b/hw/sh7750.c
index fd48c4a..e181305 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -780,7 +780,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
   NULL,
   s->intc.irqs[SCIF_BRI]);
 
-tmu012_init(0x1fd8,
+tmu012_init(sysmem, 0x1fd8,
TMU012_FEAT_TOCR | TMU012_FEAT_3CHAN | TMU012_FEAT_EXTCLK,
s->periph_freq,
s->intc.irqs[TMU0],
@@ -804,7 +804,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
 sh_intc_register_sources(&s->intc,
 _INTC_ARRAY(vectors_tmu34),
 NULL, 0);
-tmu012_init(0x1e10, 0, s->periph_freq,
+tmu012_init(sysmem, 0x1e10, 0, s->periph_freq,
s->intc.irqs[TMU3],
s->intc.irqs[TMU4],
NULL, NULL);
diff --git a/hw/sh_timer.c b/hw/sh_timer.c
index dca3c94..9132207 100644
--- a/hw/sh_timer.c
+++ b/hw/sh_timer.c
@@ -11,6 +11,7 @@
 #include "hw.h"
 #include "sh.h"
 #include "qemu-timer.h"
+#include "exec-memory.h"
 
 //#define DEBUG_TIMER
 
@@ -210,6 +211,9 @@ static void *sh_timer_init(uint32_t freq, int feat, 
qemu_irq irq)
 }
 
 typedef struct {
+MemoryRegion iomem;
+MemoryRegion iomem_p4;
+MemoryRegion iomem_a7;
 void *timer[3];
 int level[3];
 uint32_t tocr;
@@ -217,7 +221,8 @@ typedef struct {
 int feat;
 } tmu012_state;
 
-static uint32_t tmu012_read(void *opaque, target_phys_addr_t offset)
+static uint64_t tmu012_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
 {
 tmu012_state *s = (tmu012_state *)opaque;
 
@@ -248,7 +253,7 @@ static uint32_t tmu012_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void tmu012_write(void *opaque, target_phys_addr_t offset,
-uint32_t value)
+uint64_t value, unsigned size)
 {
 tmu012_state *s = (tmu012_state *)opaque;
 
@@ -291,23 +296,17 @@ static void tmu012_write(void *opaque, target_phys_addr_t 
offset,
 }
 }
 
-static CPUReadMemoryFunc * const tmu012_readfn[] = {
-tmu012_read,
-tmu012_read,
-tmu012_read
+static const MemoryRegionOps tmu012_ops = {
+.read = tmu012_read,
+.write = tmu012_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const tmu012_writefn[] = {
-tmu012_write,
-tmu012_write,
-tmu012_write
-};
-
-void tmu012_init(target_phys_addr_t base, int feat, uint32_t freq,
+void tmu012_init(MemoryRegion *sysmem, target_phys_addr_t base,
+ int feat, uint32_t freq,
 qemu_irq ch0_irq, qemu_irq ch1_irq,
 qemu_irq ch2_irq0, qemu_irq ch2_irq1)
 {
-int iomemtype;
 tmu012_state *s;
 int timer_feat = (feat & TMU012_FEAT_EXTCLK) ? TIMER_FEAT_EXTCLK : 0;
 
@@ -318,10 +317,16 @@ void tmu012_init(target_phys_addr_t base, int feat, 
uint32_t freq,
 if (feat & TMU012_FEAT_3CHAN)
 s->timer[2] = sh_timer_init(freq, timer_feat | TIMER_FEAT_CAPT,
ch2_irq0); /* ch2_irq1 not supported */
-iomemtype = cpu_register_io_memory(tmu012_readfn,
-   tmu012_writefn, s,
-   DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(P4ADDR(base), 0x1000, iomemtype);
-cpu_register_physical_memory(A7ADDR(base), 0x1000, iomemtype);
+
+memory_region_init_io(&s->iomem, &tmu012_ops, s,
+  "timer", 0x1ULL);
+
+memory_region_init_alias(&s->iomem_p4, "timer-p4",
+ &s->iomem, 0, 0x1000);
+memory_region_add_subregion(sysmem, P4ADDR(base), &s->iomem_p4);
+
+memory_region_init_alias(&s->iomem_a7, "timer-a7",
+ &s->iomem, 0, 0x1000);
+memory_region_add_subregion(sysmem, A7ADDR(base), &s->iomem_a7);
 /* ??? Save/restore.  */
 }
-- 
1.7.5.4

[Qemu-devel] [PATCH 1/5] sh7750: convert memory controller/ioport to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/r2d.c|2 +-
 hw/sh.h |3 +-
 hw/sh7750.c |   72 --
 hw/shix.c   |2 +-
 4 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/hw/r2d.c b/hw/r2d.c
index a9aefa2..9b6fcba 100644
--- a/hw/r2d.c
+++ b/hw/r2d.c
@@ -250,7 +250,7 @@ static void r2d_init(ram_addr_t ram_size,
 memory_region_init_ram(sdram, NULL, "r2d.sdram", SDRAM_SIZE);
 memory_region_add_subregion(address_space_mem, SDRAM_BASE, sdram);
 /* Register peripherals */
-s = sh7750_init(env);
+s = sh7750_init(env, address_space_mem);
 irq = r2d_fpga_init(address_space_mem, 0x0400, sh7750_irl(s));
 sysbus_create_varargs("sh_pci", 0x1e20, irq[PCI_INTA], irq[PCI_INTB],
   irq[PCI_INTC], irq[PCI_INTD], NULL);
diff --git a/hw/sh.h b/hw/sh.h
index d30e9f5..cf3f6f6 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -9,8 +9,9 @@
 
 /* sh7750.c */
 struct SH7750State;
+struct MemoryRegion;
 
-struct SH7750State *sh7750_init(CPUState * cpu);
+struct SH7750State *sh7750_init(CPUState * cpu, struct MemoryRegion *sysmem);
 
 typedef struct {
 /* The callback will be triggered if any of the designated lines change */
diff --git a/hw/sh7750.c b/hw/sh7750.c
index 9f3ea92..5cee76a 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -30,10 +30,18 @@
 #include "sh7750_regnames.h"
 #include "sh_intc.h"
 #include "cpu.h"
+#include "exec-memory.h"
 
 #define NB_DEVICES 4
 
 typedef struct SH7750State {
+MemoryRegion iomem;
+MemoryRegion iomem_1f0;
+MemoryRegion iomem_ff0;
+MemoryRegion iomem_1f8;
+MemoryRegion iomem_ff8;
+MemoryRegion iomem_1fc;
+MemoryRegion iomem_ffc;
 /* CPU */
 CPUSH4State *cpu;
 /* Peripheral frequency in Hz */
@@ -436,16 +444,16 @@ static void sh7750_mem_writel(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static CPUReadMemoryFunc * const sh7750_mem_read[] = {
-sh7750_mem_readb,
-sh7750_mem_readw,
-sh7750_mem_readl
-};
-
-static CPUWriteMemoryFunc * const sh7750_mem_write[] = {
-sh7750_mem_writeb,
-sh7750_mem_writew,
-sh7750_mem_writel
+static const MemoryRegionOps sh7750_mem_ops = {
+.old_mmio = {
+.read = {sh7750_mem_readb,
+ sh7750_mem_readw,
+ sh7750_mem_readl },
+.write = {sh7750_mem_writeb,
+  sh7750_mem_writew,
+  sh7750_mem_writel },
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 /* sh775x interrupt controller tables for sh_intc.c
@@ -706,30 +714,40 @@ static CPUWriteMemoryFunc * const sh7750_mmct_write[] = {
 sh7750_mmct_writel
 };
 
-SH7750State *sh7750_init(CPUSH4State * cpu)
+SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion *sysmem)
 {
 SH7750State *s;
-int sh7750_io_memory;
 int sh7750_mm_cache_and_tlb; /* memory mapped cache and tlb */
 
 s = g_malloc0(sizeof(SH7750State));
 s->cpu = cpu;
 s->periph_freq = 6000; /* 60MHz */
-sh7750_io_memory = cpu_register_io_memory(sh7750_mem_read,
- sh7750_mem_write, s,
-  DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory_offset(0x1f00, 0x1000,
-sh7750_io_memory, 0x1f00);
-cpu_register_physical_memory_offset(0xff00, 0x1000,
-sh7750_io_memory, 0x1f00);
-cpu_register_physical_memory_offset(0x1f80, 0x1000,
-sh7750_io_memory, 0x1f80);
-cpu_register_physical_memory_offset(0xff80, 0x1000,
-sh7750_io_memory, 0x1f80);
-cpu_register_physical_memory_offset(0x1fc0, 0x1000,
-sh7750_io_memory, 0x1fc0);
-cpu_register_physical_memory_offset(0xffc0, 0x1000,
-sh7750_io_memory, 0x1fc0);
+memory_region_init_io(&s->iomem, &sh7750_mem_ops, s,
+  "memory", 0x1fe0);
+
+memory_region_init_alias(&s->iomem_1f0, "memory-1f0",
+ &s->iomem, 0x1f00, 0x1000);
+memory_region_add_subregion(sysmem, 0x1f00, &s->iomem_1f0);
+
+memory_region_init_alias(&s->iomem_ff0, "memory-ff0",
+ &s->iomem, 0x1f00, 0x1000);
+memory_region_add_subregion(sysmem, 0xff00, &s->iomem_ff0);
+
+memory_region_init_alias(&s->iomem_1f8, "memory-1f8",
+ &s->iomem, 0x1f80, 0x1000);
+memory_region_add_subregion(sysmem, 0x1f80, &s->iomem_1f8);
+
+memory_region_init_alias(&s->iomem_ff8, "memory-ff8",
+ &s->iomem, 0x1f80, 0x1000);
+memory_region_add_subregion(sysmem, 0xff80, &s->iomem_ff8);
+
+memory_region_init_alias(&s->iomem_1fc, "memory-1fc",
+ &s->iomem, 0x1fc0, 0

[Qemu-devel] [PATCH 0/5] Convert remaining sh4 devices to memory API

2011-11-17 Thread Benoît Canet

These patches converts the remaining sh4 devices to the memory API.
The patch "sh_intc: convert interrupt controller to memory API" is
somewhat tricky.

Benoît Canet (5):
  sh7750: convert memory controller/ioport to memory API
  sh7750: convert cache and tlb to memory API
  sh_timer: convert to memory API
  sh_intc: convert interrupt controller to memory API
  sh_serial: convert to memory API

 hw/r2d.c|2 +-
 hw/sh.h |9 ++-
 hw/sh7750.c |  155 +--
 hw/sh_intc.c|   87 +++-
 hw/sh_intc.h|7 ++-
 hw/sh_serial.c  |   55 ++
 hw/sh_timer.c   |   43 --
 hw/shix.c   |2 +-
 target-sh4/helper.c |3 +
 9 files changed, 217 insertions(+), 146 deletions(-)

-- 
1.7.5.4

[Qemu-devel] [PATCH 2/5] sh7750: convert cache and tlb to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh7750.c |   43 ++-
 1 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/hw/sh7750.c b/hw/sh7750.c
index 5cee76a..fd48c4a 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -42,6 +42,7 @@ typedef struct SH7750State {
 MemoryRegion iomem_ff8;
 MemoryRegion iomem_1fc;
 MemoryRegion iomem_ffc;
+MemoryRegion mmct_iomem;
 /* CPU */
 CPUSH4State *cpu;
 /* Peripheral frequency in Hz */
@@ -623,18 +624,23 @@ static struct intc_group groups_irl[] = {
 #define MM_UTLB_DATA (7)
 #define MM_REGION_TYPE(addr)  ((addr & MM_REGION_MASK) >> 24)
 
-static uint32_t invalid_read(void *opaque, target_phys_addr_t addr)
+static uint64_t invalid_read(void *opaque, target_phys_addr_t addr)
 {
 abort();
 
 return 0;
 }
 
-static uint32_t sh7750_mmct_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t sh7750_mmct_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
 {
 SH7750State *s = opaque;
 uint32_t ret = 0;
 
+if (size != 4) {
+return invalid_read(opaque, addr);
+}
+
 switch (MM_REGION_TYPE(addr)) {
 case MM_ICACHE_ADDR:
 case MM_ICACHE_DATA:
@@ -664,16 +670,20 @@ static uint32_t sh7750_mmct_readl(void *opaque, 
target_phys_addr_t addr)
 }
 
 static void invalid_write(void *opaque, target_phys_addr_t addr,
- uint32_t mem_value)
+  uint64_t mem_value)
 {
 abort();
 }
 
-static void sh7750_mmct_writel(void *opaque, target_phys_addr_t addr,
-   uint32_t mem_value)
+static void sh7750_mmct_write(void *opaque, target_phys_addr_t addr,
+  uint64_t mem_value, unsigned size)
 {
 SH7750State *s = opaque;
 
+if (size != 4) {
+invalid_write(opaque, addr, mem_value);
+}
+
 switch (MM_REGION_TYPE(addr)) {
 case MM_ICACHE_ADDR:
 case MM_ICACHE_DATA:
@@ -702,22 +712,15 @@ static void sh7750_mmct_writel(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static CPUReadMemoryFunc * const sh7750_mmct_read[] = {
-invalid_read,
-invalid_read,
-sh7750_mmct_readl
-};
-
-static CPUWriteMemoryFunc * const sh7750_mmct_write[] = {
-invalid_write,
-invalid_write,
-sh7750_mmct_writel
+static const struct MemoryRegionOps sh7750_mmct_ops = {
+.read = sh7750_mmct_read,
+.write = sh7750_mmct_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion *sysmem)
 {
 SH7750State *s;
-int sh7750_mm_cache_and_tlb; /* memory mapped cache and tlb */
 
 s = g_malloc0(sizeof(SH7750State));
 s->cpu = cpu;
@@ -749,11 +752,9 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
  &s->iomem, 0x1fc0, 0x1000);
 memory_region_add_subregion(sysmem, 0xffc0, &s->iomem_ffc);
 
-sh7750_mm_cache_and_tlb = cpu_register_io_memory(sh7750_mmct_read,
-sh7750_mmct_write, s,
- DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(0xf000, 0x0800,
-sh7750_mm_cache_and_tlb);
+memory_region_init_io(&s->mmct_iomem, &sh7750_mmct_ops, s,
+  "cache-and-tlb", 0x0800);
+memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
 
 sh_intc_init(&s->intc, NR_SOURCES,
 _INTC_ARRAY(mask_registers),
-- 
1.7.5.4

[Qemu-devel] [PATCH 5/5] sh_serial: convert to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh.h|3 ++-
 hw/sh7750.c|   28 +++-
 hw/sh_serial.c |   55 ++-
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/hw/sh.h b/hw/sh.h
index c764be6..0e45d61 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -39,7 +39,8 @@ void tmu012_init(struct MemoryRegion *sysmem, 
target_phys_addr_t base,
 
 /* sh_serial.c */
 #define SH_SERIAL_FEAT_SCIF (1 << 0)
-void sh_serial_init (target_phys_addr_t base, int feat,
+void sh_serial_init(MemoryRegion *sysmem,
+ target_phys_addr_t base, int feat,
 uint32_t freq, CharDriverState *chr,
 qemu_irq eri_source,
 qemu_irq rxi_source,
diff --git a/hw/sh7750.c b/hw/sh7750.c
index 930d212..318cdab 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -766,19 +766,21 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
 
 cpu->intc_handle = &s->intc;
 
-sh_serial_init(0x1fe0, 0, s->periph_freq, serial_hds[0],
-  s->intc.irqs[SCI1_ERI],
-  s->intc.irqs[SCI1_RXI],
-  s->intc.irqs[SCI1_TXI],
-  s->intc.irqs[SCI1_TEI],
-  NULL);
-sh_serial_init(0x1fe8, SH_SERIAL_FEAT_SCIF,
-  s->periph_freq, serial_hds[1],
-  s->intc.irqs[SCIF_ERI],
-  s->intc.irqs[SCIF_RXI],
-  s->intc.irqs[SCIF_TXI],
-  NULL,
-  s->intc.irqs[SCIF_BRI]);
+sh_serial_init(sysmem, 0x1fe0,
+   0, s->periph_freq, serial_hds[0],
+   s->intc.irqs[SCI1_ERI],
+   s->intc.irqs[SCI1_RXI],
+   s->intc.irqs[SCI1_TXI],
+   s->intc.irqs[SCI1_TEI],
+   NULL);
+sh_serial_init(sysmem, 0x1fe8,
+   SH_SERIAL_FEAT_SCIF,
+   s->periph_freq, serial_hds[1],
+   s->intc.irqs[SCIF_ERI],
+   s->intc.irqs[SCIF_RXI],
+   s->intc.irqs[SCIF_TXI],
+   NULL,
+   s->intc.irqs[SCIF_BRI]);
 
 tmu012_init(sysmem, 0x1fd8,
TMU012_FEAT_TOCR | TMU012_FEAT_3CHAN | TMU012_FEAT_EXTCLK,
diff --git a/hw/sh_serial.c b/hw/sh_serial.c
index a20c59e..43b0eb1 100644
--- a/hw/sh_serial.c
+++ b/hw/sh_serial.c
@@ -27,6 +27,7 @@
 #include "hw.h"
 #include "sh.h"
 #include "qemu-char.h"
+#include "exec-memory.h"
 
 //#define DEBUG_SERIAL
 
@@ -39,6 +40,9 @@
 #define SH_RX_FIFO_LENGTH (16)
 
 typedef struct {
+MemoryRegion iomem;
+MemoryRegion iomem_p4;
+MemoryRegion iomem_a7;
 uint8_t smr;
 uint8_t brr;
 uint8_t scr;
@@ -74,7 +78,8 @@ static void sh_serial_clear_fifo(sh_serial_state * s)
 s->rx_tail = 0;
 }
 
-static void sh_serial_write(void *opaque, uint32_t offs, uint32_t val)
+static void sh_serial_write(void *opaque, target_phys_addr_t offs,
+uint64_t val, unsigned size)
 {
 sh_serial_state *s = opaque;
 unsigned char ch;
@@ -185,7 +190,8 @@ static void sh_serial_write(void *opaque, uint32_t offs, 
uint32_t val)
 abort();
 }
 
-static uint32_t sh_serial_read(void *opaque, uint32_t offs)
+static uint64_t sh_serial_read(void *opaque, target_phys_addr_t offs,
+   unsigned size)
 {
 sh_serial_state *s = opaque;
 uint32_t ret = ~0;
@@ -338,28 +344,22 @@ static void sh_serial_event(void *opaque, int event)
 sh_serial_receive_break(s);
 }
 
-static CPUReadMemoryFunc * const sh_serial_readfn[] = {
-&sh_serial_read,
-&sh_serial_read,
-&sh_serial_read,
+static const MemoryRegionOps sh_serial_ops = {
+.read = sh_serial_read,
+.write = sh_serial_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const sh_serial_writefn[] = {
-&sh_serial_write,
-&sh_serial_write,
-&sh_serial_write,
-};
-
-void sh_serial_init (target_phys_addr_t base, int feat,
-uint32_t freq, CharDriverState *chr,
-qemu_irq eri_source,
-qemu_irq rxi_source,
-qemu_irq txi_source,
-qemu_irq tei_source,
-qemu_irq bri_source)
+void sh_serial_init(MemoryRegion *sysmem,
+target_phys_addr_t base, int feat,
+uint32_t freq, CharDriverState *chr,
+qemu_irq eri_source,
+qemu_irq rxi_source,
+qemu_irq txi_source,
+qemu_irq tei_source,
+qemu_irq bri_source)
 {
 sh_serial_state *s;
-int s_io_memory;
 
 s = g_malloc0(sizeof(sh_serial_state));
 
@@ -381,11 +381,16 @@ void sh_serial_init (target_phys_addr_t base, int feat,
 
 sh_serial_clear_fifo(s);
 
-s_io_memory = cpu_register_io_memory(sh_serial_readfn,
-

Re: [Qemu-devel] [v9 Patch 1/6 - updated]Qemu: Enhance "info block" to display host cache setting

2011-11-17 Thread Luiz Capitulino

On Fri, 11 Nov 2011 15:39:29 +0530
Supriya Kannery  wrote:

> On 11/11/2011 12:17 PM, Supriya Kannery wrote:
>  > Enhance "info block" to display hostcache setting for each
>  > block device.
>  >
>  >
>  >   ##
>  > Index: qemu/qapi-types.h
>  > ===
>  > --- qemu.orig/qapi-types.h
>  > +++ qemu/qapi-types.h
>  > @@ -383,6 +383,7 @@ struct BlockInfo
>  >   {
>  >   char * device;
>  >   char * type;
>  > +bool hostcache;
>  >   bool removable;
>  >   bool locked;
>  >   bool has_inserted;
>  > Index: qemu/block.c
> 
> hostcache gets added to qapi-types.h from
> the change done in qapi-schema.json. Hence
> above change has to be ignored. Pls find
> updated patch.

git am says this patch is corrupted. Otherwise the QAPI changes look ok
to me.

> 
> 
> *
> Enhance "info block" to display hostcache setting for each
> block device.
> 
> Example:
> (qemu) info block
> ide0-hd0: removable=0 file=../rhel6-32.raw ro=0 drv=raw encrypted=0
> 
> Enhanced to display "hostcache" setting:
> (qemu) info block
> ide0-hd0: removable=0 hostcache=1 file=../rhel6-32.raw ro=0 drv=raw 
> encrypted=0
> 
> Signed-off-by: Supriya Kannery 
> 
> ---
>   block.c |   20 
>   qmp-commands.hx |2 ++
>   2 files changed, 18 insertions(+), 4 deletions(-)
> 
> Index: qemu/qapi-schema.json
> ===
> --- qemu.orig/qapi-schema.json
> +++ qemu/qapi-schema.json
> @@ -409,6 +409,8 @@
>   # @locked: True if the guest has locked this device from having its media
>   #  removed
>   #
> +# @hostcache: True if host pagecache is enabled.
> +#
>   # @tray_open: #optional True if the device has a tray and it is open
>   # (only present if removable is true)
>   #
> @@ -422,7 +424,7 @@
>   ##
>   { 'type': 'BlockInfo',
> 'data': {'device': 'str', 'type': 'str', 'removable': 'bool',
> -   'locked': 'bool', '*inserted': 'BlockDeviceInfo',
> +   'locked': 'bool','hostcache': 'bool', '*inserted': 
> 'BlockDeviceInfo',
>  '*tray_open': 'bool', '*io-status': 'BlockDeviceIoStatus'} }
> 
>   ##
> Index: qemu/block.c
> ===
> --- qemu.orig/block.c
> +++ qemu/block.c
> @@ -1839,6 +1839,7 @@ BlockInfoList *qmp_query_block(Error **e
>   info->value->device = g_strdup(bs->device_name);
>   info->value->type = g_strdup("unknown");
>   info->value->locked = bdrv_dev_is_medium_locked(bs);
> +info->value->hostcache = !(bs->open_flags & BDRV_O_NOCACHE);
>   info->value->removable = bdrv_dev_has_removable_media(bs);
> 
>   if (bdrv_dev_has_removable_media(bs)) {
> Index: qemu/hmp.c
> ===
> --- qemu.orig/hmp.c
> +++ qemu/hmp.c
> @@ -199,6 +199,8 @@ void hmp_info_block(Monitor *mon)
>   monitor_printf(mon, " tray-open=%d", info->value->tray_open);
>   }
> 
> +monitor_printf(mon, " hostcache=%d", info->value->hostcache);
> +
>   if (info->value->has_io_status) {
>   monitor_printf(mon, " io-status=%s",
>  
> BlockDeviceIoStatus_lookup[info->value->io_status]);
> 
>

Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time

2011-11-17 Thread Zang Hongyong


于 2011/11/16,星期三 2:43, Cam Macdonell 写道:

On Sun, Nov 13, 2011 at 8:56 PM,  wrote:

From: Hongyong Zang

Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running on the 
same host. Currently, guest notifies qemu by reading or writing ivshmem 
device's PCI MMIO BAR0(Doorbell).

This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find guest 
accesses PIO BAR 30% faster than MMIO BAR.

Nice work :)


Test it with:
Call 5,000,000 times writing PCI BAR0's DOORBELL register, we got the total 
time as follows:
linux command #time:
MMIO(regular interrupt)  PIO(regular interrupt)MMIO(msi+ioeventfd)  
PIO(msi+ioeventfd)
real101.441s 68.863s   70.720s  
49.521s
user0.391s   0.305s0.404s   
0.340s
sys 46.308s  30.634s   38.740s  
27.559s

Did you pin the VMs to cores?

No. We let the relation of vcpu and pcpu alone as default.


You're sending between 5-10 notifications per second, did you
confirm that they are all being received?  Since eventfds do not
buffer, some may be lost at that rate.  Of course, one would expect
that a single notification should be faster based on these results,
but I'm just curious.
Oh, we just measured from the sending side. At the receiver side, some 
notifications may be lost

when the receiver's notification service function is time-consuming.


Do you know of any issues with mapping a PIO region to user-space with
the UIO driver framework?

I'm not very familar with UIO yet. But I think UIO can do PIO operations.


Thanks,
Cam


Signed-off-by: Hongyong Zang
---
  hw/ivshmem.c |   26 +-
  kvm-all.c|   23 +++
  kvm.h|1 +
  3 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 242fbea..e68d0a7 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -28,7 +28,7 @@
  #define IVSHMEM_PEER0
  #define IVSHMEM_MASTER  1

-#define IVSHMEM_REG_BAR_SIZE 0x100
+#define IVSHMEM_REG_BAR_SIZE 0x10

  //#define DEBUG_IVSHMEM
  #ifdef DEBUG_IVSHMEM
@@ -56,9 +56,9 @@ typedef struct IVShmemState {

 CharDriverState **eventfd_chr;
 CharDriverState *server_chr;
-MemoryRegion ivshmem_mmio;
+MemoryRegion ivshmem_pio;

-pcibus_t mmio_addr;
+pcibus_t pio_addr;
 /* We might need to register the BAR before we actually have the memory.
  * So prepare a container MemoryRegion for the BAR immediately and
  * add a subregion when we have the memory.
@@ -234,7 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, 
target_phys_addr_t addr,
 return ret;
  }

-static const MemoryRegionOps ivshmem_mmio_ops = {
+static const MemoryRegionOps ivshmem_pio_ops = {
 .read = ivshmem_io_read,
 .write = ivshmem_io_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
@@ -346,8 +346,8 @@ static void close_guest_eventfds(IVShmemState *s, int posn)
 guest_curr_max = s->peers[posn].nb_eventfds;

 for (i = 0; i<  guest_curr_max; i++) {
-kvm_set_ioeventfd_mmio_long(s->peers[posn].eventfds[i],
-s->mmio_addr + DOORBELL, (posn<<  16) | i, 0);
+kvm_set_ioeventfd_pio_long(s->peers[posn].eventfds[i],
+s->pio_addr + DOORBELL, (posn<<  16) | i, 0);
 close(s->peers[posn].eventfds[i]);
 }

@@ -361,7 +361,7 @@ static void setup_ioeventfds(IVShmemState *s) {

 for (i = 0; i<= s->max_peer; i++) {
 for (j = 0; j<  s->peers[i].nb_eventfds; j++) {
-memory_region_add_eventfd(&s->ivshmem_mmio,
+memory_region_add_eventfd(&s->ivshmem_pio,
   DOORBELL,
   4,
   true,
@@ -491,7 +491,7 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, 
int flags)
 }

 if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
-if (kvm_set_ioeventfd_mmio_long(incoming_fd, s->mmio_addr + DOORBELL,
+if (kvm_set_ioeventfd_pio_long(incoming_fd, s->pio_addr + DOORBELL,
 (incoming_posn<<  16) | guest_max_eventfd, 1)<  0) {
 fprintf(stderr, "ivshmem: ioeventfd not available\n");
 }
@@ -656,16 +656,16 @@ static int pci_ivshmem_init(PCIDevice *dev)

 s->shm_fd = 0;

-memory_region_init_io(&s->ivshmem_mmio,&ivshmem_mmio_ops, s,
-  "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE);
+memory_region_init_io(&s->ivshmem_pio,&ivshmem_pio_ops, s,
+  "ivshmem-pio", IVSHMEM_REG_BAR_SIZE);

 if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
 setup_ioeventfds(s);
 }

 /* region for registers*/
-pci_register_bar(&s->dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
-&s->ivshmem_mmio);
+pci_register_bar(&s->dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
+&s->ivshmem_pio);

 memory_region_init(&s->bar, "ivshmem-bar2-container"

Re: [Qemu-devel] [PATCH 1/5] sh7750: convert memory controller/ioport to memory API

2011-11-17 Thread Avi Kivity

On 11/17/2011 02:24 PM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet 
>  
>  /* sh775x interrupt controller tables for sh_intc.c
> @@ -706,30 +714,40 @@ static CPUWriteMemoryFunc * const sh7750_mmct_write[] = 
> {
>  sh7750_mmct_writel
>  };
>  
> -SH7750State *sh7750_init(CPUSH4State * cpu)
> +SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion *sysmem)
>  {
>  SH7750State *s;
> -int sh7750_io_memory;
>  int sh7750_mm_cache_and_tlb; /* memory mapped cache and tlb */
>  
>  s = g_malloc0(sizeof(SH7750State));
>  s->cpu = cpu;
>  s->periph_freq = 6000;   /* 60MHz */
> -sh7750_io_memory = cpu_register_io_memory(sh7750_mem_read,
> -   sh7750_mem_write, s,
> -  DEVICE_NATIVE_ENDIAN);
> -cpu_register_physical_memory_offset(0x1f00, 0x1000,
> -sh7750_io_memory, 0x1f00);
> -cpu_register_physical_memory_offset(0xff00, 0x1000,
> -sh7750_io_memory, 0x1f00);
> -cpu_register_physical_memory_offset(0x1f80, 0x1000,
> -sh7750_io_memory, 0x1f80);
> -cpu_register_physical_memory_offset(0xff80, 0x1000,
> -sh7750_io_memory, 0x1f80);
> -cpu_register_physical_memory_offset(0x1fc0, 0x1000,
> -sh7750_io_memory, 0x1fc0);
> -cpu_register_physical_memory_offset(0xffc0, 0x1000,
> -sh7750_io_memory, 0x1fc0);
> +memory_region_init_io(&s->iomem, &sh7750_mem_ops, s,
> +  "memory", 0x1fe0);

Any size >= 0x1fc01000 will work here, why did you pick px1fe0? 
just curious.

I see serial starts at that address, but note that even a larger size
won't interfere, since we never add anything to sysmem at that address.

Anyway, no need to change the patch, since it will work just fine.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [v9 Patch 2/6]Qemu: Error classes for file reopen and data sync failure

2011-11-17 Thread Luiz Capitulino

On Fri, 11 Nov 2011 12:17:35 +0530
Supriya Kannery  wrote:

> New error classes defined for file reopen failure and data
> sync error
> 
> Signed-off-by: Supriya Kannery 
> 
> ---
>  qerror.c |8 
>  qerror.h |6 ++
>  2 files changed, 14 insertions(+)
> 
> Index: qemu/qerror.c
> ===
> --- qemu.orig/qerror.c
> +++ qemu/qerror.c
> @@ -97,6 +97,14 @@ static const QErrorStringTable qerror_ta
>  .desc  = "Device '%(device)' is not removable",
>  },
>  {
> +.error_fmt = QERR_DATA_SYNC_FAILED,
> +.desc  = "Syncing of data failed for device '%(device)'",
> +},
> +{
> +.error_fmt = QERR_REOPEN_FILE_FAILED,
> +.desc  = "Could not reopen '%(filename)'",
> +},

Is this really needed? I think you could use QERR_OPEN_FILE_FAILED.

> +{
>  .error_fmt = QERR_DEVICE_NO_BUS,
>  .desc  = "Device '%(device)' has no child bus",
>  },
> Index: qemu/qerror.h
> ===
> --- qemu.orig/qerror.h
> +++ qemu/qerror.h
> @@ -87,6 +87,9 @@ QError *qobject_to_qerror(const QObject 
>  #define QERR_DEVICE_NOT_FOUND \
>  "{ 'class': 'DeviceNotFound', 'data': { 'device': %s } }"
>  
> +#define QERR_DATA_SYNC_FAILED \
> +"{ 'class': 'DataSyncFailed', 'data': { 'device': %s } }"
> +
>  #define QERR_DEVICE_NOT_REMOVABLE \
>  "{ 'class': 'DeviceNotRemovable', 'data': { 'device': %s } }"
>  
> @@ -144,6 +147,9 @@ QError *qobject_to_qerror(const QObject 
>  #define QERR_OPEN_FILE_FAILED \
>  "{ 'class': 'OpenFileFailed', 'data': { 'filename': %s } }"
>  
> +#define QERR_REOPEN_FILE_FAILED \
> +"{ 'class': 'ReopenFileFailed', 'data': { 'filename': %s } }"
> +
>  #define QERR_PROPERTY_NOT_FOUND \
>  "{ 'class': 'PropertyNotFound', 'data': { 'device': %s, 'property': %s } 
> }"
>  
>

Re: [Qemu-devel] [PATCH 4/5] sh_intc: convert interrupt controller to memory API

2011-11-17 Thread Peter Maydell

2011/11/17 Benoît Canet :
> Signed-off-by: Benoit Canet 
> --- a/hw/sh7750.c
> +++ b/hw/sh7750.c
> @@ -756,7 +756,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
> *sysmem)
>                           "cache-and-tlb", 0x0800);
>     memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
>
> -    sh_intc_init(&s->intc, NR_SOURCES,
> +    sh_intc_init(sysmem, &s->intc, NR_SOURCES,
>                 _INTC_ARRAY(mask_registers),
>                 _INTC_ARRAY(prio_registers));

This would be nicer as a sysbus device so we didn't have to hand
it the sysmem, but that can be done later if we care enough I guess.

> +    iomem = &desc->iomem;
> +    iomem_p4 = desc->iomem_aliases + index;
> +    iomem_a7 = iomem_p4 + 1;
> +
> +#define SH_INTC_IOMEM_FORMAT "interrupt-controller-%s-%s-%s"
> +    snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "p4");
> +    memory_region_init_alias(iomem_p4, name, iomem, INTC_A7(address), 4);
> +    memory_region_add_subregion(sysmem, P4ADDR(address), iomem_p4);
> +
> +    snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "a7");
> +    memory_region_init_alias(iomem_a7, name, iomem, INTC_A7(address), 4);
> +    memory_region_add_subregion(sysmem, A7ADDR(address), iomem_a7);
> +#undef SH_INTC_IOMEM_FORMAT
> +
> +    /* used to increment aliases index */
> +    return 2;

This is going to give us 6 * 2 * 2 = 24 four-byte memory regions,
incidentally. That should be OK, but one memory region per register
is an interesting arrangement.

> @@ -430,7 +447,11 @@ int sh_intc_init(struct intc_desc *desc,
>     desc->nr_mask_regs = nr_mask_regs;
>     desc->prio_regs = prio_regs;
>     desc->nr_prio_regs = nr_prio_regs;
> +    /* Allocate 4 MemoryRegions per register (2 actions * 2 aliases). */
> +    desc->iomem_aliases = g_new0(MemoryRegion,
> +                                 (nr_mask_regs + nr_prio_regs) * 4);

This should be exactly the right size...

> +    /* free unused MemoryRegions */
> +    desc->iomem_aliases = g_realloc(desc->iomem_aliases,
> +                                    sizeof(MemoryRegion)*j);

making this realloc unnecessary; or am I missing something?

-- PMM

Re: [Qemu-devel] [PATCH 4/5] sh_intc: convert interrupt controller to memory API

2011-11-17 Thread Avi Kivity

On 11/17/2011 02:56 PM, Peter Maydell wrote:
> 2011/11/17 Benoît Canet :
> > Signed-off-by: Benoit Canet 
> > --- a/hw/sh7750.c
> > +++ b/hw/sh7750.c
> > @@ -756,7 +756,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, 
> > MemoryRegion *sysmem)
> >   "cache-and-tlb", 0x0800);
> > memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
> >
> > -sh_intc_init(&s->intc, NR_SOURCES,
> > +sh_intc_init(sysmem, &s->intc, NR_SOURCES,
> > _INTC_ARRAY(mask_registers),
> > _INTC_ARRAY(prio_registers));
>
> This would be nicer as a sysbus device so we didn't have to hand
> it the sysmem, but that can be done later if we care enough I guess.

Later, yes.

>
> > +iomem = &desc->iomem;
> > +iomem_p4 = desc->iomem_aliases + index;
> > +iomem_a7 = iomem_p4 + 1;
> > +
> > +#define SH_INTC_IOMEM_FORMAT "interrupt-controller-%s-%s-%s"
> > +snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "p4");
> > +memory_region_init_alias(iomem_p4, name, iomem, INTC_A7(address), 4);
> > +memory_region_add_subregion(sysmem, P4ADDR(address), iomem_p4);
> > +
> > +snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "a7");
> > +memory_region_init_alias(iomem_a7, name, iomem, INTC_A7(address), 4);
> > +memory_region_add_subregion(sysmem, A7ADDR(address), iomem_a7);
> > +#undef SH_INTC_IOMEM_FORMAT
> > +
> > +/* used to increment aliases index */
> > +return 2;
>
> This is going to give us 6 * 2 * 2 = 24 four-byte memory regions,
> incidentally. That should be OK, but one memory region per register
> is an interesting arrangement.

In fact if we introduce a Register class there's no reason it won't be a
MemoryRegion.  So any Register would be a MemoryRegion, we could have
thousands in a system.  I don't see anything wrong with it, do you?

>
> > @@ -430,7 +447,11 @@ int sh_intc_init(struct intc_desc *desc,
> > desc->nr_mask_regs = nr_mask_regs;
> > desc->prio_regs = prio_regs;
> > desc->nr_prio_regs = nr_prio_regs;
> > +/* Allocate 4 MemoryRegions per register (2 actions * 2 aliases). */
> > +desc->iomem_aliases = g_new0(MemoryRegion,
> > + (nr_mask_regs + nr_prio_regs) * 4);
>
> This should be exactly the right size...
>
> > +/* free unused MemoryRegions */
> > +desc->iomem_aliases = g_realloc(desc->iomem_aliases,
> > +sizeof(MemoryRegion)*j);
>
> making this realloc unnecessary; or am I missing something?
>

Not all calls to sh_intc_register() return 2.

However, calling realloc() in a MemoryRegion array is not a good idea, since 
the pointers may leak (memory_region_add_subregion() does this).  It's true 
that a size-reducing realloc doesn't change the pointer, yet it makes me 
uncomfortable.


-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 1/5] Fix spelling in documentation and comments (similiar -> similar)

2011-11-17 Thread Stefan Hajnoczi

From: Stefan Weil 

This bug was detected by codespell.
In mips_mipssim.c a grammatical error was fixed, too.

Signed-off-by: Stefan Weil 
Signed-off-by: Stefan Hajnoczi 
---
 docs/libcacard.txt |2 +-
 hw/mips_mipssim.c  |2 +-
 qemu-doc.texi  |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/libcacard.txt b/docs/libcacard.txt
index 5dee6fa..ae010d2 100644
--- a/docs/libcacard.txt
+++ b/docs/libcacard.txt
@@ -399,7 +399,7 @@ functions:
   This function will automatically generate the appropriate new reader
   events and add the reader to the list.
 
-  To create a new card, the virtual card emulator will call a similiar
+  To create a new card, the virtual card emulator will call a similar
   function.
 
VCard *vcard_new(VCardEmul *card_emul,
diff --git a/hw/mips_mipssim.c b/hw/mips_mipssim.c
index 7407158..b56cba6 100644
--- a/hw/mips_mipssim.c
+++ b/hw/mips_mipssim.c
@@ -1,7 +1,7 @@
 /*
  * QEMU/mipssim emulation
  *
- * Emulates a very simple machine model similiar to the one use by the
+ * Emulates a very simple machine model similar to the one used by the
  * proprietary MIPS emulator.
  * 
  * Copyright (c) 2007 Thiemo Seufer
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 149e9bd..154b82d 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1749,7 +1749,7 @@ PC Keyboard
 IDE controller
 @end itemize
 
-The mipssim pseudo board emulation provides an environment similiar
+The mipssim pseudo board emulation provides an environment similar
 to what the proprietary MIPS emulator uses for running Linux.
 It supports:
 
@@ -2141,7 +2141,7 @@ Xtensa emulator pseudo board "sim"
 Avnet LX60/LX110/LX200 board
 @end itemize
 
-The sim pseudo board emulation provides an environment similiar
+The sim pseudo board emulation provides an environment similar
 to one provided by the proprietary Tensilica ISS.
 It supports:
 
-- 
1.7.7.1

[Qemu-devel] [PULL 1.0 0/5] Trivial patches for 11 to 17 November 2011

2011-11-17 Thread Stefan Hajnoczi

These bug fixes and documentation fixes are suitable for 1.0.  The purely
internal trivial patches are being queued up for 1.1 in the
trivial-patches-next tree.

The following changes since commit 3f5bd4e1b874590d3d76e031530799a4610da6dc:

  Update version to 1.0-rc2 (2011-11-14 11:26:32 -0600)

are available in the git repository at:
  ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches

Markus Armbruster (1):
  monitor: Fix file_completion() to check for stat() failure

Matthias Brugger (1):
  Fixing some spelling in docs/libcacard.txt

Stefan Weil (2):
  Fix spelling in documentation and comments (similiar -> similar)
  Fix some spelling bugs in documentation and comments

Vagrant Cascadian (1):
  Fix typo: runnning -> running

 block-migration.c  |2 +-
 docs/libcacard.txt |   20 ++--
 docs/qapi-code-gen.txt |2 +-
 hw/mips_mipssim.c  |2 +-
 monitor.c  |4 ++--
 qemu-doc.texi  |4 ++--
 target-i386/kvm.c  |2 +-
 7 files changed, 18 insertions(+), 18 deletions(-)

-- 
1.7.7.1

[Qemu-devel] [PATCH 3/5] Fix typo: runnning -> running

2011-11-17 Thread Stefan Hajnoczi

From: Vagrant Cascadian 

One n too many for running, need we say more.

Signed-Off-By: Vagrant Cascadian 

Signed-off-by: Stefan Hajnoczi 
---
 target-i386/kvm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ddd115c..5bfc21f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1845,7 +1845,7 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run 
*run)
 code);
 if (host_supports_vmx() && code == VMX_INVALID_GUEST_STATE) {
 fprintf(stderr,
-"\nIf you're runnning a guest on an Intel machine without "
+"\nIf you're running a guest on an Intel machine without "
 "unrestricted mode\n"
 "support, the failure can be most likely due to the guest "
 "entering an invalid\n"
-- 
1.7.7.1

[Qemu-devel] [PATCH 5/5] monitor: Fix file_completion() to check for stat() failure

2011-11-17 Thread Stefan Hajnoczi

From: Markus Armbruster 

stat() can fail for a file name just read with readdir().  Easiest way
to trigger is a dangling symbolic link --- look ma, no race!  When it
fails, file_completion() uses sb.st_mode uninitialized.  If the
directory bit happens to be set, it appends a "/" to the completed
name.

Signed-off-by: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
---
 monitor.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/monitor.c b/monitor.c
index 5ea35de..1be222e 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4207,9 +4207,9 @@ static void file_completion(const char *input)
 /* stat the file to find out if it's a directory.
  * In that case add a slash to speed up typing long paths
  */
-stat(file, &sb);
-if(S_ISDIR(sb.st_mode))
+if (stat(file, &sb) == 0 && S_ISDIR(sb.st_mode)) {
 pstrcat(file, sizeof(file), "/");
+}
 readline_add_completion(cur_mon->rs, file);
 }
 }
-- 
1.7.7.1

[Qemu-devel] [PATCH 4/5] Fixing some spelling in docs/libcacard.txt

2011-11-17 Thread Stefan Hajnoczi

From: Matthias Brugger 

Reviewed-by: Alon Levy 
Signed-off-by: Matthias Brugger 
Signed-off-by: Stefan Hajnoczi 
---
 docs/libcacard.txt |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/libcacard.txt b/docs/libcacard.txt
index 296706a..f7d7519 100644
--- a/docs/libcacard.txt
+++ b/docs/libcacard.txt
@@ -170,7 +170,7 @@ public entry point:
int cert_count);
 
   The parameters for this are:
-  card   - the virtual card structure which will prepresent this card.
+  card   - the virtual card structure which will represent this card.
   flags  - option flags that may be specific to this card type.
   cert   - array of binary certificates.
   cert_len   - array of lengths of each of the certificates specified in cert.
@@ -179,7 +179,7 @@ public entry point:
   cert_count - number of entries in cert, cert_len, and key arrays.
 
   Any cert, cert_len, or key with the same index are matching sets. That is
-  cert[0] is cert_len[0] long and has the corresponsing private key of key[0].
+  cert[0] is cert_len[0] long and has the corresponding private key of key[0].
 
 The card type emulator is expected to own the VCardKeys, but it should copy
 any raw cert data it wants to save. It can create new applets and add them to
@@ -261,7 +261,7 @@ Prior to processing calling the card type emulator's 
VCardProcessAPDU function,
apdu->a_Le   - The expected length of any returned data.
apdu->a_cla  - The raw apdu class.
apdu->a_channel - The channel (decoded from the class).
-   apdu->a_secure_messaging_type - The decoded secure messagin type
+   apdu->a_secure_messaging_type - The decoded secure messaging type
(from class).
apdu->a_type - The decode class type.
apdu->a_gen_type - the generic class type (7816, PROPRIETARY, RFU, PTS).
@@ -273,7 +273,7 @@ Creating a Response --
 
 The expected result of any APDU call is a response. The card type emulator must
 set *response with an appropriate VCardResponse value if it returns VCARD_DONE.
-Reponses could be as simple as returning a 2 byte status word response, to as
+Responses could be as simple as returning a 2 byte status word response, to as
 complex as returning a block of data along with a 2 byte response. Which is
 returned will depend on the semantics of the APDU. The following functions will
 create card responses.
@@ -282,12 +282,12 @@ create card responses.
 
 This is the most basic function to get a response. This function will
 return a response the consists solely one 2 byte status code. If that 
status
-code is defined in card_7816t.h, then this function is guarrenteed to
+code is defined in card_7816t.h, then this function is guaranteed to
 return a response with that status. If a cart type specific status code
 is passed and vcard_make_response fails to allocate the appropriate memory
 for that response, then vcard_make_response will return a VCardResponse
 of VCARD7816_STATUS_EXC_ERROR_MEMORY. In any case, this function is
-guarrenteed to return a valid VCardResponse.
+guaranteed to return a valid VCardResponse.
 
 VCardResponse *vcard_response_new(unsigned char *buf, int len,
   VCard7816Status status);
-- 
1.7.7.1

[Qemu-devel] [PATCH 2/5] Fix some spelling bugs in documentation and comments

2011-11-17 Thread Stefan Hajnoczi

From: Stefan Weil 

These errors were detected by codespell:

remaing -> remaining
soley -> solely
virutal -> virtual
seperate -> separate

libcacard.txt still needs some more patches.

Signed-off-by: Stefan Weil 
Signed-off-by: Stefan Hajnoczi 
---
 block-migration.c  |2 +-
 docs/libcacard.txt |6 +++---
 docs/qapi-code-gen.txt |2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 0bff075..5f104864 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -521,7 +521,7 @@ static int is_stage2_completed(void)
 
 if ((remaining_dirty / bwidth) <=
 migrate_max_downtime()) {
-/* finish stage2 because we think that we can finish remaing work
+/* finish stage2 because we think that we can finish remaining work
below max_downtime */
 
 return 1;
diff --git a/docs/libcacard.txt b/docs/libcacard.txt
index ae010d2..296706a 100644
--- a/docs/libcacard.txt
+++ b/docs/libcacard.txt
@@ -281,7 +281,7 @@ create card responses.
 VCardResponse *vcard_make_response(VCard7816Status status);
 
 This is the most basic function to get a response. This function will
-return a response the consists soley one 2 byte status code. If that status
+return a response the consists solely one 2 byte status code. If that 
status
 code is defined in card_7816t.h, then this function is guarrenteed to
 return a response with that status. If a cart type specific status code
 is passed and vcard_make_response fails to allocate the appropriate memory
@@ -327,7 +327,7 @@ and applet.
 
 int vcard_emul_get_login_count(VCard *card);
 
-This function returns the the number of remaing login attempts for this
+This function returns the the number of remaining login attempts for this
 card. If the card emulator does not know, or the card does not have a
 way of giving this information, this function returns -1.
 
@@ -373,7 +373,7 @@ functions:
 
   The options structure is built by another function in the virtual card
   interface where a string of virtual card emulator specific strings are
-  mapped to the options. The actual structure is defined by the virutal card
+  mapped to the options. The actual structure is defined by the virtual card
   emulator and is used to determine the configuration of soft cards, or to
   determine which physical cards to present to the guest.
 
diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt
index c0a9325..5831e37 100644
--- a/docs/qapi-code-gen.txt
+++ b/docs/qapi-code-gen.txt
@@ -14,7 +14,7 @@ To map QMP-defined interfaces to the native C QAPI 
implementations,
 a JSON-based schema is used to define types and function
 signatures, and a set of scripts is used to generate types/signatures,
 and marshaling/dispatch code. The QEMU Guest Agent also uses these
-scripts, paired with a seperate schema, to generate
+scripts, paired with a separate schema, to generate
 marshaling/dispatch code for the guest agent server running in the
 guest.
 
-- 
1.7.7.1

Re: [Qemu-devel] [PATCH 4/5] sh_intc: convert interrupt controller to memory API

2011-11-17 Thread Benoît Canet

Yes, allocating the exact size would make the realloc unnecessary. But it
involve more code to walk one more time through the mask_reg and prio_reg
array before allocating.
I made it this way to make code shorter.

The freed MemoryRegion are not initialized at all.

However as realloc seems to be a bad idea I need something to compute the
exact size in a minimum of code.

2011/11/17 Peter Maydell 

> 2011/11/17 Benoît Canet :
> > Signed-off-by: Benoit Canet 
> > --- a/hw/sh7750.c
> > +++ b/hw/sh7750.c
> > @@ -756,7 +756,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu,
> MemoryRegion *sysmem)
> >   "cache-and-tlb", 0x0800);
> > memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
> >
> > -sh_intc_init(&s->intc, NR_SOURCES,
> > +sh_intc_init(sysmem, &s->intc, NR_SOURCES,
> > _INTC_ARRAY(mask_registers),
> > _INTC_ARRAY(prio_registers));
>
> This would be nicer as a sysbus device so we didn't have to hand
> it the sysmem, but that can be done later if we care enough I guess.
>
> > +iomem = &desc->iomem;
> > +iomem_p4 = desc->iomem_aliases + index;
> > +iomem_a7 = iomem_p4 + 1;
> > +
> > +#define SH_INTC_IOMEM_FORMAT "interrupt-controller-%s-%s-%s"
> > +snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action,
> "p4");
> > +memory_region_init_alias(iomem_p4, name, iomem, INTC_A7(address),
> 4);
> > +memory_region_add_subregion(sysmem, P4ADDR(address), iomem_p4);
> > +
> > +snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action,
> "a7");
> > +memory_region_init_alias(iomem_a7, name, iomem, INTC_A7(address),
> 4);
> > +memory_region_add_subregion(sysmem, A7ADDR(address), iomem_a7);
> > +#undef SH_INTC_IOMEM_FORMAT
> > +
> > +/* used to increment aliases index */
> > +return 2;
>
> This is going to give us 6 * 2 * 2 = 24 four-byte memory regions,
> incidentally. That should be OK, but one memory region per register
> is an interesting arrangement.
>
> > @@ -430,7 +447,11 @@ int sh_intc_init(struct intc_desc *desc,
> > desc->nr_mask_regs = nr_mask_regs;
> > desc->prio_regs = prio_regs;
> > desc->nr_prio_regs = nr_prio_regs;
> > +/* Allocate 4 MemoryRegions per register (2 actions * 2 aliases). */
> > +desc->iomem_aliases = g_new0(MemoryRegion,
> > + (nr_mask_regs + nr_prio_regs) * 4);
>
> This should be exactly the right size...
>
> > +/* free unused MemoryRegions */
> > +desc->iomem_aliases = g_realloc(desc->iomem_aliases,
> > +sizeof(MemoryRegion)*j);
>
> making this realloc unnecessary; or am I missing something?
>
> -- PMM
>

Re: [Qemu-devel] [v9 Patch 3/6]Qemu: Cmd "block_set_hostcache" for dynamic cache change

2011-11-17 Thread Luiz Capitulino

On Fri, 11 Nov 2011 12:17:48 +0530
Supriya Kannery  wrote:

> New command "block_set_hostcache" added for dynamically changing 
> host pagecache setting of a block device.
> 
> Usage: 
>  block_set_hostcache   
> = block device
> = on/off
> 
> Example:
>  (qemu) block_set_hostcache ide0-hd0 off
> 
> Signed-off-by: Supriya Kannery 
> 
> ---
>  block.c |   54 ++
>  block.h |2 ++
>  blockdev.c  |   26 ++
>  blockdev.h  |2 ++
>  hmp-commands.hx |   14 ++
>  qmp-commands.hx |   27 +++
>  6 files changed, 125 insertions(+)
> 
> Index: qemu/block.c
> ===
> --- qemu.orig/block.c
> +++ qemu/block.c
> @@ -696,6 +696,35 @@ unlink_and_fail:
>  return ret;
>  }
>  
> +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
> +{
> +BlockDriver *drv = bs->drv;
> +int ret = 0, open_flags;
> +
> +/* Quiesce IO for the given block device */
> +qemu_aio_flush();
> +ret = bdrv_flush(bs);
> +if (ret != 0) {
> +qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name);
> +return ret;
> +}
> +open_flags = bs->open_flags;
> +bdrv_close(bs);
> +
> +ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
> +if (ret < 0) {
> +/* Reopen failed. Try to open with original flags */
> +qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
> +ret = bdrv_open(bs, bs->filename, open_flags, drv);
> +if (ret < 0) {
> +/* Reopen failed with orig and modified flags */
> +abort();
> +}
> +}
> +
> +return ret;
> +}

In this thread:

 http://lists.gnu.org/archive/html/qemu-devel/2011-11/msg01271.html

Juan uses a similar method (well, at least it looks similar to me)
to fix a problem with migration. However, it was said that that method
can cause problems with -snapshot and encrypted images.

Won't we have the same sort of problems with this series?


> +
>  void bdrv_close(BlockDriverState *bs)
>  {
>  if (bs->drv) {
> @@ -733,6 +762,32 @@ void bdrv_close_all(void)
>  }
>  }
>  
> +int bdrv_change_hostcache(BlockDriverState *bs, bool enable_host_cache)
> +{
> +int bdrv_flags = bs->open_flags;
> +
> +/* set hostcache flags (without changing WCE/flush bits) */
> +if (enable_host_cache) {
> +bdrv_flags &= ~BDRV_O_NOCACHE;
> +} else {
> +bdrv_flags |= BDRV_O_NOCACHE;
> +}
> +
> +/* If no change in flags, no need to reopen */
> +if (bdrv_flags == bs->open_flags) {
> +return 0;
> +}
> +
> +if (bdrv_is_inserted(bs)) {
> +/* Reopen file with changed set of flags */
> +return bdrv_reopen(bs, bdrv_flags);
> +} else {
> +/* Save hostcache change for future use */
> +bs->open_flags = bdrv_flags;
> +return 0;
> +}

I'm wondering if the simplest (and best) thing to do here is to fail
if the drive is not inserted. Just wondering, not exactly asking you
to change it. But it should at least be clearly documented if you keep it.

> +}
> +
>  /* make a BlockDriverState anonymous by removing from bdrv_state list.
> Also, NULL terminate the device_name to prevent double remove */
>  void bdrv_make_anon(BlockDriverState *bs)
> Index: qemu/block.h
> ===
> --- qemu.orig/block.h
> +++ qemu/block.h
> @@ -104,6 +104,7 @@ int bdrv_parse_cache_flags(const char *m
>  int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
>  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>BlockDriver *drv);
> +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags);
>  void bdrv_close(BlockDriverState *bs);
>  int bdrv_attach_dev(BlockDriverState *bs, void *dev);
>  void bdrv_attach_dev_nofail(BlockDriverState *bs, void *dev);
> @@ -138,6 +139,7 @@ void bdrv_commit_all(void);
>  int bdrv_change_backing_file(BlockDriverState *bs,
>  const char *backing_file, const char *backing_fmt);
>  void bdrv_register(BlockDriver *bdrv);
> +int bdrv_change_hostcache(BlockDriverState *bs, bool enable_host_cache);
>  
>  
>  typedef struct BdrvCheckResult {
> Index: qemu/blockdev.c
> ===
> --- qemu.orig/blockdev.c
> +++ qemu/blockdev.c
> @@ -776,3 +776,29 @@ int do_block_resize(Monitor *mon, const 
>  
>  return 0;
>  }
> +
> +
> +/*
> + * Change host page cache setting while guest is running.
> +*/
> +int do_block_set_hostcache(Monitor *mon, const QDict *qdict,
> +   QObject **ret_data)
> +{
> +BlockDriverState *bs = NULL;
> +int enable;
> +const char *device;
> +
> +/* Validate device */
> +device = qdict_get_str(qdict, "device");
> +bs = bdrv_find(device);
> +if (!bs) {
> +qerror_report(QERR_DEVICE_NOT_F

Re: [Qemu-devel] [v9 Patch 3/6]Qemu: Cmd "block_set_hostcache" for dynamic cache change

2011-11-17 Thread Luiz Capitulino

On Thu, 17 Nov 2011 11:15:06 +0530
Supriya Kannery  wrote:

> On 11/17/2011 12:04 AM, Stefan Hajnoczi wrote:
> > On Fri, Nov 11, 2011 at 6:47 AM, Supriya Kannery
> >   wrote:
> >> +{
> >> +.name   = "block_set_hostcache",
> >> +.args_type  = "device:B,option:b",
> >> +.params = "device on|off",
> >> +.help   = "Change setting of host pagecache",
> >> +.user_print = monitor_user_noop,
> >> +.mhandler.cmd_new = do_block_set_hostcache,
> >> +},
> >> +STEXI
> >> +@item block_set_hostcache @var{device} @var{setting}
> >
> > @var{option}
> 
> Will send updated patch
> 
> >
> >> +@findex block_set_hostcache
> >> +Change host pagecache setting of a block device while guest is running.
> >> +ETEXI
> >> +
> >>
> >>  {
> >>  .name   = "eject",
> >> Index: qemu/qmp-commands.hx
> >> ===
> >> --- qemu.orig/qmp-commands.hx
> >> +++ qemu/qmp-commands.hx
> >> @@ -716,7 +716,34 @@ Example:
> >>
> >>   EQMP
> >>
> >> +
> >>  {
> >> +.name   = "block_set_hostcache",
> >> +.args_type  = "device:B,option:b",
> >> +.params = "device on|off",
> >> +.help   = "Change setting of host pagecache (true|false)",
> >
> > It would be more consistent to use "on|off" instead of "true|false".
> > Or eliminate it entirely by saying "Enable or disable host pagecache
> > usage".
> >
> > Stefan
> >
> 
> Followed similar way how set_link is done.
> Specified 'true/false' in brackets as 'on' or 'off' are not accepted as
> bool parameter in qmp prompt.

on/off is used in HMP, while true/false is used in QMP.

Re: [Qemu-devel] [v9 Patch 5/6]Qemu: Framework for reopening images safely

2011-11-17 Thread Luiz Capitulino

On Fri, 11 Nov 2011 12:18:18 +0530
Supriya Kannery  wrote:

> Struct BDRVReopenState along with three reopen related functions
> introduced for handling reopen state of images safely. This can be
> extended by each of the block drivers to reopen respective
> image files.

Shouldn't this patch come before the one introducing the QMP command?

> 
> Signed-off-by: Supriya Kannery 
> 
> Index: qemu/block.c
> ===
> --- qemu.orig/block.c
> +++ qemu/block.c
> @@ -696,10 +696,33 @@ unlink_and_fail:
>  return ret;
>  }
>  
> +int bdrv_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs, int 
> flags)
> +{
> + BlockDriver *drv = bs->drv;
> +
> + return drv->bdrv_reopen_prepare(bs, prs, flags);
> +}
> +
> +void bdrv_reopen_commit(BlockDriverState *bs, BDRVReopenState *rs, int flags)
> +{
> +BlockDriver *drv = bs->drv;
> +
> +drv->bdrv_reopen_commit(bs, rs, flags);
> +bs->open_flags = flags;
> +}
> +
> +void bdrv_reopen_abort(BlockDriverState *bs, BDRVReopenState *rs)
> +{
> +BlockDriver *drv = bs->drv;
> +
> +drv->bdrv_reopen_abort(bs, rs);
> +}
> +
>  int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>  {
>  BlockDriver *drv = bs->drv;
>  int ret = 0, open_flags;
> +BDRVReopenState *reopen_state = NULL;
>  
>  /* Quiesce IO for the given block device */
>  qemu_aio_flush();
> @@ -708,17 +731,31 @@ int bdrv_reopen(BlockDriverState *bs, in
>  qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name);
>  return ret;
>  }
> -open_flags = bs->open_flags;
> -bdrv_close(bs);
>  
> -ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
> -if (ret < 0) {
> -/* Reopen failed. Try to open with original flags */
> -qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
> -ret = bdrv_open(bs, bs->filename, open_flags, drv);
> +/* Use driver specific reopen() if available */
> +if (drv->bdrv_reopen_prepare) {
> +ret = bdrv_reopen_prepare(bs, &reopen_state, bdrv_flags);
> + if (ret < 0) {
> +bdrv_reopen_abort(bs, reopen_state);
> +qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
> +return ret;
> +}
> +
> +bdrv_reopen_commit(bs, reopen_state, bdrv_flags);
> +
> +} else {
> +   open_flags = bs->open_flags;
> +   bdrv_close(bs);
> +
> +   ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
>  if (ret < 0) {
> -/* Reopen failed with orig and modified flags */
> -abort();
> +/* Reopen failed. Try to open with original flags */
> +qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
> +ret = bdrv_open(bs, bs->filename, open_flags, drv);
> +if (ret < 0) {
> +/* Reopen failed with orig and modified flags */
> +bs->drv = NULL;
> +}
>  }
>  }
>  
> Index: qemu/block_int.h
> ===
> --- qemu.orig/block_int.h
> +++ qemu/block_int.h
> @@ -56,6 +56,14 @@ struct BlockDriver {
>  int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char 
> *filename);
>  int (*bdrv_probe_device)(const char *filename);
>  int (*bdrv_open)(BlockDriverState *bs, int flags);
> +
> +/* For handling image reopen for split or non-split files */
> +int (*bdrv_reopen_prepare)(BlockDriverState *bs,
> +   BDRVReopenState **prs,
> +   int flags);
> +void (*bdrv_reopen_commit)(BlockDriverState *bs, BDRVReopenState *rs,
> +   int flags);
> +void (*bdrv_reopen_abort)(BlockDriverState *bs, BDRVReopenState *rs);
>  int (*bdrv_file_open)(BlockDriverState *bs, const char *filename, int 
> flags);
>  int (*bdrv_read)(BlockDriverState *bs, int64_t sector_num,
>   uint8_t *buf, int nb_sectors);
> @@ -213,6 +221,11 @@ struct BlockDriverState {
>  void *private;
>  };
>  
> +struct BDRVReopenState {
> +BlockDriverState *bs;
> +int reopen_flags;
> +};
> +
>  struct BlockDriverAIOCB {
>  AIOPool *pool;
>  BlockDriverState *bs;
> Index: qemu/qemu-common.h
> ===
> --- qemu.orig/qemu-common.h
> +++ qemu/qemu-common.h
> @@ -203,6 +203,7 @@ typedef struct NICInfo NICInfo;
>  typedef struct HCIInfo HCIInfo;
>  typedef struct AudioState AudioState;
>  typedef struct BlockDriverState BlockDriverState;
> +typedef struct BDRVReopenState BDRVReopenState;
>  typedef struct DriveInfo DriveInfo;
>  typedef struct DisplayState DisplayState;
>  typedef struct DisplayChangeListener DisplayChangeListener;
> Index: qemu/block.h
> ===
> --- qemu.orig/block.h
> +++ qemu/block.h
> @@ -105,6 +105,9 @@ int bdrv_file_open(BlockDriverState **pb
>  int bdrv_ope

Re: [Qemu-devel] [PATCH 1/4] Makefile: remove more generated files on clean

2011-11-17 Thread Luiz Capitulino

On Wed, 16 Nov 2011 23:58:46 +0200
"Michael S. Tsirkin"  wrote:

> make clean missed the source qmp files generated
> by python. Fix that.
> 
> Signed-off-by: Michael S. Tsirkin 

Michael, this series is for 1.0, right?

> ---
>  Makefile |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 168093c..b335f2a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -8,6 +8,7 @@ ifeq ($(TRACE_BACKEND),dtrace)
>  GENERATED_HEADERS += trace-dtrace.h
>  endif
>  GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
> +GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c
>  
>  ifneq ($(wildcard config-host.mak),)
>  # Put the all: rule here so that config-host.mak can contain dependencies.
> @@ -227,6 +228,7 @@ clean:
>   rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp
>   rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp
>   rm -f trace-dtrace.h trace-dtrace.h-timestamp
> + rm -f $(GENERATED_SOURCES)
>   rm -rf $(qapi-dir)
>   $(MAKE) -C tests clean
>   for d in $(ALL_SUBDIRS) $(QEMULIBS) libcacard; do \

[Qemu-devel] [[PATCH V2] 0/5]

2011-11-17 Thread Benoît Canet

These patches converts the remaining sh4 devices to the memory API.
The patch "sh_intc: convert interrupt controller to memory API" is
somewhat tricky

V2:
Cosmetic change of a memory region size in
"sh7750: convert memory controller/ioport to memory API".

Remove the realloc in "convert interrupt controller to memory API"
to make is safer even we loose some extra pointers.

Benoît Canet (5):
  sh7750: convert memory controller/ioport to memory API
  sh7750: convert cache and tlb to memory API
  sh_timer: convert to memory API
  sh_intc: convert interrupt controller to memory API
  sh_serial: convert to memory API

 hw/r2d.c|2 +-
 hw/sh.h |9 ++-
 hw/sh7750.c |  155 +--
 hw/sh_intc.c|   85 +++-
 hw/sh_intc.h|7 ++-
 hw/sh_serial.c  |   55 ++
 hw/sh_timer.c   |   43 --
 hw/shix.c   |2 +-
 target-sh4/helper.c |3 +
 9 files changed, 215 insertions(+), 146 deletions(-)

-- 
1.7.5.4

[Qemu-devel] [[PATCH V2] 1/5] sh7750: convert memory controller/ioport to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/r2d.c|2 +-
 hw/sh.h |3 +-
 hw/sh7750.c |   72 --
 hw/shix.c   |2 +-
 4 files changed, 49 insertions(+), 30 deletions(-)

diff --git a/hw/r2d.c b/hw/r2d.c
index a9aefa2..9b6fcba 100644
--- a/hw/r2d.c
+++ b/hw/r2d.c
@@ -250,7 +250,7 @@ static void r2d_init(ram_addr_t ram_size,
 memory_region_init_ram(sdram, NULL, "r2d.sdram", SDRAM_SIZE);
 memory_region_add_subregion(address_space_mem, SDRAM_BASE, sdram);
 /* Register peripherals */
-s = sh7750_init(env);
+s = sh7750_init(env, address_space_mem);
 irq = r2d_fpga_init(address_space_mem, 0x0400, sh7750_irl(s));
 sysbus_create_varargs("sh_pci", 0x1e20, irq[PCI_INTA], irq[PCI_INTB],
   irq[PCI_INTC], irq[PCI_INTD], NULL);
diff --git a/hw/sh.h b/hw/sh.h
index d30e9f5..cf3f6f6 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -9,8 +9,9 @@
 
 /* sh7750.c */
 struct SH7750State;
+struct MemoryRegion;
 
-struct SH7750State *sh7750_init(CPUState * cpu);
+struct SH7750State *sh7750_init(CPUState * cpu, struct MemoryRegion *sysmem);
 
 typedef struct {
 /* The callback will be triggered if any of the designated lines change */
diff --git a/hw/sh7750.c b/hw/sh7750.c
index 9f3ea92..3bf568d 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -30,10 +30,18 @@
 #include "sh7750_regnames.h"
 #include "sh_intc.h"
 #include "cpu.h"
+#include "exec-memory.h"
 
 #define NB_DEVICES 4
 
 typedef struct SH7750State {
+MemoryRegion iomem;
+MemoryRegion iomem_1f0;
+MemoryRegion iomem_ff0;
+MemoryRegion iomem_1f8;
+MemoryRegion iomem_ff8;
+MemoryRegion iomem_1fc;
+MemoryRegion iomem_ffc;
 /* CPU */
 CPUSH4State *cpu;
 /* Peripheral frequency in Hz */
@@ -436,16 +444,16 @@ static void sh7750_mem_writel(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static CPUReadMemoryFunc * const sh7750_mem_read[] = {
-sh7750_mem_readb,
-sh7750_mem_readw,
-sh7750_mem_readl
-};
-
-static CPUWriteMemoryFunc * const sh7750_mem_write[] = {
-sh7750_mem_writeb,
-sh7750_mem_writew,
-sh7750_mem_writel
+static const MemoryRegionOps sh7750_mem_ops = {
+.old_mmio = {
+.read = {sh7750_mem_readb,
+ sh7750_mem_readw,
+ sh7750_mem_readl },
+.write = {sh7750_mem_writeb,
+  sh7750_mem_writew,
+  sh7750_mem_writel },
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 /* sh775x interrupt controller tables for sh_intc.c
@@ -706,30 +714,40 @@ static CPUWriteMemoryFunc * const sh7750_mmct_write[] = {
 sh7750_mmct_writel
 };
 
-SH7750State *sh7750_init(CPUSH4State * cpu)
+SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion *sysmem)
 {
 SH7750State *s;
-int sh7750_io_memory;
 int sh7750_mm_cache_and_tlb; /* memory mapped cache and tlb */
 
 s = g_malloc0(sizeof(SH7750State));
 s->cpu = cpu;
 s->periph_freq = 6000; /* 60MHz */
-sh7750_io_memory = cpu_register_io_memory(sh7750_mem_read,
- sh7750_mem_write, s,
-  DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory_offset(0x1f00, 0x1000,
-sh7750_io_memory, 0x1f00);
-cpu_register_physical_memory_offset(0xff00, 0x1000,
-sh7750_io_memory, 0x1f00);
-cpu_register_physical_memory_offset(0x1f80, 0x1000,
-sh7750_io_memory, 0x1f80);
-cpu_register_physical_memory_offset(0xff80, 0x1000,
-sh7750_io_memory, 0x1f80);
-cpu_register_physical_memory_offset(0x1fc0, 0x1000,
-sh7750_io_memory, 0x1fc0);
-cpu_register_physical_memory_offset(0xffc0, 0x1000,
-sh7750_io_memory, 0x1fc0);
+memory_region_init_io(&s->iomem, &sh7750_mem_ops, s,
+  "memory", 0x1fc01000);
+
+memory_region_init_alias(&s->iomem_1f0, "memory-1f0",
+ &s->iomem, 0x1f00, 0x1000);
+memory_region_add_subregion(sysmem, 0x1f00, &s->iomem_1f0);
+
+memory_region_init_alias(&s->iomem_ff0, "memory-ff0",
+ &s->iomem, 0x1f00, 0x1000);
+memory_region_add_subregion(sysmem, 0xff00, &s->iomem_ff0);
+
+memory_region_init_alias(&s->iomem_1f8, "memory-1f8",
+ &s->iomem, 0x1f80, 0x1000);
+memory_region_add_subregion(sysmem, 0x1f80, &s->iomem_1f8);
+
+memory_region_init_alias(&s->iomem_ff8, "memory-ff8",
+ &s->iomem, 0x1f80, 0x1000);
+memory_region_add_subregion(sysmem, 0xff80, &s->iomem_ff8);
+
+memory_region_init_alias(&s->iomem_1fc, "memory-1fc",
+ &s->iomem, 0x1fc0, 0

[Qemu-devel] [[PATCH V2] 2/5] sh7750: convert cache and tlb to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh7750.c |   43 ++-
 1 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/hw/sh7750.c b/hw/sh7750.c
index 3bf568d..6ad76df 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -42,6 +42,7 @@ typedef struct SH7750State {
 MemoryRegion iomem_ff8;
 MemoryRegion iomem_1fc;
 MemoryRegion iomem_ffc;
+MemoryRegion mmct_iomem;
 /* CPU */
 CPUSH4State *cpu;
 /* Peripheral frequency in Hz */
@@ -623,18 +624,23 @@ static struct intc_group groups_irl[] = {
 #define MM_UTLB_DATA (7)
 #define MM_REGION_TYPE(addr)  ((addr & MM_REGION_MASK) >> 24)
 
-static uint32_t invalid_read(void *opaque, target_phys_addr_t addr)
+static uint64_t invalid_read(void *opaque, target_phys_addr_t addr)
 {
 abort();
 
 return 0;
 }
 
-static uint32_t sh7750_mmct_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t sh7750_mmct_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
 {
 SH7750State *s = opaque;
 uint32_t ret = 0;
 
+if (size != 4) {
+return invalid_read(opaque, addr);
+}
+
 switch (MM_REGION_TYPE(addr)) {
 case MM_ICACHE_ADDR:
 case MM_ICACHE_DATA:
@@ -664,16 +670,20 @@ static uint32_t sh7750_mmct_readl(void *opaque, 
target_phys_addr_t addr)
 }
 
 static void invalid_write(void *opaque, target_phys_addr_t addr,
- uint32_t mem_value)
+  uint64_t mem_value)
 {
 abort();
 }
 
-static void sh7750_mmct_writel(void *opaque, target_phys_addr_t addr,
-   uint32_t mem_value)
+static void sh7750_mmct_write(void *opaque, target_phys_addr_t addr,
+  uint64_t mem_value, unsigned size)
 {
 SH7750State *s = opaque;
 
+if (size != 4) {
+invalid_write(opaque, addr, mem_value);
+}
+
 switch (MM_REGION_TYPE(addr)) {
 case MM_ICACHE_ADDR:
 case MM_ICACHE_DATA:
@@ -702,22 +712,15 @@ static void sh7750_mmct_writel(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static CPUReadMemoryFunc * const sh7750_mmct_read[] = {
-invalid_read,
-invalid_read,
-sh7750_mmct_readl
-};
-
-static CPUWriteMemoryFunc * const sh7750_mmct_write[] = {
-invalid_write,
-invalid_write,
-sh7750_mmct_writel
+static const struct MemoryRegionOps sh7750_mmct_ops = {
+.read = sh7750_mmct_read,
+.write = sh7750_mmct_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion *sysmem)
 {
 SH7750State *s;
-int sh7750_mm_cache_and_tlb; /* memory mapped cache and tlb */
 
 s = g_malloc0(sizeof(SH7750State));
 s->cpu = cpu;
@@ -749,11 +752,9 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
  &s->iomem, 0x1fc0, 0x1000);
 memory_region_add_subregion(sysmem, 0xffc0, &s->iomem_ffc);
 
-sh7750_mm_cache_and_tlb = cpu_register_io_memory(sh7750_mmct_read,
-sh7750_mmct_write, s,
- DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(0xf000, 0x0800,
-sh7750_mm_cache_and_tlb);
+memory_region_init_io(&s->mmct_iomem, &sh7750_mmct_ops, s,
+  "cache-and-tlb", 0x0800);
+memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
 
 sh_intc_init(&s->intc, NR_SOURCES,
 _INTC_ARRAY(mask_registers),
-- 
1.7.5.4

[Qemu-devel] [[PATCH V2] 4/5] sh_intc: convert interrupt controller to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh7750.c |2 +-
 hw/sh_intc.c|   85 ++-
 hw/sh_intc.h|7 +++-
 target-sh4/helper.c |3 ++
 4 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/hw/sh7750.c b/hw/sh7750.c
index c659756..20ac605 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -756,7 +756,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
   "cache-and-tlb", 0x0800);
 memory_region_add_subregion(sysmem, 0xf000, &s->mmct_iomem);
 
-sh_intc_init(&s->intc, NR_SOURCES,
+sh_intc_init(sysmem, &s->intc, NR_SOURCES,
 _INTC_ARRAY(mask_registers),
 _INTC_ARRAY(prio_registers));
 
diff --git a/hw/sh_intc.c b/hw/sh_intc.c
index e07424f..b8ad2de 100644
--- a/hw/sh_intc.c
+++ b/hw/sh_intc.c
@@ -219,7 +219,8 @@ static void sh_intc_toggle_mask(struct intc_desc *desc, 
intc_enum id,
 #endif
 }
 
-static uint32_t sh_intc_read(void *opaque, target_phys_addr_t offset)
+static uint64_t sh_intc_read(void *opaque, target_phys_addr_t offset,
+ unsigned size)
 {
 struct intc_desc *desc = opaque;
 intc_enum *enum_ids = NULL;
@@ -238,7 +239,7 @@ static uint32_t sh_intc_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void sh_intc_write(void *opaque, target_phys_addr_t offset,
- uint32_t value)
+  uint64_t value, unsigned size)
 {
 struct intc_desc *desc = opaque;
 intc_enum *enum_ids = NULL;
@@ -282,16 +283,10 @@ static void sh_intc_write(void *opaque, 
target_phys_addr_t offset,
 #endif
 }
 
-static CPUReadMemoryFunc * const sh_intc_readfn[] = {
-sh_intc_read,
-sh_intc_read,
-sh_intc_read
-};
-
-static CPUWriteMemoryFunc * const sh_intc_writefn[] = {
-sh_intc_write,
-sh_intc_write,
-sh_intc_write
+static const struct MemoryRegionOps sh_intc_ops = {
+.read = sh_intc_read,
+.write = sh_intc_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 struct intc_source *sh_intc_source(struct intc_desc *desc, intc_enum id)
@@ -302,15 +297,36 @@ struct intc_source *sh_intc_source(struct intc_desc 
*desc, intc_enum id)
 return NULL;
 }
 
-static void sh_intc_register(struct intc_desc *desc, 
-unsigned long address)
+static unsigned int sh_intc_register(MemoryRegion *sysmem,
+ struct intc_desc *desc,
+ const unsigned long address,
+ const char *type,
+ const char *action,
+ const unsigned int index)
 {
-if (address) {
-cpu_register_physical_memory_offset(P4ADDR(address), 4,
-desc->iomemtype, INTC_A7(address));
-cpu_register_physical_memory_offset(A7ADDR(address), 4,
-desc->iomemtype, INTC_A7(address));
+char name[60];
+MemoryRegion *iomem, *iomem_p4, *iomem_a7;
+
+if (!address) {
+return 0;
 }
+
+iomem = &desc->iomem;
+iomem_p4 = desc->iomem_aliases + index;
+iomem_a7 = iomem_p4 + 1;
+
+#define SH_INTC_IOMEM_FORMAT "interrupt-controller-%s-%s-%s"
+snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "p4");
+memory_region_init_alias(iomem_p4, name, iomem, INTC_A7(address), 4);
+memory_region_add_subregion(sysmem, P4ADDR(address), iomem_p4);
+
+snprintf(name, sizeof(name), SH_INTC_IOMEM_FORMAT, type, action, "a7");
+memory_region_init_alias(iomem_a7, name, iomem, INTC_A7(address), 4);
+memory_region_add_subregion(sysmem, A7ADDR(address), iomem_a7);
+#undef SH_INTC_IOMEM_FORMAT
+
+/* used to increment aliases index */
+return 2;
 }
 
 static void sh_intc_register_source(struct intc_desc *desc,
@@ -415,14 +431,15 @@ void sh_intc_register_sources(struct intc_desc *desc,
 }
 }
 
-int sh_intc_init(struct intc_desc *desc,
+int sh_intc_init(MemoryRegion *sysmem,
+ struct intc_desc *desc,
 int nr_sources,
 struct intc_mask_reg *mask_regs,
 int nr_mask_regs,
 struct intc_prio_reg *prio_regs,
 int nr_prio_regs)
 {
-unsigned int i;
+unsigned int i, j;
 
 desc->pending = 0;
 desc->nr_sources = nr_sources;
@@ -430,7 +447,12 @@ int sh_intc_init(struct intc_desc *desc,
 desc->nr_mask_regs = nr_mask_regs;
 desc->prio_regs = prio_regs;
 desc->nr_prio_regs = nr_prio_regs;
+/* Allocate 4 MemoryRegions per register (2 actions * 2 aliases).
+ **/
+desc->iomem_aliases = g_new0(MemoryRegion,
+ (nr_mask_regs + nr_prio_regs) * 4);
 
+j = 0;
 i = sizeof(struct intc_source) * nr_sources;
 desc->sources = g_malloc0(i);
 
@@ -442,15 +464,19 @@ int sh_intc_init(struct intc_desc *desc,
 
 desc->irqs = qemu_allocate_irqs(sh_intc_s

Re: [Qemu-devel] [PATCH 1/4] Makefile: remove more generated files on clean

2011-11-17 Thread Michael S. Tsirkin

On Thu, Nov 17, 2011 at 11:21:01AM -0200, Luiz Capitulino wrote:
> On Wed, 16 Nov 2011 23:58:46 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > make clean missed the source qmp files generated
> > by python. Fix that.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Michael, this series is for 1.0, right?

Yes, I think so.

> > ---
> >  Makefile |2 ++
> >  1 files changed, 2 insertions(+), 0 deletions(-)
> > 
> > diff --git a/Makefile b/Makefile
> > index 168093c..b335f2a 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -8,6 +8,7 @@ ifeq ($(TRACE_BACKEND),dtrace)
> >  GENERATED_HEADERS += trace-dtrace.h
> >  endif
> >  GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h
> > +GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c
> >  
> >  ifneq ($(wildcard config-host.mak),)
> >  # Put the all: rule here so that config-host.mak can contain dependencies.
> > @@ -227,6 +228,7 @@ clean:
> > rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp
> > rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp
> > rm -f trace-dtrace.h trace-dtrace.h-timestamp
> > +   rm -f $(GENERATED_SOURCES)
> > rm -rf $(qapi-dir)
> > $(MAKE) -C tests clean
> > for d in $(ALL_SUBDIRS) $(QEMULIBS) libcacard; do \

[Qemu-devel] [PATCH v2 1/8] qemu-common: add QEMU_ALIGN_DOWN() and QEMU_ALIGN_UP() macros

2011-11-17 Thread Stefan Hajnoczi

Add macros for aligning a number to a multiple, for example:

QEMU_ALIGN_DOWN(500, 2000) = 0
QEMU_ALIGN_UP(500, 2000) = 2000

Since ALIGN_UP() is a common macro name use the QEMU_* namespace prefix.
Hopefully this will protect us from included headers that leak something
with a similar name.

Signed-off-by: Stefan Hajnoczi 
---
 qemu-common.h |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/qemu-common.h b/qemu-common.h
index 2ce47aa..44870fe 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -341,6 +341,12 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 return res.ll;
 }
 
+/* Round number down to multiple */
+#define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
+
+/* Round number up to multiple */
+#define QEMU_ALIGN_UP(n, m) QEMU_ALIGN_DOWN((n) + (m) - 1, (m))
+
 #include "module.h"
 
 #endif
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 7/8] block: core copy-on-read logic

2011-11-17 Thread Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi 
---
 block.c  |   72 ++
 trace-events |1 +
 2 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0eef122..d5faa6c 100644
--- a/block.c
+++ b/block.c
@@ -1464,6 +1464,61 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t 
offset,
 return 0;
 }
 
+static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+{
+/* Perform I/O through a temporary buffer so that users who scribble over
+ * their read buffer while the operation is in progress do not end up
+ * modifying the image file.  This is critical for zero-copy guest I/O
+ * where anything might happen inside guest memory.
+ */
+void *bounce_buffer;
+
+struct iovec iov;
+QEMUIOVector bounce_qiov;
+int64_t cluster_sector_num;
+int cluster_nb_sectors;
+size_t skip_bytes;
+int ret;
+
+/* Cover entire cluster so no additional backing file I/O is required when
+ * allocating cluster in the image file.
+ */
+round_to_clusters(bs, sector_num, nb_sectors,
+  &cluster_sector_num, &cluster_nb_sectors);
+
+trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
+cluster_sector_num, cluster_nb_sectors);
+
+iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
+iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+
+ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
+ &bounce_qiov);
+if (ret < 0) {
+goto err;
+}
+
+ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
+  &bounce_qiov);
+if (ret < 0) {
+/* It might be okay to ignore write errors for guest requests.  If this
+ * is a deliberate copy-on-read then we don't want to ignore the error.
+ * Simply report it in all cases.
+ */
+goto err;
+}
+
+skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
+qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
+   nb_sectors * BDRV_SECTOR_SIZE);
+
+err:
+qemu_vfree(bounce_buffer);
+return ret;
+}
+
 /*
  * Handle a read request in coroutine context
  */
@@ -1491,7 +1546,24 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 }
 
 tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
+
+if (bs->copy_on_read) {
+int pnum;
+
+ret = bdrv_co_is_allocated(bs, sector_num, nb_sectors, &pnum);
+if (ret < 0) {
+goto out;
+}
+
+if (!ret || pnum != nb_sectors) {
+ret = bdrv_co_copy_on_readv(bs, sector_num, nb_sectors, qiov);
+goto out;
+}
+}
+
 ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+
+out:
 tracked_request_end(&req);
 return ret;
 }
diff --git a/trace-events b/trace-events
index 962caca..518b76b 100644
--- a/trace-events
+++ b/trace-events
@@ -69,6 +69,7 @@ bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num 
%"PRId64" nb_sectors %d"
 bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void 
*acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
+bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t 
cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" 
nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
 
 # hw/virtio-blk.c
 virtio_blk_req_complete(void *req, int status) "req %p status %d"
-- 
1.7.7.1

Re: [Qemu-devel] [PATCH V2] Add -f option to qemu-nbd

2011-11-17 Thread Paolo Bonzini


On 11/17/2011 12:36 PM, Chunyan Liu wrote:


Adding lock to the nbd device before connecting disk image to that device to
handling race conditions.


This removes the possibility for other programs to lock.  Have you 
checked what happens if you use the same device twice and whether you 
can piggyback on e.g. an EBUSY from the NBD_SET_SOCK ioctl?


Paolo

Re: [Qemu-devel] [PATCH v2 5/8] block: wait for overlapping requests

2011-11-17 Thread Paolo Bonzini


On 11/17/2011 02:40 PM, Stefan Hajnoczi wrote:

When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.


What about discards?

Paolo

Re: [Qemu-devel] [PATCH v2 5/8] block: wait for overlapping requests

2011-11-17 Thread Stefan Hajnoczi

On Thu, Nov 17, 2011 at 1:43 PM, Paolo Bonzini  wrote:
> On 11/17/2011 02:40 PM, Stefan Hajnoczi wrote:
>>
>> When copy-on-read is enabled it is necessary to wait for overlapping
>> requests before issuing new requests.  This prevents races between the
>> copy-on-read and a write request.
>
> What about discards?

To get into an interesting scenario the guest would need to issue
overlapping read and discard requests.  QEMU with copy-on-read turns
this into either:

discard, read-from-backing-file, write-to-image-file
read-from-backing-file, discard, write-to-image-file
read-from-backing-file, write-to-image-file, discard

There is no issue with any of these orderings.  In the worst case we
end up with allocated image space where the guest issued a discard.
But since discard is a hint anyway it doesn't matter.

Stefan

Re: [Qemu-devel] [PATCH V2] Add -f option to qemu-nbd

2011-11-17 Thread Stefan Hajnoczi

On Thu, Nov 17, 2011 at 1:41 PM, Paolo Bonzini  wrote:
> On 11/17/2011 12:36 PM, Chunyan Liu wrote:
>>
>> Adding lock to the nbd device before connecting disk image to that device
>> to
>> handling race conditions.
>
> This removes the possibility for other programs to lock.  Have you checked
> what happens if you use the same device twice and whether you can piggyback
> on e.g. an EBUSY from the NBD_SET_SOCK ioctl?

Yes, I just sent an email showing how we can use NBD_SET_LOCK for -EBUSY.

Stefan

[Qemu-devel] [Bug 891625] Re: [qemu-kvm] add vhost-net to kvm group udev rules 65-kvm.rules

2011-11-17 Thread Alon Bar-Lev

** Patch added: "udef.diff"
   https://bugs.launchpad.net/bugs/891625/+attachment/2599656/+files/udev.diff

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/891625

Title:
  [qemu-kvm] add vhost-net to kvm group udev rules 65-kvm.rules

Status in QEMU:
  New

Bug description:
  Please consider authorizing the kvm group to access vhost-net device, similar 
to the kvm device.
  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/891625/+subscriptions

[Qemu-devel] [Bug 891625] [NEW] [qemu-kvm] add vhost-net to kvm group udev rules 65-kvm.rules

2011-11-17 Thread Alon Bar-Lev

Public bug reported:

Please consider authorizing the kvm group to access vhost-net device, similar 
to the kvm device.
Thanks!

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/891625

Title:
  [qemu-kvm] add vhost-net to kvm group udev rules 65-kvm.rules

Status in QEMU:
  New

Bug description:
  Please consider authorizing the kvm group to access vhost-net device, similar 
to the kvm device.
  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/891625/+subscriptions

[Qemu-devel] Host networking hang-ups in a qemu cluster

2011-11-17 Thread Kfir Lavi

Hi,
I have a cluster of similar qemu machines, for doing some simulation.
I'm using e1000 as network driver.
When loading few machines, we encounter hangs of networking inside a
virtual machine, and
it will hang all the interfaces inside the machine.
If I'll do down/up to all interfaces, the networking will resume.
Reboot of the vm will solve the problem the same way.
Please not that there are some bursts of packets, but really not a lot.
To mitigate burst problems, I connected each tap to a bridge, and it seems
to
have some effect. (each tap has different vlan, so it gets only its
traffic).
I tried to use ethtool rx tx as 4096 in the virtual machines.
Virtio cause more problems then e1000.
What is your take on it, where should I check? Is there any way to debug
the
buffers of qemu?
Each vm has 6 interfaces connected to tap device on the host.
Is it possible that qemu networking buffer is shared between all the vm
interfaces?
A software we are running inside the virtual machines opens a raw socket
(read, write) to intercept packages.
Is it possible that this raw socket can hang all the interfaces in the
machine?

The virtual machines are running Gentoo with 2.6.28 kernel (Embedded), and
the qemu host running
Gentoo with kernel 3.1.1 with nested kvm and unstable (recent) qemu.

Can it be the mismatch of the 2.6.28 in the vm, and the recent qemu and
latest kvm from 3.1.1?

Thanks for your help,
Kfir

[Qemu-devel] [PATCH v2 6/8] block: request overlap detection

2011-11-17 Thread Stefan Hajnoczi

Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi 
---
 block.c |   40 ++--
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index da7aaa2..0eef122 100644
--- a/block.c
+++ b/block.c
@@ -1133,21 +1133,57 @@ static void tracked_request_begin(BdrvTrackedRequest 
*req,
 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 }
 
+/**
+ * Round a region to cluster boundaries
+ */
+static void round_to_clusters(BlockDriverState *bs,
+  int64_t sector_num, int nb_sectors,
+  int64_t *cluster_sector_num,
+  int *cluster_nb_sectors)
+{
+BlockDriverInfo bdi;
+
+if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+*cluster_sector_num = sector_num;
+*cluster_nb_sectors = nb_sectors;
+} else {
+int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
+*cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
+*cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
+nb_sectors, c);
+}
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
  int64_t sector_num, int nb_sectors) {
-return false; /* not yet implemented */
+/*    */
+if (sector_num >= req->sector_num + req->nb_sectors) {
+return false;
+}
+/*    */
+if (req->sector_num >= sector_num + nb_sectors) {
+return false;
+}
+return true;
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors)
 {
 BdrvTrackedRequest *req;
+int64_t cluster_sector_num;
+int cluster_nb_sectors;
 bool retry;
 
+/* If we touch the same cluster it counts as an overlap */
+round_to_clusters(bs, sector_num, nb_sectors,
+  &cluster_sector_num, &cluster_nb_sectors);
+
 do {
 retry = false;
 QLIST_FOREACH(req, &bs->tracked_requests, list) {
-if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
+if (tracked_request_overlaps(req, cluster_sector_num,
+ cluster_nb_sectors)) {
 qemu_co_queue_wait(&req->wait_queue);
 retry = true;
 break;
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 5/8] block: wait for overlapping requests

2011-11-17 Thread Stefan Hajnoczi

When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.

Signed-off-by: Stefan Hajnoczi 
---
 block.c |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index c90880b..da7aaa2 100644
--- a/block.c
+++ b/block.c
@@ -1099,6 +1099,7 @@ struct BdrvTrackedRequest {
 int nb_sectors;
 bool is_write;
 QLIST_ENTRY(BdrvTrackedRequest) list;
+CoQueue wait_queue; /* coroutines blocked on this request */
 };
 
 /**
@@ -1109,6 +1110,7 @@ struct BdrvTrackedRequest {
 static void tracked_request_end(BdrvTrackedRequest *req)
 {
 QLIST_REMOVE(req, list);
+qemu_co_queue_restart_all(&req->wait_queue);
 }
 
 /**
@@ -1126,9 +1128,34 @@ static void tracked_request_begin(BdrvTrackedRequest 
*req,
 .is_write = is_write,
 };
 
+qemu_co_queue_init(&req->wait_queue);
+
 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 }
 
+static bool tracked_request_overlaps(BdrvTrackedRequest *req,
+ int64_t sector_num, int nb_sectors) {
+return false; /* not yet implemented */
+}
+
+static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
+int64_t sector_num, int nb_sectors)
+{
+BdrvTrackedRequest *req;
+bool retry;
+
+do {
+retry = false;
+QLIST_FOREACH(req, &bs->tracked_requests, list) {
+if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
+qemu_co_queue_wait(&req->wait_queue);
+retry = true;
+break;
+}
+}
+} while (retry);
+}
+
 /*
  * Return values:
  * 0- success
@@ -1423,6 +1450,10 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 bdrv_io_limits_intercept(bs, false, nb_sectors);
 }
 
+if (bs->copy_on_read) {
+wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+}
+
 tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
 ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
 tracked_request_end(&req);
@@ -1462,6 +1493,10 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 bdrv_io_limits_intercept(bs, true, nb_sectors);
 }
 
+if (bs->copy_on_read) {
+wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+}
+
 tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
 ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 0/8] block: generic copy-on-read

2011-11-17 Thread Stefan Hajnoczi

The new -drive copy-on-read=on|off feature populates the image file with data
from the backing file on read.  This is useful when accessing images backed
over a slow medium (e.g. http over internet).  All read data will be stored in
the local image file so it does not need to be fetched again in the future.

This series is a prerequisite for the image streaming feature, which uses
copy-on-read to populate the image file in the background while the VM is
running.  However, the copy-on-read feature is useful on its own.

Copy-on-read is implemented by checking whether or not data is allocated in the
image file before reading it.  If data is not allocated then it needs to be
read and written back to the image file.

The tricky bit is avoiding races with other I/O requests.  These patches add
request tracking to BlockDriverState so that the list of pending requests is
available.  Copy-on-read prevents races by serializing overlapping requests.

Finally, there is a performance impact when enabling this feature since an
additional write is performed.  Serializing overlapping requests also means
that I/O patterns where multiple requests access the same cluster will see a
loss in parallelism.  Perhaps we can be smarter about preventing corruption in
the future and win back some performance.

v2:
 * Based on bdrv_co_is_allocated patch series - now safe in coroutine context
 * Use QEMU_ALIGN_DOWN/UP() macros for copy-on-read cluster calculations [Zhi 
Yong]
 * Reset bs->copy_on_read on bdrv_close() [Kevin]
 * Refcount bs->copy_on_read so it doesn't get clobbered by multiple users 
[Marcelo]
 * Use bool instead of int where appropriate [Kevin]
 * Use compound literal assignment to ensure BdrvTrackedRequest fields always 
get zeroed [Kevin]
 * Comment rationale for copy-on-read bounce buffer [Kevin]

Stefan Hajnoczi (8):
  qemu-common: add QEMU_ALIGN_DOWN() and QEMU_ALIGN_UP() macros
  coroutine: add qemu_co_queue_restart_all()
  block: add request tracking
  block: add bdrv_set_copy_on_read()
  block: wait for overlapping requests
  block: request overlap detection
  block: core copy-on-read logic
  block: add -drive copy-on-read=on|off

 block.c   |  213 -
 block.h   |3 +
 block/qcow2.c |2 +-
 block_int.h   |6 ++
 blockdev.c|6 ++
 hmp-commands.hx   |5 +-
 qemu-common.h |6 ++
 qemu-config.c |4 +
 qemu-coroutine-lock.c |   15 ++--
 qemu-coroutine.h  |5 +
 qemu-options.hx   |9 ++-
 trace-events  |1 +
 12 files changed, 263 insertions(+), 12 deletions(-)

-- 
1.7.7.1

[Qemu-devel] [PATCH v2 4/8] block: add bdrv_set_copy_on_read()

2011-11-17 Thread Stefan Hajnoczi

The bdrv_set_copy_on_read() function can be used to programmatically
enable or disable copy-on-read for a block device.  Later patches add
the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi 
---
 block.c |   22 ++
 block.h |3 +++
 block_int.h |2 ++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 27c4e84..c90880b 100644
--- a/block.c
+++ b/block.c
@@ -538,6 +538,22 @@ int bdrv_parse_cache_flags(const char *mode, int *flags)
 return 0;
 }
 
+/**
+ * Enable/disable copy-on-read
+ *
+ * This is based on a reference count so multiple users may call this function
+ * without worrying about clobbering the previous state.  Copy-on-read stays
+ * enabled until all users have called to disable it.
+ */
+void bdrv_set_copy_on_read(BlockDriverState *bs, bool enable)
+{
+if (enable) {
+bs->copy_on_read++;
+} else {
+bs->copy_on_read--;
+}
+}
+
 /*
  * Common part for opening disk images and files
  */
@@ -559,6 +575,11 @@ static int bdrv_open_common(BlockDriverState *bs, const 
char *filename,
 bs->growable = 0;
 bs->buffer_alignment = 512;
 
+assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
+if (flags & BDRV_O_RDWR) {
+bdrv_set_copy_on_read(bs, !!(flags & BDRV_O_COPY_ON_READ));
+}
+
 pstrcpy(bs->filename, sizeof(bs->filename), filename);
 bs->backing_file[0] = '\0';
 
@@ -801,6 +822,7 @@ void bdrv_close(BlockDriverState *bs)
 #endif
 bs->opaque = NULL;
 bs->drv = NULL;
+bs->copy_on_read = 0;
 
 if (bs->file != NULL) {
 bdrv_close(bs->file);
diff --git a/block.h b/block.h
index ad8dd48..68b4b14 100644
--- a/block.h
+++ b/block.h
@@ -70,6 +70,7 @@ typedef struct BlockDevOps {
 #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool 
*/
 #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
 #define BDRV_O_NO_FLUSH0x0200 /* disable flushing on this disk */
+#define BDRV_O_COPY_ON_READ 0x0400 /* copy read backing sectors into image */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
 
@@ -308,6 +309,8 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t 
cur_sector,
   int nr_sectors);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
+void bdrv_set_copy_on_read(BlockDriverState *bs, bool enable);
+
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
diff --git a/block_int.h b/block_int.h
index 788fde9..3c5bacb 100644
--- a/block_int.h
+++ b/block_int.h
@@ -193,6 +193,8 @@ struct BlockDriverState {
 int encrypted; /* if true, the media is encrypted */
 int valid_key; /* if true, a valid encryption key has been set */
 int sg;/* if true, the device is a /dev/sg* */
+int copy_on_read; /* if true, copy read backing sectors into image
+ note this is a reference count */
 
 BlockDriver *drv; /* NULL means no media */
 void *opaque;
-- 
1.7.7.1

Re: [Qemu-devel] [v9 Patch 4/6]Qemu: Add commandline -drive option 'hostcache'

2011-11-17 Thread Stefan Hajnoczi

On Thu, Nov 17, 2011 at 5:18 AM, Supriya Kannery
 wrote:
> On 11/17/2011 01:36 AM, Stefan Hajnoczi wrote:
>>
>> On Fri, Nov 11, 2011 at 6:48 AM, Supriya Kannery
>>   wrote:
>>>
>>> +        if ((hostcache = qemu_opt_get_bool(opts, "hostcache", -1)) !=
>>> -1) {
>>
>> This does not work.  qemu_opt_get_bool() takes a bool default argument
>> and returns a bool.  (bool)-1 == true.  But (int)true == 1 and you
>> cannot expect it to ever equal -1.
>>
>> Try this:
>>
>> if (qemu_opt_get(opts, "hostcache")&&
>>     !qemu_opt_get_bool(opts, "hostcache", false)) {
>>     bdrv_flags |= BDRV_O_NOCACHE;
>> }
>>
>> Stefan
>>
>
> Thanks! for pointing this.
> Does the following look ok?
>
>  if ((hostcache = qemu_opt_get_bool(opts, "hostcache", 1) == 0) {
>     bdrv_flags |= BDRV_O_NOCACHE;
>  }
>
> If either "hostcache" is not at all specified or it is specified
> as "on", qemu_opt_get_bool will return 1, which can be ignored
> as bdrv_flags is initialized to 0.

It depends on the overall way this should work.  I think this captures
all the cases:

1. cache= and hostcache= may not be used together.
2. cache= sets bdrv_flags.
3. hostcache= may |= BDRV_O_NOCACHE.
4. No option defaults to cache=writethrough (bdrv_flags &= ~BDRV_O_CACHE_MASK).

The code you posted will work although I find it a bit weird how it
also includes case #4.  IMO it's cleanest to just do case #3 by
testing whether or not the hostcache= option is set.

BTW, is there a check for case #1 in your patch series.  I thought I
saw one earlier but now I can't find it.

Stefan

[Qemu-devel] [PATCH v2 2/8] coroutine: add qemu_co_queue_restart_all()

2011-11-17 Thread Stefan Hajnoczi

It's common to wake up all waiting coroutines.  Introduce the
qemu_co_queue_restart_all() function to do this instead of looping over
qemu_co_queue_next() in every caller.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c |2 +-
 qemu-coroutine-lock.c |   15 ---
 qemu-coroutine.h  |5 +
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index eab35d1..195e1b1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -514,7 +514,7 @@ static void run_dependent_requests(BDRVQcowState *s, 
QCowL2Meta *m)
 /* Restart all dependent requests */
 if (!qemu_co_queue_empty(&m->dependent_requests)) {
 qemu_co_mutex_unlock(&s->lock);
-while(qemu_co_queue_next(&m->dependent_requests));
+qemu_co_queue_restart_all(&m->dependent_requests);
 qemu_co_mutex_lock(&s->lock);
 }
 }
diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
index 9549c07..26ad76b 100644
--- a/qemu-coroutine-lock.c
+++ b/qemu-coroutine-lock.c
@@ -84,6 +84,13 @@ bool qemu_co_queue_next(CoQueue *queue)
 return (next != NULL);
 }
 
+void qemu_co_queue_restart_all(CoQueue *queue)
+{
+while (qemu_co_queue_next(queue)) {
+/* Do nothing */
+}
+}
+
 bool qemu_co_queue_empty(CoQueue *queue)
 {
 return (QTAILQ_FIRST(&queue->entries) == NULL);
@@ -144,13 +151,7 @@ void qemu_co_rwlock_unlock(CoRwlock *lock)
 assert(qemu_in_coroutine());
 if (lock->writer) {
 lock->writer = false;
-while (!qemu_co_queue_empty(&lock->queue)) {
-/*
- * Wakeup every body. This will include some
- * writers too.
- */
-qemu_co_queue_next(&lock->queue);
-}
+qemu_co_queue_restart_all(&lock->queue);
 } else {
 lock->reader--;
 assert(lock->reader >= 0);
diff --git a/qemu-coroutine.h b/qemu-coroutine.h
index 8a2e5d2..8a55fe1 100644
--- a/qemu-coroutine.h
+++ b/qemu-coroutine.h
@@ -131,6 +131,11 @@ void coroutine_fn qemu_co_queue_wait_insert_head(CoQueue 
*queue);
 bool qemu_co_queue_next(CoQueue *queue);
 
 /**
+ * Restarts all coroutines in the CoQueue and leaves the queue empty.
+ */
+void qemu_co_queue_restart_all(CoQueue *queue);
+
+/**
  * Checks if the CoQueue is empty.
  */
 bool qemu_co_queue_empty(CoQueue *queue);
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 3/8] block: add request tracking

2011-11-17 Thread Stefan Hajnoczi

The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.

Signed-off-by: Stefan Hajnoczi 
---
 block.c |   48 +++-
 block_int.h |4 
 2 files changed, 51 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 0df7eb9..27c4e84 100644
--- a/block.c
+++ b/block.c
@@ -1071,6 +1071,42 @@ void bdrv_commit_all(void)
 }
 }
 
+struct BdrvTrackedRequest {
+BlockDriverState *bs;
+int64_t sector_num;
+int nb_sectors;
+bool is_write;
+QLIST_ENTRY(BdrvTrackedRequest) list;
+};
+
+/**
+ * Remove an active request from the tracked requests list
+ *
+ * This function should be called when a tracked request is completing.
+ */
+static void tracked_request_end(BdrvTrackedRequest *req)
+{
+QLIST_REMOVE(req, list);
+}
+
+/**
+ * Add an active request to the tracked requests list
+ */
+static void tracked_request_begin(BdrvTrackedRequest *req,
+  BlockDriverState *bs,
+  int64_t sector_num,
+  int nb_sectors, bool is_write)
+{
+*req = (BdrvTrackedRequest){
+.bs = bs,
+.sector_num = sector_num,
+.nb_sectors = nb_sectors,
+.is_write = is_write,
+};
+
+QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
+}
+
 /*
  * Return values:
  * 0- success
@@ -1350,6 +1386,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState 
*bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
 BlockDriver *drv = bs->drv;
+BdrvTrackedRequest req;
+int ret;
 
 if (!drv) {
 return -ENOMEDIUM;
@@ -1363,7 +1401,10 @@ static int coroutine_fn 
bdrv_co_do_readv(BlockDriverState *bs,
 bdrv_io_limits_intercept(bs, false, nb_sectors);
 }
 
-return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
+ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+tracked_request_end(&req);
+return ret;
 }
 
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -1381,6 +1422,7 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
 BlockDriver *drv = bs->drv;
+BdrvTrackedRequest req;
 int ret;
 
 if (!bs->drv) {
@@ -1398,6 +1440,8 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 bdrv_io_limits_intercept(bs, true, nb_sectors);
 }
 
+tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
+
 ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 
 if (bs->dirty_bitmap) {
@@ -1408,6 +1452,8 @@ static int coroutine_fn 
bdrv_co_do_writev(BlockDriverState *bs,
 bs->wr_highest_sector = sector_num + nb_sectors - 1;
 }
 
+tracked_request_end(&req);
+
 return ret;
 }
 
diff --git a/block_int.h b/block_int.h
index f9e2c9a..788fde9 100644
--- a/block_int.h
+++ b/block_int.h
@@ -51,6 +51,8 @@
 #define BLOCK_OPT_PREALLOC  "preallocation"
 #define BLOCK_OPT_SUBFMT"subformat"
 
+typedef struct BdrvTrackedRequest BdrvTrackedRequest;
+
 typedef struct AIOPool {
 void (*cancel)(BlockDriverAIOCB *acb);
 int aiocb_size;
@@ -250,6 +252,8 @@ struct BlockDriverState {
 int in_use; /* users other than guest access, eg. block migration */
 QTAILQ_ENTRY(BlockDriverState) list;
 void *private;
+
+QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 };
 
 struct BlockDriverAIOCB {
-- 
1.7.7.1

[Qemu-devel] [PATCH v2 8/8] block: add -drive copy-on-read=on|off

2011-11-17 Thread Stefan Hajnoczi

This patch adds the -drive copy-on-read=on|off command-line option:

  copy-on-read=on|off
  copy-on-read is "on" or "off" and enables whether to copy read backing
  file sectors into the image file.  Copy-on-read avoids accessing the
  same backing file sectors repeatedly and is useful when the backing
  file is over a slow network.  By default copy-on-read is off.

Signed-off-by: Stefan Hajnoczi 
---
 blockdev.c  |6 ++
 hmp-commands.hx |5 +++--
 qemu-config.c   |4 
 qemu-options.hx |9 -
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 9068c5b..af4e239 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -257,6 +257,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 DriveInfo *dinfo;
 BlockIOLimit io_limits;
 int snapshot = 0;
+bool copy_on_read;
 int ret;
 
 translation = BIOS_ATA_TRANSLATION_AUTO;
@@ -273,6 +274,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
 snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
 ro = qemu_opt_get_bool(opts, "readonly", 0);
+copy_on_read = qemu_opt_get_bool(opts, "copy-on-read", false);
 
 file = qemu_opt_get(opts, "file");
 serial = qemu_opt_get(opts, "serial");
@@ -546,6 +548,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH);
 }
 
+if (copy_on_read) {
+bdrv_flags |= BDRV_O_COPY_ON_READ;
+}
+
 if (media == MEDIA_CDROM) {
 /* CDROM is fine for any interface, don't check.  */
 ro = 1;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index f8d855e..79a9195 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -860,9 +860,10 @@ ETEXI
 .args_type  = "pci_addr:s,opts:s",
 .params = "[[:]:]\n"
   "[file=file][,if=type][,bus=n]\n"
-  "[,unit=m][,media=d][index=i]\n"
+  "[,unit=m][,media=d][,index=i]\n"
   "[,cyls=c,heads=h,secs=s[,trans=t]]\n"
-  "[snapshot=on|off][,cache=on|off]",
+  "[,snapshot=on|off][,cache=on|off]\n"
+  "[,readonly=on|off][,copy-on-read=on|off]",
 .help   = "add drive to PCI storage controller",
 .mhandler.cmd = drive_hot_add,
 },
diff --git a/qemu-config.c b/qemu-config.c
index 1aa080f..18f3020 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -109,6 +109,10 @@ static QemuOptsList qemu_drive_opts = {
 .name = "bps_wr",
 .type = QEMU_OPT_NUMBER,
 .help = "limit write bytes per second",
+},{
+.name = "copy-on-read",
+.type = QEMU_OPT_BOOL,
+.help = "copy read data from backing file into image file",
 },
 { /* end of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 25a7be7..b3db10c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -135,7 +135,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
 "   [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
 "   
[,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
 "   [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
-"   [,readonly=on|off]\n"
+"   [,readonly=on|off][,copy-on-read=on|off]\n"
 "   
[[,bps=b]|[[,bps_rd=r][,bps_wr=w]]][[,iops=i]|[[,iops_rd=r][,iops_wr=w]]\n"
 "use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
@@ -187,6 +187,9 @@ host disk is full; report the error to the guest otherwise).
 The default setting is @option{werror=enospc} and @option{rerror=report}.
 @item readonly
 Open drive @option{file} as read-only. Guest write attempts will fail.
+@item copy-on-read=@var{copy-on-read}
+@var{copy-on-read} is "on" or "off" and enables whether to copy read backing
+file sectors into the image file.
 @end table
 
 By default, writethrough caching is used for all block device.  This means that
@@ -218,6 +221,10 @@ like your host losing power, the disk storage getting 
disconnected accidently,
 etc. you're image will most probably be rendered unusable.   When using
 the @option{-snapshot} option, unsafe caching is always used.
 
+Copy-on-read avoids accessing the same backing file sectors repeatedly and is
+useful when the backing file is over a slow network.  By default copy-on-read
+is off.
+
 Instead of @option{-cdrom} you can use:
 @example
 qemu -drive file=file,index=2,media=cdrom
-- 
1.7.7.1

[Qemu-devel] [[PATCH V2] 5/5] sh_serial: convert to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh.h|3 ++-
 hw/sh7750.c|   28 +++-
 hw/sh_serial.c |   55 ++-
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/hw/sh.h b/hw/sh.h
index c764be6..0e45d61 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -39,7 +39,8 @@ void tmu012_init(struct MemoryRegion *sysmem, 
target_phys_addr_t base,
 
 /* sh_serial.c */
 #define SH_SERIAL_FEAT_SCIF (1 << 0)
-void sh_serial_init (target_phys_addr_t base, int feat,
+void sh_serial_init(MemoryRegion *sysmem,
+ target_phys_addr_t base, int feat,
 uint32_t freq, CharDriverState *chr,
 qemu_irq eri_source,
 qemu_irq rxi_source,
diff --git a/hw/sh7750.c b/hw/sh7750.c
index 20ac605..4f4d8e7 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -766,19 +766,21 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
 
 cpu->intc_handle = &s->intc;
 
-sh_serial_init(0x1fe0, 0, s->periph_freq, serial_hds[0],
-  s->intc.irqs[SCI1_ERI],
-  s->intc.irqs[SCI1_RXI],
-  s->intc.irqs[SCI1_TXI],
-  s->intc.irqs[SCI1_TEI],
-  NULL);
-sh_serial_init(0x1fe8, SH_SERIAL_FEAT_SCIF,
-  s->periph_freq, serial_hds[1],
-  s->intc.irqs[SCIF_ERI],
-  s->intc.irqs[SCIF_RXI],
-  s->intc.irqs[SCIF_TXI],
-  NULL,
-  s->intc.irqs[SCIF_BRI]);
+sh_serial_init(sysmem, 0x1fe0,
+   0, s->periph_freq, serial_hds[0],
+   s->intc.irqs[SCI1_ERI],
+   s->intc.irqs[SCI1_RXI],
+   s->intc.irqs[SCI1_TXI],
+   s->intc.irqs[SCI1_TEI],
+   NULL);
+sh_serial_init(sysmem, 0x1fe8,
+   SH_SERIAL_FEAT_SCIF,
+   s->periph_freq, serial_hds[1],
+   s->intc.irqs[SCIF_ERI],
+   s->intc.irqs[SCIF_RXI],
+   s->intc.irqs[SCIF_TXI],
+   NULL,
+   s->intc.irqs[SCIF_BRI]);
 
 tmu012_init(sysmem, 0x1fd8,
TMU012_FEAT_TOCR | TMU012_FEAT_3CHAN | TMU012_FEAT_EXTCLK,
diff --git a/hw/sh_serial.c b/hw/sh_serial.c
index a20c59e..43b0eb1 100644
--- a/hw/sh_serial.c
+++ b/hw/sh_serial.c
@@ -27,6 +27,7 @@
 #include "hw.h"
 #include "sh.h"
 #include "qemu-char.h"
+#include "exec-memory.h"
 
 //#define DEBUG_SERIAL
 
@@ -39,6 +40,9 @@
 #define SH_RX_FIFO_LENGTH (16)
 
 typedef struct {
+MemoryRegion iomem;
+MemoryRegion iomem_p4;
+MemoryRegion iomem_a7;
 uint8_t smr;
 uint8_t brr;
 uint8_t scr;
@@ -74,7 +78,8 @@ static void sh_serial_clear_fifo(sh_serial_state * s)
 s->rx_tail = 0;
 }
 
-static void sh_serial_write(void *opaque, uint32_t offs, uint32_t val)
+static void sh_serial_write(void *opaque, target_phys_addr_t offs,
+uint64_t val, unsigned size)
 {
 sh_serial_state *s = opaque;
 unsigned char ch;
@@ -185,7 +190,8 @@ static void sh_serial_write(void *opaque, uint32_t offs, 
uint32_t val)
 abort();
 }
 
-static uint32_t sh_serial_read(void *opaque, uint32_t offs)
+static uint64_t sh_serial_read(void *opaque, target_phys_addr_t offs,
+   unsigned size)
 {
 sh_serial_state *s = opaque;
 uint32_t ret = ~0;
@@ -338,28 +344,22 @@ static void sh_serial_event(void *opaque, int event)
 sh_serial_receive_break(s);
 }
 
-static CPUReadMemoryFunc * const sh_serial_readfn[] = {
-&sh_serial_read,
-&sh_serial_read,
-&sh_serial_read,
+static const MemoryRegionOps sh_serial_ops = {
+.read = sh_serial_read,
+.write = sh_serial_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const sh_serial_writefn[] = {
-&sh_serial_write,
-&sh_serial_write,
-&sh_serial_write,
-};
-
-void sh_serial_init (target_phys_addr_t base, int feat,
-uint32_t freq, CharDriverState *chr,
-qemu_irq eri_source,
-qemu_irq rxi_source,
-qemu_irq txi_source,
-qemu_irq tei_source,
-qemu_irq bri_source)
+void sh_serial_init(MemoryRegion *sysmem,
+target_phys_addr_t base, int feat,
+uint32_t freq, CharDriverState *chr,
+qemu_irq eri_source,
+qemu_irq rxi_source,
+qemu_irq txi_source,
+qemu_irq tei_source,
+qemu_irq bri_source)
 {
 sh_serial_state *s;
-int s_io_memory;
 
 s = g_malloc0(sizeof(sh_serial_state));
 
@@ -381,11 +381,16 @@ void sh_serial_init (target_phys_addr_t base, int feat,
 
 sh_serial_clear_fifo(s);
 
-s_io_memory = cpu_register_io_memory(sh_serial_readfn,
-

[Qemu-devel] [[PATCH V2] 3/5] sh_timer: convert to memory API

2011-11-17 Thread Benoît Canet

Signed-off-by: Benoit Canet 
---
 hw/sh.h   |3 ++-
 hw/sh7750.c   |4 ++--
 hw/sh_timer.c |   43 ---
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/hw/sh.h b/hw/sh.h
index cf3f6f6..c764be6 100644
--- a/hw/sh.h
+++ b/hw/sh.h
@@ -31,7 +31,8 @@ int sh7750_register_io_device(struct SH7750State *s,
 #define TMU012_FEAT_TOCR   (1 << 0)
 #define TMU012_FEAT_3CHAN  (1 << 1)
 #define TMU012_FEAT_EXTCLK (1 << 2)
-void tmu012_init(target_phys_addr_t base, int feat, uint32_t freq,
+void tmu012_init(struct MemoryRegion *sysmem, target_phys_addr_t base,
+ int feat, uint32_t freq,
 qemu_irq ch0_irq, qemu_irq ch1_irq,
 qemu_irq ch2_irq0, qemu_irq ch2_irq1);
 
diff --git a/hw/sh7750.c b/hw/sh7750.c
index 6ad76df..c659756 100644
--- a/hw/sh7750.c
+++ b/hw/sh7750.c
@@ -780,7 +780,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
   NULL,
   s->intc.irqs[SCIF_BRI]);
 
-tmu012_init(0x1fd8,
+tmu012_init(sysmem, 0x1fd8,
TMU012_FEAT_TOCR | TMU012_FEAT_3CHAN | TMU012_FEAT_EXTCLK,
s->periph_freq,
s->intc.irqs[TMU0],
@@ -804,7 +804,7 @@ SH7750State *sh7750_init(CPUSH4State * cpu, MemoryRegion 
*sysmem)
 sh_intc_register_sources(&s->intc,
 _INTC_ARRAY(vectors_tmu34),
 NULL, 0);
-tmu012_init(0x1e10, 0, s->periph_freq,
+tmu012_init(sysmem, 0x1e10, 0, s->periph_freq,
s->intc.irqs[TMU3],
s->intc.irqs[TMU4],
NULL, NULL);
diff --git a/hw/sh_timer.c b/hw/sh_timer.c
index dca3c94..9132207 100644
--- a/hw/sh_timer.c
+++ b/hw/sh_timer.c
@@ -11,6 +11,7 @@
 #include "hw.h"
 #include "sh.h"
 #include "qemu-timer.h"
+#include "exec-memory.h"
 
 //#define DEBUG_TIMER
 
@@ -210,6 +211,9 @@ static void *sh_timer_init(uint32_t freq, int feat, 
qemu_irq irq)
 }
 
 typedef struct {
+MemoryRegion iomem;
+MemoryRegion iomem_p4;
+MemoryRegion iomem_a7;
 void *timer[3];
 int level[3];
 uint32_t tocr;
@@ -217,7 +221,8 @@ typedef struct {
 int feat;
 } tmu012_state;
 
-static uint32_t tmu012_read(void *opaque, target_phys_addr_t offset)
+static uint64_t tmu012_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
 {
 tmu012_state *s = (tmu012_state *)opaque;
 
@@ -248,7 +253,7 @@ static uint32_t tmu012_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void tmu012_write(void *opaque, target_phys_addr_t offset,
-uint32_t value)
+uint64_t value, unsigned size)
 {
 tmu012_state *s = (tmu012_state *)opaque;
 
@@ -291,23 +296,17 @@ static void tmu012_write(void *opaque, target_phys_addr_t 
offset,
 }
 }
 
-static CPUReadMemoryFunc * const tmu012_readfn[] = {
-tmu012_read,
-tmu012_read,
-tmu012_read
+static const MemoryRegionOps tmu012_ops = {
+.read = tmu012_read,
+.write = tmu012_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const tmu012_writefn[] = {
-tmu012_write,
-tmu012_write,
-tmu012_write
-};
-
-void tmu012_init(target_phys_addr_t base, int feat, uint32_t freq,
+void tmu012_init(MemoryRegion *sysmem, target_phys_addr_t base,
+ int feat, uint32_t freq,
 qemu_irq ch0_irq, qemu_irq ch1_irq,
 qemu_irq ch2_irq0, qemu_irq ch2_irq1)
 {
-int iomemtype;
 tmu012_state *s;
 int timer_feat = (feat & TMU012_FEAT_EXTCLK) ? TIMER_FEAT_EXTCLK : 0;
 
@@ -318,10 +317,16 @@ void tmu012_init(target_phys_addr_t base, int feat, 
uint32_t freq,
 if (feat & TMU012_FEAT_3CHAN)
 s->timer[2] = sh_timer_init(freq, timer_feat | TIMER_FEAT_CAPT,
ch2_irq0); /* ch2_irq1 not supported */
-iomemtype = cpu_register_io_memory(tmu012_readfn,
-   tmu012_writefn, s,
-   DEVICE_NATIVE_ENDIAN);
-cpu_register_physical_memory(P4ADDR(base), 0x1000, iomemtype);
-cpu_register_physical_memory(A7ADDR(base), 0x1000, iomemtype);
+
+memory_region_init_io(&s->iomem, &tmu012_ops, s,
+  "timer", 0x1ULL);
+
+memory_region_init_alias(&s->iomem_p4, "timer-p4",
+ &s->iomem, 0, 0x1000);
+memory_region_add_subregion(sysmem, P4ADDR(base), &s->iomem_p4);
+
+memory_region_init_alias(&s->iomem_a7, "timer-a7",
+ &s->iomem, 0, 0x1000);
+memory_region_add_subregion(sysmem, A7ADDR(base), &s->iomem_a7);
 /* ??? Save/restore.  */
 }
-- 
1.7.5.4

Re: [Qemu-devel] [PATCH] Add -f option to qemu-nbd

2011-11-17 Thread Stefan Hajnoczi

On Thu, Nov 17, 2011 at 11:34 AM, Chun Yan Liu  wrote:
> Thanks for your suggestions.
>
> For the usage "qemu-nbd -f disk.img", adding some code could implement it. I
> think it could be like "losetup -f" usage.
>
> #qemu-nbd -f
>
> show the first free nbd device at this moment.
>
> user can choose to issue "qemu-nbd -c THAT_DEVICE disk.img" or not.
>
> #qemu-nbd -f disk.img
>
> find a free nbd device and connect disk.img to that device.
>
> How do you think?
>
> For the race conditions caused by executing multiple qemu-nbd -f at the same
> time, I've tried both ways (1. lock; 2. if one device not work, trying other
> devices until one works).
>
> In my testing, the 2nd way has problem. When issuing "qemu-nbd -c /dev/nbd0
> disk.img -v" and "qemu-nbd -c /dev/nbd0 disk1.img -v" at the same time, the
> latter one will eventually exit with EXIT_FAILURE, but the first one cannot
> work normally as well, it cannot show disk partitions. Executing multiple
> "qemu-nbd -f" has same problem.

The problem is that qemu/nbd.c is doing it wrong.

The linux/drivers/block/nbd.c driver returns -EBUSY from
ioctl(NBD_SET_SOCK) if the device already has a socket configured.
However, qemu/nbd.c issues NBD_SET_BLKSIZE and other settings ioctls
*before* NBD_SET_SOCK.

This clobbers the existing nbd connection settings.  Then we take over
the device using NBD_CLEAR_SOCK and NBD_SET_SOCK instead of noticing
there is already a socket configured.

I think qemu/nbd.c:nbd_init() should be fixed to:

1. NBD_SET_SOCK.  This fails with -EBUSY if an existing socket is
configured.  This is the atomic acquire/test operation.
2. NBD_SET_BLKSIZE and other settings ioctls.

The only user-visible change is that qemu-nbd -c /dev/nbd0 no longer
hijacks the nbd device.  Instead it properly detects -EBUSY.  This
seems like a reasonable change to make although we could restrict it
to only qemu-nbd -f disk.img in order to preserve the current behavior
for qemu-nbd -c /dev/nbd0.

I don't think using file locks is necessary since it can be done
properly through the nbd ioctl interface.

Stefan

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Jamie Lokier

>>> On 11/16/2011 03:36 PM, Anthony Liguori wrote:
 We have another requirement. We need to embed the source for the guest
 agent in the QEMU release tarball. This is for GPL compliance since we
 want to include an ISO (eventually) that contains binaries.

Paolo Bonzini wrote:
> ovirt-guest-agent is licensed under GPLv3, so you do not need to;
> the options in GPLv3 include this one:
> 
> d) Convey the object code by offering access from a designated
> place (gratis or for a charge), and offer equivalent access to the
> Corresponding Source in the same way through the same place at no
> further charge.  You need not require recipients to copy the
> Corresponding Source along with the object code.  If the place to
> copy the object code is a network server, the Corresponding Source
> may be on a different server (operated by you or a third party)
> that supports equivalent copying facilities, provided you maintain
> clear directions next to the object code saying where to find the
> Corresponding Source.  Regardless of what server hosts the
> Corresponding Source, you remain obligated to ensure that it is
> available for as long as needed to satisfy these requirements.

Hi,

GPLv2 also has a clause similar to the above.  In GPLv2 it's not
enumerated, but says:

If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.

I'm not sure why "mere aggregation" (GPLv2) and "aggregate" (GPLv3)
aren't sufficient to allow shipping the different binaries together in
a single ISO regardless of where the source code lives or how it's licensed.

-- Jamie

Re: [Qemu-devel] [v9 Patch 5/6]Qemu: Framework for reopening images safely

2011-11-17 Thread Stefan Hajnoczi

On Fri, Nov 11, 2011 at 6:48 AM, Supriya Kannery
 wrote:
> @@ -708,17 +731,31 @@ int bdrv_reopen(BlockDriverState *bs, in
>         qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name);
>         return ret;
>     }
> -    open_flags = bs->open_flags;
> -    bdrv_close(bs);
>
> -    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
> -    if (ret < 0) {
> -        /* Reopen failed. Try to open with original flags */
> -        qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
> -        ret = bdrv_open(bs, bs->filename, open_flags, drv);
> +    /* Use driver specific reopen() if available */
> +    if (drv->bdrv_reopen_prepare) {

This seems weird to me because we're saying a driver may have
drv->bdrv_reopen_prepare == NULL but the public bdrv_reopen_prepare()
function doesn't check and return -ENOTSUP.

This check can be moved into bdrv_reopen_prepare().  We can test for
the -ENOTSUP return value here instead.

> +        ret = bdrv_reopen_prepare(bs, &reopen_state, bdrv_flags);
> +         if (ret < 0) {

Indentation is off here.

Stefan

Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time

2011-11-17 Thread Avi Kivity

On 11/14/2011 05:56 AM, zanghongy...@huawei.com wrote:
> From: Hongyong Zang 
>
> Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running on 
> the same host. Currently, guest notifies qemu by reading or writing ivshmem 
> device's PCI MMIO BAR0(Doorbell).
>
> This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find guest 
> accesses PIO BAR 30% faster than MMIO BAR.
>  
>  CharDriverState **eventfd_chr;
>  CharDriverState *server_chr;
> -MemoryRegion ivshmem_mmio;
> +MemoryRegion ivshmem_pio;
>  
> -pcibus_t mmio_addr;
> +pcibus_t pio_addr;

This is a backwards incompatible change.  The way to accomplish this is
to add a new BAR which aliases the old one.  The new BAR should not be
visible on guests created with -M pc-1.0 and below.  Please also update
the spec so that driver authors can make use of the new feature.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [v9 Patch 6/6]Qemu: raw posix implementation of reopen functions

2011-11-17 Thread Stefan Hajnoczi

On Fri, Nov 11, 2011 at 6:48 AM, Supriya Kannery
 wrote:
> +static int raw_reopen_prepare(BlockDriverState *bs, BDRVReopenState **prs,
> +                              int flags)
> +{
> +    BDRVRawReopenState *raw_rs = g_malloc0(sizeof(BDRVRawReopenState));
> +    BDRVRawState *s = bs->opaque;
> +    int ret = 0;
> +
> +    raw_rs->reopen_state.reopen_flags = s->open_flags;
> +    raw_rs->reopen_state.bs = bs;
> +    raw_rs->reopen_fd = -1;
> +    *prs = &(raw_rs->reopen_state);
> +
> +    /* Flags that can be set using fcntl */
> +    int fcntl_flags = BDRV_O_NOCACHE;
> +
> +    if ((bs->open_flags & ~fcntl_flags) == (flags & ~fcntl_flags)) {
> +        raw_rs->reopen_fd = dup(s->fd);
> +        if (raw_rs->reopen_fd <= 0) {
> +            return -1;

return -errno;

> +        }
> +        if ((flags & BDRV_O_NOCACHE)) {
> +            raw_rs->reopen_state.reopen_flags |= O_DIRECT;
> +        } else {
> +            raw_rs->reopen_state.reopen_flags &= ~O_DIRECT;
> +        }
> +        ret = fcntl_setfl(raw_rs->reopen_fd, 
> raw_rs->reopen_state.reopen_flags);

I wonder if this works on Solaris, FreeBSD, etc?

Perhaps there needs to be a fallback to the missing "else" case below...

> +    } else {
> +
> +        /* TBD: Handle O_DSYNC and other flags. For now return error */
> +        ret = -1;

...and this needs to be implemented.

> +    }
> +    return ret;
> +}

Stefan

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Anthony Liguori


On 11/17/2011 02:59 AM, Ayal Baron wrote:



- Original Message -

On 11/16/2011 11:53 AM, Barak Azulay wrote:

On Wednesday 16 November 2011 17:28:16 Michael Roth wrote:

2) You'd also need a schema, similar to
qemu.git/qapi-schema-guest.json,
to describe the calls you're proxying. The existing infrastructure
in
QEMU will handle all the work of marshalling/unmarshalling
responses
back to the QMP client on the host-side.

It's a bit of extra work, but the benefit is unifying the
qemu/guest-level management interface into a single place that's
easy
for QMP/libvirt to consume.



The issue is not whether it's possible or not or the amount of
efforts need to
be done for that to happen, either for qemu-ga or ovirt-guest-agent
this work
needs to be done.

the question is whether all comminication should go through the
monitor (hence
double proxy) or ... only a subset of the commands that are closly
related to
hypervisor functionality and separate it from general
management-system
related actions (e.g. ovirt or any other management system that
wants to
communicate to the guest).


Yes, all guest interaction should be funnelled through QEMU.  QEMU
has one job
in life--to expose an interface to guests and turn it into something
more useful
to the host.  QEMU expose an emulated AHCI controller and turns that
into VFS
operations.

Likewise, QEMU should expose a paravirtual "agent" device to a guest,
and then
turn that into higher level management interfaces.


Exposing higher level management interfaces means that qemu would have to do 
policy.


No, the way we plan on doing this is having a guest agent schema 
(qapi-schema-guest.json) that we can use to (1) white list valid operations and 
(2) decode and re-encode operations.


(1) let's us validate that guest state isn't escaping which keeps migration safe

(2) let's us scrub any potentially malicious input from the guest before we hand 
it off to the management tool.


Otherwise, we don't get in the middle and don't really care what the verbs are.



QEMU's job is to sanitize information from the guest and try to turn
that into
something that is safer for the broader world to consume.  QEMU also
deals with
isolating state in order to support things like live migration.  This


So are you suggesting that when a user reads a file you would automatically 
encode the contents?


I'm not sure I understand what you're suggesting.

Here's another way to think of this.  In a typical enterprise environment, you 
would secure your network infrastructure using isolated zones.  You may have a 
red zone (guest networking), a yellow zone (management network), and a green 
zone (broader intranet).  The zones are physically separate with very few things 
that exist on two zones.


You pay special attention to anything that crosses zones and try to minimize 
them as much as possible.  You never allow something to live on more than two zones.


The guest is the red zone and the rest of the host environment is the yellow 
zone.  QEMU bridges between the red and yellow zone.  That is fundamentally its 
job in the stack.


Other than the guest agent, VDSM lives purely in the yellow zone.  In fact, VDSM 
bridges from the yellow zone to the green zone (broader management infrastructure).


It may be easier to skip QEMU and have VDSM also stride into the red zone.  It's 
always easier to cross zones.  But it's not good security practice.  There is 
tremendous value in having clean security layers.


Regards,

Anthony Liguori

Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time

2011-11-17 Thread Sasha Levin

On Thu, 2011-11-17 at 16:36 +0200, Avi Kivity wrote:
> On 11/14/2011 05:56 AM, zanghongy...@huawei.com wrote:
> > From: Hongyong Zang 
> >
> > Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running on 
> > the same host. Currently, guest notifies qemu by reading or writing ivshmem 
> > device's PCI MMIO BAR0(Doorbell).
> >
> > This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find guest 
> > accesses PIO BAR 30% faster than MMIO BAR.
> >  
> >  CharDriverState **eventfd_chr;
> >  CharDriverState *server_chr;
> > -MemoryRegion ivshmem_mmio;
> > +MemoryRegion ivshmem_pio;
> >  
> > -pcibus_t mmio_addr;
> > +pcibus_t pio_addr;
> 
> 
> This is a backwards incompatible change.  The way to accomplish this is
> to add a new BAR which aliases the old one.  The new BAR should not be
> visible on guests created with -M pc-1.0 and below.  Please also update
> the spec so that driver authors can make use of the new feature.

Can we add an optional BAR 3 which does exactly what BAR 0 does, but is
in PIO space?

This will allow us to extend the spec instead of changing it, and in
turn drivers could remain compatible with QEMU and other device
implementations.

-- 

Sasha.

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Michael Roth


On 11/17/2011 02:46 AM, Ayal Baron wrote:



- Original Message -

I have been following this thread pretty closely and the one sentence
summary of
the current argument is: ovirt-guest-agent is already featureful and
tested, so
let's drop qemu-ga and have everyone adopt ovirt-guest-agent.


What we're suggesting is let's drop *one* of the two agents (obviously it would 
be easier for us to drop qemu-ga, but we'd rather reach consensus and unite 
behind one agent regardless of which agent it is).


  Unfortunately,
this track strays completely away from the stated goal of
convergence.  I have
at least two examples of why the greater KVM community can never
adopt
ovirt-guest-agent as-is.  To address this, I would like to counter
with an
example on how qemu-ga can enable the deployment of ovirt-guest-agent
features
and satisfy the needs of the whole community at the same time.

1) Scope:  The ovirt-guest-agent contains functionality that is
incredibly
useful within the context of oVirt.  Single Sign-on is very handy but
KVM users
outside the scope of oVirt will not want this extra complexity in
their agent.
For simplicity they will probably just write something small that
does what they
need (and we have failed to provide a ubiquitous KVM agent).


I totally agree, but that could easily be resolved using the plugin 
architecture suggested before.



1) Deployment complexity: The more complex the guest agent is, the
more often it
will need to be updated (bug/security fixes, distro compatibility,
new
features).  Rolling out guest agent updates does not scale well in
large
environments (especially when the guest and host administrators are
not the same
person).


Using plugins, you just deploy the ones you need, keeping the attack surface / 
#bugs / need to update lower


But you still need to deploy those plugins somehow, so the logistics of 
distributing this code to multiple types/levels of guests remains, and 
plugins are insufficient to handle security fixes in the core code 
(however small that attack surface may be). Eventually you'll need a 
newer version of the guest agent installed.


qemu-ga could be the vehicle for delivering those ovirt plugins/updates, 
and qemu-ga can upgrade itself to handle it's own security fixes/updates.


With this model you can keep your agent functionality closely tied to 
the high-level management infrastructure, take liberties in what 
features/changes you need to add/make, and push-deploy those changes 
through qemu-ga. Low-level primitives to build high-level interfaces 
higher up the stack has always been a primary design goal so this all 
fits together fairly well from a QEMU perspective. The extra 
orchestration required is worth it, IMO, as the alternative is limiting 
customers to a particular distro, installing a similar backend, or 
shooting out emails to everyone asking them to update their guest agent 
so you can leverage feature X.






For these reasons (and many others), I support having an agent with
very basic
primitives that can be orchestrated by the host to provide needed
functionality.
This agent would present a low-level, stable, extensible API that
everyone can
use.  Today qemu-ga supports the following verbs: sync ping info
shutdown
file-open file-close file-read file-write file-seek file-flush
fsfreeze-status
fsfreeze-freeze fsfreeze-thaw.  If we add a generic execute
mechanism, then the
agent can provide everything needed by oVirt to deploy SSO.

Let's assume that we have already agreed on some sort of security
policy for the
write-file and exec primitives.  Consensus is possible on this issue
but I
don't want to get bogged down with that here.

With the above primitives, SSO could be deployed automatically to a
guest with
the following sequence of commands:

file-open "/sso-package.bin" "w"
file-write  
file-close
file-open "/sso-package.bin" "x"
file-exec  
file-close


The guest can run on any number of hosts.  currently, the guest tools contain 
all the relevant logic installed (specifically for the guest os version).
What you're suggesting here is that we keep all the relevant guest-agent 
variants code on the host, automatically detect the guest os version and inject 
the correct file (e.g. SSO on winXP and on win2k8 is totally different).
In addition, there might be things requiring boot for example. So to solve that 
we would instead need to install a set of tools on the guest like we do the 
guest agent today (it would be a separate package because it's management 
specific).  And then we would tell the guest-agent to run tools from that set?  
Sounds overly complex to me.



The nature of the tools is more an implementation detail. It could also 
be distributed the same way it is now, except with a CLI interface or 
something rather than via virtio-serial.


Going even further, I posted another approach where ovirt-guest-agent 
just speaks to a local pipe, and qemu-ga execs ovirt-guest-agent and 
proxies RPCs via it's existing

Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time

2011-11-17 Thread Avi Kivity

On 11/17/2011 04:48 PM, Sasha Levin wrote:
> On Thu, 2011-11-17 at 16:36 +0200, Avi Kivity wrote:
> > On 11/14/2011 05:56 AM, zanghongy...@huawei.com wrote:
> > > From: Hongyong Zang 
> > >
> > > Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running 
> > > on the same host. Currently, guest notifies qemu by reading or writing 
> > > ivshmem device's PCI MMIO BAR0(Doorbell).
> > >
> > > This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find 
> > > guest accesses PIO BAR 30% faster than MMIO BAR.
> > >  
> > >  CharDriverState **eventfd_chr;
> > >  CharDriverState *server_chr;
> > > -MemoryRegion ivshmem_mmio;
> > > +MemoryRegion ivshmem_pio;
> > >  
> > > -pcibus_t mmio_addr;
> > > +pcibus_t pio_addr;
> > 
> > 
> > This is a backwards incompatible change.  The way to accomplish this is
> > to add a new BAR which aliases the old one.  The new BAR should not be
> > visible on guests created with -M pc-1.0 and below.  Please also update
> > the spec so that driver authors can make use of the new feature.
>
> Can we add an optional BAR 3 which does exactly what BAR 0 does, but is
> in PIO space?
>
> This will allow us to extend the spec instead of changing it, and in
> turn drivers could remain compatible with QEMU and other device
> implementations.

Yes, that's what I meant.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 3/8] qcow2: Cleanups and memleak fix in qcow2_snapshot_create

2011-11-17 Thread Kevin Wolf

sn->id_str could be leaked before this. The rest of this patch changes
comments, fixes coding style or removes checks that are unnecessary with
g_malloc.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   26 +++---
 1 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 4d387a7..2df7858 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -290,21 +290,20 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 
 memset(sn, 0, sizeof(*sn));
 
+/* Generate an ID if it wasn't passed */
 if (sn_info->id_str[0] == '\0') {
-/* compute a new id */
 find_new_snapshot_id(bs, sn_info->id_str, sizeof(sn_info->id_str));
 }
 
-/* check that the ID is unique */
-if (find_snapshot_by_id(bs, sn_info->id_str) >= 0)
+/* Check that the ID is unique */
+if (find_snapshot_by_id(bs, sn_info->id_str) >= 0) {
 return -ENOENT;
+}
 
+/* Populate sn with passed data */
 sn->id_str = g_strdup(sn_info->id_str);
-if (!sn->id_str)
-goto fail;
 sn->name = g_strdup(sn_info->name);
-if (!sn->name)
-goto fail;
+
 sn->vm_state_size = sn_info->vm_state_size;
 sn->date_sec = sn_info->date_sec;
 sn->date_nsec = sn_info->date_nsec;
@@ -314,7 +313,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 if (ret < 0)
 goto fail;
 
-/* create the L1 table of the snapshot */
+/* Allocate the L1 table of the snapshot and copy the current one there. */
 l1_table_offset = qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
 if (l1_table_offset < 0) {
 goto fail;
@@ -324,12 +323,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 sn->l1_table_offset = l1_table_offset;
 sn->l1_size = s->l1_size;
 
-if (s->l1_size != 0) {
-l1_table = g_malloc(s->l1_size * sizeof(uint64_t));
-} else {
-l1_table = NULL;
-}
-
+l1_table = g_malloc(s->l1_size * sizeof(uint64_t));
 for(i = 0; i < s->l1_size; i++) {
 l1_table[i] = cpu_to_be64(s->l1_table[i]);
 }
@@ -356,7 +350,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 }
 #endif
 return 0;
- fail:
+
+fail:
+g_free(sn->id_str);
 g_free(sn->name);
 g_free(l1_table);
 return -1;
-- 
1.7.6.4

[Qemu-devel] [PATCH 2/8] qcow2: Return real error code in qcow2_write_snapshots

2011-11-17 Thread Kevin Wolf

Doesn't immediately fix anything as the callers don't use the return
value, but they will be fixed next.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   48 ++--
 1 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index db49bb3..4d387a7 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -144,6 +144,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 uint64_t data64;
 uint32_t data32;
 int64_t offset, snapshots_offset;
+int ret;
 
 /* compute the size of the snapshots */
 offset = 0;
@@ -156,6 +157,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 }
 snapshots_size = offset;
 
+/* Allocate space for the new snapshot list */
 snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
 bdrv_flush(bs->file);
 offset = snapshots_offset;
@@ -163,6 +165,7 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 return offset;
 }
 
+/* Write all snapshots to the new list */
 for(i = 0; i < s->nb_snapshots; i++) {
 sn = s->snapshots + i;
 memset(&h, 0, sizeof(h));
@@ -178,34 +181,59 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
 h.id_str_size = cpu_to_be16(id_str_size);
 h.name_size = cpu_to_be16(name_size);
 offset = align_offset(offset, 8);
-if (bdrv_pwrite_sync(bs->file, offset, &h, sizeof(h)) < 0)
+
+ret = bdrv_pwrite(bs->file, offset, &h, sizeof(h));
+if (ret < 0) {
 goto fail;
+}
 offset += sizeof(h);
-if (bdrv_pwrite_sync(bs->file, offset, sn->id_str, id_str_size) < 0)
+
+ret = bdrv_pwrite(bs->file, offset, sn->id_str, id_str_size);
+if (ret < 0) {
 goto fail;
+}
 offset += id_str_size;
-if (bdrv_pwrite_sync(bs->file, offset, sn->name, name_size) < 0)
+
+ret = bdrv_pwrite(bs->file, offset, sn->name, name_size);
+if (ret < 0) {
 goto fail;
+}
 offset += name_size;
 }
 
-/* update the various header fields */
+/*
+ * Update the header to point to the new snapshot table. This requires the
+ * new table and its refcounts to be stable on disk.
+ *
+ * FIXME This should be done with a single write
+ */
+ret = bdrv_flush(bs);
+if (ret < 0) {
+goto fail;
+}
+
 data64 = cpu_to_be64(snapshots_offset);
-if (bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, snapshots_offset),
-&data64, sizeof(data64)) < 0)
+ret = bdrv_pwrite(bs->file, offsetof(QCowHeader, snapshots_offset),
+  &data64, sizeof(data64));
+if (ret < 0) {
 goto fail;
+}
+
 data32 = cpu_to_be32(s->nb_snapshots);
-if (bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, nb_snapshots),
-&data32, sizeof(data32)) < 0)
+ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, nb_snapshots),
+   &data32, sizeof(data32));
+if (ret < 0) {
 goto fail;
+}
 
 /* free the old snapshot table */
 qcow2_free_clusters(bs, s->snapshots_offset, s->snapshots_size);
 s->snapshots_offset = snapshots_offset;
 s->snapshots_size = snapshots_size;
 return 0;
- fail:
-return -1;
+
+fail:
+return ret;
 }
 
 static void find_new_snapshot_id(BlockDriverState *bs,
-- 
1.7.6.4

[Qemu-devel] [PATCH 4/8] qcow2: Rework qcow2_snapshot_create error handling

2011-11-17 Thread Kevin Wolf

Increase refcounts only after allocating a new L1 table has succeeded in
order to make leaks less likely. If writing the snapshot table fails,
revert in-memory state to be consistent with that on disk.

While at it, make it return the real error codes instead of -1.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   55 +++
 1 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 2df7858..066d56b 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -283,7 +283,9 @@ static int find_snapshot_by_id_or_name(BlockDriverState 
*bs, const char *name)
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
 {
 BDRVQcowState *s = bs->opaque;
-QCowSnapshot *snapshots1, sn1, *sn = &sn1;
+QCowSnapshot *new_snapshot_list = NULL;
+QCowSnapshot *old_snapshot_list = NULL;
+QCowSnapshot sn1, *sn = &sn1;
 int i, ret;
 uint64_t *l1_table = NULL;
 int64_t l1_table_offset;
@@ -309,16 +311,12 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 sn->date_nsec = sn_info->date_nsec;
 sn->vm_clock_nsec = sn_info->vm_clock_nsec;
 
-ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
1);
-if (ret < 0)
-goto fail;
-
 /* Allocate the L1 table of the snapshot and copy the current one there. */
 l1_table_offset = qcow2_alloc_clusters(bs, s->l1_size * sizeof(uint64_t));
 if (l1_table_offset < 0) {
+ret = l1_table_offset;
 goto fail;
 }
-bdrv_flush(bs->file);
 
 sn->l1_table_offset = l1_table_offset;
 sn->l1_size = s->l1_size;
@@ -327,22 +325,50 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 for(i = 0; i < s->l1_size; i++) {
 l1_table[i] = cpu_to_be64(s->l1_table[i]);
 }
-if (bdrv_pwrite_sync(bs->file, sn->l1_table_offset,
-l1_table, s->l1_size * sizeof(uint64_t)) < 0)
+
+ret = bdrv_pwrite(bs->file, sn->l1_table_offset, l1_table,
+  s->l1_size * sizeof(uint64_t));
+if (ret < 0) {
 goto fail;
+}
+
 g_free(l1_table);
 l1_table = NULL;
 
-snapshots1 = g_malloc((s->nb_snapshots + 1) * sizeof(QCowSnapshot));
+/*
+ * Increase the refcounts of all clusters and make sure everything is
+ * stable on disk before updating the snapshot table to contain a pointer
+ * to the new L1 table.
+ */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
1);
+if (ret < 0) {
+goto fail;
+}
+
+ret = bdrv_flush(bs->file);
+if (ret < 0) {
+goto fail;
+}
+
+/* Append the new snapshot to the snapshot list */
+new_snapshot_list = g_malloc((s->nb_snapshots + 1) * sizeof(QCowSnapshot));
 if (s->snapshots) {
-memcpy(snapshots1, s->snapshots, s->nb_snapshots * 
sizeof(QCowSnapshot));
-g_free(s->snapshots);
+memcpy(new_snapshot_list, s->snapshots,
+   s->nb_snapshots * sizeof(QCowSnapshot));
+old_snapshot_list = s->snapshots;
 }
-s->snapshots = snapshots1;
+s->snapshots = new_snapshot_list;
 s->snapshots[s->nb_snapshots++] = *sn;
 
-if (qcow2_write_snapshots(bs) < 0)
+ret = qcow2_write_snapshots(bs);
+if (ret < 0) {
+g_free(s->snapshots);
+s->snapshots = old_snapshot_list;
 goto fail;
+}
+
+g_free(old_snapshot_list);
+
 #ifdef DEBUG_ALLOC
 {
   BdrvCheckResult result = {0};
@@ -355,7 +381,8 @@ fail:
 g_free(sn->id_str);
 g_free(sn->name);
 g_free(l1_table);
-return -1;
+
+return ret;
 }
 
 /* copy the snapshot 'snapshot_name' into the current disk image */
-- 
1.7.6.4

[Qemu-devel] [PATCH 6/8] qcow2: Fix order of refcount updates in qcow2_snapshot_goto

2011-11-17 Thread Kevin Wolf

The refcount updates must be moved so that in the worst case we can get
cluster leaks, but refcounts may never be too low.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-refcount.c |7 -
 block/qcow2-snapshot.c |   61 ++-
 2 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9605367..2db2ede 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -700,6 +700,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 l2_table = NULL;
 l1_table = NULL;
 l1_size2 = l1_size * sizeof(uint64_t);
+
+/* WARNING: qcow2_snapshot_goto relies on this function not using the
+ * l1_table_offset when it is the current s->l1_table_offset! Be careful
+ * when changing this! */
 if (l1_table_offset != s->l1_table_offset) {
 if (l1_size2 != 0) {
 l1_table = g_malloc0(align_offset(l1_size2, 512));
@@ -819,7 +823,8 @@ fail:
 qcow2_cache_set_writethrough(bs, s->refcount_block_cache,
 old_refcount_writethrough);
 
-if (l1_modified) {
+/* Update L1 only if it isn't deleted anyway (addend = -1) */
+if (addend >= 0 && l1_modified) {
 for(i = 0; i < l1_size; i++)
 cpu_to_be64s(&l1_table[i]);
 if (bdrv_pwrite_sync(bs->file, l1_table_offset, l1_table,
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 9f6647f..d6b5506 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -393,6 +393,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 int i, snapshot_index;
 int cur_l1_bytes, sn_l1_bytes;
 int ret;
+uint64_t *sn_l1_table = NULL;
 
 /* Search the snapshot */
 snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_id);
@@ -401,14 +402,6 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }
 sn = &s->snapshots[snapshot_index];
 
-/* Decrease refcount of clusters of current L1 table.
- * FIXME This is too early! */
-ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
- s->l1_size, -1);
-if (ret < 0) {
-goto fail;
-}
-
 /*
  * Make sure that the current L1 table is big enough to contain the whole
  * L1 table of the snapshot. If the snapshot L1 table is smaller, the
@@ -422,32 +415,65 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 cur_l1_bytes = s->l1_size * sizeof(uint64_t);
 sn_l1_bytes = sn->l1_size * sizeof(uint64_t);
 
-if (cur_l1_bytes > sn_l1_bytes) {
-memset(s->l1_table + sn->l1_size, 0, cur_l1_bytes - sn_l1_bytes);
-}
-
 /*
  * Copy the snapshot L1 table to the current L1 table.
  *
  * Before overwriting the old current L1 table on disk, make sure to
  * increase all refcounts for the clusters referenced by the new one.
+ * Decrease the refcount referenced by the old one only when the L1
+ * table is overwritten.
  */
-ret = bdrv_pread(bs->file, sn->l1_table_offset, s->l1_table, sn_l1_bytes);
+sn_l1_table = g_malloc0(cur_l1_bytes);
+
+ret = bdrv_pread(bs->file, sn->l1_table_offset, sn_l1_table, sn_l1_bytes);
+if (ret < 0) {
+goto fail;
+}
+
+ret = qcow2_update_snapshot_refcount(bs, sn->l1_table_offset,
+ sn->l1_size, 1);
 if (ret < 0) {
 goto fail;
 }
 
-ret = bdrv_pwrite(bs->file, s->l1_table_offset, s->l1_table, cur_l1_bytes);
+ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset, sn_l1_table,
+   cur_l1_bytes);
 if (ret < 0) {
 goto fail;
 }
 
+/*
+ * Decrease refcount of clusters of current L1 table.
+ *
+ * At this point, the in-memory s->l1_table points to the old L1 table,
+ * whereas on disk we already have the new one.
+ *
+ * qcow2_update_snapshot_refcount special cases the current L1 table to use
+ * the in-memory data instead of really using the offset to load a new one,
+ * which is why this works.
+ */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
+ s->l1_size, -1);
+
+/*
+ * Now update the in-memory L1 table to be in sync with the on-disk one. We
+ * need to do this even if updating refcounts failed.
+ */
 for(i = 0;i < s->l1_size; i++) {
-be64_to_cpus(&s->l1_table[i]);
+s->l1_table[i] = be64_to_cpu(sn_l1_table[i]);
 }
 
-/* FIXME This is too late! */
-ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
1);
+if (ret < 0) {
+goto fail;
+}
+
+g_free(sn_l1_table);
+
+/*
+ * Update QCOW_OFLAG_COPIED in the active L1 table (it may have changed
+ * when we decreased the refcount of the old snapshot.
+ */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_s

[Qemu-devel] [PATCH 5/8] qcow2: Return real error in qcow2_snapshot_goto

2011-11-17 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   50 +--
 1 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 066d56b..9f6647f 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -392,17 +392,32 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 QCowSnapshot *sn;
 int i, snapshot_index;
 int cur_l1_bytes, sn_l1_bytes;
+int ret;
 
+/* Search the snapshot */
 snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_id);
-if (snapshot_index < 0)
+if (snapshot_index < 0) {
 return -ENOENT;
+}
 sn = &s->snapshots[snapshot_index];
 
-if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, -1) 
< 0)
+/* Decrease refcount of clusters of current L1 table.
+ * FIXME This is too early! */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset,
+ s->l1_size, -1);
+if (ret < 0) {
 goto fail;
+}
 
-if (qcow2_grow_l1_table(bs, sn->l1_size, true) < 0)
+/*
+ * Make sure that the current L1 table is big enough to contain the whole
+ * L1 table of the snapshot. If the snapshot L1 table is smaller, the
+ * current one must be padded with zeros.
+ */
+ret = qcow2_grow_l1_table(bs, sn->l1_size, true);
+if (ret < 0) {
 goto fail;
+}
 
 cur_l1_bytes = s->l1_size * sizeof(uint64_t);
 sn_l1_bytes = sn->l1_size * sizeof(uint64_t);
@@ -411,19 +426,31 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 memset(s->l1_table + sn->l1_size, 0, cur_l1_bytes - sn_l1_bytes);
 }
 
-/* copy the snapshot l1 table to the current l1 table */
-if (bdrv_pread(bs->file, sn->l1_table_offset,
-   s->l1_table, sn_l1_bytes) < 0)
+/*
+ * Copy the snapshot L1 table to the current L1 table.
+ *
+ * Before overwriting the old current L1 table on disk, make sure to
+ * increase all refcounts for the clusters referenced by the new one.
+ */
+ret = bdrv_pread(bs->file, sn->l1_table_offset, s->l1_table, sn_l1_bytes);
+if (ret < 0) {
 goto fail;
-if (bdrv_pwrite_sync(bs->file, s->l1_table_offset,
-s->l1_table, cur_l1_bytes) < 0)
+}
+
+ret = bdrv_pwrite(bs->file, s->l1_table_offset, s->l1_table, cur_l1_bytes);
+if (ret < 0) {
 goto fail;
+}
+
 for(i = 0;i < s->l1_size; i++) {
 be64_to_cpus(&s->l1_table[i]);
 }
 
-if (qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 1) 
< 0)
+/* FIXME This is too late! */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
1);
+if (ret < 0) {
 goto fail;
+}
 
 #ifdef DEBUG_ALLOC
 {
@@ -432,8 +459,9 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 }
 #endif
 return 0;
- fail:
-return -EIO;
+
+fail:
+return ret;
 }
 
 int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
-- 
1.7.6.4

[Qemu-devel] [PATCH 8/8] qcow2: Fix error path in qcow2_snapshot_load_tmp

2011-11-17 Thread Kevin Wolf

If the bdrv_read() of the snapshot's L1 table fails, return the right
error code and make sure that the old L1 table is still loaded and we
don't break the BlockDriverState completely.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   33 +
 1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 4c2fbe8..0214b95 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -578,32 +578,41 @@ int qcow2_snapshot_list(BlockDriverState *bs, 
QEMUSnapshotInfo **psn_tab)
 
 int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name)
 {
-int i, snapshot_index, l1_size2;
+int i, snapshot_index;
 BDRVQcowState *s = bs->opaque;
 QCowSnapshot *sn;
+uint64_t *new_l1_table;
+int new_l1_bytes;
+int ret;
 
+assert(bs->read_only);
+
+/* Search the snapshot */
 snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_name);
 if (snapshot_index < 0) {
 return -ENOENT;
 }
-
 sn = &s->snapshots[snapshot_index];
-s->l1_size = sn->l1_size;
-l1_size2 = s->l1_size * sizeof(uint64_t);
-if (s->l1_table != NULL) {
-g_free(s->l1_table);
-}
 
-s->l1_table_offset = sn->l1_table_offset;
-s->l1_table = g_malloc0(align_offset(l1_size2, 512));
+/* Allocate and read in the snapshot's L1 table */
+new_l1_bytes = s->l1_size * sizeof(uint64_t);
+new_l1_table = g_malloc0(align_offset(new_l1_bytes, 512));
 
-if (bdrv_pread(bs->file, sn->l1_table_offset,
-   s->l1_table, l1_size2) != l1_size2) {
-return -1;
+ret = bdrv_pread(bs->file, sn->l1_table_offset, new_l1_table, 
new_l1_bytes);
+if (ret < 0) {
+g_free(new_l1_table);
+return ret;
 }
 
+/* Switch the L1 table */
+s->l1_size = sn->l1_size;
+s->l1_table_offset = sn->l1_table_offset;
+s->l1_table = new_l1_table;
+g_free(s->l1_table);
+
 for(i = 0;i < s->l1_size; i++) {
 be64_to_cpus(&s->l1_table[i]);
 }
+
 return 0;
 }
-- 
1.7.6.4

[Qemu-devel] [PATCH V4 00/10] Xen PCI Passthrough

2011-11-17 Thread Anthony PERARD


Hi all,

This patch series introduces the PCI passthrough for Xen.

First, we have HostPCIDevice that help to access one PCI device of the host.

Then, there is an additions in the QEMU code, pci_check_bar_overlap.

There are also several change in pci_ids and pci_regs.

Last part, but not least, the PCI passthrough device himself. Cut in 3 parts
(or file), there is one to take care of the initialisation of a passthrough
device. The second one handle everything about the config address space, there
are specifics functions for every config register. The third one is to handle
MSI.

There is a patch series on xen-devel (applied to xen-unstable) that add the
support of setting a PCI passthrough device through QMP from libxl (xen tool
stack). It is just a call to device_add, with the driver parametter
hostaddr=":00:1b.0".


Change since v3:
  - host_pci_get_* can now return an error, and take an extra parameter, a
pointer to store the wanted value.
  - The memory_region for the PCI BAR are handled "manualy" because calling
pci_default_write_config was not possible, because the XenPT handle the
PCIIORegion it self. This make possible to do a device_remove.
  - Introduction of PT_ERR and PT_WARN macro to print debug and error messages.
Also, these macro as well as PT_LOG will always print the short BDF of the
device in the guest point of view.
  - PT_ERR is print by default (for all error messages).
  - Some debug/error message have been improve and should be a bit more useful.
  - hw_error have been removed from the code, and have been replaced by either
a call to qemu_system_shudown_request() (that lead to a domain destroy) or
a failed in the initialisation of the device.
  - Now, every patchs should compile with no error.


Change v2-v3;
  - in host-pci-device.c:
- Return more usefull error code in get_ressource().
- Use macro in host_pci_find_ext_cap_offset instead of raw number. But I
  still not sure if PCI_MAX_EXT_CAP is right, it's result is 480 like it
  was before, so it's maybe ok.
  - All use of MSI stuff in two first pci passthrough patch have been removed
and move to the last patch.


Change v1-v2:
  - fix style issue (checkpatch.pl)
  - set the original authors, add some missing copyright headers
  - HostPCIDevice:
- introduce HostPCIIORegions (with base_addr, size, flags)
- save all flags from ./resource and store it in a separate field.
- fix endianess on write
- new host_pci_dev_put function
- use pci.c like interface host_pci_get/set_byte/word/long (instead of
  host_pci_read/write_)
  - compile HostPCIDevice only on linux (as well as xen_pci_passthrough)
  - introduce apic-msidef.h file.
  - no more run_one_timer, if a pci device is in the middle of a power
transition, just "return an error" in config read/write
  - use a global var mapped_machine_irq (local to xen_pci_passthrough.c)
  - add msitranslate and power-mgmt ad qdev property



Allen Kay (2):
  Introduce Xen PCI Passthrough, qdevice (1/3)
  Introduce Xen PCI Passthrough, PCI config space helpers (2/3)

Anthony PERARD (6):
  pci_ids: Add INTEL_82599_VF id.
  pci_regs: Fix value of PCI_EXP_TYPE_RC_EC.
  pci_regs: Add PCI_EXP_TYPE_PCIE_BRIDGE
  configure: Introduce --enable-xen-pci-passthrough.
  Introduce HostPCIDevice to access a pci device on the host.
  Introduce apic-msidef.h

Jiang Yunhong (1):
  Introduce Xen PCI Passthrough, MSI (3/3)

Yuji Shimada (1):
  pci.c: Add pci_check_bar_overlap

 Makefile.target  |6 +
 configure|   25 +
 hw/apic-msidef.h |   30 +
 hw/apic.c|   11 +-
 hw/host-pci-device.c |  279 
 hw/host-pci-device.h |   75 +
 hw/pci.c |   47 +
 hw/pci.h |3 +
 hw/pci_ids.h |1 +
 hw/pci_regs.h|3 +-
 hw/xen_common.h  |3 +
 hw/xen_pci_passthrough.c |  902 
 hw/xen_pci_passthrough.h |  337 +
 hw/xen_pci_passthrough_config_init.c | 2637 ++
 hw/xen_pci_passthrough_msi.c |  678 +
 xen-all.c|   12 +
 16 files changed, 5038 insertions(+), 11 deletions(-)
 create mode 100644 hw/apic-msidef.h
 create mode 100644 hw/host-pci-device.c
 create mode 100644 hw/host-pci-device.h
 create mode 100644 hw/xen_pci_passthrough.c
 create mode 100644 hw/xen_pci_passthrough.h
 create mode 100644 hw/xen_pci_passthrough_config_init.c
 create mode 100644 hw/xen_pci_passthrough_msi.c

-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 01/10] pci_ids: Add INTEL_82599_VF id.

2011-11-17 Thread Anthony PERARD

Signed-off-by: Anthony PERARD 
---
 hw/pci_ids.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_ids.h b/hw/pci_ids.h
index 83f3893..2ea5ec2 100644
--- a/hw/pci_ids.h
+++ b/hw/pci_ids.h
@@ -117,6 +117,7 @@
 #define PCI_DEVICE_ID_INTEL_82801I_UHCI6 0x2939
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI1 0x293a
 #define PCI_DEVICE_ID_INTEL_82801I_EHCI2 0x293c
+#define PCI_DEVICE_ID_INTEL_82599_VF 0x10ed
 
 #define PCI_VENDOR_ID_XEN   0x5853
 #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 05/10] Introduce HostPCIDevice to access a pci device on the host.

2011-11-17 Thread Anthony PERARD

Signed-off-by: Anthony PERARD 
---
 Makefile.target  |1 +
 hw/host-pci-device.c |  279 ++
 hw/host-pci-device.h |   75 ++
 3 files changed, 355 insertions(+), 0 deletions(-)
 create mode 100644 hw/host-pci-device.c
 create mode 100644 hw/host-pci-device.h

diff --git a/Makefile.target b/Makefile.target
index 2e881ce..e527c1b 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -220,6 +220,7 @@ obj-$(CONFIG_NO_XEN) += xen-stub.o
 obj-i386-$(CONFIG_XEN) += xen_platform.o
 
 # Xen PCI Passthrough
+obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += host-pci-device.o
 
 # Inter-VM PCI shared memory
 CONFIG_IVSHMEM =
diff --git a/hw/host-pci-device.c b/hw/host-pci-device.c
new file mode 100644
index 000..06f7761
--- /dev/null
+++ b/hw/host-pci-device.c
@@ -0,0 +1,279 @@
+/*
+ * Copyright (C) 2011   Citrix Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "host-pci-device.h"
+
+#define PCI_MAX_EXT_CAP \
+((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
+
+enum error_code {
+ERROR_SYNTAX = 1,
+};
+
+static int path_to(const HostPCIDevice *d,
+   const char *name, char *buf, ssize_t size)
+{
+return snprintf(buf, size, "/sys/bus/pci/devices/%04x:%02x:%02x.%x/%s",
+d->domain, d->bus, d->dev, d->func, name);
+}
+
+static int get_resource(HostPCIDevice *d)
+{
+int i, rc = 0;
+FILE *f;
+char path[PATH_MAX];
+unsigned long long start, end, flags, size;
+
+path_to(d, "resource", path, sizeof (path));
+f = fopen(path, "r");
+if (!f) {
+fprintf(stderr, "Error: Can't open %s: %s\n", path, strerror(errno));
+return -errno;
+}
+
+for (i = 0; i < PCI_NUM_REGIONS; i++) {
+if (fscanf(f, "%llx %llx %llx", &start, &end, &flags) != 3) {
+fprintf(stderr, "Error: Syntax error in %s\n", path);
+rc = ERROR_SYNTAX;
+break;
+}
+if (start) {
+size = end - start + 1;
+} else {
+size = 0;
+}
+
+if (i < PCI_ROM_SLOT) {
+d->io_regions[i].base_addr = start;
+d->io_regions[i].size = size;
+d->io_regions[i].flags = flags;
+} else {
+d->rom.base_addr = start;
+d->rom.size = size;
+d->rom.flags = flags;
+}
+}
+
+fclose(f);
+return rc;
+}
+
+static int get_value(HostPCIDevice *d, const char *name, unsigned long *pvalue)
+{
+char path[PATH_MAX];
+FILE *f;
+unsigned long value;
+
+path_to(d, name, path, sizeof (path));
+f = fopen(path, "r");
+if (!f) {
+fprintf(stderr, "Error: Can't open %s: %s\n", path, strerror(errno));
+return -1;
+}
+if (fscanf(f, "%lx\n", &value) != 1) {
+fprintf(stderr, "Error: Syntax error in %s\n", path);
+return -1;
+}
+fclose(f);
+*pvalue = value;
+return 0;
+}
+
+static bool pci_dev_is_virtfn(HostPCIDevice *d)
+{
+int rc;
+char path[PATH_MAX];
+struct stat buf;
+
+path_to(d, "physfn", path, sizeof (path));
+rc = !stat(path, &buf);
+
+return rc;
+}
+
+static int host_pci_config_fd(HostPCIDevice *d)
+{
+char path[PATH_MAX];
+
+if (d->config_fd < 0) {
+path_to(d, "config", path, sizeof (path));
+d->config_fd = open(path, O_RDWR);
+if (d->config_fd < 0) {
+fprintf(stderr, "HostPCIDevice: Can not open '%s': %s\n",
+path, strerror(errno));
+}
+}
+return d->config_fd;
+}
+static int host_pci_config_read(HostPCIDevice *d, int pos, void *buf, int len)
+{
+int fd = host_pci_config_fd(d);
+int res = 0;
+
+again:
+res = pread(fd, buf, len, pos);
+if (res != len) {
+if (res < 0 && (errno == EINTR || errno == EAGAIN)) {
+goto again;
+}
+fprintf(stderr, "host_pci_config: read failed: %s (fd: %i)\n",
+strerror(errno), fd);
+return -errno;
+}
+return 0;
+}
+static int host_pci_config_write(HostPCIDevice *d,
+ int pos, const void *buf, int len)
+{
+int fd = host_pci_config_fd(d);
+int res = 0;
+
+again:
+res = pwrite(fd, buf, len, pos);
+if (res != len) {
+if (res < 0 && (errno == EINTR || errno == EAGAIN)) {
+goto again;
+}
+fprintf(stderr, "host_pci_config: write failed: %s\n",
+strerror(errno));
+return -errno;
+}
+return 0;
+}
+
+int host_pci_get_byte(HostPCIDevice *d, int pos, uint8_t *p)
+{
+  uint8_t buf;
+  if (host_pci_config_read(d, pos, &buf, 1)) {
+  return -1;
+  }
+  *p = buf;
+  return 0;
+}
+int host_pci_get_word(HostPCIDevice *d, int pos, uint16_t *p)
+{
+  uint16_t buf;
+  if (host_pci_config_read(d, pos, &buf, 2)) {
+

[Qemu-devel] [PATCH V4 06/10] pci.c: Add pci_check_bar_overlap

2011-11-17 Thread Anthony PERARD

From: Yuji Shimada 

This function help Xen PCI Passthrough device to check for overlap.

Signed-off-by: Yuji Shimada 
Signed-off-by: Anthony PERARD 
---
 hw/pci.c |   47 +++
 hw/pci.h |3 +++
 2 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 399227f..563bb37 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2038,3 +2038,50 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
 {
 return dev->bus->address_space_io;
 }
+
+int pci_check_bar_overlap(PCIDevice *dev,
+  pcibus_t addr, pcibus_t size, uint8_t type)
+{
+PCIBus *bus = dev->bus;
+PCIDevice *devices = NULL;
+PCIIORegion *r;
+int i, j;
+int rc = 0;
+
+/* check Overlapped to Base Address */
+for (i = 0; i < ARRAY_SIZE(bus->devices); i++) {
+devices = bus->devices[i];
+if (!devices) {
+continue;
+}
+
+/* skip itself */
+if (devices->devfn == dev->devfn) {
+continue;
+}
+
+for (j = 0; j < PCI_NUM_REGIONS; j++) {
+r = &devices->io_regions[j];
+
+/* skip different resource type, but don't skip when
+ * prefetch and non-prefetch memory are compared.
+ */
+if (type != r->type) {
+if (type == PCI_BASE_ADDRESS_SPACE_IO ||
+r->type == PCI_BASE_ADDRESS_SPACE_IO) {
+continue;
+}
+}
+
+if ((addr < (r->addr + r->size)) && ((addr + size) > r->addr)) {
+printf("Overlapped to device[%02x:%02x.%x][Region:%d]"
+   "[Address:%"PRIx64"h][Size:%"PRIx64"h]\n",
+   pci_bus_num(bus), PCI_SLOT(devices->devfn),
+   PCI_FUNC(devices->devfn), j, r->addr, r->size);
+rc = 1;
+}
+}
+}
+
+return rc;
+}
diff --git a/hw/pci.h b/hw/pci.h
index 4b2e785..307fa13 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -550,4 +550,7 @@ static inline void pci_dma_sglist_init(QEMUSGList *qsg, 
PCIDevice *dev,
 qemu_sglist_init(qsg, alloc_hint);
 }
 
+int pci_check_bar_overlap(PCIDevice *dev,
+  pcibus_t addr, pcibus_t size, uint8_t type);
+
 #endif
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 02/10] pci_regs: Fix value of PCI_EXP_TYPE_RC_EC.

2011-11-17 Thread Anthony PERARD

Value check in PCI Express Base Specification rev 1.1

Signed-off-by: Anthony PERARD 
---
 hw/pci_regs.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index e8357c3..6b42515 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -393,7 +393,7 @@
 #define  PCI_EXP_TYPE_DOWNSTREAM 0x6   /* Downstream Port */
 #define  PCI_EXP_TYPE_PCI_BRIDGE 0x7   /* PCI/PCI-X Bridge */
 #define  PCI_EXP_TYPE_RC_END   0x9 /* Root Complex Integrated Endpoint */
-#define  PCI_EXP_TYPE_RC_EC0x10/* Root Complex Event Collector */
+#define  PCI_EXP_TYPE_RC_EC 0xa /* Root Complex Event Collector */
 #define PCI_EXP_FLAGS_SLOT 0x0100  /* Slot implemented */
 #define PCI_EXP_FLAGS_IRQ  0x3e00  /* Interrupt message number */
 #define PCI_EXP_DEVCAP 4   /* Device capabilities */
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 04/10] configure: Introduce --enable-xen-pci-passthrough.

2011-11-17 Thread Anthony PERARD

Signed-off-by: Anthony PERARD 
---
 Makefile.target |2 ++
 configure   |   25 +
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index a111521..2e881ce 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -219,6 +219,8 @@ obj-$(CONFIG_NO_XEN) += xen-stub.o
 
 obj-i386-$(CONFIG_XEN) += xen_platform.o
 
+# Xen PCI Passthrough
+
 # Inter-VM PCI shared memory
 CONFIG_IVSHMEM =
 ifeq ($(CONFIG_KVM), y)
diff --git a/configure b/configure
index 6c77fbb..1e6ea91 100755
--- a/configure
+++ b/configure
@@ -127,6 +127,7 @@ vnc_png=""
 vnc_thread="no"
 xen=""
 xen_ctrl_version=""
+xen_pci_passthrough=""
 linux_aio=""
 attr=""
 libattr=""
@@ -644,6 +645,10 @@ for opt do
   ;;
   --enable-xen) xen="yes"
   ;;
+  --disable-xen-pci-passthrough) xen_pci_passthrough="no"
+  ;;
+  --enable-xen-pci-passthrough) xen_pci_passthrough="yes"
+  ;;
   --disable-brlapi) brlapi="no"
   ;;
   --enable-brlapi) brlapi="yes"
@@ -990,6 +995,8 @@ echo "   (affects only QEMU, not 
qemu-img)"
 echo "  --enable-mixemu  enable mixer emulation"
 echo "  --disable-xendisable xen backend driver support"
 echo "  --enable-xen enable xen backend driver support"
+echo "  --disable-xen-pci-passthrough"
+echo "  --enable-xen-pci-passthrough"
 echo "  --disable-brlapi disable BrlAPI"
 echo "  --enable-brlapi  enable BrlAPI"
 echo "  --disable-vnc-tlsdisable TLS encryption for VNC server"
@@ -1357,6 +1364,21 @@ EOF
   fi
 fi
 
+if test "$xen_pci_passthrough" != "no"; then
+  if test "$xen" = "yes" && test "$linux" = "yes"; then
+xen_pci_passthrough=yes
+  else
+if test "$xen_pci_passthrough" = "yes"; then
+  echo "ERROR"
+  echo "ERROR: User requested feature Xen PCI Passthrough"
+  echo "ERROR: but this feature require /sys from Linux"
+  echo "ERROR"
+  exit 1;
+fi
+xen_pci_passthrough=no
+  fi
+fi
+
 ##
 # pkg-config probe
 
@@ -3462,6 +3484,9 @@ case "$target_arch2" in
 if test "$xen" = "yes" -a "$target_softmmu" = "yes" ; then
   target_phys_bits=64
   echo "CONFIG_XEN=y" >> $config_target_mak
+  if test "$xen_pci_passthrough" = yes; then
+echo "CONFIG_XEN_PCI_PASSTHROUGH=y" >> "$config_target_mak"
+  fi
 else
   echo "CONFIG_NO_XEN=y" >> $config_target_mak
 fi
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 03/10] pci_regs: Add PCI_EXP_TYPE_PCIE_BRIDGE

2011-11-17 Thread Anthony PERARD

Signed-off-by: Anthony PERARD 
---
 hw/pci_regs.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/pci_regs.h b/hw/pci_regs.h
index 6b42515..56a404b 100644
--- a/hw/pci_regs.h
+++ b/hw/pci_regs.h
@@ -392,6 +392,7 @@
 #define  PCI_EXP_TYPE_UPSTREAM 0x5 /* Upstream Port */
 #define  PCI_EXP_TYPE_DOWNSTREAM 0x6   /* Downstream Port */
 #define  PCI_EXP_TYPE_PCI_BRIDGE 0x7   /* PCI/PCI-X Bridge */
+#define  PCI_EXP_TYPE_PCIE_BRIDGE 0x8   /* PCI/PCI-X to PCIE Bridge */
 #define  PCI_EXP_TYPE_RC_END   0x9 /* Root Complex Integrated Endpoint */
 #define  PCI_EXP_TYPE_RC_EC 0xa /* Root Complex Event Collector */
 #define PCI_EXP_FLAGS_SLOT 0x0100  /* Slot implemented */
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 09/10] Introduce apic-msidef.h

2011-11-17 Thread Anthony PERARD

This patch move the msi definition from apic.c to apic-msidef.h. So it can be
used also by other .c files.

Signed-off-by: Anthony PERARD 
Cc: Michael S. Tsirkin 
---
 hw/apic-msidef.h |   28 
 hw/apic.c|   11 +--
 2 files changed, 29 insertions(+), 10 deletions(-)
 create mode 100644 hw/apic-msidef.h

diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
new file mode 100644
index 000..3182f0b
--- /dev/null
+++ b/hw/apic-msidef.h
@@ -0,0 +1,28 @@
+#ifndef HW_APIC_MSIDEF_H
+#define HW_APIC_MSIDEF_H
+
+/*
+ * Intel APIC constants: from include/asm/msidef.h
+ */
+
+/*
+ * Shifts for MSI data
+ */
+
+#define MSI_DATA_VECTOR_SHIFT   0
+#define  MSI_DATA_VECTOR_MASK   0x00ff
+
+#define MSI_DATA_DELIVERY_MODE_SHIFT8
+#define MSI_DATA_LEVEL_SHIFT14
+#define MSI_DATA_TRIGGER_SHIFT  15
+
+/*
+ * Shift/mask fields for msi address
+ */
+
+#define MSI_ADDR_DEST_MODE_SHIFT2
+
+#define MSI_ADDR_DEST_ID_SHIFT  12
+#define  MSI_ADDR_DEST_ID_MASK  0x000
+
+#endif /* HW_APIC_MSIDEF_H */
diff --git a/hw/apic.c b/hw/apic.c
index 8289eef..18c4a87 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -24,6 +24,7 @@
 #include "sysbus.h"
 #include "trace.h"
 #include "pc.h"
+#include "apic-msidef.h"
 
 /* APIC Local Vector Table */
 #define APIC_LVT_TIMER   0
@@ -65,16 +66,6 @@
 #define MAX_APICS 255
 #define MAX_APIC_WORDS 8
 
-/* Intel APIC constants: from include/asm/msidef.h */
-#define MSI_DATA_VECTOR_SHIFT  0
-#define MSI_DATA_VECTOR_MASK   0x00ff
-#define MSI_DATA_DELIVERY_MODE_SHIFT   8
-#define MSI_DATA_TRIGGER_SHIFT 15
-#define MSI_DATA_LEVEL_SHIFT   14
-#define MSI_ADDR_DEST_MODE_SHIFT   2
-#define MSI_ADDR_DEST_ID_SHIFT 12
-#defineMSI_ADDR_DEST_ID_MASK   0x000
-
 #define MSI_ADDR_SIZE   0x10
 
 typedef struct APICState APICState;
-- 
Anthony PERARD

[Qemu-devel] [PATCH V4 07/10] Introduce Xen PCI Passthrough, qdevice (1/3)

2011-11-17 Thread Anthony PERARD

From: Allen Kay 

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Allen Kay 
Signed-off-by: Guy Zana 
Signed-off-by: Anthony PERARD 
---
 Makefile.target  |2 +
 hw/xen_common.h  |3 +
 hw/xen_pci_passthrough.c |  831 ++
 hw/xen_pci_passthrough.h |  282 
 hw/xen_pci_passthrough_config_init.c |   11 +
 xen-all.c|   12 +
 6 files changed, 1141 insertions(+), 0 deletions(-)
 create mode 100644 hw/xen_pci_passthrough.c
 create mode 100644 hw/xen_pci_passthrough.h
 create mode 100644 hw/xen_pci_passthrough_config_init.c

diff --git a/Makefile.target b/Makefile.target
index e527c1b..33435a3 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -221,6 +221,8 @@ obj-i386-$(CONFIG_XEN) += xen_platform.o
 
 # Xen PCI Passthrough
 obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += host-pci-device.o
+obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pci_passthrough.o
+obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pci_passthrough_config_init.o
 
 # Inter-VM PCI shared memory
 CONFIG_IVSHMEM =
diff --git a/hw/xen_common.h b/hw/xen_common.h
index 0409ac7..48916fd 100644
--- a/hw/xen_common.h
+++ b/hw/xen_common.h
@@ -135,4 +135,7 @@ static inline int xc_fd(xc_interface *xen_xc)
 
 void destroy_hvm_domain(void);
 
+/* shutdown/destroy current domain because of an error */
+void xen_shutdown_fatal_error(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+
 #endif /* QEMU_HW_XEN_COMMON_H */
diff --git a/hw/xen_pci_passthrough.c b/hw/xen_pci_passthrough.c
new file mode 100644
index 000..998470b
--- /dev/null
+++ b/hw/xen_pci_passthrough.c
@@ -0,0 +1,831 @@
+/*
+ * Copyright (c) 2007, Neocleus Corporation.
+ * Copyright (c) 2007, Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Alex Novik 
+ * Allen Kay 
+ * Guy Zana 
+ *
+ * This file implements direct PCI assignment to a HVM guest
+ */
+
+/*
+ * Interrupt Disable policy:
+ *
+ * INTx interrupt:
+ *   Initialize(register_real_device)
+ * Map INTx(xc_physdev_map_pirq):
+ *   
+ * - Set real Interrupt Disable bit to '1'.
+ * - Set machine_irq and assigned_device->machine_irq to '0'.
+ * * Don't bind INTx.
+ *
+ * Bind INTx(xc_domain_bind_pt_pci_irq):
+ *   
+ * - Set real Interrupt Disable bit to '1'.
+ * - Unmap INTx.
+ * - Decrement mapped_machine_irq[machine_irq]
+ * - Set assigned_device->machine_irq to '0'.
+ *
+ *   Write to Interrupt Disable bit by guest software(pt_cmd_reg_write)
+ * Write '0'
+ *   msi_trans_en is false>
+ * - Set real bit to '0' if assigned_device->machine_irq isn't '0'.
+ *
+ * Write '1'
+ *   msi_trans_en is false>
+ * - Set real bit to '1'.
+ */
+
+#include 
+
+#include "pci.h"
+#include "xen.h"
+#include "xen_backend.h"
+#include "xen_pci_passthrough.h"
+
+#define PCI_BAR_ENTRIES (6)
+
+#define PT_NR_IRQS  (256)
+char mapped_machine_irq[PT_NR_IRQS] = {0};
+
+void pt_log(const PCIDevice *d, const char *f, ...)
+{
+va_list ap;
+
+va_start(ap, f);
+if (d) {
+fprintf(stderr, "[%02x:%02x.%x] ", pci_bus_num(d->bus),
+PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
+}
+vfprintf(stderr, f, ap);
+va_end(ap);
+}
+
+
+/* Config Space */
+static int pt_pci_config_access_check(PCIDevice *d, uint32_t address, int len)
+{
+/* check offset range */
+if (address >= 0xFF) {
+PT_ERR(d, "Failed to access register with offset exceeding 0xFF. "
+   "(addr: 0x%02x, len: %d)\n", address, len);
+return -1;
+}
+
+/* check read size */
+if ((len != 1) && (len != 2) && (len != 4)) {
+PT_ERR(d, "Failed to access register with invalid access length. "
+   "(addr: 0x%02x, len: %d)\n", address, len);
+return -1;
+}
+
+/* check offset alignment */
+if (address & (len - 1)) {
+PT_ERR(d, "Failed to access register with invalid access size "
+   "alignment. (addr: 0x%02x, len: %d)\n", address, len);
+return -1;
+}
+
+return 0;
+}
+
+int pt_bar_offset_to_index(uint32_t offset)
+{
+int index = 0;
+
+/* check Exp ROM BAR */
+if (offset == PCI_ROM_ADDRESS) {
+return PCI_ROM_SLOT;
+}
+
+/* calculate BAR index */
+index = (offset - PCI_BASE_ADDRESS_0) >> 2;
+if (index >= PCI_NUM_REGIONS) {
+return -1;
+}
+
+return index;
+}
+
+static uint32_t pt_pci_read_config(PCIDevice *d, uint32_t address, int len)
+{
+XenPCIPassthroughState *s = DO_UPCAST(XenPCIPassthroughState, dev, d);
+uint32_t val = 0;
+XenPTRegGroup *reg_grp_entry = NULL;
+XenPTReg *reg_entry = NULL;
+int rc = 0;
+int emul_len = 0;
+uint32_t find_addr = address;
+
+if (pt_pci_config_access_

[Qemu-devel] [PATCH V4 10/10] Introduce Xen PCI Passthrough, MSI (3/3)

2011-11-17 Thread Anthony PERARD

From: Jiang Yunhong 

A more complete history can be found here:
git://xenbits.xensource.com/qemu-xen-unstable.git

Signed-off-by: Jiang Yunhong 
Signed-off-by: Shan Haitao 
Signed-off-by: Anthony PERARD 
---
 Makefile.target  |1 +
 hw/apic-msidef.h |2 +
 hw/xen_pci_passthrough.c |   60 +++-
 hw/xen_pci_passthrough.h |   55 +++
 hw/xen_pci_passthrough_config_init.c |  505 +-
 hw/xen_pci_passthrough_msi.c |  678 ++
 6 files changed, 1294 insertions(+), 7 deletions(-)
 create mode 100644 hw/xen_pci_passthrough_msi.c

diff --git a/Makefile.target b/Makefile.target
index 33435a3..81cff70 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -223,6 +223,7 @@ obj-i386-$(CONFIG_XEN) += xen_platform.o
 obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += host-pci-device.o
 obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pci_passthrough.o
 obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pci_passthrough_config_init.o
+obj-i386-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pci_passthrough_msi.o
 
 # Inter-VM PCI shared memory
 CONFIG_IVSHMEM =
diff --git a/hw/apic-msidef.h b/hw/apic-msidef.h
index 3182f0b..6e2eb71 100644
--- a/hw/apic-msidef.h
+++ b/hw/apic-msidef.h
@@ -22,6 +22,8 @@
 
 #define MSI_ADDR_DEST_MODE_SHIFT2
 
+#define MSI_ADDR_REDIRECTION_SHIFT  3
+
 #define MSI_ADDR_DEST_ID_SHIFT  12
 #define  MSI_ADDR_DEST_ID_MASK  0x000
 
diff --git a/hw/xen_pci_passthrough.c b/hw/xen_pci_passthrough.c
index c816ed5..cd7e3c7 100644
--- a/hw/xen_pci_passthrough.c
+++ b/hw/xen_pci_passthrough.c
@@ -38,6 +38,39 @@
  * Write '1'
  *   msi_trans_en is false>
  * - Set real bit to '1'.
+ *
+ * MSI-INTx translation.
+ *   Initialize(xc_physdev_map_pirq_msi/pt_msi_setup)
+ * Bind MSI-INTx(xc_domain_bind_pt_irq)
+ *   
+ * - Unmap MSI.
+ *   
+ * - Set dev->msi->pirq to '-1'.
+ *   
+ * - Do nothing.
+ *
+ *   Write to Interrupt Disable bit by guest software(pt_cmd_reg_write)
+ * Write '0'
+ *   msi_trans_en is true>
+ * - Set MSI Enable bit to '1'.
+ *
+ * Write '1'
+ *   msi_trans_en is true>
+ * - Set MSI Enable bit to '0'.
+ *
+ * MSI interrupt:
+ *   Initialize MSI register(pt_msi_setup, pt_msi_update)
+ * Bind MSI(xc_domain_update_msi_irq)
+ *   
+ * - Unmap MSI.
+ * - Set dev->msi->pirq to '-1'.
+ *
+ * MSI-X interrupt:
+ *   Initialize MSI-X register(pt_msix_update_one)
+ * Bind MSI-X(xc_domain_update_msi_irq)
+ *   
+ * - Unmap MSI-X.
+ * - Set entry->pirq to '-1'.
  */
 
 #include 
@@ -389,6 +422,7 @@ static void pt_iomem_map(XenPCIPassthroughState *s, int i,
 }
 
 if (!first_map && old_ebase != PT_PCI_BAR_UNMAPPED) {
+pt_add_msix_mapping(s, i);
 /* Remove old mapping */
 memory_region_del_subregion(r->address_space,
 r->memory);
@@ -417,6 +451,15 @@ static void pt_iomem_map(XenPCIPassthroughState *s, int i,
 if (ret != 0) {
 PT_ERR(&s->dev, "create new mapping failed!\n");
 }
+
+ret = pt_remove_msix_mapping(s, i);
+if (ret != 0) {
+PT_ERR(&s->dev, "Remove MSI-X MMIO mapping failed!\n");
+}
+
+if (old_ebase != e_phys && old_ebase != -1) {
+pt_msix_update_remap(s, i);
+}
 }
 }
 
@@ -744,6 +787,9 @@ static int pt_initfn(PCIDevice *pcidev)
 mapped_machine_irq[machine_irq]++;
 }
 
+/* setup MSI-INTx translation if support */
+rc = pt_enable_msi_translate(s);
+
 /* bind machine_irq to device */
 if (rc < 0 && machine_irq != 0) {
 uint8_t e_device = PCI_SLOT(s->dev.devfn);
@@ -773,7 +819,8 @@ static int pt_initfn(PCIDevice *pcidev)
 
 out:
 PT_LOG(pcidev, "Real physical device %02x:%02x.%x registered successfuly!"
-   "\nIRQ type = %s\n", bus, slot, func, "INTx");
+   "\nIRQ type = %s\n", bus, slot, func,
+   s->msi_trans_en ? "MSI-INTx" : "INTx");
 
 return 0;
 }
@@ -790,7 +837,7 @@ static int pt_unregister_device(PCIDevice *pcidev)
 e_intx = pci_intx(s);
 machine_irq = s->machine_irq;
 
-if (machine_irq) {
+if (s->msi_trans_en == 0 && machine_irq) {
 rc = xc_domain_unbind_pt_irq(xen_xc, xen_domid, machine_irq,
  PT_IRQ_TYPE_PCI, 0, e_device, e_intx, 0);
 if (rc < 0) {
@@ -798,6 +845,13 @@ static int pt_unregister_device(PCIDevice *pcidev)
 }
 }
 
+if (s->msi) {
+pt_msi_disable(s);
+}
+if (s->msix) {
+pt_msix_disable(s);
+}
+
 if (machine_irq) {
 mapped_machine_irq[machine_irq]--;
 
@@ -832,6 +886,8 @@ static PCIDeviceInfo xen_pci_passthrough = {
 .is_express = 0,
 .qdev.props = (Property[]) {
 DEFINE_PROP_STRING("hostaddr", XenPCIPassthroughState, hostaddr),
+

[Qemu-devel] [PATCH 7/8] qcow2: Fix order in qcow2_snapshot_delete

2011-11-17 Thread Kevin Wolf

First the snapshot must be deleted and only then the refcounts can be
decreased.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   48 +---
 1 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index d6b5506..4c2fbe8 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -494,32 +494,50 @@ fail:
 int qcow2_snapshot_delete(BlockDriverState *bs, const char *snapshot_id)
 {
 BDRVQcowState *s = bs->opaque;
-QCowSnapshot *sn;
+QCowSnapshot sn;
 int snapshot_index, ret;
 
+/* Search the snapshot */
 snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_id);
-if (snapshot_index < 0)
+if (snapshot_index < 0) {
 return -ENOENT;
-sn = &s->snapshots[snapshot_index];
+}
+sn = s->snapshots[snapshot_index];
 
-ret = qcow2_update_snapshot_refcount(bs, sn->l1_table_offset, sn->l1_size, 
-1);
-if (ret < 0)
+/* Remove it from the snapshot list */
+memmove(s->snapshots + snapshot_index,
+s->snapshots + snapshot_index + 1,
+(s->nb_snapshots - snapshot_index - 1) * sizeof(sn));
+s->nb_snapshots--;
+ret = qcow2_write_snapshots(bs);
+if (ret < 0) {
 return ret;
-/* must update the copied flag on the current cluster offsets */
-ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
0);
-if (ret < 0)
+}
+
+/*
+ * The snapshot is now unused, clean up. If we fail after this point, we
+ * won't recover but just leak clusters.
+ */
+g_free(sn.id_str);
+g_free(sn.name);
+
+/*
+ * Now decrease the refcounts of clusters referenced by the snapshot and
+ * free the L1 table.
+ */
+ret = qcow2_update_snapshot_refcount(bs, sn.l1_table_offset,
+ sn.l1_size, -1);
+if (ret < 0) {
 return ret;
-qcow2_free_clusters(bs, sn->l1_table_offset, sn->l1_size * 
sizeof(uint64_t));
+}
+qcow2_free_clusters(bs, sn.l1_table_offset, sn.l1_size * sizeof(uint64_t));
 
-g_free(sn->id_str);
-g_free(sn->name);
-memmove(sn, sn + 1, (s->nb_snapshots - snapshot_index - 1) * sizeof(*sn));
-s->nb_snapshots--;
-ret = qcow2_write_snapshots(bs);
+/* must update the copied flag on the current cluster offsets */
+ret = qcow2_update_snapshot_refcount(bs, s->l1_table_offset, s->l1_size, 
0);
 if (ret < 0) {
-/* XXX: restore snapshot if error ? */
 return ret;
 }
+
 #ifdef DEBUG_ALLOC
 {
 BdrvCheckResult result = {0};
-- 
1.7.6.4

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Alon Levy

On Wed, Nov 16, 2011 at 08:59:35AM -0600, Michael Roth wrote:
> On 11/16/2011 02:16 AM, Ayal Baron wrote:
> >
> >
> >- Original Message -
> >>Hi,
> >>
> >>On 11/15/2011 11:39 PM, Ayal Baron wrote:
> >>>
> >>
> >>
> >>
> If you want to talk about convergence, the discussion should start
> around
> collecting requirements.  We can then figure out if the two sets
> of
> requirements
> are strictly overlapping or if there are any requirements that are
> fundamentally
> in opposition.
> >>>
> >>>Agreed.
> >>>
> >>>So vdsm guest agent goal is to ease administration of VMs.  This is
> >>>not saying much as it is quite broad so I will list what is
> >>>provided today and some things we need to add:
> >>>
> >>>Assistance in VM life-cycle:
> >>>"desktopShutdown" - Shuts the VM down gracefully from within the
> >>>guest.
> >>>"quiesce" - does not exist today.  This is definitely a requirement
> >>>for us.
> >>>
> >>>SSO support for spice sessions (automatically login into guest OS
> >>>using provided credentials):
> >>>"desktopLock" - lock current session, used when spice session gets
> >>>disconnected / before giving a new user access to spice session
> >>>"desktopLogin"
> >>>"desktopLogoff"
> >>>In addition, guest reports relevant info (currently active user,
> >>>session state)
> >>>
> >>>Monitoring and inventory:
> >>>currently agent sends info periodically, which includes a lot of
> >>>info which should probably be broken down and served upon request.
> >>>Info includes -
> >>>- memory usage
> >>>- NICs info (name, hw, inet, inet6)
> >>>- appslist (list of installed apps / rpms)
> >>>- OS type
> >>>- guest hostname
> >>>- internal file systems info (path, fs type, total space, used
> >>>space)
> >>>
> >>
> >>
> >>
> >>If we're gathering requirements and trying to come up with one agent
> >>to rule them all, don't forget
> >
> >I don't think we're trying to come up with one agent to rule them all, just 
> >avoid duplication of efforts if most of what the 2 agents are doing overlaps.
> >I think we can safely say that seeing as oVirt is KVM centric, 
> >ovirt-guest-agent wants to leverage qemu/kvm to the fullest which aligns 
> >with what qemu-guest-agent is doing.
> >However, ovirt-guest-agent is required to do a lot more, so we need to see 
> >if and how we resolve this.
> >
> >>about VDI and the Spice agent. Currently the spice agent handles the
> >>following:
> >>
> >>1) Paravirtual mouse (needed to get mouse coordinates right with
> >>multi monitor setups)
> >>2) Send client monitor configuration, so that the guest os can adjust
> >>its resolution
> >> (and number and place of monitors) to match the client
> >>3) Copy and paste in a platform neutral manner, if anyone wishes to
> >>add this to another agent
> >> please, please contact us (me) first. This is easy to get wrong
> >> (we went through 2 revisions
> >> of the protocol for this).
> >>4) Allow the client to request the guest to tone down the bling (for
> >>low spec clients)
> >>
> >>Notes:
> >>1) All of these are client<->  guest communication, rather then the
> >>host<->  guest communication
> >>which the other agents seem to focus on.
> >>
> >>2) Getting copy paste right requires a system level guest agent
> >>process as well as a per user
> >>session agent process.
> >
> >Neither qemu-guest-agent nor ovirt-guest-agent is aligned with doing any of 
> >the above, so I'm not sure there is any justification in uniting the spice 
> >agent with the rest.
> >
> 
> copy/paste was actually one of the initial use cases motivating
> qemu-ga; it's just that the requirements (system+user-level agents)
> were so different from the more pressing use cases of things like
> reliable shutdown/reboot that it's been put off for now. At some
> point we had a basic plan on how to approach it, but that needs to
> be re-assessed.
> 

I think for large opaque copies in/out to the guest, like image copy
paste, or guest<->guest copy paste (word OLE) it would be nice to
implemant a side channel scheme:
 message to allocate a channel
 message to deallocate a channel and signal successfull completion or
 error
 the channel is just another virtio-serial that is used for this
 communication only

The benefits would be:
 no need to slow down other operations
 no base64 conversion (both sides)

This of course means that this data is not being parsed by qemu, so it
can't benefit from any whitelisting / schema description. That's why it
should only be used for data that is undescribable - like the
aformentioned image/guest copy case (for instance for text copy it makes
possibly less sense - although again that's completely unstructured
text, so perhaps it makes sense as well).


> >>
> >>Regards,
> >>
> >>Hans
> >>
> >
> 
>

Re: [Qemu-devel] [PATCH V2 04/12] hw/9pfs: Open and create files

2011-11-17 Thread Stefan Hajnoczi

On Tue, Nov 15, 2011 at 11:57 AM, M. Mohan Kumar  wrote:
> +static void send_fd(int sockfd, int fd)
> +{
> +    struct msghdr msg = { };
> +    struct iovec iov;
> +    struct cmsghdr *cmsg;
> +    int retval, data;
> +    union MsgControl msg_control;
> +
> +    iov.iov_base = &data;
> +    iov.iov_len = sizeof(data);
> +
> +    memset(&msg, 0, sizeof(msg));
> +    msg.msg_iov = &iov;
> +    msg.msg_iovlen = 1;
> +    /* No ancillary data on error */
> +    if (fd < 0) {
> +        /*
> +         * fd is really negative errno if the request failed. Or simply
> +         * zero if the request is successful and it doesn't need a file
> +         * descriptor.
> +         */

It cannot be zero because the if statement is fd < 0.  The comment is confusing.

> +/*
> + * create a file and send fd on success
> + * return -errno on error
> + */
> +static int do_create(struct iovec *iovec)
> +{
> +    V9fsString path;
> +    int flags, fd, mode, uid, gid, cur_uid, cur_gid;
> +    proxy_unmarshal(iovec, 1, HDR_SZ, "s",
> +                   &path, &flags, &mode, &uid, &gid);

Unmarshalling can fail if the iovec size does not match what the
format string describes.  We should fail here rather than continuing
on.  If execution continues some of the variables may be
uninitialized.

Stefan

[Qemu-devel] [PATCH 0/8] qcow2: Fix error paths for internal snapshots

2011-11-17 Thread Kevin Wolf

This is more or less the same kind of fixes that we made in the rest of qcow2
last year: Return the right error codes and make the order of operations safe
so that a crash can lead to no more than cluster leaks.

Although all of these are bug fixes, I'm not so sure about taking them into
1.0. Maybe we can take some of the easier ones and leave others out, or just
move the whole series to 1.1. Feedback on this would appreciated.

Kevin Wolf (8):
  qcow2: Return real error code in qcow2_read_snapshots
  qcow2: Return real error code in qcow2_write_snapshots
  qcow2: Cleanups and memleak fix in qcow2_snapshot_create
  qcow2: Rework qcow2_snapshot_create error handling
  qcow2: Return real error in qcow2_snapshot_goto
  qcow2: Fix order of refcount updates in qcow2_snapshot_goto
  qcow2: Fix order in qcow2_snapshot_delete
  qcow2: Fix error path in qcow2_snapshot_load_tmp

 block/qcow2-refcount.c |7 +-
 block/qcow2-snapshot.c |  322 +++-
 block/qcow2.c  |5 +-
 3 files changed, 244 insertions(+), 90 deletions(-)

-- 
1.7.6.4

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Adam Litke

On Thu, Nov 17, 2011 at 03:46:37AM -0500, Ayal Baron wrote:
> 
> 
> - Original Message -
> > I have been following this thread pretty closely and the one sentence
> > summary of the current argument is: ovirt-guest-agent is already featureful
> > and tested, so let's drop qemu-ga and have everyone adopt ovirt-guest-agent.
> 
> What we're suggesting is let's drop *one* of the two agents (obviously it
> would be easier for us to drop qemu-ga, but we'd rather reach consensus and
> unite behind one agent regardless of which agent it is).
> 
> >  Unfortunately, this track strays completely away from the stated goal of
> >  convergence.  I have at least two examples of why the greater KVM community
> >  can never adopt ovirt-guest-agent as-is.  To address this, I would like to
> >  counter with an example on how qemu-ga can enable the deployment of
> >  ovirt-guest-agent features and satisfy the needs of the whole community at
> >  the same time.
> > 
> > 1) Scope:  The ovirt-guest-agent contains functionality that is incredibly
> > useful within the context of oVirt.  Single Sign-on is very handy but KVM
> > users outside the scope of oVirt will not want this extra complexity in
> > their agent.  For simplicity they will probably just write something small
> > that does what they need (and we have failed to provide a ubiquitous KVM
> > agent).
> 
> I totally agree, but that could easily be resolved using the plugin
> architecture suggested before.
> 
> > 
> > 1) Deployment complexity: The more complex the guest agent is, the more
> > often it will need to be updated (bug/security fixes, distro compatibility,
> > new features).  Rolling out guest agent updates does not scale well in large
> > environments (especially when the guest and host administrators are not the
> > same person).
> 
> Using plugins, you just deploy the ones you need, keeping the attack surface /
> #bugs / need to update lower

In order for any KVM guest agent to become ubiquitous, I think the code _must_ 
live
in the qemu repository.  This includes the base infrastructure and a core set of
plugins to provide the current set of qemu-ga APIs.  This way, both endpoints
(host/guest) can evolve together.  How easy would it be to extract this basic
infrastructure from the ovirt-guest-agent?  Is the qemu project opposed to a
Python agent?

> > For these reasons (and many others), I support having an agent with very
> > basic primitives that can be orchestrated by the host to provide needed
> > functionality.  This agent would present a low-level, stable, extensible API
> > that everyone can use.  Today qemu-ga supports the following verbs: sync
> > ping info shutdown file-open file-close file-read file-write file-seek
> > file-flush fsfreeze-status fsfreeze-freeze fsfreeze-thaw.  If we add a
> > generic execute mechanism, then the agent can provide everything needed by
> > oVirt to deploy SSO.
> > 
> > Let's assume that we have already agreed on some sort of security policy for
> > the write-file and exec primitives.  Consensus is possible on this issue but
> > I don't want to get bogged down with that here.
> > 
> > With the above primitives, SSO could be deployed automatically to a guest
> > with the following sequence of commands:
> > 
> > file-open "/sso-package.bin" "w" file-write   file-close
> >  file-open "/sso-package.bin" "x" file-exec  
> > file-close 
> 
> The guest can run on any number of hosts.  currently, the guest tools contain
> all the relevant logic installed (specifically for the guest os version).
> What you're suggesting here is that we keep all the relevant guest-agent
> variants code on the host, automatically detect the guest os version and
> inject the correct file (e.g. SSO on winXP and on win2k8 is totally
> different).  In addition, there might be things requiring boot for example. So
> to solve that we would instead need to install a set of tools on the guest
> like we do the guest agent today (it would be a separate package because it's
> management specific).  And then we would tell the guest-agent to run tools
> from that set?  Sounds overly complex to me.

We already have that packaging complexity today.  You must already maintain the
various Windows packages somewhere.  You'd just be pushing them from the host
instead.  Could you provide examples of the things required for boot?  If you
are talking virtio drivers, I think this is a separate problem.  I would argue
that vdsm should have a hardware "safe-mode" when the guest tools are not
installed.  This would be a set of hardware exposed that is known to work with
all guests.  Then, when the guest tools are installed, the hardware can be
"upgraded" since we will know the guest can support paravirt hw.

> > At this point, the package is installed.  It can contain whatever existing
> > logic exists in the ovirt-guest-agent today.  To perform a user login, we'll
> > assume that sso-package.bin contains an executable 'sso/do-user-sso':
> > 
> > file-open "/s

Re: [Qemu-devel] [PATCH V2 00/12] Proxy FS driver for VirtFS

2011-11-17 Thread Stefan Hajnoczi

On Tue, Nov 15, 2011 at 12:09 PM, M. Mohan Kumar  wrote:
> Changes from previous version:
>
> 1) Communication between qemu and helper process is similar to 9p way of
> packing
> elements (pdu marshaling).

There is code I haven't reviewed yet but I think it will change as you
add input validation, so I will wait for v3.

Stefan

Re: [Qemu-devel] [[PATCH V2] 0/5]

2011-11-17 Thread Avi Kivity

On 11/17/2011 03:22 PM, Benoît Canet wrote:
> These patches converts the remaining sh4 devices to the memory API.
> The patch "sh_intc: convert interrupt controller to memory API" is
> somewhat tricky
>
>

Thanks, applied.  Please adjust your editor to display tabs as 8
positions, not 4.  Your patches had wierd indentation due to that, which
I adjusted before applying.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 1/8] qcow2: Return real error code in qcow2_read_snapshots

2011-11-17 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/qcow2-snapshot.c |   25 -
 block/qcow2.c  |5 +++--
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index fb7f58c..db49bb3 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -72,6 +72,7 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 int i, id_str_size, name_size;
 int64_t offset;
 uint32_t extra_data_size;
+int ret;
 
 if (!s->nb_snapshots) {
 s->snapshots = NULL;
@@ -81,10 +82,15 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 
 offset = s->snapshots_offset;
 s->snapshots = g_malloc0(s->nb_snapshots * sizeof(QCowSnapshot));
+
 for(i = 0; i < s->nb_snapshots; i++) {
+/* Read statically sized part of the snapshot header */
 offset = align_offset(offset, 8);
-if (bdrv_pread(bs->file, offset, &h, sizeof(h)) != sizeof(h))
+ret = bdrv_pread(bs->file, offset, &h, sizeof(h));
+if (ret < 0) {
 goto fail;
+}
+
 offset += sizeof(h);
 sn = s->snapshots + i;
 sn->l1_table_offset = be64_to_cpu(h.l1_table_offset);
@@ -98,25 +104,34 @@ int qcow2_read_snapshots(BlockDriverState *bs)
 id_str_size = be16_to_cpu(h.id_str_size);
 name_size = be16_to_cpu(h.name_size);
 
+/* Skip extra data */
 offset += extra_data_size;
 
+/* Read snapshot ID */
 sn->id_str = g_malloc(id_str_size + 1);
-if (bdrv_pread(bs->file, offset, sn->id_str, id_str_size) != 
id_str_size)
+ret = bdrv_pread(bs->file, offset, sn->id_str, id_str_size);
+if (ret < 0) {
 goto fail;
+}
 offset += id_str_size;
 sn->id_str[id_str_size] = '\0';
 
+/* Read snapshot name */
 sn->name = g_malloc(name_size + 1);
-if (bdrv_pread(bs->file, offset, sn->name, name_size) != name_size)
+ret = bdrv_pread(bs->file, offset, sn->name, name_size);
+if (ret < 0) {
 goto fail;
+}
 offset += name_size;
 sn->name[name_size] = '\0';
 }
+
 s->snapshots_size = offset - s->snapshots_offset;
 return 0;
- fail:
+
+fail:
 qcow2_free_snapshots(bs);
-return -1;
+return ret;
 }
 
 /* add at the end of the file a new list of snapshots */
diff --git a/block/qcow2.c b/block/qcow2.c
index a56b011..27cbbeb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -272,8 +272,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
 }
 bs->backing_file[len] = '\0';
 }
-if (qcow2_read_snapshots(bs) < 0) {
-ret = -EINVAL;
+
+ret = qcow2_read_snapshots(bs);
+if (ret < 0) {
 goto fail;
 }
 
-- 
1.7.6.4

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Daniel P. Berrange

On Thu, Nov 17, 2011 at 09:58:33AM -0600, Adam Litke wrote:
> On Thu, Nov 17, 2011 at 03:46:37AM -0500, Ayal Baron wrote:
> > 
> > 
> > - Original Message -
> > > I have been following this thread pretty closely and the one sentence
> > > summary of the current argument is: ovirt-guest-agent is already 
> > > featureful
> > > and tested, so let's drop qemu-ga and have everyone adopt 
> > > ovirt-guest-agent.
> > 
> > What we're suggesting is let's drop *one* of the two agents (obviously it
> > would be easier for us to drop qemu-ga, but we'd rather reach consensus and
> > unite behind one agent regardless of which agent it is).
> > 
> > >  Unfortunately, this track strays completely away from the stated goal of
> > >  convergence.  I have at least two examples of why the greater KVM 
> > > community
> > >  can never adopt ovirt-guest-agent as-is.  To address this, I would like 
> > > to
> > >  counter with an example on how qemu-ga can enable the deployment of
> > >  ovirt-guest-agent features and satisfy the needs of the whole community 
> > > at
> > >  the same time.
> > > 
> > > 1) Scope:  The ovirt-guest-agent contains functionality that is incredibly
> > > useful within the context of oVirt.  Single Sign-on is very handy but KVM
> > > users outside the scope of oVirt will not want this extra complexity in
> > > their agent.  For simplicity they will probably just write something small
> > > that does what they need (and we have failed to provide a ubiquitous KVM
> > > agent).
> > 
> > I totally agree, but that could easily be resolved using the plugin
> > architecture suggested before.
> > 
> > > 
> > > 1) Deployment complexity: The more complex the guest agent is, the more
> > > often it will need to be updated (bug/security fixes, distro 
> > > compatibility,
> > > new features).  Rolling out guest agent updates does not scale well in 
> > > large
> > > environments (especially when the guest and host administrators are not 
> > > the
> > > same person).
> > 
> > Using plugins, you just deploy the ones you need, keeping the attack 
> > surface /
> > #bugs / need to update lower
> 
> In order for any KVM guest agent to become ubiquitous, I think the code 
> _must_ live
> in the qemu repository.  This includes the base infrastructure and a core set 
> of
> plugins to provide the current set of qemu-ga APIs.  This way, both endpoints
> (host/guest) can evolve together.  How easy would it be to extract this basic
> infrastructure from the ovirt-guest-agent?  Is the qemu project opposed to a
> Python agent?

IMHO Python would be a really bad choice for the agent. An agent wants to be
maximally portable to any guest OS, regardless of its vintage. The changes
between each python release, even within the 2.x stream, let alone between
2.x and 3.x would cause us endless compatibility problems upon deployment.
And while python is common on Linux, we don't really want to get into the
business of installing the python runtime on Windows or other OS, simply to
run an agent.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] wiki summary

2011-11-17 Thread Barak Azulay

On Thursday 17 November 2011 02:48:50 Michael Roth wrote:
> I've tried to summarize the pros/cons, points, and proposals outlined in
> this thread at the following wiki:
> 
> http://www.ovirt.org/wiki/Guest_agent_proposals
> 
> Please feel free to add/edit as needed. If you don't have an account on
> ovirt.org let me know.
>

Thanks Michael, it's a good start.


A few questions about the qemu-ga's requirements:

#1
  - same repo ? why is this a requirement ?
  - distributable via ISO  - can you elaborate? 
  - upgradeable via hypervisor push - by the title it sounds like it belongs 
to deployment, which sounds to me like it belongs to a higher management 
level

#3 a few questions come up when I read it: 
  - some may consider those primitives as a security breach
  - I understand the motivation of being able to do everything on the guest 
(exe) but we need to keep in mind it's various guest OSs, and it means 
that there should be a script for every OS type. to me the option of 
having a well defined interface is much more appealing  

Thanks
Barak
 
> Thanks!

Re: [Qemu-devel] converging around a single guest agent

2011-11-17 Thread Eric Gaulin

On Thu, Nov 17, 2011 at 11:14 AM, Daniel P. Berrange
 wrote:
> On Thu, Nov 17, 2011 at 09:58:33AM -0600, Adam Litke wrote:
>> On Thu, Nov 17, 2011 at 03:46:37AM -0500, Ayal Baron wrote:
>> >
>> >
>> > - Original Message -
>> > > I have been following this thread pretty closely and the one sentence
>> > > summary of the current argument is: ovirt-guest-agent is already 
>> > > featureful
>> > > and tested, so let's drop qemu-ga and have everyone adopt 
>> > > ovirt-guest-agent.
>> >
>> > What we're suggesting is let's drop *one* of the two agents (obviously it
>> > would be easier for us to drop qemu-ga, but we'd rather reach consensus and
>> > unite behind one agent regardless of which agent it is).
>> >
>> > >  Unfortunately, this track strays completely away from the stated goal of
>> > >  convergence.  I have at least two examples of why the greater KVM 
>> > > community
>> > >  can never adopt ovirt-guest-agent as-is.  To address this, I would like 
>> > > to
>> > >  counter with an example on how qemu-ga can enable the deployment of
>> > >  ovirt-guest-agent features and satisfy the needs of the whole community 
>> > > at
>> > >  the same time.
>> > >
>> > > 1) Scope:  The ovirt-guest-agent contains functionality that is 
>> > > incredibly
>> > > useful within the context of oVirt.  Single Sign-on is very handy but KVM
>> > > users outside the scope of oVirt will not want this extra complexity in
>> > > their agent.  For simplicity they will probably just write something 
>> > > small
>> > > that does what they need (and we have failed to provide a ubiquitous KVM
>> > > agent).
>> >
>> > I totally agree, but that could easily be resolved using the plugin
>> > architecture suggested before.
>> >
>> > >
>> > > 1) Deployment complexity: The more complex the guest agent is, the more
>> > > often it will need to be updated (bug/security fixes, distro 
>> > > compatibility,
>> > > new features).  Rolling out guest agent updates does not scale well in 
>> > > large
>> > > environments (especially when the guest and host administrators are not 
>> > > the
>> > > same person).
>> >
>> > Using plugins, you just deploy the ones you need, keeping the attack 
>> > surface /
>> > #bugs / need to update lower
>>
>> In order for any KVM guest agent to become ubiquitous, I think the code 
>> _must_ live
>> in the qemu repository.  This includes the base infrastructure and a core 
>> set of
>> plugins to provide the current set of qemu-ga APIs.  This way, both endpoints
>> (host/guest) can evolve together.  How easy would it be to extract this basic
>> infrastructure from the ovirt-guest-agent?  Is the qemu project opposed to a
>> Python agent?
>
> IMHO Python would be a really bad choice for the agent. An agent wants to be
> maximally portable to any guest OS, regardless of its vintage. The changes
> between each python release, even within the 2.x stream, let alone between
> 2.x and 3.x would cause us endless compatibility problems upon deployment.
> And while python is common on Linux, we don't really want to get into the
> business of installing the python runtime on Windows or other OS, simply to
> run an agent.
>
> Regards,
> Daniel
> --

I agree with Daniel,

A good example to get inspired from is the ZABBIX agent. A single C
source tree that can be compiled to many Unix and Windows binaries.

Eric Gaulin
___

1 2 >

1 - 100 of 119 matches

Mail list logo