date:20110919

On 09/19/2011 08:55 AM, Thomas Huth wrote:

>  Note that in addition to that, the PAPR spec specifies only one
>  "device" (whatever that means) per vscsi instance.

Really? In that case, I wonder why Linux is using the "Logical unit
addressing format" with target IDs and bus numbers instead of the
"Flat space addressing method" for vscsi ... according to
drivers/scsi/ibmvscsi/ibmvscsi.c :

static inline u16 lun_from_dev(struct scsi_device *dev)
{
return (0x2<<  14) | (dev->id<<  8) | (dev->channel<<  5) | dev->lun;
}

In case there's really only one device per vscsi instance, shouldn't
that code use addressing method 0x1 instead of 0x2 here?

As long as dev->id == 0, dev->channel == 0, dev->lun < 31, the three 
addressing methods are all equivalent.

Some comments in ibmvscsi.c say that iOS needs non-zero channels, so 
there does seem to be someone else who doesn't follow the PAPR spec too 
well. :)

Paolo

Re: [Qemu-devel] [PATCH 00/14] qdev: assign unique names to all devices (part 1)

On 2011-09-16 20:03, Anthony Liguori wrote:
> On 09/16/2011 12:11 PM, Kevin Wolf wrote:
>> Am 16.09.2011 18:54, schrieb Anthony Liguori:
>>> This series just asks the device model developer to come up with a unique 
>>> *when*
>>> they're doing device composition.  Even with a totally path based interface,
>>> this is always going to be a firm requirement.
>>>
>>> I think it may be possible to eliminate required device names by having a 
>>> formal
>>> notion of composition and have the devices store the names of the composed
>>> devices as part of the reference to that device.  You could then have user
>>> created devices use a separate hash table to track the names of those 
>>> devices.
>>>
>>> But, we can't easily do this today.  Having either a fully qualified name 
>>> or a
>>> composition name as part of qdev_create() is the Right Thing IMHO so I think
>>> this is the stepping stone to something more sophisticated.
>>
>> Actually, as I said, I think this naming scheme is already by far too
>> sophisticated. Jan is completely right, there is no point in duplicating
>> paths in a name. Either we assign names only if the user specified one,
>> or we auto-generate a really simple name that doesn't resemble a path,
>> can be easily typed and is actually useful to have in addition to paths
>> (my "#foo-1" suggestion).
> 
> Names have to be relatively stable.  Instantiation ordering should not affect 
> names nor should hotplug.  Otherwise two device models will end up with 
> devices 
> with different names.
> 
> Jan's point is that there is a stable path that could be used for the name 
> and 
> satisfy these purposes.  This is the composition path.  Either a device is 
> created by the user (and therefore has a stable name and sits on the '/' part 
> of 
> the path) or is created through composition and has a derived named from a 
> user 
> created device.  Since the composed device is tied to its parent's lifecycle, 
> the path is always valid.
> 
> So this is a simplification that I plan on running with.  For now, I think 
> this 
> series is the right next step because it gives us a path name for the name 
> (although different syntax) and let's us enforce that all devices has a 
> canonical path.

For something that changes lots of devices and, at the same time, is
going to be removed again, I'm hesitating to call it the right direction.

A right step would be, IMHO, to introduce a generic introspectable
device link so that parent devices can reference their children and a
visitor can derive a child's relative name from that link name. Then
make sure this link type is consistently used.

I really dislike this focusing on assigning names internally and using
them in QEMU-internal APIs. They should just fall out of the core when
external interaction is required.

> 
> Independent of that, Jan suggested that we could have what's essentially an 
> alias.  This is just a short name (could be in the form '%s-%d' % 
> (class.lower(), object_count).  This alias is just a hash table.  It has 
> nothing 
> to do with the core device model.

I can't remember suggesting such thing.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 04/14] qdev: take ownership of id pointer

2011-09-19 Thread Gerd Hoffmann


On 09/16/11 18:00, Anthony Liguori wrote:

qdev has this quirk that it owns a seemingly arbitrary QemuOpts pointer.  That's
because qdev expects a static string for the id (which really makes no sense
since ids are supposed to be provided by the user).  Instead of managing just
the id pointer, we currently take ownership of the entire QemuOpts structure
that was used to create the device just to keep the name around.


FYI: Keeping the pointer to the QemuOpts has one more reason:  It will 
free the QemuOpts on hot-unplug, which is needed to free the id from 
QemuOpts point of view, which in turn allows to re-use the id when 
hot-plugging the same (or another) device later on.


cheers,
  Gerd

Re: [Qemu-devel] [PATCH 00/14] qdev: assign unique names to all devices (part 1)

On 2011-09-16 20:21, Anthony Liguori wrote:
> On 09/16/2011 11:48 AM, Jan Kiszka wrote:
>> On 2011-09-16 18:00, Anthony Liguori wrote:
>>> This series introduces an infrastructure to remove anonymous devices from 
>>> qdev.
>>> Anonymous devices are one of the big gaps between qdev and QOM so removing 
>>> is
>>> a prerequisite to incrementally merging QOM.
>>>
>>> Besides the infrastructure, I also converted almost all of the possible PC
>>> devices to have unique names.  Please not that naming is not a property of
>>> devices but rather of the thing that creates the devices (usually machines).
>>>
>>> The names are ugly but this is because of the alternating device/bus 
>>> hierarchy
>>> in qdev.  For now, the names use '::' as deliminators but I think Jan has
>>> convinced me that down the road, we should use '/' as a deliminator such 
>>> that
>>> the resulting names are actually valid paths (using a canonical path 
>>> format).
>>
>> I still don't see why we need to store strings as device references.
>> Everyone that lacks a reference (QEMU-external users) can pass in a path
>> - which can be a device name in the simple case.
> 
> Thinking more about this.  I think a critical requirement is to be able to 
> ask a 
> device how to reference itself.  IOW, there needs to be a 
> device_get_name(dev) 
> that returns something that can be meaningfully used to later reference the 
> device.
> 
> With your no name stored in a device proposal, you would have something like 
> this:

I would not even store a name in the device unless it is user-assigned.
That can be created on demand if we did our homework correctly.

> 
> const char *device_get_name(Device *dev)
> {
> if (dev->parent) { // created through composition, ask parent
> return device_get_child_name(dev->parent, dev);
> } else { // user created, return user supplied name
> return dev->name;
> }
> }
> 
> device_get_child_name() ends up becoming complicated unless you maintain a 
> list 
> of children and their name mappings.  That means Device needs to store a hash 
> table even though those pointers are not the canonical references since the 
> composition devices are embedded in the parent Device.

As said in the other mail, the link name of the parent to its child is
the name we need to look up here. In turn that means a child likely
should know what link is referring to it in the composing parent. Can be
a feature of a generic device link so that we can also easily walk
graphs in both directions.

> 
> I think this leads to a lot of complexity without much real life gain.  I 
> think 
> having the parent generate and set the child's name during creation is a 
> significant simplification.

The parent creates the name, no question, but in a different way you
still think of: by naming its link. Let's focus on how to do
inter-device referencing. This is actually the more important topic IMO,
but if we do it right, naming just falls out of it as a byproduct.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH 05/14] qdev: remove opts pointer tracking

2011-09-19 Thread Gerd Hoffmann


On 09/16/11 18:00, Anthony Liguori wrote:

This was only used because id's memory was stored in opts.


No, see reply for patch #4.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 00/14] qdev: assign unique names to all devices (part 1)

Am 16.09.2011 20:03, schrieb Anthony Liguori:
> On 09/16/2011 12:11 PM, Kevin Wolf wrote:
>> Am 16.09.2011 18:54, schrieb Anthony Liguori:
>>> This series just asks the device model developer to come up with a unique 
>>> *when*
>>> they're doing device composition.  Even with a totally path based interface,
>>> this is always going to be a firm requirement.
>>>
>>> I think it may be possible to eliminate required device names by having a 
>>> formal
>>> notion of composition and have the devices store the names of the composed
>>> devices as part of the reference to that device.  You could then have user
>>> created devices use a separate hash table to track the names of those 
>>> devices.
>>>
>>> But, we can't easily do this today.  Having either a fully qualified name 
>>> or a
>>> composition name as part of qdev_create() is the Right Thing IMHO so I think
>>> this is the stepping stone to something more sophisticated.
>>
>> Actually, as I said, I think this naming scheme is already by far too
>> sophisticated. Jan is completely right, there is no point in duplicating
>> paths in a name. Either we assign names only if the user specified one,
>> or we auto-generate a really simple name that doesn't resemble a path,
>> can be easily typed and is actually useful to have in addition to paths
>> (my "#foo-1" suggestion).
> 
> Names have to be relatively stable.  Instantiation ordering should not affect 
> names nor should hotplug.  Otherwise two device models will end up with 
> devices 
> with different names.

But that's what you have paths for. If you introduce name that are meant
to fulfill the same requirements, the only thing you have achieved is
duplication.

This is why I think that _if_ we have names, they should address a
different requirement, like being easy to remember and type. That there
are user-assigned names inclines me to think that this is what they are
meant for. But if they do just the same thing as paths already do, there
is no reason to have them.

> Independent of that, Jan suggested that we could have what's essentially an 
> alias.  This is just a short name (could be in the form '%s-%d' % 
> (class.lower(), object_count).  This alias is just a hash table.  It has 
> nothing 
> to do with the core device model.
> 
> I think what you're really getting at is that you want to be able to do 
> something like 'ide0-hd0' to refer to a device via the command line or HMP.
> 
> I don't really disagree with that either.  But that's a layer on top of the 
> core 
> device model.  We shouldn't push every UI feature we want into the core 
> device 
> model.

Doing it outside the device model certainly works, too. I was just
trying to make your names somewhat useful. If it turns out that we have
no use for them in the device mode, let's use paths for everything and
abandon names.

Kevin

Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code

Am 17.09.2011 16:49, schrieb Paolo Bonzini:
> On 09/17/2011 08:29 AM, MORITA Kazutaka wrote:
  +#else
  +struct iovec *p = iov;
  +ret = 0;
  +while (iovlen>  0) {
  +int rc;
  +if (do_sendv) {
  +rc = send(sockfd, p->iov_base, p->iov_len, 0);
  +} else {
  +rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
  +}
  +if (rc == -1) {
  +if (errno == EINTR) {
  +continue;
  +}
  +if (ret == 0) {
  +ret = -1;
  +}
  +break;
  +}
  +iovlen--, p++;
  +ret += rc;
  +}
>> This code can be called inside coroutines with a non-blocking fd, so
>> should we avoid busy waiting?
> 
> It doesn't busy wait, it exits with EAGAIN.  I'll squash in here the 
> first hunk of patch 4, which is needed.
> 
> qemu_co_recvv already handles reads that return zero, unlike sheepdog's 
> do_readv_writev.  I probably moved it there inadvertently while moving 
> code around to cutils.c, but in order to fix qemu-ga I need to create a 
> new file qemu-coroutine-io.c.
> 
> Kevin, do you want me to resubmit everything, or are you going to apply 
> some more patches to the block branch (5 to 12 should be fine)?

As long as it's clear what the current version is, I don't mind. Do I
understand right that there will be a v3 for patches 3 and 4?

Kevin

Re: [Qemu-devel] [PATCH] isapc: give system address space when pci is disabled

On 2011-09-18 18:04, Hervé Poussineau wrote:
> 
> Signed-off-by: Hervé Poussineau 
> ---
>  hw/pc_piix.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index da6fa55..c0b8a3a 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -121,7 +121,7 @@ static void pc_init1(MemoryRegion *system_memory,
>  pc_memory_init(system_memory,
> kernel_filename, kernel_cmdline, initrd_filename,
> below_4g_mem_size, above_4g_mem_size,
> -   pci_memory, &ram_memory);
> +   pci_enabled ? pci_memory : system_memory, 
> &ram_memory);
>  }
>  
>  if (!xen_enabled()) {

Makes sense. I guess we should rename 'pci_memory' inside pc_memory_init
to reflect this - 'rom_memory'?

But -M isapc still only gives me a black screen. Are there further fixes
required?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] Add astyle option file for code formatting

2011-09-19 Thread Markus Armbruster

Anthony Liguori  writes:

[...]
> FWIW, this is what I use with emacs:
>
> (c-add-style "qemu"
> '("stroustrup"
> (indent-tabs-mode . nil)
> (c-basic-offset   . 4)
> (tab-width . 8)
> )
> nil) ; t = set this style, nil = don't

Here's my configuration (excuse liberal use of sledgehammer):

(setq c-initialization-hook
  (lambda()
(setcar (member '(other . "gnu") c-default-style)
(cons 'other "stroustrup"
(set-default 'c-recognize-knr-p nil)
(setq c-electric-pound-behavior '(alignleft))

(defun linux-c-mode ()
  "C mode with adjusted defaults for use with the Linux kernel."
  (interactive)
  (c-mode)
  (c-set-style "K&R")
  (setq tab-width 8)
  (setq indent-tabs-mode t)
  (setq c-basic-offset 8))

(setq auto-mode-alist
  (cons '("/linux.*\\.[ch]\\'" . linux-c-mode)
auto-mode-alist))

;; avoid tabs for some projects
(defun my-choose-tabs ()
  (let ((fname (buffer-file-name)))
(and fname (string-match "/qemu" fname)
 (setq indent-tabs-mode nil
(add-hook 'after-change-major-mode-hook 'my-choose-tabs)

Re: [Qemu-devel] [PATCH 50/58] pseries: Update SLOF firmware image


On 14.09.2011, at 14:59, Anthony Liguori wrote:

> On 09/14/2011 07:28 AM, Peter Maydell wrote:
>> On 14 September 2011 13:24, Alexander Graf  wrote:
>>> Am 14.09.2011 um 13:01 schrieb Peter Maydell:
 I confess to not really understanding how we keep the git
 submodules and the binary blobs in sync, but shouldn't there
 be a reference in the commit message to the git commit hash
 for the slof sources corresponding to this blob, and maybe
 also an update to roms/SLOF here? (cf commit d67c3f2c for
 example) ?
>>> 
>>> Oh? Since I have absolutely no idea on git submodules, it might
>>> be helpful to add some description on how to do a blob update
>>> into README?
>> 
>> Sounds like a good idea -- I think Anthony is the expert here.
> 
> You should be able to just checkout the desired version of the submodule (you 
> may need to refetch from git.qemu.org), then build the binary and copy the 
> results to pc-bios/.  Then in the top level, do a single commit that includes 
> the submodule commit change and the new binary blob.
> 
> For seabios, it would look something like:
> 

agraf@lychee:/home/agraf/release/qemu> git submodule init
Submodule 'roms/SLOF' (git://git.qemu.org/SLOF.git) registered for path 
'roms/SLOF'
Submodule 'roms/ipxe' (git://git.qemu.org/ipxe.git) registered for path 
'roms/ipxe'
Submodule 'roms/seabios' (git://git.qemu.org/seabios.git/) registered for path 
'roms/seabios'
Submodule 'roms/vgabios' (git://git.qemu.org/vgabios.git/) registered for path 
'roms/vgabios'
agraf@lychee:/home/agraf/release/qemu> cd roms/SLOF/
agraf@lychee:/home/agraf/release/qemu/roms/SLOF> l
total 8
drwxr-xr-x 2 agraf suse 4096 2011-04-01 18:34 ./
drwxr-xr-x 7 agraf suse 4096 2011-05-01 00:42 ../
agraf@lychee:/home/agraf/release/qemu/roms/SLOF> git checkout master
Switched to branch "master"

... which switches to "master" on my qemu git tree.


Alex

Re: [Qemu-devel] [PATCH] Add astyle option file for code formatting

2011-09-19 Thread Peter Maydell

On 19 September 2011 09:01, Markus Armbruster  wrote:
> Anthony Liguori  writes:
>> FWIW, this is what I use with emacs:
>
> Here's my configuration (excuse liberal use of sledgehammer):

Anybody seeking yet a third emacs configuration can try

https://wiki.linaro.org/PeterMaydell/QemuEmacsStyle

:-)

-- PMM

Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)

2011-09-19 Thread David Gilbert

On 18 September 2011 16:13, Stefan Weil  wrote:
> Am 18.09.2011 17:02, schrieb Mulyadi Santosa:
>>
>> Hi :)
>>
>> On Sun, Sep 18, 2011 at 02:59, Stefan Weil  wrote:
>>>
>>> Hello,
>>>
>>> these patches add a new code generator (TCG target) to qemu.
>>
>> I personally congrats you for your hard work. So, here's a question
>> from who are not so keen with Qemu internals: what is the biggest
>> advantage of using TCI instead of directly using TCG?
>
> TCG with native code support is much faster (6x to 10x),
> so for emulation on a supported host, TCI has no advantage
> for normal users.

Is it possible to dynamically switch between the two?

The two cases I'm thinking of are:
  1) Using the interpreter to execute one or two instructions in an exception
handling case
  2) Avoiding TCG code generation on the first few runs of a piece of
code that might only be init code, and only bothering with TCG for hotter
code.

Dave

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode


On 17.09.2011, at 23:40, Blue Swirl wrote:

> On Thu, Sep 15, 2011 at 11:31 AM, Avi Kivity  wrote:
>> On 09/15/2011 01:01 PM, Benjamin Herrenschmidt wrote:
>>> 
  Sure :). So the problem is that when emulating the G3 Beige machine in
  QEMU (default ppc32 target) we also add a PCI VGA adapter. Apparently,
  on x86 that PCI VGA adapter can map the special VGA regions to
  somewhere, namely 0xa. With the memory api overhaul, this also
  slipped into the PPC world where mapping 0xa with VGA adapters is
  a pretty bad idea, as it's occupied by RAM.
 
  Now the discussion was on which level that mapping would happen and
  which devices go through which buses which then would filter certain
  ranges from being mapped. Basically, which way does a memory request
  from the CPU go on a G3 Beige machine until it arrives the VGA
  adapter?
 
  I hope that concludes the actual question. Avi, if I explained this
  wrong, please correct me.
>>> 
>>> Ok so there's several things here.
>>> 
>>> First, the mapping from CPU addresses to PCI addresses. This depends on
>>> the host bridge chip. The MPC106, used in the Beige G3, itself supports
>>> different type of mappings.
>>> 
>>>  From memory, the way it's configured in a G3 is to have a 1:1 mapping of
>>> 8000 CPU to 8000 PCI.
>>> 
>>> That means that with this basic mapping, you cannot generate memory
>>> accesses to low PCI addresses such as 0xa.
>> 
>> Alex, what this means (I think is) that: pci_grackle_init() needs to create
>> a container memory region and pass it to pc_register_bus() as the pci
>> address space, and create and alias starting at 0x8000 of the pci
>> address space, and map that alias at address 0x8000 of the system
>> address space.
>> 
>> See pc_init1() creating pci_memory and passing it to i440fx_init(), which
>> then maps some aliases into the system address space and also gives it to
>> pci_bus_new().  It's essentially the same thing with different details.
> 
> I think the attached patch (on top of ppc-next) should do it, but it
> doesn't. Only the top area of the screen is shown, the rest is black.

Without your patch:

(qemu) info mtree 
memory
-fffe : system
  800a-800a : vga.chain4
  8088-808f : macio
0006-0007 : macio-nvram
0002-00020fff : pmac-ide
00016000-00017fff : cuda
00013000-0001303f : escc-bar
8000-8fff : dbdma
-0fff : heathrow-pic
  8080-8080 : vga.rom
  8000-807f : vga.vram
  800a-800b : vga-lowmem
  80013000-8001303f : escc
  fee0-fee00fff : pci-data-idx
  fec0-fec00fff : pci-conf-idx
  fe00-fe1f : isa-mmio


With your patch:

(qemu) info mtree 
memory
-fffe : system
  80013000-8001303f : escc
  fee0-fee00fff : pci-data-idx
  fec0-fec00fff : pci-conf-idx
  8000-fdff : pci-hole
  fe00-fe1f : isa-mmio



Since the VRAM is mapped to 0x8000 which is now occupied by the hole and 
nothing behind it, nothing gets to write there? Not sure - I still haven't 
understood how the memory api works.


Alex

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode


On 09/19/2011 12:15 PM, Alexander Graf wrote:

On 17.09.2011, at 23:40, Blue Swirl wrote:

>  On Thu, Sep 15, 2011 at 11:31 AM, Avi Kivity  wrote:
>>  On 09/15/2011 01:01 PM, Benjamin Herrenschmidt wrote:
>>>
   Sure :). So the problem is that when emulating the G3 Beige machine in
   QEMU (default ppc32 target) we also add a PCI VGA adapter. Apparently,
   on x86 that PCI VGA adapter can map the special VGA regions to
   somewhere, namely 0xa. With the memory api overhaul, this also
   slipped into the PPC world where mapping 0xa with VGA adapters is
   a pretty bad idea, as it's occupied by RAM.

   Now the discussion was on which level that mapping would happen and
   which devices go through which buses which then would filter certain
   ranges from being mapped. Basically, which way does a memory request
   from the CPU go on a G3 Beige machine until it arrives the VGA
   adapter?

   I hope that concludes the actual question. Avi, if I explained this
   wrong, please correct me.
>>>
>>>  Ok so there's several things here.
>>>
>>>  First, the mapping from CPU addresses to PCI addresses. This depends on
>>>  the host bridge chip. The MPC106, used in the Beige G3, itself supports
>>>  different type of mappings.
>>>
>>>From memory, the way it's configured in a G3 is to have a 1:1 mapping of
>>>  8000 CPU to 8000 PCI.
>>>
>>>  That means that with this basic mapping, you cannot generate memory
>>>  accesses to low PCI addresses such as 0xa.
>>
>>  Alex, what this means (I think is) that: pci_grackle_init() needs to create
>>  a container memory region and pass it to pc_register_bus() as the pci
>>  address space, and create and alias starting at 0x8000 of the pci
>>  address space, and map that alias at address 0x8000 of the system
>>  address space.
>>
>>  See pc_init1() creating pci_memory and passing it to i440fx_init(), which
>>  then maps some aliases into the system address space and also gives it to
>>  pci_bus_new().  It's essentially the same thing with different details.
>
>  I think the attached patch (on top of ppc-next) should do it, but it
>  doesn't. Only the top area of the screen is shown, the rest is black.

Without your patch:

(qemu) info mtree
memory
-fffe : system
   800a-800a : vga.chain4
This is here due to the isa_mem_base problem - I think we can get that 
part of the patch merged?



   8088-808f : macio
 0006-0007 : macio-nvram
 0002-00020fff : pmac-ide
 00016000-00017fff : cuda
 00013000-0001303f : escc-bar
 8000-8fff : dbdma
 -0fff : heathrow-pic
   8080-8080 : vga.rom
   8000-807f : vga.vram
   800a-800b : vga-lowmem
   80013000-8001303f : escc
   fee0-fee00fff : pci-data-idx
   fec0-fec00fff : pci-conf-idx
   fe00-fe1f : isa-mmio


With your patch:

(qemu) info mtree
memory
-fffe : system
   80013000-8001303f : escc
   fee0-fee00fff : pci-data-idx
   fec0-fec00fff : pci-conf-idx
   8000-fdff : pci-hole
   fe00-fe1f : isa-mmio



Since the VRAM is mapped to 0x8000 which is now occupied by the hole and 
nothing behind it, nothing gets to write there?


The memory printer doesn't follow aliases, so this is incomplete.


Not sure - I still haven't understood how the memory api works.



Thanks for reminding me to write up slides for tomorrow's code overview 
session.  I'll start with a general explanation.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Memory API code review

2011-09-19 Thread Peter Maydell

On 14 September 2011 16:07, Avi Kivity  wrote:
> I would like to carry out an online code review of the memory API so that
> more people are familiar with the internals, and perhaps even to catch some
> bugs or deficiency.  I'd like to use the next kvm conference call slot for
> this (Tuesday 1400 UTC) since many people already have it reserved in the
> schedule.
>
> It would be great if people from the wider qemu community be present, rather
> than the usual "x86 is everything" crowd (+Jan) that usually participates in
> the kvm weekly call.

I can dial in if that's useful -- are the dialin details available somewhere?

thanks
-- PMM

Re: [Qemu-devel] windows XP fail to enter standby mode

2011-09-19 Thread hkran


On 09/18/2011 03:53 PM, Alon Levy wrote:

On Fri, Sep 16, 2011 at 02:28:05PM +0800, hkran wrote:

On 09/15/2011 06:42 PM, Alon Levy wrote:

On Thu, Sep 15, 2011 at 05:23:01PM +0800, hkran wrote:

Hi,

If I select to let my guest XP enter standby mode by clicking the button
manually. qemu will exit with the following message left:

ioport_write: PANIC d->guest_slots[val].active failed

the following is my command to start qemu:

/home/huikai/qemu15/bin/qemu --enable-kvm  -m 768  -drive
file=/home/huikai/winxp_dev.img,if=virtio  -net nic,model=virtio -net
user -usb -usbdevice tablet -vga qxl  -localtime  -device virtio-serial
-chardev spicevmc,name=vdagent,id=vdagent -device
virtserialport,chardev=vdagent,name=spice0 -spice
port=1234,disable-ticketing -monitor telnet:localhost:12341,server,nowait

Qemu is 0.15v
gxl driver 
isqxl-0.10-20112808.zip
from http://spice-space.org/download.html

What bios are you using? Is there a reset event? can you paste the complete log 
before the PANIC?

Thanks,
Alon


BIOS:
(qemu) info roms
addr=fffe size=0x02 mem=rom name="bios.bin"

This just shows you have a bios, but where is that file from? i.e. is
it compiled by you, or from an rpm, if so which and what version?


the entire log:

[root@oc0100708617 ~]# /home/huikai/qemu15/bin/qemu  --enable-kvm
-m 768  -drive file=/home/huikai/winxp_dev.img,if=virtio  -net
nic,model=virtio -net user -usb -usbdevice tablet  -localtime -vga
qxl -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent
-device virtserialport,chardev=vdagent,name=spice0 -spice
port=1234,disable-ticketing   -monitor
telnet:localhost:12341,server,nowait -cdrom
/home/huikai/iso/GRMWDK_EN_7600_1.ISO
do_spice_init: starting 0.8.0
spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
spice_server_add_interface: SPICE_INTERFACE_MOUSE
spice_server_add_interface: SPICE_INTERFACE_QXL
red_worker_main: begin
handle_dev_destroy_surfaces:
handle_dev_destroy_surfaces:
handle_dev_input: start
spice_server_add_interface: SPICE_INTERFACE_TABLET
handle_dev_destroy_surfaces:
reds_handle_main_link:
reds_show_new_channel: channel 1:0, connected successfully, over Non
Secure link
handle_dev_input: mouse mode 2
reds_main_handle_message: net test: latency 1.043000 ms, bitrate
3413 bps (325520.83 Mbps)
reds_show_new_channel: channel 4:0, connected successfully, over Non
Secure link
red_dispatcher_set_cursor_peer:
handle_dev_input: cursor connect
reds_show_new_channel: channel 2:0, connected successfully, over Non
Secure link
red_dispatcher_set_peer:
handle_dev_input: connect
handle_new_display_channel: jpeg disabled
handle_new_display_channel: zlib-over-glz disabled
reds_show_new_channel: channel 3:0, connected successfully, over Non
Secure link
inputs_link:
handle_dev_destroy_surfaces:
handle_dev_destroy_surfaces:
ioport_write: PANIC d->guest_slots[val].active failed

As for reset event, Alon, Do you know how to tell a reset event
posted in qemu?
In addition, if I use the default -vga std, qemu will not exit when
VM entering standby.It seems that there is something wrong about qxl
driver.

You can enable bios debug, that should show it. Also, you can do a reset (via
guest for instance) and see the qemu messages you get from it, and then compare.

To enable bios debugging add the following to the command line for upstream 
qemu:
  -chardev stdio,id=muxstdio,mux=on -mon chardev=muxstdio,mode=readline
  -device isa-debugcon,iobase=0x402,chardev=muxstdio

This assumes you don't mind/want to have the monitor on stdio, if you use a qmp 
or hmp
on something else the stdio (unix domain socket / tcp) then you can just forgoe
the whole mux part.

For rhel6 that device is not backported, but you can still get bios debugging 
information
by uncommenting theline "//#define DEBUG_BIOS" in hw/pc.c

Also you can just enable debugging for the qxl device, for the primary (-vga 
qxl) you need to add:
-global qxl-vga.debug=1

For instance, do system_reset from monitor after running:
qemu -vga qxl -monitor stdio -global qxl-vga.debug=1,

I don't see a reset happening. You should see something like:

qxl-0: qxl_hard_reset: start
qxl-0: qxl_reset_surfaces:
qxl-0: qxl_reset_memslots:
qxl-0: qxl_soft_reset:
qxl-0: qxl_enter_vga_mode
qxl-0: qxl_hard_reset: done

(It repeats a few times - system_reset actually calls reset more then once on 
each device).


___
kvm-cstl mailing list
To unsubscribe from the list, change your list options
or if you have forgotten your list password visit:
http://lists.linux.ibm.com/mailman/listinfo/kvm-cstl




The bios I am using is shipped with Qemu.
I got qemu from http://wiki.qemu.org/download/qemu-0.15.0.tar.gz
and configure and install it.
After I enable bios-debug, I saw this:

Start bios (version pre-0.6.3-20110315_112143-titi)

Is that the BIOS version we wanted?

The following is the l

Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code


On 09/19/2011 09:47 AM, Kevin Wolf wrote:

As long as it's clear what the current version is, I don't mind. Do I
understand right that there will be a v3 for patches 3 and 4?


Yes.

Paolo

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode


On 19.09.2011, at 11:22, Avi Kivity wrote:

> On 09/19/2011 12:15 PM, Alexander Graf wrote:
>> On 17.09.2011, at 23:40, Blue Swirl wrote:
>> 
>> >  On Thu, Sep 15, 2011 at 11:31 AM, Avi Kivity  wrote:
>> >>  On 09/15/2011 01:01 PM, Benjamin Herrenschmidt wrote:
>> >>>
>>    Sure :). So the problem is that when emulating the G3 Beige machine in
>>    QEMU (default ppc32 target) we also add a PCI VGA adapter. Apparently,
>>    on x86 that PCI VGA adapter can map the special VGA regions to
>>    somewhere, namely 0xa. With the memory api overhaul, this also
>>    slipped into the PPC world where mapping 0xa with VGA adapters is
>>    a pretty bad idea, as it's occupied by RAM.
>> 
>>    Now the discussion was on which level that mapping would happen and
>>    which devices go through which buses which then would filter certain
>>    ranges from being mapped. Basically, which way does a memory request
>>    from the CPU go on a G3 Beige machine until it arrives the VGA
>>    adapter?
>> 
>>    I hope that concludes the actual question. Avi, if I explained this
>>    wrong, please correct me.
>> >>>
>> >>>  Ok so there's several things here.
>> >>>
>> >>>  First, the mapping from CPU addresses to PCI addresses. This depends on
>> >>>  the host bridge chip. The MPC106, used in the Beige G3, itself supports
>> >>>  different type of mappings.
>> >>>
>> >>>From memory, the way it's configured in a G3 is to have a 1:1 mapping 
>> >>> of
>> >>>  8000 CPU to 8000 PCI.
>> >>>
>> >>>  That means that with this basic mapping, you cannot generate memory
>> >>>  accesses to low PCI addresses such as 0xa.
>> >>
>> >>  Alex, what this means (I think is) that: pci_grackle_init() needs to 
>> >> create
>> >>  a container memory region and pass it to pc_register_bus() as the pci
>> >>  address space, and create and alias starting at 0x8000 of the pci
>> >>  address space, and map that alias at address 0x8000 of the system
>> >>  address space.
>> >>
>> >>  See pc_init1() creating pci_memory and passing it to i440fx_init(), which
>> >>  then maps some aliases into the system address space and also gives it to
>> >>  pci_bus_new().  It's essentially the same thing with different details.
>> >
>> >  I think the attached patch (on top of ppc-next) should do it, but it
>> >  doesn't. Only the top area of the screen is shown, the rest is black.
>> 
>> Without your patch:
>> 
>> (qemu) info mtree
>> memory
>> -fffe : system
>>   800a-800a : vga.chain4
> This is here due to the isa_mem_base problem - I think we can get that part 
> of the patch merged?
> 
>>   8088-808f : macio
>> 0006-0007 : macio-nvram
>> 0002-00020fff : pmac-ide
>> 00016000-00017fff : cuda
>> 00013000-0001303f : escc-bar
>> 8000-8fff : dbdma
>> -0fff : heathrow-pic
>>   8080-8080 : vga.rom
>>   8000-807f : vga.vram
>>   800a-800b : vga-lowmem

Well what I don't understand is how this corresponds to the lowmem region here. 
If I simply #if 0 the chain4 code out, everything works as expected.

>>   80013000-8001303f : escc
>>   fee0-fee00fff : pci-data-idx
>>   fec0-fec00fff : pci-conf-idx
>>   fe00-fe1f : isa-mmio
>> 
>> 
>> With your patch:
>> 
>> (qemu) info mtree
>> memory
>> -fffe : system
>>   80013000-8001303f : escc
>>   fee0-fee00fff : pci-data-idx
>>   fec0-fec00fff : pci-conf-idx
>>   8000-fdff : pci-hole
>>   fe00-fe1f : isa-mmio
>> 
>> 
>> 
>> Since the VRAM is mapped to 0x8000 which is now occupied by the hole and 
>> nothing behind it, nothing gets to write there?
> 
> The memory printer doesn't follow aliases, so this is incomplete.

Ugh :(.

> 
>> Not sure - I still haven't understood how the memory api works.
>> 
> 
> Thanks for reminding me to write up slides for tomorrow's code overview 
> session.  I'll start with a general explanation.

What code overview session?


Alex

[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

2011-09-19 Thread Justin Shafer

I would like to run wine. I have compiled qemu 0.15.5 from git and I
noticed we can enable nptl without a patch. I can run an x86 nptl bash
but I havent tried to chroot. I have the posix wine running with qemu...
So i tried it with wine and I am able to get a --version out of it, but
when I actually run it, it does a fork. It will ask for library files
and I will stick them in /usr/gnemul/qemu-i386/usr/lib one at a time and
everntually it stops asking for files and just errors out.

Tried 0.9.22, 1.01 and 1.21... Stole files from Ubuntu 5, Ubuntu 9,
etc.

root@localhost:/wine/usr/lib# wine-pthread
wine-pthread
qemu: Unsupported syscall: 240
Usage: wine PROGRAM [ARGUMENTS...] Run the specified program
wine --help Display this help and exit
wine --versionOutput version information and exit

root@localhost:/wine/usr/lib# wine-pthread notepad.exe
wine-pthread notepad.exe
qemu: Unsupported syscall: 240
qemu: Unsupported syscall: 240
wine: fork : Invalid argument

On Posix it does an unsuportive syscall of 254 but works. I have seen it
do a 240 though, but thats when its asking for library files.. I cant
help but wonder if its needing a file, though not asking me.

--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/739785

Title:
qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

Status in QEMU:
New

Bug description:
Good time of day everybody,

I have been trying to make usermode qemu on ARM with plugapps
(archlinux) with archlinux i386 chroot to work.

1. I installed arch linux in a virtuabox and created a chroot for it with
mkarchroot. Transferred it to my pogo plug into /i386/
2. I comiled qemu-i386 static and put it into /i386/usr/bin/
./configure --static --disable-blobs --disable-system
--target-list=i386-linux-user
make

3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and
installed it.
uname -a
Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

4. Added the following options into /etc/rc.local
/sbin/modprobe binfmt_misc
/bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
echo
':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:'
>/proc/sys/fs/binfmt_misc/register

5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld-
linux.so.3 is a link to that file) from /lib/ to /i386/lib/

6.Now i chroot into /i386 and I get this:
[root@Plugbox i386]# chroot .
[II aI hnve ao n@P /]# pacman -Suy
bash: fork: Invalid argument

7.I also downloaded linux-user-test-0.3 from qemu website and ran the test:
[root@Plugbox linux-user-test-0.3]# make
./qemu-linux-user.sh
[qemu-i386]
../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l
dummyfile
BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions:
Assertion `needed != ((void *)0)' failed!
make: *** [test] Error 127

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions

Re: [Qemu-devel] [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap

2011-09-19 Thread Jason Wang


On 09/18/2011 03:17 AM, Michael S. Tsirkin wrote:

On Sat, Sep 17, 2011 at 02:02:04PM +0800, Jason Wang wrote:

A wiki-page was created to narrate the detail design of all parts
involved in the multi queue implementation:
http://www.linux-kvm.org/page/Multiqueue and some basic tests result
could be seen in this page
http://www.linux-kvm.org/page/Multiqueue-performance-Sep-13. I would
post the detail numbers in attachment as the reply of this thread.

Does it make sense to test both with and without RPS in guest?


I've tested with RPS in guest, but didn't see improvements.

Re: [Qemu-devel] [PATCH v3 5/6] vga: Use linear mapping + dirty logging in chain 4 memory access mode

On 09/19/2011 12:36 PM, Alexander Graf wrote:

>
>>8088-808f : macio
>>  0006-0007 : macio-nvram
>>  0002-00020fff : pmac-ide
>>  00016000-00017fff : cuda
>>  00013000-0001303f : escc-bar
>>  8000-8fff : dbdma
>>  -0fff : heathrow-pic
>>8080-8080 : vga.rom
>>8000-807f : vga.vram
>>800a-800b : vga-lowmem

Well what I don't understand is how this corresponds to the lowmem region here. 
If I simply #if 0 the chain4 code out, everything works as expected.

It's at the wrong address.  It should be at 000a, but not be visible 
since it's outside the pci hole.  IIUC.

>
>>  Not sure - I still haven't understood how the memory api works.
>>
>
>  Thanks for reminding me to write up slides for tomorrow's code overview 
session.  I'll start with a general explanation.

What code overview session?

At the usual kvm weekly time & place, I'll do a walkthrough of the 
memory API theory and practice.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-09-19 Thread Zhi Yong Wu

On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti  wrote:
> On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote:
>> On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti  wrote:
>> > On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote:
>> >> Note:
>> >>      1.) When bps/iops limits are specified to a small value such as 511 
>> >> bytes/s, this VM will hang up. We are considering how to handle this 
>> >> senario.
>> >
>> > You can increase the length of the slice, if the request is larger than
>> > slice_time * bps_limit.
>> Yeah, but it is a challenge for how to increase it. Do you have some nice 
>> idea?
>
> If the queue is empty, and the request being processed does not fit the
> queue, increase the slice so that the request fits.
Sorry for late reply. actually, do you think that this scenario is
meaningful for the user?
Since we implement this, if the user limits the bps below 512
bytes/second, the VM can also not run every task.
Can you let us know why we need to make such effort?

>
> That is, make BLOCK_IO_SLICE_TIME dynamic and adjust it as described
> above (if the bps or io limits change, reset it to the default
> BLOCK_IO_SLICE_TIME).
>
>> >>      2.) When "dd" command is issued in guest, if its option bs is set to 
>> >> a large value such as "bs=1024K", the result speed will slightly bigger 
>> >> than the limits.
>> >
>> > Why?
>> This issue has not existed. I will remove it.
>> When drive bps=100, i did some testings on guest VM.
>> 1.) bs=1024K
>> 18+0 records in
>> 18+0 records out
>> 18874368 bytes (19 MB) copied, 26.6268 s, 709 kB/s
>> 2.) bs=2048K
>> 18+0 records in
>> 18+0 records out
>> 37748736 bytes (38 MB) copied, 46.5336 s, 811 kB/s
>>
>> >
>> > There is lots of debugging leftovers in the patch.
>> sorry, i forgot to remove them.
>> >
>> >
>
>



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH v2 0/5] block: preparatory patches for scatter/gather support

Am 16.09.2011 16:39, schrieb Paolo Bonzini:
> These patches are preparatory work for supporting scatter/gather in
> the SCSI subsystem.  Since there would be no HBA actually using it,
> I am just posting the cleanups, and the fix for CVE-2011-3346 (buffer
> overflow in the handling of READ CAPACITY 16) that comes for free
> with the last patch.
> 
> v1->v2: made to_dev a bool; fixes in patch 3
> 
> Paolo Bonzini (5):
>   dma-helpers: rename is_write to to_dev
>   dma-helpers: allow including from target-independent code
>   dma-helpers: rewrite completion/cancellation
>   scsi-disk: commonize iovec creation between reads and writes
>   scsi-disk: lazily allocate bounce buffer

Thanks, applied all to the block branch.

Kevin

Re: [Qemu-devel] [PATCH] VMDK: fix leak of extent_file

Am 19.09.2011 04:26, schrieb Fam Zheng:
> Release extent_file on error in vmdk_parse_extents. Added closing files
> in freeing extents.
> 
> Signed-off-by: Fam Zheng 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)

On Mon, Sep 19, 2011 at 9:40 AM, David Gilbert  wrote:
> On 18 September 2011 16:13, Stefan Weil  wrote:
>> Am 18.09.2011 17:02, schrieb Mulyadi Santosa:
>>>
>>> Hi :)
>>>
>>> On Sun, Sep 18, 2011 at 02:59, Stefan Weil  wrote:

 Hello,

 these patches add a new code generator (TCG target) to qemu.
>>>
>>> I personally congrats you for your hard work. So, here's a question
>>> from who are not so keen with Qemu internals: what is the biggest
>>> advantage of using TCI instead of directly using TCG?
>>
>> TCG with native code support is much faster (6x to 10x),
>> so for emulation on a supported host, TCI has no advantage
>> for normal users.
>
> Is it possible to dynamically switch between the two?
>
> The two cases I'm thinking of are:
>  1) Using the interpreter to execute one or two instructions in an exception
> handling case
>  2) Avoiding TCG code generation on the first few runs of a piece of
> code that might only be init code, and only bothering with TCG for hotter
> code.

The tricky thing with using the interpeter for lesser run code is that
it has a bunch of machinery in front of it which probably makes it
relatively similar to actually emitting native code.  The interesting
benchmark would be to translate blocks but never cache them for future
executions - compare this with TCI to see how much difference there is
between executing with interpretation vs translation.  If the
interpreter is almost as expensive as the translator then it's not
worth it.

Stefan

Re: [Qemu-devel] [PATCH] removed usused offset variable

Am 16.09.2011 13:34, schrieb Frediano Ziglio:
> Signed-off-by: Frediano Ziglio 
> ---
>  posix-aio-compat.c |5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)

2011-09-19 Thread David Gilbert

On 19 September 2011 11:20, Stefan Hajnoczi  wrote:
> On Mon, Sep 19, 2011 at 9:40 AM, David Gilbert  
> wrote:



>> Is it possible to dynamically switch between the two?
>>
>> The two cases I'm thinking of are:
>>  1) Using the interpreter to execute one or two instructions in an exception
>> handling case
>>  2) Avoiding TCG code generation on the first few runs of a piece of
>> code that might only be init code, and only bothering with TCG for hotter
>> code.
>
> The tricky thing with using the interpeter for lesser run code is that
> it has a bunch of machinery in front of it which probably makes it
> relatively similar to actually emitting native code.  The interesting
> benchmark would be to translate blocks but never cache them for future
> executions - compare this with TCI to see how much difference there is
> between executing with interpretation vs translation.  If the
> interpreter is almost as expensive as the translator then it's not
> worth it.

Right; the trick is if you have a passably fast interpreter you can afford
to do some more expensive optimisations in the code generator which would
be interesting.  It's not unusual to find an awful lot of executed once-or-twice
code.

Dave

[Qemu-devel] Question on block chaining

2011-09-19 Thread 陳韋任

Hi, all

  I am tracing how block linking is done in QEMU. I find
there is a comment in struct TranslationBlock says,

/* list of TBs jumping to this one. This is a circular list using
   the two least significant bits of the pointers to tell what is
   the next pointer: 0 = jmp_next[0], 1 = jmp_next[1], 2 =
   jmp_first */
struct TranslationBlock *jmp_next[2];
struct TranslationBlock *jmp_first;

But after tracing the code, I think the comment might be
wrong. For example, if we want to link tb1 to tb2, i.e.,
tb1 -> tb2. Then roughly speaking, tb1->jmp_next[n] should
be tb2, and tb2->jmp_first should be tb1. So the comment
"list of TBs jumping to this one" looks weird to me.
Do I misunderstand how the block chaining is done?

  Thanks!

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667

Re: [Qemu-devel] AHCI Port Interrupt Enable register cleaning on soft reset

Am 12.09.2011 10:19, schrieb Alexander Motin:
> Alexander Graf wrote:
>> Am 11.09.2011 um 16:43 schrieb Alexander Motin :
>>> I've found that FreeBSD AHCI driver doesn't work with AHCI hardware
>>> emulation of QEMU 0.15.0. I believe the problem is on QEMU's side. As I
>>> see, it clears port's Interrupt Enable register each time when reset of
>>> any level happens. Is is reasonable for the global controller reset. It
>>> is probably not good, but acceptable for FreeBSD driver for the port
>>> hard reset. But it is IMO wrong for the device soft reset. None of real
>>> hardware I know behaves that way.
>>>
>>> This patch fixes the problem for me:
>>> http://people.freebsd.org/~mav/qemu.ahci.patch
>>
>> Ah, cool! So FreeBSD works with AHCI using this patch? 
> 
> Yes. I haven't done deep testing to guarantee there is no other issues,
> but at least disk is properly detected now.
> 
>> Please send it again as an inline patch (if really really hard not 100% 
>> important) and add a signed-off-by line (very important) to the patch.
> 
> OK. Here it is:
> 
> Signed-off-by: Alexander Motin 
> 
> --- hw/ide/ahci.c.prev2011-09-11 16:39:53.0 +0300
> +++ hw/ide/ahci.c 2011-09-11 16:39:48.0 +0300
> @@ -505,10 +505,7 @@ static void ahci_reset_port(AHCIState *s
>  ide_bus_reset(&d->port);
>  ide_state->ncq_queues = AHCI_MAX_CMDS;
> 
> -pr->irq_stat = 0;
> -pr->irq_mask = 0;
>  pr->scr_stat = 0;
> -pr->scr_ctl = 0;
>  pr->scr_err = 0;
>  pr->scr_act = 0;
>  d->busy_slot = -1;
> @@ -1157,12 +1154,17 @@ void ahci_uninit(AHCIState *s)
>  void ahci_reset(void *opaque)
>  {
>  struct AHCIPCIState *d = opaque;
> +AHCIPortRegs *pr;
>  int i;
> 
>  d->ahci.control_regs.irqstatus = 0;
>  d->ahci.control_regs.ghc = 0;
> 
>  for (i = 0; i < d->ahci.ports; i++) {
> +pr = &d->ahci.dev[i].port_regs;
> +pr->irq_stat = 0;
> +pr->irq_mask = 0;
> +pr->scr_ctl = 0;
>  ahci_reset_port(&d->ahci, i);
>  }
>  }
> 

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 1/1] qemu-img: async write to block device when converting image

Am 12.09.2011 16:26, schrieb Yehuda Sadeh:
> In order to improve image conversion process, instead of synchronously
> writing the destingation image, we keep a window of async writes.
> 
> Signed-off-by: Yehuda Sadeh 

Hm, now that I'm really looking at the code, it seems that I
misunderstood why you're doing here. With this patch applied, qemu-img
will synchronously read in its buffer from the source file and then
write it out in multiple parallel AIO requests. The next buffer is read
in only when all AIO requests have completed.

What I thought it was is that the whole thing is async, including
reading new buffers from the source file while other AIO requests are
still writing. I think that would be much more useful.

> ---
>  qemu-img.c |   47 +++
>  1 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index 6a39731..a45f5f2 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -646,6 +646,29 @@ static int compare_sectors(const uint8_t *buf1, const 
> uint8_t *buf2, int n,
>  }
>  
>  #define IO_BUF_SIZE (2 * 1024 * 1024)
> +#define IO_WRITE_WINDOW_THRESHOLD (32 * 1024 * 1024)
> +
> +static int write_window = 0;
> +static int write_ret = 0;
> +
> +struct write_info {
> +int64_t sector;
> +QEMUIOVector qiov;
> +};
> +
> +static void img_write_cb(void *opaque, int ret)
> +{
> +struct write_info *wr = (struct write_info *)opaque;
> +QEMUIOVector *qiov = &wr->qiov;
> +if (ret < 0) {
> +error_report("error while writing sector %" PRId64
> + ": %s", wr->sector, strerror(-ret));
> +write_ret = ret;
> +}
> +write_window -=  qiov->iov->iov_len / 512;

I would rather use bytes as unit for write window (especially as
IO_WRITE_WINDOW_THRESHOLD is in bytes).

> +qemu_iovec_destroy(qiov);
> +g_free(wr);
> +}
>  
>  static int img_convert(int argc, char **argv)
>  {
> @@ -1019,6 +1042,9 @@ static int img_convert(int argc, char **argv)
> should add a specific call to have the info to go faster */
>  buf1 = buf;
>  while (n > 0) {
> +while (write_window > IO_WRITE_WINDOW_THRESHOLD / 512) {
> +qemu_aio_wait();
> +}
>  /* If the output image is being created as a copy on write 
> image,
> copy all sectors even the ones containing only NUL bytes,
> because they may differ from the sectors in the base 
> image.
> @@ -1028,12 +1054,21 @@ static int img_convert(int argc, char **argv)
> already there is garbage, not 0s. */
>  if (!has_zero_init || out_baseimg ||
>  is_allocated_sectors_min(buf1, n, &n1, min_sparse)) {
> -ret = bdrv_write(out_bs, sector_num, buf1, n1);
> -if (ret < 0) {
> -error_report("error while writing sector %" PRId64
> - ": %s", sector_num, strerror(-ret));
> +QEMUIOVector *qiov;
> +struct write_info *wr;
> +BlockDriverAIOCB *acb;
> +wr = g_malloc0(sizeof(struct write_info));
> +qiov = &wr->qiov;
> +qemu_iovec_init(qiov, 1);
> +qemu_iovec_add(qiov, (void *)buf1, n1 * 512);

Casts to void* are unnecessary.

You could use qemu_iovec_init_external in order to avoid the malloc here
(cf. bdrv_read)...

> +wr->sector = sector_num;
> +acb = bdrv_aio_writev(out_bs, sector_num, qiov, n1, 
> img_write_cb, wr);
> +if (!acb) {

...then you wouldn't have the chance to forget qemu_iodev_destroy()
here. :-)

> +g_free(wr);
> +error_report("I/O error while writing sector %" 
> PRId64, sector_num);
>  goto out;
>  }
> +write_window += n1;
>  }
>  sector_num += n1;
>  n -= n1;
> @@ -1041,6 +1076,9 @@ static int img_convert(int argc, char **argv)
>  }
>  qemu_progress_print(local_progress, 100);
>  }
> +while (write_window > 0) {
> +qemu_aio_wait();
> +}
>  }
>  out:
>  qemu_progress_end();
> @@ -1048,6 +1086,7 @@ out:
>  free_option_parameters(param);
>  qemu_vfree(buf);
>  if (out_bs) {
> +bdrv_flush(out_bs);
>  bdrv_delete(out_bs);
>  }
>  if (bs) {

The bdrv_flush() looks unrelated to introducing AIO.

Kevin

Re: [Qemu-devel] [PATCH 14/58] device tree: add nop_node


On 17.09.2011, at 18:48, Blue Swirl wrote:

> On Wed, Sep 14, 2011 at 8:42 AM, Alexander Graf  wrote:
>> We have a qemu internal abstraction layer on FDT. While I'm not fully 
>> convinced
>> we need it at all, it's missing the nop_node functionality that we now need
>> on e500. So let's add it and think about the general future of that API 
>> later.
>> 
>> Signed-off-by: Alexander Graf 
>> ---
>>  device_tree.c |   11 +++
>>  device_tree.h |1 +
>>  2 files changed, 12 insertions(+), 0 deletions(-)
>> 
>> diff --git a/device_tree.c b/device_tree.c
>> index 3a224d1..23e89e3 100644
>> --- a/device_tree.c
>> +++ b/device_tree.c
>> @@ -107,3 +107,14 @@ int qemu_devtree_setprop_string(void *fdt, const char 
>> *node_path,
>> 
>> return fdt_setprop_string(fdt, offset, property, string);
>>  }
>> +
>> +int qemu_devtree_nop_node(void *fdt, const char *node_path)
>> +{
>> +int offset;
>> +
>> +offset = fdt_path_offset(fdt, node_path);
>> +if (offset < 0)
> 
> -EBRACES

That code gets changed right after that patch anyways. The version that 
actually ends up being visible to the user has proper braces :)


Alex

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code

On 17.09.2011, at 19:40, Blue Swirl wrote:

> On Sat, Sep 17, 2011 at 5:15 PM, Alexander Graf  wrote:
>> 
>> Am 17.09.2011 um 18:58 schrieb Blue Swirl :
>> 
>>> On Wed, Sep 14, 2011 at 8:42 AM, Alexander Graf  wrote:
 CPUs that are not the boot CPU need to run in spinning code to check if 
 they
 should run off to execute and if so where to jump to. This usually happens
 by leaving secondary CPUs looping and checking if some variable in memory
 changed.

 In an environment like Qemu however we can be more clever. We can just 
 export
 the spin table the primary CPU modifies as MMIO region that would event 
 based
 wake up the respective secondary CPUs. That saves us quite some cycles 
 while
 the secondary CPUs are not up yet.

 So this patch adds a PV device that simply exports the spinning table into 
 the
 guest and thus allows the primary CPU to wake up secondary ones.
>>> 
>>> On Sparc32, there is no need for a PV device. The CPU is woken up from
>>> halted state with an IPI. Maybe you could use this approach?
>> 
>> The way it's done here is defined by u-boot and now also nailed down in the 
>> ePAPR architecture spec. While alternatives might be more appealing, this is 
>> how guests work today :).
> 
> OK. I hoped that there were no implementations yet. The header (btw
> missing) should point to the spec.

IIUC the spec that includes these bits is not finalized yet. It is however in 
use on all u-boot versions for e500 that I'm aware of and the method Linux uses 
to bring up secondary CPUs.

Stuart / Scott, do you have any pointers to documentation where the spinning is 
explained?

Alex

[Qemu-devel] [PATCH 0/2][RFC?] Remove SIGUSR2 from posix-aio

Now that iothread is always compiled sending a signal seems only an
additional step. This patch also avoid writing to two pipe (one from signal
and one in qemu_service_io).

Tested and works correctly with KVM enabled. Performances are only sligthly
better (as I expected). strace output is more readable.

Doubts:
- any sense having two patches and not only last one?
- comment is perhaps wrong as is also affect main core
- is ok if KVM is disabled? more testing required

Frediano Ziglio (2):
  block: avoid storing a constant in qemu_paiocb structure
  block: avoid SIGUSR2

 cpus.c |5 -
 posix-aio-compat.c |   17 -
 2 files changed, 4 insertions(+), 18 deletions(-)

[Qemu-devel] [PATCH 1/2] block: avoid storing a constant in qemu_paiocb structure


Signed-off-by: Frediano Ziglio 
---
 posix-aio-compat.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 3193dbf..7ea63a1 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -42,7 +42,6 @@ struct qemu_paiocb {
 int aio_niov;
 size_t aio_nbytes;
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
-int ev_signo;
 off_t aio_offset;
 
 QTAILQ_ENTRY(qemu_paiocb) node;
@@ -381,7 +380,7 @@ static void *aio_thread(void *unused)
 aiocb->ret = ret;
 mutex_unlock(&lock);
 
-if (kill(pid, aiocb->ev_signo)) die("kill failed");
+if (kill(pid, SIGUSR2)) die("kill failed");
 }
 
 cur_threads--;
@@ -623,7 +622,6 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 return NULL;
 acb->aio_type = type;
 acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;
 
 if (qiov) {
 acb->aio_iov = qiov->iov;
@@ -651,7 +649,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 return NULL;
 acb->aio_type = QEMU_AIO_IOCTL;
 acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;
 acb->aio_offset = 0;
 acb->aio_ioctl_buf = buf;
 acb->aio_ioctl_cmd = req;
-- 
1.7.1

[Qemu-devel] [PATCH 2/2] block: avoid SIGUSR2


Signed-off-by: Frediano Ziglio 
---
 cpus.c |5 -
 posix-aio-compat.c |   14 --
 2 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..d0cfe91 100644
--- a/cpus.c
+++ b/cpus.c
@@ -380,11 +380,6 @@ static int qemu_signal_init(void)
 int sigfd;
 sigset_t set;
 
-/* SIGUSR2 used by posix-aio-compat.c */
-sigemptyset(&set);
-sigaddset(&set, SIGUSR2);
-pthread_sigmask(SIG_UNBLOCK, &set, NULL);
-
 /*
  * SIG_IPI must be blocked in the main thread and must not be caught
  * by sigwait() in the signal thread. Otherwise, the cpu thread will
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 7ea63a1..72c6dbb 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -308,6 +308,8 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
 return nbytes;
 }
 
+static void posix_aio_notify_event(void);
+
 static void *aio_thread(void *unused)
 {
 pid_t pid;
@@ -380,7 +382,7 @@ static void *aio_thread(void *unused)
 aiocb->ret = ret;
 mutex_unlock(&lock);
 
-if (kill(pid, SIGUSR2)) die("kill failed");
+posix_aio_notify_event();
 }
 
 cur_threads--;
@@ -547,7 +549,7 @@ static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
-static void aio_signal_handler(int signum)
+static void posix_aio_notify_event(void)
 {
 if (posix_aio_state) {
 char byte = 0;
@@ -557,8 +559,6 @@ static void aio_signal_handler(int signum)
 if (ret < 0 && errno != EAGAIN)
 die("write()");
 }
-
-qemu_service_io();
 }
 
 static void paio_remove(struct qemu_paiocb *acb)
@@ -662,7 +662,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 
 int paio_init(void)
 {
-struct sigaction act;
 PosixAioState *s;
 int fds[2];
 int ret;
@@ -672,11 +671,6 @@ int paio_init(void)
 
 s = g_malloc(sizeof(PosixAioState));
 
-sigfillset(&act.sa_mask);
-act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-act.sa_handler = aio_signal_handler;
-sigaction(SIGUSR2, &act, NULL);
-
 s->first_aio = NULL;
 if (qemu_pipe(fds) == -1) {
 fprintf(stderr, "failed to create pipe\n");
-- 
1.7.1

Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode

On 09/19/2011 09:52 AM, Andi Kleen wrote:

>  I think it also improves branch target prediction - if you have a tight
>  loop of a few opcodes the predictor can guess where you're headed (since
>  there is a separate lookup key for each opcode), whereas with the
>  original code, there's a single key which cannot be used to predict the
>  branch target.

At least usually. I caught at least one version of gcc to CSE the jump
instruction into a common location in one of my interpreters
:-( But not all do it at least, and I hope gcc gets fixed.

You generally want CSE, yes?  So you can't blame gcc for getting it 
wrong sometimes.

--
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH v2 0/2] Make simpletrace work on Windows

The 'simple' trace backend uses pthreads and does not work on Windows.  These
patches switch from pthreads to glib so that the code builds on all platforms
supported by glib.

v2:
 * Block signals in trace write-out thread

Stefan Hajnoczi (2):
  trace: portable simple trace backend using glib
  trace: use binary file open mode in simpletrace

 trace/simple.c |   72 +++-
 1 files changed, 45 insertions(+), 27 deletions(-)

-- 
1.7.5.4

[Qemu-devel] [PATCH v2 2/2] trace: use binary file open mode in simpletrace

For Windows portability the simple trace backend must use the 'b' file
open mode.  This prevents the stdio library from mangling 0x0a/0x0d
newline characters.

Signed-off-by: Stefan Hajnoczi 
---
 trace/simple.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index e29e001..b729c34 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -234,7 +234,7 @@ void st_set_trace_file_enabled(bool enable)
 .x1 = HEADER_VERSION,
 };
 
-trace_fp = fopen(trace_file_name, "w");
+trace_fp = fopen(trace_file_name, "wb");
 if (!trace_fp) {
 return;
 }
-- 
1.7.5.4

[Qemu-devel] [PATCH v2 1/2] trace: portable simple trace backend using glib

Convert the simple trace backend to glib so that it works under Windows.
We cannot use pthread directly but glib provides portable abstractions.
Also use glib atomics instead of newish gcc builtins which may not be
supported on Windows toolchains.

Signed-off-by: Stefan Hajnoczi 
---
 trace/simple.c |   70 +++
 1 files changed, 44 insertions(+), 26 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index a609368..e29e001 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -12,8 +12,10 @@
 #include 
 #include 
 #include 
+#ifndef _WIN32
 #include 
 #include 
+#endif
 #include "qemu-timer.h"
 #include "trace.h"
 #include "trace/control.h"
@@ -54,9 +56,9 @@ enum {
  * Trace records are written out by a dedicated thread.  The thread waits for
  * records to become available, writes them out, and then waits again.
  */
-static pthread_mutex_t trace_lock = PTHREAD_MUTEX_INITIALIZER;
-static pthread_cond_t trace_available_cond = PTHREAD_COND_INITIALIZER;
-static pthread_cond_t trace_empty_cond = PTHREAD_COND_INITIALIZER;
+static GStaticMutex trace_lock = G_STATIC_MUTEX_INIT;
+static GCond *trace_available_cond;
+static GCond *trace_empty_cond;
 static bool trace_available;
 static bool trace_writeout_enabled;
 
@@ -93,29 +95,30 @@ static bool get_trace_record(unsigned int idx, TraceRecord 
*record)
  */
 static void flush_trace_file(bool wait)
 {
-pthread_mutex_lock(&trace_lock);
+g_static_mutex_lock(&trace_lock);
 trace_available = true;
-pthread_cond_signal(&trace_available_cond);
+g_cond_signal(trace_available_cond);
 
 if (wait) {
-pthread_cond_wait(&trace_empty_cond, &trace_lock);
+g_cond_wait(trace_empty_cond, g_static_mutex_get_mutex(&trace_lock));
 }
 
-pthread_mutex_unlock(&trace_lock);
+g_static_mutex_unlock(&trace_lock);
 }
 
 static void wait_for_trace_records_available(void)
 {
-pthread_mutex_lock(&trace_lock);
+g_static_mutex_lock(&trace_lock);
 while (!(trace_available && trace_writeout_enabled)) {
-pthread_cond_signal(&trace_empty_cond);
-pthread_cond_wait(&trace_available_cond, &trace_lock);
+g_cond_signal(trace_empty_cond);
+g_cond_wait(trace_available_cond,
+g_static_mutex_get_mutex(&trace_lock));
 }
 trace_available = false;
-pthread_mutex_unlock(&trace_lock);
+g_static_mutex_unlock(&trace_lock);
 }
 
-static void *writeout_thread(void *opaque)
+static gpointer writeout_thread(gpointer opaque)
 {
 TraceRecord record;
 unsigned int writeout_idx = 0;
@@ -159,7 +162,7 @@ static void trace(TraceEventID event, uint64_t x1, uint64_t 
x2, uint64_t x3,
 
 timestamp = get_clock();
 
-idx = __sync_fetch_and_add(&trace_idx, 1) % TRACE_BUF_LEN;
+idx = g_atomic_int_exchange_and_add((gint *)&trace_idx, 1) % TRACE_BUF_LEN;
 trace_buf[idx] = (TraceRecord){
 .event = event,
 .timestamp_ns = timestamp,
@@ -331,28 +334,43 @@ bool trace_event_set_state(const char *name, bool state)
 return false;
 }
 
-bool trace_backend_init(const char *events, const char *file)
+/* Helper function to create a thread with signals blocked */
+static GThread *trace_thread_create(GThreadFunc fn)
 {
-pthread_t thread;
-pthread_attr_t attr;
+GThread *thread;
+#ifndef _WIN32
 sigset_t set, oldset;
-int ret;
-
-pthread_attr_init(&attr);
-pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
 
 sigfillset(&set);
 pthread_sigmask(SIG_SETMASK, &set, &oldset);
-ret = pthread_create(&thread, &attr, writeout_thread, NULL);
+#endif
+thread = g_thread_create(writeout_thread, NULL, FALSE, NULL);
+#ifndef _WIN32
 pthread_sigmask(SIG_SETMASK, &oldset, NULL);
+#endif
+
+return thread;
+}
+
+bool trace_backend_init(const char *events, const char *file)
+{
+GThread *thread;
+
+if (!g_thread_supported()) {
+g_thread_init(NULL);
+}
 
-if (ret != 0) {
+trace_available_cond = g_cond_new();
+trace_empty_cond = g_cond_new();
+
+thread = trace_thread_create(writeout_thread);
+if (!thread) {
 fprintf(stderr, "warning: unable to initialize simple trace 
backend\n");
-} else {
-atexit(st_flush_trace_buffer);
-trace_backend_init_events(events);
-st_set_trace_file(file);
+return false;
 }
 
+atexit(st_flush_trace_buffer);
+trace_backend_init_events(events);
+st_set_trace_file(file);
 return true;
 }
-- 
1.7.5.4

[Qemu-devel] op-helper.c vs helper.c

2011-09-19 Thread Xin Tong Utoronto

There are 2 files on helpers in target-ppc and target-i386 ( op-helper.c
 helper.c), what are their differences ? also, what kind of functions are
typically emulated using helpers ?


Thanks

Xin

Re: [Qemu-devel] op-helper.c vs helper.c

On 19.09.2011, at 14:06, Xin Tong Utoronto wrote:

> There are 2 files on helpers in target-ppc and target-i386 ( op-helper.c  
> helper.c), what are their differences ?

General rule:

  op_helper.c: Code snippets called from TCG generated code. Implement more 
complex operations that gcc gets better than TCG.
  helper.c: Helper functions specific to the CPU, but called from multiple 
places around QEMU. For example the MMU code belongs here.

> also, what kind of functions are typically emulated using helpers ?

IIRC the rule of thumb is anything containing branches or bigger than 10(?) TCG 
ops should be in a helper. But it's up to the target developer really.

Alex

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 09/18/2011 10:04 PM, Jan Kiszka wrote:

>
>  If you make the core patch add both mr->offset and mrp->offset, then
>  change isa to drop memory_region_set_offset(), instead adding the delta
>  to mrp->offset, does that not work out?

Nope. The old API accepted arbitrary portio lists per memory region, the
new requires one region with a consistent offset per range. I should
have documented it...

What does "a consistent offset per range" mean?  You aren't actually 
changing the caller's ranges.

>
>>  >And I
>>  >   don't want to remove memory_region_set_offset() until everything (that
>>  >   can potentially use it, at least) has been converted.
>>
>>  IMO it's easier to fix those potential users before converting them. You
>>  need to review them anyway to decide if an offset might be needed, and
>>  which one precisely.
>>
>>  Are you aware of any candidates? For PIO, there should be none now.
>
>  For pio, none, but mmio has some:
>
>  hw/sh7750.c:cpu_register_physical_memory_offset(0x1f00, 0x1000,
>  hw/sh7750.c:cpu_register_physical_memory_offset(0xff00, 0x1000,
>  hw/sh7750.c:cpu_register_physical_memory_offset(0x1f80, 0x1000,
>  hw/sh7750.c:cpu_register_physical_memory_offset(0xff80, 0x1000,
>  hw/sh7750.c:cpu_register_physical_memory_offset(0x1fc0, 0x1000,
>  hw/sh7750.c:cpu_register_physical_memory_offset(0xffc0, 0x1000,
>  hw/sh_intc.c:
>  cpu_register_physical_memory_offset(P4ADDR(address), 4,
>  hw/sh_intc.c:
>  cpu_register_physical_memory_offset(A7ADDR(address), 4,

Cool, that's all. Trivial to fix, just push the offset math into those
few handler. Then we can drop cpu_register_physical_memory_offset as well.

They all use the same handler, so you need to split e.g. 
sh7750_io_memory into six MemoryRegionsOps. Or use tricks with aliases - 
have one giant 4G region with one handler, and map six 4k aliases into 
the system address space.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 09/18/2011 10:16 PM, Jan Kiszka wrote:

On 2011-09-18 18:49, Richard Henderson wrote:
>  On 09/18/2011 05:54 AM, Jan Kiszka wrote:
>>  @@ -375,8 +375,7 @@ static const MemoryRegionPortio 
*find_portio(MemoryRegion *mr, uint64_t offset,
>>   const MemoryRegionPortio *mrp;
>>
>>   for (mrp = mr->ops->old_portio; mrp->size; ++mrp) {
>>  -if (offset>= mrp->offset&&  offset<  mrp->offset + mrp->len
>>  -&&  width == mrp->size
>>  +if (offset<  mrp->len&&  width == mrp->size
>
>  This change looks broken to me.  How, exactly, are you
>  disambiguating different entries?

See my reply to Avi: all offsets of an portio region must be the same.

Said Avi doesn't understand.  VGA for example has many ports.

Or are you saying, split the input into sets of discontinuous ports, 
within each set you can use only one offset?

They should actually only differ in access width, but there is still at
least one counter example (of course IDE...). Given that this is just a
portability helper, all this will likely be reviewed and cleaned up when
getting rid of old portio.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

On 09/18/2011 10:07 PM, Jan Kiszka wrote:

On 2011-09-18 18:49, Avi Kivity wrote:
>  On 09/18/2011 07:28 PM, Jan Kiszka wrote:
>>  >>
>>  >>   This is PIO, limited by the x86 address space to 16 bit. Will add a
>>  >>   comment.
>>  >
>>  >   x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
>>  >   nothing about.
>>
>>  Confused address and data, the former is limited 16, the latter can be
>>  32 as well. But I guess only ISA models made use of the core's split up
>>  service, and that's why QEMU limited itself accordingly.
>
>  Let's not bury such details in the core.

It's already in the core (ioport), and would refrain from changing it in
this fix.

That's the bad old core we're trying to get away from.

>
>  I don't think this holds for pci; there the bus always generates 32-bit
>  writes with separate byte enables for each lane.  The device need not
>  even be aware of a sub-word access, for reads.

The problem is that once we "enhance" the core with such a support to
potentially help one use case, we need to validate all users again if
they depend on the old behavior. That's tricky as breakage may only show
up with odd guests that issue invalid but so far harmless requests.

It's opt-in.  If a device sets 
MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed 
byte accesses (the core will take care of breaking apart larger 
writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it 
will only get long accesses (and the core will/should shift/mask or 
RMW).  Refusing illegal access sizes is done using 
MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 2/2] vga: Fix portio list conversion fallouts

On 09/18/2011 10:07 PM, Jan Kiszka wrote:

>  I don't think it should be deduplicated.  The device is providing two
>  separate ABIs.

Yes, two ABIs, and the only difference is the offset of the data register.

How about

#ifdef TARGET_I386
# define VBE_DATA_REG 1
#else
# define VBE_DATA_REG 2
#endif

>
>  Why disallow it?

Did anyone check that something useful or at least valid comes out of
the handlers when doing this so far impossible access?

It's a general problem.  It can't be fixed in devices, the core has to 
handle this (and devices have to tell it how to react to such accesses).

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 2011-09-19 14:14, Avi Kivity wrote:
> On 09/18/2011 10:04 PM, Jan Kiszka wrote:
>> >
>> >  If you make the core patch add both mr->offset and mrp->offset, then
>> >  change isa to drop memory_region_set_offset(), instead adding the
>> delta
>> >  to mrp->offset, does that not work out?
>>
>> Nope. The old API accepted arbitrary portio lists per memory region, the
>> new requires one region with a consistent offset per range. I should
>> have documented it...
> 
> What does "a consistent offset per range" mean?  You aren't actually
> changing the caller's ranges.

I'm changing the way isa_register_portio_1 registers portios with the
core: only one per offset. The new commit log says:

"This implies that MemoryRegionPortio::offset is no longer used as
offset within the memory region but just as a correction value for the
offset passed to legacy handlers that expect absolute port addresses."

> 
> 
>>
>> >
>> >>  >And I
>> >>  >   don't want to remove memory_region_set_offset() until
>> everything (that
>> >>  >   can potentially use it, at least) has been converted.
>> >>
>> >>  IMO it's easier to fix those potential users before converting
>> them. You
>> >>  need to review them anyway to decide if an offset might be needed,
>> and
>> >>  which one precisely.
>> >>
>> >>  Are you aware of any candidates? For PIO, there should be none now.
>> >
>> >  For pio, none, but mmio has some:
>> >
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0x1f00,
>> 0x1000,
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0xff00,
>> 0x1000,
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0x1f80,
>> 0x1000,
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0xff80,
>> 0x1000,
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0x1fc0,
>> 0x1000,
>> >  hw/sh7750.c:cpu_register_physical_memory_offset(0xffc0,
>> 0x1000,
>> >  hw/sh_intc.c:
>> >  cpu_register_physical_memory_offset(P4ADDR(address), 4,
>> >  hw/sh_intc.c:
>> >  cpu_register_physical_memory_offset(A7ADDR(address), 4,
>>
>> Cool, that's all. Trivial to fix, just push the offset math into those
>> few handler. Then we can drop cpu_register_physical_memory_offset as
>> well.
> 
> They all use the same handler, so you need to split e.g.
> sh7750_io_memory into six MemoryRegionsOps. Or use tricks with aliases -
> have one giant 4G region with one handler, and map six 4k aliases into
> the system address space.

Looks more like 3 regions with one alias each. But we likely need to
disentangle all that logic first. I would be surprised if there wasn't a
more readable way to express it via the memory API.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

On 2011-09-19 14:19, Avi Kivity wrote:
> On 09/18/2011 10:07 PM, Jan Kiszka wrote:
>> On 2011-09-18 18:49, Avi Kivity wrote:
>> >  On 09/18/2011 07:28 PM, Jan Kiszka wrote:
>> >>  >>
>> >>  >>   This is PIO, limited by the x86 address space to 16 bit. Will
>> add a
>> >>  >>   comment.
>> >>  >
>> >>  >   x86 PIO is not limited to 16 bits, just ISA, which memory.c knows
>> >>  >   nothing about.
>> >>
>> >>  Confused address and data, the former is limited 16, the latter
>> can be
>> >>  32 as well. But I guess only ISA models made use of the core's
>> split up
>> >>  service, and that's why QEMU limited itself accordingly.
>> >
>> >  Let's not bury such details in the core.
>>
>> It's already in the core (ioport), and would refrain from changing it in
>> this fix.
> 
> That's the bad old core we're trying to get away from.

We overcome it at the point the last portio user was converted and that
legacy is removed.

> 
>> >
>> >  I don't think this holds for pci; there the bus always generates
>> 32-bit
>> >  writes with separate byte enables for each lane.  The device need not
>> >  even be aware of a sub-word access, for reads.
>>
>> The problem is that once we "enhance" the core with such a support to
>> potentially help one use case, we need to validate all users again if
>> they depend on the old behavior. That's tricky as breakage may only show
>> up with odd guests that issue invalid but so far harmless requests.
> 
> It's opt-in.  If a device sets
> MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
> byte accesses (the core will take care of breaking apart larger
> writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it
> will only get long accesses (and the core will/should shift/mask or
> RMW).  Refusing illegal access sizes is done using
> MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.

That makes sense (for non-old_portio users).

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 09/19/2011 03:29 PM, Jan Kiszka wrote:

On 2011-09-19 14:14, Avi Kivity wrote:
>  On 09/18/2011 10:04 PM, Jan Kiszka wrote:
>>  >
>>  >   If you make the core patch add both mr->offset and mrp->offset, then
>>  >   change isa to drop memory_region_set_offset(), instead adding the
>>  delta
>>  >   to mrp->offset, does that not work out?
>>
>>  Nope. The old API accepted arbitrary portio lists per memory region, the
>>  new requires one region with a consistent offset per range. I should
>>  have documented it...
>
>  What does "a consistent offset per range" mean?  You aren't actually
>  changing the caller's ranges.

I'm changing the way isa_register_portio_1 registers portios with the
core: only one per offset. The new commit log says:

"This implies that MemoryRegionPortio::offset is no longer used as
offset within the memory region but just as a correction value for the
offset passed to legacy handlers that expect absolute port addresses."

Ah:

-/* If we see a hole, break the region.  */
+/* If we see a new offset, break the region. */

But, sorry for being slow, I don't see why it requires a core update 
(other for adding mrp->offset).

>  They all use the same handler, so you need to split e.g.
>  sh7750_io_memory into six MemoryRegionsOps. Or use tricks with aliases -
>  have one giant 4G region with one handler, and map six 4k aliases into
>  the system address space.

Looks more like 3 regions with one alias each. But we likely need to
disentangle all that logic first. I would be surprised if there wasn't a
more readable way to express it via the memory API.

Depends if you subscribe to the "blindly make it work exactly the same 
way" or "understand the details and rewrite it cleanly" brands of masochism.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

On 09/19/2011 03:32 PM, Jan Kiszka wrote:

>  It's opt-in.  If a device sets
>  MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
>  byte accesses (the core will take care of breaking apart larger
>  writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size = 4, it
>  will only get long accesses (and the core will/should shift/mask or
>  RMW).  Refusing illegal access sizes is done using
>  MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.

That makes sense (for non-old_portio users).

The trick of having a way to register N callbacks with one shot is worth 
growing.  Ideally each register in a BAR would have a callback and we'd 
do something like

MemoryRegionOps mydev_ops = {
.registers = {
 { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
 ...
 },
}

with hints to the core like "this register sits at this offset, use it 
for reads instead of a callback", or, "this is a read-only register".

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 2011-09-19 14:37, Avi Kivity wrote:
> On 09/19/2011 03:29 PM, Jan Kiszka wrote:
>> On 2011-09-19 14:14, Avi Kivity wrote:
>> >  On 09/18/2011 10:04 PM, Jan Kiszka wrote:
>> >>  >
>> >>  >   If you make the core patch add both mr->offset and
>> mrp->offset, then
>> >>  >   change isa to drop memory_region_set_offset(), instead adding the
>> >>  delta
>> >>  >   to mrp->offset, does that not work out?
>> >>
>> >>  Nope. The old API accepted arbitrary portio lists per memory
>> region, the
>> >>  new requires one region with a consistent offset per range. I should
>> >>  have documented it...
>> >
>> >  What does "a consistent offset per range" mean?  You aren't actually
>> >  changing the caller's ranges.
>>
>> I'm changing the way isa_register_portio_1 registers portios with the
>> core: only one per offset. The new commit log says:
>>
>> "This implies that MemoryRegionPortio::offset is no longer used as
>> offset within the memory region but just as a correction value for the
>> offset passed to legacy handlers that expect absolute port addresses."
> 
> 
> Ah:
> 
> -/* If we see a hole, break the region.  */
> +/* If we see a new offset, break the region. */
> 
> 
> But, sorry for being slow, I don't see why it requires a core update
> (other for adding mrp->offset).

So far we matched accesses in find_portio by considering the portio
offset as well. If we want to replace the region offset with the portio
one (which confines legacy to a legacy-only place), we need to make the
portio offset a pure correction value on handler invocation and exclude
it from any range matching. And that means an old_portio memory region
can only describe one range starting exactly at MemoryRegion::addr.

> 
>>
>> >  They all use the same handler, so you need to split e.g.
>> >  sh7750_io_memory into six MemoryRegionsOps. Or use tricks with
>> aliases -
>> >  have one giant 4G region with one handler, and map six 4k aliases into
>> >  the system address space.
>>
>> Looks more like 3 regions with one alias each. But we likely need to
>> disentangle all that logic first. I would be surprised if there wasn't a
>> more readable way to express it via the memory API.
>>
> 
> Depends if you subscribe to the "blindly make it work exactly the same
> way" or "understand the details and rewrite it cleanly" brands of
> masochism.

We generally used to convert from APIv to APIv by adding legacy
wrappers, rarely removing any of them. This doesn't scale, but - granted
- it requires some masochism to make progress.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

On 2011-09-19 14:42, Avi Kivity wrote:
> On 09/19/2011 03:32 PM, Jan Kiszka wrote:
>> >  It's opt-in.  If a device sets
>> >  MemoryRegionOps::impl.{min,max}_access_size = 1, it will only be fed
>> >  byte accesses (the core will take care of breaking apart larger
>> >  writes).  If it sets MemoryRegionOps::impl.{min,max}_access_size =
>> 4, it
>> >  will only get long accesses (and the core will/should shift/mask or
>> >  RMW).  Refusing illegal access sizes is done using
>> >  MemoryRegionOps::valid.  Most of this is unimplemented unfortunately.
>>
>> That makes sense (for non-old_portio users).
>>
>>
> 
> The trick of having a way to register N callbacks with one shot is worth
> growing.  Ideally each register in a BAR would have a callback and we'd
> do something like
> 
> MemoryRegionOps mydev_ops = {
> .registers = {
>  { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
>  ...
>  },
> }
> 
> with hints to the core like "this register sits at this offset, use it
> for reads instead of a callback", or, "this is a read-only register".

This has pros and cons. If you have n registers to dispatch, you then
have to write n function prologues and maybe epilogues instead of just
one. Specifically if the register access is trivial, that could case
quite some LoC blowup on the device side.

What may have a better ratio are generic register get/set handlers.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] isa: Avoid using obsolete memory_region_set_offset for old portio

On 09/19/2011 03:48 PM, Jan Kiszka wrote:

>
>  Ah:
>
>  -/* If we see a hole, break the region.  */
>  +/* If we see a new offset, break the region. */
>
>
>  But, sorry for being slow, I don't see why it requires a core update
>  (other for adding mrp->offset).

So far we matched accesses in find_portio by considering the portio
offset as well. If we want to replace the region offset with the portio
one (which confines legacy to a legacy-only place), we need to make the
portio offset a pure correction value on handler invocation and exclude
it from any range matching. And that means an old_portio memory region
can only describe one range starting exactly at MemoryRegion::addr.

Thanks for the explanation.  But I think you're trying to remove 
->offset by moving it somewhere else.  That's not removal, that's 
renaming, and it reduces functionality for other old_portio users.

If users need absolute addresses, then we should provide them via 
set_offset(), not pretend the need doesn't exist.

(btw, another way to emulate set_offset() is via aliases, as detailed in 
the other thread).

>
>>
>>  >   They all use the same handler, so you need to split e.g.
>>  >   sh7750_io_memory into six MemoryRegionsOps. Or use tricks with
>>  aliases -
>>  >   have one giant 4G region with one handler, and map six 4k aliases into
>>  >   the system address space.
>>
>>  Looks more like 3 regions with one alias each. But we likely need to
>>  disentangle all that logic first. I would be surprised if there wasn't a
>>  more readable way to express it via the memory API.
>>
>
>  Depends if you subscribe to the "blindly make it work exactly the same
>  way" or "understand the details and rewrite it cleanly" brands of
>  masochism.

We generally used to convert from APIv  to APIv  by adding legacy
wrappers, rarely removing any of them. This doesn't scale, but - granted
- it requires some masochism to make progress.

Wrappers reduce the risk of regression from a bad conversion by a tired 
coder.  But yes, they increase the amount of cruft immensely.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

On 09/19/2011 03:55 PM, Jan Kiszka wrote:

>
>  The trick of having a way to register N callbacks with one shot is worth
>  growing.  Ideally each register in a BAR would have a callback and we'd
>  do something like
>
>  MemoryRegionOps mydev_ops = {
>  .registers = {
>   { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
>   ...
>   },
>  }
>
>  with hints to the core like "this register sits at this offset, use it
>  for reads instead of a callback", or, "this is a read-only register".

This has pros and cons. If you have n registers to dispatch, you then
have to write n function prologues and maybe epilogues instead of just
one. Specifically if the register access is trivial, that could case
quite some LoC blowup on the device side.

What may have a better ratio are generic register get/set handlers.

With C++ pointers-to-members and pointers-to-member-functions, you 
actually get some nice representation:

class MyDev {

void reg_1_read(...) { return some_computation(); }
void reg_1_write(...) { do_something(); }

uint32_t reg_2;
void reg_2_write(...) { reg_2 = value; do_something(); }

uint64_t reg_3;

static const Register registers[] = {
  Register(REG_1, &MyDev::reg_1_read, &MyDev::reg_1_write),
  Register(REG_2, &MyDev::reg_2, &MyDev::reg_2_write),
  Register(REG_1, &MyDev::reg_3),
};
};

... and the Register class generates the appropriate accessors.  We can 
emulate some of this with macros, but the conversion from opaque to the 
actual type will always be ugly.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support

Am 01.09.2011 20:37, schrieb Luiz Capitulino:
> This series adds support to the block layer to keep track of devices'
> I/O status. That information is also made available in QMP and HMP.
> 
> The goal here is to allow management applications that miss the
> BLOCK_IO_ERROR event to able to query the VM to determine if any device has
> caused the VM to stop and which device caused it.
> 
> Please, note that this series depends on the following series:
> 
>  o [PATCH v3 0/8]: Introduce the RunState type
>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
> 
> And to be able to properly test it you'll also need:
> 
>  o [PATCH 0/3] qcow2/coroutine fixes
>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
> 
> Here's an HMP example:
> 
>   (qemu) info status 
>   VM status: paused (io-error)
>   (qemu) info block
>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 
> encrypted=0
>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest ro=0 
> drv=qcow2 encrypted=0
>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
>   floppy0: removable=1 locked=0 [not inserted]
>   sd0: removable=1 locked=0 [not inserted]
> 
> The "info status" command shows that the VM is stopped due to an I/O error.
> By issuing "info block" it's possible to determine that the 'ide0-hd1' device
> caused the error, which turns out to be due to no space.

Looks like I didn't reply here yet?

I still don't quite like that the devices are involved, but their part
is minimal and it makes the implementation much easier, so for me that's
acceptable.

Kevin

[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

2011-09-19 Thread Justin Shafer

Here is a strace.. fumex calls???

strace -ff -o /ls-strace-alignment-6.log /usr/bin/qemu-i386 /usr/bin
/wine-pthread notepad.exe

** Attachment added: "wine.log"
   
https://bugs.launchpad.net/qemu/+bug/739785/+attachment/2426371/+files/wine.log

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/739785

Title:
  qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

Status in QEMU:
  New

Bug description:
  Good time of day everybody,

  I have been trying to make usermode qemu on ARM with plugapps
  (archlinux) with archlinux i386 chroot to work.

  1. I installed arch linux in a virtuabox and created a chroot for it with 
mkarchroot. Transferred it to my pogo plug into /i386/
  2. I comiled qemu-i386 static and put it into /i386/usr/bin/
  ./configure --static --disable-blobs --disable-system 
--target-list=i386-linux-user
  make

  3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and 
installed it.
  uname -a
  Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel 
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

  4. Added the following options into /etc/rc.local
  /sbin/modprobe binfmt_misc
  /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
  echo 
':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:'
 >/proc/sys/fs/binfmt_misc/register

  5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld-
  linux.so.3 is a link to that file) from /lib/ to /i386/lib/

  6.Now i chroot into /i386 and I get this:
  [root@Plugbox i386]# chroot .
  [II aI hnve ao n@P /]# pacman -Suy
  bash: fork: Invalid argument

  7.I also downloaded linux-user-test-0.3 from qemu website and ran the test:
  [root@Plugbox linux-user-test-0.3]# make
  ./qemu-linux-user.sh
  [qemu-i386]
  ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l 
dummyfile
  BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: 
Assertion `needed != ((void *)0)' failed!
  make: *** [test] Error 127

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions

Re: [Qemu-devel] op-helper.c vs helper.c

2011-09-19 Thread Peter Maydell

On 19 September 2011 13:06, Xin Tong Utoronto  wrote:
> There are 2 files on helpers in target-ppc and target-i386 ( op-helper.c
>  helper.c), what are their differences ? also, what kind of functions are
> typically emulated using helpers ?

The key difference is that op_helper.c is compiled with compiler
flags and includes header files that give it access to a global
variable 'CPUState *env' which is kept in a fixed CPU register
during execution of translated code. helper.c (and other foo_helper.c
files) are built as regular C files, and so if they need access to the
CPU state it has to be passed into the helper as an explicit parameter.

We're currently trying to cut back on the use of the implicit global,
so new helper functions should probably go in helper.c.

> also, what kind of functions are typically emulated using helpers ?

Anything that seems too hard to do inline :-) tcg/README has a
paragraph at the end giving some rules of thumb.

-- PMM

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses

2011-09-19 Thread Gerd Hoffmann


  Hi,


The trick of having a way to register N callbacks with one shot is worth
growing.  Ideally each register in a BAR would have a callback and we'd
do something like

 MemoryRegionOps mydev_ops = {
 .registers = {
  { MYDEV_REG_x, 4, 4, mydev_reg_x_read, mydev_reg_x_write, },
  ...
  },
 }

with hints to the core like "this register sits at this offset, use it
for reads instead of a callback", or, "this is a read-only register".


This has pros and cons. If you have n registers to dispatch, you then
have to write n function prologues and maybe epilogues instead of just
one. Specifically if the register access is trivial, that could case
quite some LoC blowup on the device side.


If the register access is trivial then you don't need to call into the 
driver at all ...


You can have a look at hw/intel-hda.c which actually implements 
something like this, with some commonly needed features:


  * The "offset" field already mentioned by avi is there, so trivial
reads/writes can be handled by the core.
  * A "wmask" field to specify which bits are guest writable.
  * A "wclear" field to specify which bits have write-one-to-clear
semantics.
  * A "reset" field which specified the value this field has after
device reset.  Also serves as value for read-only registers.
  * read/write handlers of course.  The write handler is called after
the core applied sanity checks and calculated the new register
value (using wmask+wclear).
  * A "name" field (for debug logging).

It's pretty nice, alot more readable that a big switch, forces you to 
think which bits the guest can set (not specifying a wmask gives you a 
read-only register ;).


Also no bloat.  With this moving to memory core the all the handlers 
will gain a line with a container_of(), but that isn't too bad too IMHO.


cheers,
  Gerd

Re: [Qemu-devel] [PATCH v2 1/2] memory: Fix old portio word accesses


On 09/19/2011 04:55 PM, Gerd Hoffmann wrote:


If the register access is trivial then you don't need to call into the 
driver at all ...


You can have a look at hw/intel-hda.c which actually implements 
something like this, with some commonly needed features:


  * The "offset" field already mentioned by avi is there, so trivial
reads/writes can be handled by the core.
  * A "wmask" field to specify which bits are guest writable.
  * A "wclear" field to specify which bits have write-one-to-clear
semantics.
  * A "reset" field which specified the value this field has after
device reset.  Also serves as value for read-only registers.
  * read/write handlers of course.  The write handler is called after
the core applied sanity checks and calculated the new register
value (using wmask+wclear).
  * A "name" field (for debug logging).

It's pretty nice, alot more readable that a big switch, forces you to 
think which bits the guest can set (not specifying a wmask gives you a 
read-only register ;).


Also no bloat.  With this moving to memory core the all the handlers 
will gain a line with a container_of(), but that isn't too bad too IMHO.


It's also more secure.  Move as much as possible into the core, and 
review (and fuzz) that like hell.


--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 2/4] rbd: allow escaping in config string

Am 15.09.2011 23:11, schrieb Sage Weil:
> The config string is variously delimited by =, @, and /, depending on the
> field.  Allow these characters to be escaped by preceeding them with \.
> 
> Signed-off-by: Sage Weil 
> ---
>  block/rbd.c |   29 +++--
>  1 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index f64b2e0..43f0e63 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -104,8 +104,15 @@ static int qemu_rbd_next_tok(char *dst, int dst_len,
>  *p = NULL;
>  
>  if (delim != '\0') {
> -end = strchr(src, delim);
> -if (end) {
> +for (end = src; *end; ++end) {
> +if (*end == delim) {
> +break;
> +}
> +if (*end == '\\') {
> +end++;
> +}
> +}

If src ends with a backslash, you read beyond the end of the string.

> +if (*end == delim) {
>  *p = end + 1;
>  *end = '\0';
>  }
> @@ -124,6 +131,19 @@ static int qemu_rbd_next_tok(char *dst, int dst_len,
>  return 0;
>  }
>  
> +static void qemu_rbd_unescape(char *src)
> +{
> +char *p;
> +
> +for (p = src; *src; ++src, ++p) {
> +if (*src == '\\') {
> +src++;
> +}
> +*p = *src;
> +}
> +*p = '\0';
> +}

This has the same problem.

Wouldn't it make sense to have the unescape integrated in
qemu_rbd_next_tok? Or are there any places where you would want to call
it without doing a qemu_rbd_unescape() afterwards?

Kevin

Re: [Qemu-devel] [PATCH 00/14] qdev: assign unique names to all devices (part 1)

2011-09-19 Thread Anthony Liguori


On 09/19/2011 02:26 AM, Jan Kiszka wrote:

On 2011-09-16 20:03, Anthony Liguori wrote:

So this is a simplification that I plan on running with.  For now, I think this
series is the right next step because it gives us a path name for the name
(although different syntax) and let's us enforce that all devices has a
canonical path.


For something that changes lots of devices and, at the same time, is
going to be removed again, I'm hesitating to call it the right direction.

A right step would be, IMHO, to introduce a generic introspectable
device link so that parent devices can reference their children and a
visitor can derive a child's relative name from that link name. Then
make sure this link type is consistently used.

I really dislike this focusing on assigning names internally and using
them in QEMU-internal APIs. They should just fall out of the core when
external interaction is required.


I thought a lot about this over the weekend and decided that I should go in a 
different direction based on this discussion.


Starting with the requirement that you have to be able to ask the device for an 
unambiguous reference to itself, I see two possible approaches:


1) Store the unambiguous reference within the child, derive unambiguous 
reference from the composition hierarchy.


2) Store a reference to the composition parent in the child.  When asked for the 
unambiguous reference, use parent point to generate the reference.


These two approaches have subtle differences.  For (1), all devices must be 
given globally unique names.  For (2), all devices must be given names that are 
unique to its composition parent.


I really dislike having a 'Device *parent' pointer because I fear it will be 
abused, but there a number of elegant advantages to this model.  For instance:


a) There is nothing special about user created devices.  All user created 
devices sit on the root of the composition hierarchy.  The names must be unique 
within that level of the composition tree.  However, once you get one level 
down, it's a new namespace.  This means you can achieve uniqueness without 
special characters or any of that nonsense.


b) We have proper path components without parsing a string and assigning special 
meaning to characters.


Point (b) is especially important because I spent a lot of time thinking about 
how to address Kevin's concerns about usability.  I think we can significantly 
improve path usability by trying to do an unambiguous match from right-to-left 
on the path components.  For instance, if you have:


/i440fx
/i440fx/piix3
/i440fx/piix3/ide
/i440fx/slot[1.0]
/i440fx/slot[2.0]

You can do:

 -device isa-serial,bus=piix3  (or)
 -device isa-serial,bus=i440fx/piix3 (or)
 -device isa-serial,bus=/i440fx/piix3

The first two examples are relevant, but unambiguous paths, matched from 
right-to-left.  The last example is an absolute path.  Since the vast majority 
of the time, you have an unambiguous path by simple taking the last and possibly 
second-to-last path component, I think that most of the time you can use very 
short relative paths.


I think this ends up being a big usability improvement and the only way to 
implement this (1) is parsing device names and extract paths which is even more 
evil than having a composition parent link in the Device.


So I'm now rebasing my series and switching to this model.  It's a rather 
significant change but I think I have a grip on how to do it.


Regards,

Anthony Liguori






Independent of that, Jan suggested that we could have what's essentially an
alias.  This is just a short name (could be in the form '%s-%d' %
(class.lower(), object_count).  This alias is just a hash table.  It has nothing
to do with the core device model.


I can't remember suggesting such thing.

Jan

Re: [Qemu-devel] [PATCH 2/2] block: avoid SIGUSR2

On Mon, Sep 19, 2011 at 12:54 PM, Frediano Ziglio  wrote:
> @@ -547,7 +549,7 @@ static int posix_aio_flush(void *opaque)
>
>  static PosixAioState *posix_aio_state;
>
> -static void aio_signal_handler(int signum)
> +static void posix_aio_notify_event(void)
>  {
>     if (posix_aio_state) {

Seems like this if statement is always true, hence the condition can
be removed.  The posix_aio_state global is always non-NULL here.

Stefan

Re: [Qemu-devel] [PATCH 0/4] More RBD updates

Am 15.09.2011 23:11, schrieb Sage Weil:
> Hi,
> 
> Here are a few more improvements to the qemu rbd support.  The first 
> patch makes the configuration file handling cleaner (do not error out if 
> /etc/ceph/ceph.conf doesn't exist).  One allows characters in the conf 
> string to be escaped, so you can (for example) specify an ip\:port (':' 
> is used as a delimiter).
> 
> The last patch implements flush when rbd_flush() is available.  This lets 
> us take advantage of write buffering in newer versions of librbd, which 
> improves performance significantly for many workloads (including the 
> trivial qemu-img convert).
> 
> Thanks!
> sage
> 
> 
> Sage Weil (4):
>   rbd: ignore failures when reading from default conf location
>   rbd: allow escaping in config string
>   rbd: update comment heading
>   rbd: call flush, if available
> 
>  block/rbd.c |   83 +++---
>  1 files changed, 56 insertions(+), 27 deletions(-)
> 

Thanks, applied patches 1/3/4, commented on patch 2.

Kevin

Re: [Qemu-devel] [PATCH 0/2][RFC?] Remove SIGUSR2 from posix-aio

On Mon, Sep 19, 2011 at 12:54 PM, Frediano Ziglio  wrote:
> Now that iothread is always compiled sending a signal seems only an
> additional step. This patch also avoid writing to two pipe (one from signal
> and one in qemu_service_io).
>
> Tested and works correctly with KVM enabled. Performances are only sligthly
> better (as I expected). strace output is more readable.
>
> Doubts:
> - any sense having two patches and not only last one?

Personally I would just send one here.

> - is ok if KVM is disabled? more testing required

If I understand correctly it used to interrupt guest code execution
whereas now it simply invoked a callback in the I/O thread.  Normally
this callback will raise a guest interrupt and get the vCPU to notice
that I/O has completed.  So I think this should work.

Stefan

Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support

2011-09-19 Thread Luiz Capitulino

On Mon, 19 Sep 2011 15:40:06 +0200
Kevin Wolf  wrote:

> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
> > This series adds support to the block layer to keep track of devices'
> > I/O status. That information is also made available in QMP and HMP.
> > 
> > The goal here is to allow management applications that miss the
> > BLOCK_IO_ERROR event to able to query the VM to determine if any device has
> > caused the VM to stop and which device caused it.
> > 
> > Please, note that this series depends on the following series:
> > 
> >  o [PATCH v3 0/8]: Introduce the RunState type
> >  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
> > 
> > And to be able to properly test it you'll also need:
> > 
> >  o [PATCH 0/3] qcow2/coroutine fixes
> >  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
> > 
> > Here's an HMP example:
> > 
> >   (qemu) info status 
> >   VM status: paused (io-error)
> >   (qemu) info block
> >   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 
> > encrypted=0
> >   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest ro=0 
> > drv=qcow2 encrypted=0
> >   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
> >   floppy0: removable=1 locked=0 [not inserted]
> >   sd0: removable=1 locked=0 [not inserted]
> > 
> > The "info status" command shows that the VM is stopped due to an I/O error.
> > By issuing "info block" it's possible to determine that the 'ide0-hd1' 
> > device
> > caused the error, which turns out to be due to no space.
> 
> Looks like I didn't reply here yet?

No, you didn't.

> I still don't quite like that the devices are involved, but their part
> is minimal and it makes the implementation much easier, so for me that's
> acceptable.

Suggestions on better ways of implementing this are welcome! :)

Re: [Qemu-devel] [PATCH 00/14] qdev: assign unique names to all devices (part 1)

Am 19.09.2011 16:05, schrieb Anthony Liguori:
> On 09/19/2011 02:26 AM, Jan Kiszka wrote:
>> On 2011-09-16 20:03, Anthony Liguori wrote:
>>> So this is a simplification that I plan on running with.  For now, I think 
>>> this
>>> series is the right next step because it gives us a path name for the name
>>> (although different syntax) and let's us enforce that all devices has a
>>> canonical path.
>>
>> For something that changes lots of devices and, at the same time, is
>> going to be removed again, I'm hesitating to call it the right direction.
>>
>> A right step would be, IMHO, to introduce a generic introspectable
>> device link so that parent devices can reference their children and a
>> visitor can derive a child's relative name from that link name. Then
>> make sure this link type is consistently used.
>>
>> I really dislike this focusing on assigning names internally and using
>> them in QEMU-internal APIs. They should just fall out of the core when
>> external interaction is required.
> 
> I thought a lot about this over the weekend and decided that I should go in a 
> different direction based on this discussion.
> [...]

Sounds like a good direction to me. (At least until someone brings up
the details :-))

Kevin

Re: [Qemu-devel] [PATCH v1 0/6]: block: Add I/O status support

Am 19.09.2011 16:09, schrieb Luiz Capitulino:
> On Mon, 19 Sep 2011 15:40:06 +0200
> Kevin Wolf  wrote:
> 
>> Am 01.09.2011 20:37, schrieb Luiz Capitulino:
>>> This series adds support to the block layer to keep track of devices'
>>> I/O status. That information is also made available in QMP and HMP.
>>>
>>> The goal here is to allow management applications that miss the
>>> BLOCK_IO_ERROR event to able to query the VM to determine if any device has
>>> caused the VM to stop and which device caused it.
>>>
>>> Please, note that this series depends on the following series:
>>>
>>>  o [PATCH v3 0/8]: Introduce the RunState type
>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00118.html
>>>
>>> And to be able to properly test it you'll also need:
>>>
>>>  o [PATCH 0/3] qcow2/coroutine fixes
>>>  o http://lists.gnu.org/archive/html/qemu-devel/2011-09/msg00074.html
>>>
>>> Here's an HMP example:
>>>
>>>   (qemu) info status 
>>>   VM status: paused (io-error)
>>>   (qemu) info block
>>>   ide0-hd0: removable=0 io-status=ok file=disks/test2.img ro=0 drv=qcow2 
>>> encrypted=0
>>>   ide0-hd1: removable=0 io-status=nospace file=/dev/vg_doriath/kvmtest ro=0 
>>> drv=qcow2 encrypted=0
>>>   ide1-cd0: removable=1 locked=0 io-status=ok [not inserted]
>>>   floppy0: removable=1 locked=0 [not inserted]
>>>   sd0: removable=1 locked=0 [not inserted]
>>>
>>> The "info status" command shows that the VM is stopped due to an I/O error.
>>> By issuing "info block" it's possible to determine that the 'ide0-hd1' 
>>> device
>>> caused the error, which turns out to be due to no space.
>>
>> Looks like I didn't reply here yet?
> 
> No, you didn't.
> 
>> I still don't quite like that the devices are involved, but their part
>> is minimal and it makes the implementation much easier, so for me that's
>> acceptable.
> 
> Suggestions on better ways of implementing this are welcome! :)

I don't have one. :-)

Surely it would be possible to do everything in block.c, but I see that
this would make things much more complicated (would need to get an AIO
callback added to the chain that can save the error code). It's not
worth the trouble.

Kevin

[Qemu-devel] [PATCH v2] block: avoid SIGUSR2

Now that iothread is always compiled sending a signal seems only an
additional step. This patch also avoid writing to two pipe (one from signal
and one in qemu_service_io).

Work with kvm enabled or disabled. strace output is more readable (less 
syscalls).

Signed-off-by: Frediano Ziglio 
---
 cpus.c |5 -
 posix-aio-compat.c |   29 +
 2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..d0cfe91 100644
--- a/cpus.c
+++ b/cpus.c
@@ -380,11 +380,6 @@ static int qemu_signal_init(void)
 int sigfd;
 sigset_t set;
 
-/* SIGUSR2 used by posix-aio-compat.c */
-sigemptyset(&set);
-sigaddset(&set, SIGUSR2);
-pthread_sigmask(SIG_UNBLOCK, &set, NULL);
-
 /*
  * SIG_IPI must be blocked in the main thread and must not be caught
  * by sigwait() in the signal thread. Otherwise, the cpu thread will
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 3193dbf..185d5b2 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -42,7 +42,6 @@ struct qemu_paiocb {
 int aio_niov;
 size_t aio_nbytes;
 #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
-int ev_signo;
 off_t aio_offset;
 
 QTAILQ_ENTRY(qemu_paiocb) node;
@@ -309,6 +308,8 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
 return nbytes;
 }
 
+static void posix_aio_notify_event(void);
+
 static void *aio_thread(void *unused)
 {
 pid_t pid;
@@ -381,7 +382,7 @@ static void *aio_thread(void *unused)
 aiocb->ret = ret;
 mutex_unlock(&lock);
 
-if (kill(pid, aiocb->ev_signo)) die("kill failed");
+posix_aio_notify_event();
 }
 
 cur_threads--;
@@ -548,18 +549,14 @@ static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
-static void aio_signal_handler(int signum)
+static void posix_aio_notify_event(void)
 {
-if (posix_aio_state) {
-char byte = 0;
-ssize_t ret;
-
-ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
-if (ret < 0 && errno != EAGAIN)
-die("write()");
-}
+char byte = 0;
+ssize_t ret;
 
-qemu_service_io();
+ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
+if (ret < 0 && errno != EAGAIN)
+die("write()");
 }
 
 static void paio_remove(struct qemu_paiocb *acb)
@@ -623,7 +620,6 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 return NULL;
 acb->aio_type = type;
 acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;
 
 if (qiov) {
 acb->aio_iov = qiov->iov;
@@ -651,7 +647,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 return NULL;
 acb->aio_type = QEMU_AIO_IOCTL;
 acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;
 acb->aio_offset = 0;
 acb->aio_ioctl_buf = buf;
 acb->aio_ioctl_cmd = req;
@@ -665,7 +660,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
 
 int paio_init(void)
 {
-struct sigaction act;
 PosixAioState *s;
 int fds[2];
 int ret;
@@ -675,11 +669,6 @@ int paio_init(void)
 
 s = g_malloc(sizeof(PosixAioState));
 
-sigfillset(&act.sa_mask);
-act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-act.sa_handler = aio_signal_handler;
-sigaction(SIGUSR2, &act, NULL);
-
 s->first_aio = NULL;
 if (qemu_pipe(fds) == -1) {
 fprintf(stderr, "failed to create pipe\n");
-- 
1.7.1

[Qemu-devel] [RFC] New Migration Protocol using Visitor Interface

OVERVIEW

This patch series implements a QEMUFile Visitor class that's intended to 
abstract away direct calls to qemu_put_*/qemu_get_* for save/load functions. 
Currently this is done by always creating a 
QEMUFileInputVisitor/QEMUFileOutputVisitor pair with each call to 
qemu_fopen_ops() and maintaining a QEMUFile->I/O Visitor mapping. save/load 
functions that are to be converted would them simply use lookup routines to get 
a Visitor based on their QEMUFile arguments. Once these save/load functions are 
all converted, we can then change the interfaces in bulk and switch over to 
passing in the Visitor directly.

An example conversion of Slirp, which still uses old-style save/load routines, 
is included as part of this series. Anthony I believe has VMState converted 
over to using Visitors, so theoretically all VMStatified devices are good to go 
(Anthony: if that's the case feel free to rebase on this or point me to the 
repo to pull in, and I'll work off the resulting branch).

PLANS

Ultimately, the goal is to implement a new migration protocol that is 
self-describing, such that incompatibilities or migration errors can be more 
easilly detected. Currently, a simple change in data types for a particular 
device can introduce subtle bugs that won't be detected by the target, since 
the target interprets the data according to it's own expectation of what those 
data types are. Currently the plan is to use ASN.1 BER in place of QEMUFile.

By using a Visitor interface for migration, we also gain the ability to 
generate a migration schema (via, say, a JSONSchemaVisitor). This is similar to 
the VMState schema generator in Anthony's "Add live migration unit tests" 
series (http://thread.gmane.org/gmane.comp.emulators.qemu/97754/focus=97754), 
except that it is applicable to both VMState and old-style save/load functions.

There is still quite a bit a work to get to that point. Anthony addressed some 
of the issues on the VMState side of things in the aforementioned series, and I 
believe he already has some patches that convert VMState over to using visitors 
insteads of qemu_put_*/qemu_get_* directly. That being the case, what's left is:

1) Convert old-style save/load functions over to using visitors. I'm not sure 
what the best approach is for this. We basically have 2 options: either by 
converting them to VMState, or converting the functions over to using visitors, 
such as with the example slirp conversion. Even if we do option 2 it would 
likely need to be done in 2 phases:

phase 1: a mechanical conversion where we basically replace each 
qemu_put*/qemu_get* with a visit_*. This allows use to plop in a new, say 
BERVisitor to switch to using the new migration protocol. It should also be 
possible to do this without breaking migration compatibility with older 
versions. It does not however result in a well-specified schema if we were to 
plop in, say, a JSONSchemaVisitor. This is where we need phase 2.

phase 2: Currently, runtime state modifies our "schema" the way things are 
currently done. Slirp instance and virtqueue enumeration, for instance:

slirp/slirp.c:
for (ex_ptr = slirp->exec_list; ex_ptr; ex_ptr = ex_ptr->ex_next)
if (ex_ptr->ex_pty == 3) {
struct socket *so;
so = slirp_find_ctl_socket(slirp, ex_ptr->ex_addr,
   ntohs(ex_ptr->ex_fport));
if (!so)
continue;

qemu_put_byte(f, 42);
slirp_socket_save(f, so);
}

hw/virtio.c:
for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
if (vdev->vq[i].vring.num == 0)
break;

qemu_put_be32(f, vdev->vq[i].vring.num);
qemu_put_be64(f, vdev->vq[i].pa);
qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
if (vdev->binding->save_queue)
vdev->binding->save_queue(vdev->binding_opaque, i, f);
}


To be able to build a schema around this we'd need to do a visit_start_array() 
or visit_start_list(), to denote a list of entries, and ideally we'd also 
describe structure of these entries. But often there is no structure of these 
entries...they may just be values taken from various data structures. So, 
either we do hacky things like using visitor intefaces to create "fake" structs 
(we say we're dealing with a struct when we're really not), or we actually put 
these into an intermediate struct which we then use to assign to store/load 
migration fields from.

phase 2 sounds a whole lot like converting over to VMState. So I think we'll 
only want to go with option 2 in places where converting to VMState is not 
practical/planned. Or we can hold off on the phase 2 stuff and just focus on 
getting the new protocol in place ASAP. After which, we can rework things to 
support generating a well-specified schema.

2) Once the conversion stuff is done, we can modify the save/load interfaces in 
bulk to accept a Visitor in place of a QEMUFile (but continue using the old 
protoco

[Qemu-devel] [RFC 1/8] qapi: add Visitor interfaces for uint_t and int_t


Signed-off-by: Michael Roth 
---
 qapi/qapi-visit-core.c |   78 
 qapi/qapi-visit-core.h |   30 ++
 2 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index ddef3ed..d2e7d3e 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -66,6 +66,28 @@ void visit_end_list(Visitor *v, Error **errp)
 }
 }
 
+void visit_start_array(Visitor *v, void **obj, const char *name, size_t 
elem_count,
+   size_t elem_size, Error **errp)
+{
+if (!error_is_set(errp)) {
+v->start_array(v, obj, name, elem_count, elem_size, errp);
+}
+}
+
+void visit_next_array(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)) {
+v->next_array(v, errp);
+}
+}
+
+void visit_end_array(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)) {
+v->end_array(v, errp);
+}
+}
+
 void visit_start_optional(Visitor *v, bool *present, const char *name,
   Error **errp)
 {
@@ -96,6 +118,62 @@ void visit_type_int(Visitor *v, int64_t *obj, const char 
*name, Error **errp)
 }
 }
 
+void visit_type_uint8_t(Visitor *v, uint8_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_uint8_t(v, obj, name, errp);
+}
+}
+
+void visit_type_uint16_t(Visitor *v, uint16_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_uint16_t(v, obj, name, errp);
+}
+}
+
+void visit_type_uint32_t(Visitor *v, uint32_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_uint32_t(v, obj, name, errp);
+}
+}
+
+void visit_type_uint64_t(Visitor *v, uint64_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_uint64_t(v, obj, name, errp);
+}
+}
+
+void visit_type_int8_t(Visitor *v, int8_t *obj, const char *name, Error **errp)
+{
+if (!error_is_set(errp)) {
+v->type_int8_t(v, obj, name, errp);
+}
+}
+
+void visit_type_int16_t(Visitor *v, int16_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_int16_t(v, obj, name, errp);
+}
+}
+
+void visit_type_int32_t(Visitor *v, int32_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_int32_t(v, obj, name, errp);
+}
+}
+
+void visit_type_int64_t(Visitor *v, int64_t *obj, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)) {
+v->type_int64_t(v, obj, name, errp);
+}
+}
+
 void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp)
 {
 if (!error_is_set(errp)) {
diff --git a/qapi/qapi-visit-core.h b/qapi/qapi-visit-core.h
index e850746..3e3ac4e 100644
--- a/qapi/qapi-visit-core.h
+++ b/qapi/qapi-visit-core.h
@@ -22,6 +22,11 @@ typedef struct GenericList
 struct GenericList *next;
 } GenericList;
 
+typedef struct GenericItem
+{
+void *value;
+} GenericItem;
+
 typedef struct Visitor Visitor;
 
 struct Visitor
@@ -35,10 +40,23 @@ struct Visitor
 GenericList *(*next_list)(Visitor *v, GenericList **list, Error **errp);
 void (*end_list)(Visitor *v, Error **errp);
 
+void (*start_array)(Visitor *v, void **obj, const char *name,
+size_t elem_count, size_t elem_size, Error **errp);
+void (*next_array)(Visitor *v, Error **errp);
+void (*end_array)(Visitor *v, Error **errp);
+
 void (*type_enum)(Visitor *v, int *obj, const char *strings[],
   const char *kind, const char *name, Error **errp);
 
 void (*type_int)(Visitor *v, int64_t *obj, const char *name, Error **errp);
+void (*type_uint8_t)(Visitor *v, uint8_t *obj, const char *name, Error 
**errp);
+void (*type_uint16_t)(Visitor *v, uint16_t *obj, const char *name, Error 
**errp);
+void (*type_uint32_t)(Visitor *v, uint32_t *obj, const char *name, Error 
**errp);
+void (*type_uint64_t)(Visitor *v, uint64_t *obj, const char *name, Error 
**errp);
+void (*type_int8_t)(Visitor *v, int8_t *obj, const char *name, Error 
**errp);
+void (*type_int16_t)(Visitor *v, int16_t *obj, const char *name, Error 
**errp);
+void (*type_int32_t)(Visitor *v, int32_t *obj, const char *name, Error 
**errp);
+void (*type_int64_t)(Visitor *v, int64_t *obj, const char *name, Error 
**errp);
 void (*type_bool)(Visitor *v, bool *obj, const char *name, Error **errp);
 void (*type_str)(Visitor *v, char **obj, const char *name, Error **errp);
 void (*type_number)(Visitor *v, double *obj, const char *name,
@@ -63,12 +81,24 @@ void visit_end_struct(Visitor *v, Error **errp);
 void visit_start_list(Visitor *v, const char *name, Error **errp);
 GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp);
 void visit_end_list(Visitor *v, Error **errp);
+void visit_start_array(Visitor *v, void **obj, const char *name, size_t 
elem_count,
+

[Qemu-devel] [RFC 4/8] savevm: move QEMUFile interfaces into qemu-file.c


Signed-off-by: Michael Roth 
---
 Makefile.objs |2 +-
 hw/hw.h   |1 +
 qemu-file.c   |  521 +
 savevm.c  |  494 --
 4 files changed, 523 insertions(+), 495 deletions(-)
 create mode 100644 qemu-file.c

diff --git a/Makefile.objs b/Makefile.objs
index 6bc8555..34bf58f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -119,7 +119,7 @@ common-obj-$(CONFIG_SD) += sd.o
 common-obj-y += bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o 
usb-bt.o
 common-obj-y += bt-hci-csr.o
 common-obj-y += buffered_file.o migration.o migration-tcp.o
-common-obj-y += qemu-char.o savevm.o #aio.o
+common-obj-y += qemu-char.o savevm.o qemu-file.o #aio.o
 common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o iohandler.o
diff --git a/hw/hw.h b/hw/hw.h
index a124da9..244bfb9 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -58,6 +58,7 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+int qemu_peek_byte(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/qemu-file.c b/qemu-file.c
new file mode 100644
index 000..fb91b91
--- /dev/null
+++ b/qemu-file.c
@@ -0,0 +1,521 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu-common.h"
+#include "qemu_socket.h"
+#include "hw/hw.h"
+
+#define IO_BUF_SIZE 32768
+
+struct QEMUFile {
+QEMUFilePutBufferFunc *put_buffer;
+QEMUFileGetBufferFunc *get_buffer;
+QEMUFileCloseFunc *close;
+QEMUFileRateLimit *rate_limit;
+QEMUFileSetRateLimit *set_rate_limit;
+QEMUFileGetRateLimit *get_rate_limit;
+void *opaque;
+int is_write;
+
+int64_t buf_offset; /* start of buffer when writing, end of buffer
+   when reading */
+int buf_index;
+int buf_size; /* 0 when writing */
+uint8_t buf[IO_BUF_SIZE];
+
+int has_error;
+};
+
+QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
+ QEMUFileGetBufferFunc *get_buffer,
+ QEMUFileCloseFunc *close,
+ QEMUFileRateLimit *rate_limit,
+ QEMUFileSetRateLimit *set_rate_limit,
+ QEMUFileGetRateLimit *get_rate_limit)
+{
+QEMUFile *f;
+
+f = g_malloc0(sizeof(QEMUFile));
+
+f->opaque = opaque;
+f->put_buffer = put_buffer;
+f->get_buffer = get_buffer;
+f->close = close;
+f->rate_limit = rate_limit;
+f->set_rate_limit = set_rate_limit;
+f->get_rate_limit = get_rate_limit;
+f->is_write = 0;
+
+return f;
+}
+
+int qemu_file_has_error(QEMUFile *f)
+{
+return f->has_error;
+}
+
+void qemu_file_set_error(QEMUFile *f)
+{
+f->has_error = 1;
+}
+
+void qemu_fflush(QEMUFile *f)
+{
+if (!f->put_buffer)
+return;
+
+if (f->is_write && f->buf_index > 0) {
+int len;
+
+len = f->put_buffer(f->opaque, f->buf, f->buf_offset, f->buf_index);
+if (len > 0)
+f->buf_offset += f->buf_index;
+else
+f->has_error = 1;
+f->buf_index = 0;
+}
+}
+
+static void qemu_fill_buffer(QEMUFile *f)
+{
+int len;
+
+if (!f->get_buffer)
+return;
+
+if (f->is_write)
+abort();
+
+len = f->get_buffer(f->opaque, f->buf, f->buf_offset, IO_BUF_SIZE);
+if (len > 0) {
+f->buf_index = 0;
+f->buf_size = len;
+f->buf_offset += len;
+} else if (len != -EAGAIN)
+f->has_error = 1;
+}
+
+int qemu_fclose(QEMUFile *f)
+{
+int ret = 0;
+qemu_fflush(f);
+if (f->close)
+ret = f->close(f->opaque);
+g_free(f);
+return ret;
+}
+
+voi

[Qemu-devel] [RFC 2/8] qapi: add QemuFileOutputVisitor

Visitor interface to write values to a QEMUFile.

Signed-off-by: Michael Roth 
---
 Makefile.objs   |   16 +-
 qapi/qemu-file-output-visitor.c |  353 +++
 qapi/qemu-file-output-visitor.h |   26 +++
 3 files changed, 388 insertions(+), 7 deletions(-)
 create mode 100644 qapi/qemu-file-output-visitor.c
 create mode 100644 qapi/qemu-file-output-visitor.h

diff --git a/Makefile.objs b/Makefile.objs
index 62020d7..48fe0c4 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -20,6 +20,13 @@ coroutine-obj-$(CONFIG_POSIX) += coroutine-gthread.o
 endif
 coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 
+##
+# qapi
+
+qapi-nested-y = qapi-visit-core.o qmp-input-visitor.o qmp-output-visitor.o 
qapi-dealloc-visitor.o
+qapi-nested-y += qmp-registry.o qmp-dispatch.o
+qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y))
+
 ###
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
@@ -73,6 +80,8 @@ fsdev-obj-$(CONFIG_VIRTFS) += $(addprefix fsdev/, 
$(fsdev-nested-y))
 # CPUs and machines.
 
 common-obj-y = $(block-obj-y) blockdev.o
+common-obj-y += $(qapi-obj-y)
+common-obj-y += qapi/qemu-file-output-visitor.o
 common-obj-y += $(net-obj-y)
 common-obj-y += $(qobject-obj-y)
 common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
@@ -399,13 +408,6 @@ $(trace-obj-y): $(GENERATED_HEADERS)
 libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o 
vcard_emul_type.o card_7816.o
 
 ##
-# qapi
-
-qapi-nested-y = qapi-visit-core.o qmp-input-visitor.o qmp-output-visitor.o 
qapi-dealloc-visitor.o
-qapi-nested-y += qmp-registry.o qmp-dispatch.o
-qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y))
-
-##
 # guest agent
 
 qga-nested-y = guest-agent-commands.o guest-agent-command-state.o
diff --git a/qapi/qemu-file-output-visitor.c b/qapi/qemu-file-output-visitor.c
new file mode 100644
index 000..c44b76c
--- /dev/null
+++ b/qapi/qemu-file-output-visitor.c
@@ -0,0 +1,353 @@
+/*
+ * QEMUFile Output Visitor
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Michael Roth   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-file-output-visitor.h"
+#include "qemu-queue.h"
+#include "qemu-common.h"
+#include "qemu-objects.h"
+#include "hw/hw.h"
+#include "qerror.h"
+
+typedef struct {
+size_t elem_count;
+size_t elem_size;
+size_t pos;
+} ArrayInfo;
+
+typedef struct StackEntry
+{
+enum {
+QFOV_ARRAY,
+QFOV_LIST,
+QFOV_STRUCT,
+} type;
+ArrayInfo array_info;
+bool is_list_head;
+QTAILQ_ENTRY(StackEntry) node;
+} StackEntry;
+
+struct QemuFileOutputVisitor
+{
+Visitor visitor;
+QTAILQ_HEAD(, StackEntry) stack;
+QEMUFile *file;
+};
+
+static QemuFileOutputVisitor *to_ov(Visitor *v)
+{
+return container_of(v, QemuFileOutputVisitor, visitor);
+}
+
+static void qemu_file_output_push(QemuFileOutputVisitor *ov, StackEntry *e)
+{
+QTAILQ_INSERT_HEAD(&ov->stack, e, node);
+}
+
+static void qemu_file_output_push_array(QemuFileOutputVisitor *ov, ArrayInfo 
ai)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFOV_ARRAY;
+e->array_info = ai;
+qemu_file_output_push(ov, e);
+}
+
+static void qemu_file_output_push_list(QemuFileOutputVisitor *ov)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFOV_LIST;
+e->is_list_head = true;
+qemu_file_output_push(ov, e);
+}
+
+static void qemu_file_output_push_struct(QemuFileOutputVisitor *ov)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFOV_STRUCT;
+qemu_file_output_push(ov, e);
+}
+
+static StackEntry *qemu_file_output_pop(QemuFileOutputVisitor *ov)
+{
+StackEntry *e = QTAILQ_FIRST(&ov->stack);
+QTAILQ_REMOVE(&ov->stack, e, node);
+return e;
+}
+
+static bool qemu_file_output_is_array(QemuFileOutputVisitor *ov)
+{
+StackEntry *e = QTAILQ_FIRST(&ov->stack);
+return e && e->type == QFOV_ARRAY;
+}
+
+static bool qemu_file_output_is_list(QemuFileOutputVisitor *ov)
+{
+StackEntry *e = QTAILQ_FIRST(&ov->stack);
+return e && e->type == QFOV_LIST;
+}
+
+static void qemu_file_output_start_struct(Visitor *v, void **obj,
+  const char *kind, const char *name,
+  size_t unused, Error **errp)
+{
+QemuFileOutputVisitor *ov = to_ov(v);
+
+if (!obj || *obj == NULL) {
+error_set(errp, QERR_INVALID_PARAMETER, "obj");
+}
+qemu_file_output_push_struct(ov);
+}
+
+static void qemu_file_output_end_struct(Visitor *v, Error **errp)
+{
+QemuFileOutputVisitor *ov = to_ov(v);
+StackEntry *e = qemu_file_o

[Qemu-devel] [RFC 7/8] cutil: add strocat(), to concat a string to an offset in another


Signed-off-by: Michael Roth 
---
 cutils.c  |8 
 qemu-common.h |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/cutils.c b/cutils.c
index c91f887..0835834 100644
--- a/cutils.c
+++ b/cutils.c
@@ -84,6 +84,14 @@ int stristart(const char *str, const char *val, const char 
**ptr)
 return 1;
 }
 
+/* concat first 'offset' chars in 'dest' with 'src' */
+char *strocat(char *buf, const char *src, int offset)
+{
+memcpy(&buf[offset], src, strlen(src));
+buf[offset+strlen(src)] = '\0';
+return buf;
+}
+
 /* XXX: use host strnlen if available ? */
 int qemu_strnlen(const char *s, int max_len)
 {
diff --git a/qemu-common.h b/qemu-common.h
index 404c421..0a9b919 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -138,6 +138,7 @@ void pstrcpy(char *buf, int buf_size, const char *str);
 char *pstrcat(char *buf, int buf_size, const char *s);
 int strstart(const char *str, const char *val, const char **ptr);
 int stristart(const char *str, const char *val, const char **ptr);
+char *strocat(char *dest, const char *src, int offset);
 int qemu_strnlen(const char *s, int max_len);
 time_t mktimegm(struct tm *tm);
 int qemu_fls(int i);
-- 
1.7.0.4

[Qemu-devel] [RFC 3/8] qapi: add QemuFileInputVisitor

Visitor interfaces to read values from a QEMUFile

Signed-off-by: Michael Roth 
---
 Makefile.objs  |1 +
 qapi/qemu-file-input-visitor.c |  350 
 qapi/qemu-file-input-visitor.h |   26 +++
 3 files changed, 377 insertions(+), 0 deletions(-)
 create mode 100644 qapi/qemu-file-input-visitor.c
 create mode 100644 qapi/qemu-file-input-visitor.h

diff --git a/Makefile.objs b/Makefile.objs
index 48fe0c4..6bc8555 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -82,6 +82,7 @@ fsdev-obj-$(CONFIG_VIRTFS) += $(addprefix fsdev/, 
$(fsdev-nested-y))
 common-obj-y = $(block-obj-y) blockdev.o
 common-obj-y += $(qapi-obj-y)
 common-obj-y += qapi/qemu-file-output-visitor.o
+common-obj-y += qapi/qemu-file-input-visitor.o
 common-obj-y += $(net-obj-y)
 common-obj-y += $(qobject-obj-y)
 common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
diff --git a/qapi/qemu-file-input-visitor.c b/qapi/qemu-file-input-visitor.c
new file mode 100644
index 000..7217125
--- /dev/null
+++ b/qapi/qemu-file-input-visitor.c
@@ -0,0 +1,350 @@
+/*
+ * QEMUFile Output Visitor
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Michael Roth   
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-file-input-visitor.h"
+#include "qemu-queue.h"
+#include "qemu-common.h"
+#include "qemu-objects.h"
+#include "hw/hw.h"
+#include "qerror.h"
+
+typedef struct {
+size_t elem_count;
+size_t elem_size;
+size_t pos;
+} ArrayInfo;
+
+typedef struct StackEntry
+{
+enum {
+QFIV_ARRAY,
+QFIV_LIST,
+QFIV_STRUCT,
+} type;
+ArrayInfo array_info;
+QTAILQ_ENTRY(StackEntry) node;
+} StackEntry;
+
+struct QemuFileInputVisitor
+{
+Visitor visitor;
+QTAILQ_HEAD(, StackEntry) stack;
+QEMUFile *file;
+};
+
+static QemuFileInputVisitor *to_iv(Visitor *v)
+{
+return container_of(v, QemuFileInputVisitor, visitor);
+}
+
+static void qemu_file_input_push(QemuFileInputVisitor *iv, StackEntry *e)
+{
+QTAILQ_INSERT_HEAD(&iv->stack, e, node);
+}
+
+static void qemu_file_input_push_array(QemuFileInputVisitor *iv, ArrayInfo ai)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFIV_ARRAY;
+e->array_info = ai;
+qemu_file_input_push(iv, e);
+}
+
+static void qemu_file_input_push_list(QemuFileInputVisitor *iv)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFIV_LIST;
+qemu_file_input_push(iv, e);
+}
+
+static void qemu_file_input_push_struct(QemuFileInputVisitor *iv)
+{
+StackEntry *e = g_malloc0(sizeof(*e));
+e->type = QFIV_STRUCT;
+qemu_file_input_push(iv, e);
+}
+
+static void *qemu_file_input_pop(QemuFileInputVisitor *iv)
+{
+StackEntry *e = QTAILQ_FIRST(&iv->stack);
+QTAILQ_REMOVE(&iv->stack, e, node);
+return e;
+}
+
+static bool qemu_file_input_is_array(QemuFileInputVisitor *iv)
+{
+StackEntry *e = QTAILQ_FIRST(&iv->stack);
+return e->type == QFIV_ARRAY;
+}
+
+static bool qemu_file_input_is_list(QemuFileInputVisitor *ov)
+{
+StackEntry *e = QTAILQ_FIRST(&ov->stack);
+return e && e->type == QFIV_LIST;
+}
+
+static void qemu_file_input_start_struct(Visitor *v, void **obj,
+ const char *kind,
+ const char *name, size_t size,
+ Error **errp)
+{
+QemuFileInputVisitor *iv = to_iv(v);
+
+if (obj && *obj == NULL) {
+*obj = g_malloc0(size);
+}
+qemu_file_input_push_struct(iv);
+}
+
+static void qemu_file_input_end_struct(Visitor *v, Error **errp)
+{
+QemuFileInputVisitor *iv = to_iv(v);
+StackEntry *e = qemu_file_input_pop(iv);
+
+if (!e || e->type != QFIV_STRUCT) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+g_free(e);
+}
+
+static void qemu_file_input_start_list(Visitor *v, const char *name,
+   Error **errp)
+{
+QemuFileInputVisitor *iv = to_iv(v);
+qemu_file_input_push_list(iv);
+}
+
+static GenericList *qemu_file_input_next_list(Visitor *v, GenericList **list,
+   Error **errp)
+{
+QemuFileInputVisitor *iv = to_iv(v);
+GenericList *entry;
+
+if (!qemu_file_input_is_list(iv)) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+}
+
+entry = g_malloc0(sizeof(*entry));
+if (*list) {
+(*list)->next = entry;
+}
+
+*list = entry;
+return entry;
+}
+
+static void qemu_file_input_end_list(Visitor *v, Error **errp)
+{
+QemuFileInputVisitor *iv = to_iv(v);
+StackEntry *e = qemu_file_input_pop(iv);
+if (!e || e->type != QFIV_LIST) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+g_free(e);
+}
+
+static void qemu_file_input_start_array(Visitor *v, void **obj,
+const char *na

[Qemu-devel] [RFC 5/8] qapi: test cases for QEMUFile input/output visitors


Signed-off-by: Michael Roth 
---
 Makefile   |4 +-
 test-visitor.c |  404 
 2 files changed, 406 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 7e9382f..2b68e39 100644
--- a/Makefile
+++ b/Makefile
@@ -155,7 +155,7 @@ check-qint.o check-qstring.o check-qdict.o check-qlist.o 
check-qfloat.o check-qj
 
 CHECK_PROG_DEPS = $(oslib-obj-y) $(trace-obj-y) qemu-tool.o
 
-check-qint: check-qint.o qint.o $(CHECK_PROG_DEPS)
+CHECK-QINt: check-qint.o qint.o $(CHECK_PROG_DEPS)
 check-qstring: check-qstring.o qstring.o $(CHECK_PROG_DEPS)
 check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o 
$(CHECK_PROG_DEPS)
 check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS)
@@ -188,7 +188,7 @@ $(qapi-dir)/qga-qmp-marshal.c: 
$(SRC_PATH)/qapi-schema-guest.json $(SRC_PATH)/sc
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py -o 
"$(qapi-dir)" -p "qga-" < $<, "  GEN   $@")
 
 test-visitor.o: $(addprefix $(qapi-dir)/, test-qapi-types.c test-qapi-types.h 
test-qapi-visit.c test-qapi-visit.h) $(qapi-obj-y)
-test-visitor: test-visitor.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
$(qapi-obj-y) error.o osdep.o $(oslib-obj-y) qjson.o json-streamer.o 
json-lexer.o json-parser.o qerror.o qemu-error.o qemu-tool.o 
$(qapi-dir)/test-qapi-visit.o $(qapi-dir)/test-qapi-types.o
+test-visitor: test-visitor.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o 
$(qapi-obj-y) error.o osdep.o $(oslib-obj-y) qjson.o json-streamer.o 
json-lexer.o json-parser.o qerror.o qemu-error.o qemu-tool.o 
$(qapi-dir)/test-qapi-visit.o $(qapi-dir)/test-qapi-types.o 
qapi/qemu-file-input-visitor.o qapi/qemu-file-output-visitor.o qemu-file.o
 
 test-qmp-commands.o: $(addprefix $(qapi-dir)/, test-qapi-types.c 
test-qapi-types.h test-qapi-visit.c test-qapi-visit.h test-qmp-marshal.c 
test-qmp-commands.h) $(qapi-obj-y)
 test-qmp-commands: test-qmp-commands.o qfloat.o qint.o qdict.o qstring.o 
qlist.o qbool.o $(qapi-obj-y) error.o osdep.o $(oslib-obj-y) qjson.o 
json-streamer.o json-lexer.o json-parser.o qerror.o qemu-error.o qemu-tool.o 
$(qapi-dir)/test-qapi-visit.o $(qapi-dir)/test-qapi-types.o 
$(qapi-dir)/test-qmp-marshal.o module.o
diff --git a/test-visitor.c b/test-visitor.c
index b7717de..1e8fd0c 100644
--- a/test-visitor.c
+++ b/test-visitor.c
@@ -4,6 +4,10 @@
 #include "test-qapi-types.h"
 #include "test-qapi-visit.h"
 #include "qemu-objects.h"
+#include "qapi/qemu-file-output-visitor.h"
+#include "qapi/qemu-file-input-visitor.h"
+#include "hw/hw.h"
+#include "qemu-common.h"
 
 typedef struct TestStruct
 {
@@ -291,6 +295,404 @@ static void test_nested_enums(void)
 qapi_free_NestedEnumsOne(nested_enums_cpy);
 }
 
+#define TEST_QEMU_FILE_PATH "/tmp/test_qemu_file_visitors"
+
+typedef struct QEMUFileValue {
+union {
+bool boolean;
+double number;
+uint8_t u8;
+uint16_t u16;
+uint32_t u32;
+uint64_t u64;
+int8_t s8;
+int16_t s16;
+int32_t s32;
+int64_t s64;
+uintmax_t umax;
+} value;
+union {
+uint8_t u8[32];
+uint16_t u16[32];
+uint32_t u32[32];
+uint64_t u64[32];
+int8_t s8[32];
+int16_t s16[32];
+int32_t s32[32];
+int64_t s64[32];
+uintmax_t umax[32];
+} array;
+size_t array_len;
+enum {
+QFV_BOOL = 0,
+QFV_NUMBER,
+QFV_U8,
+QFV_U16,
+QFV_U32,
+QFV_U64,
+QFV_S8,
+QFV_S16,
+QFV_S32,
+QFV_S64,
+QFV_U8_ARRAY,
+QFV_EOL,
+} type;
+} QEMUFileValue;
+
+QEMUFileValue qfvalues[] = {
+{ .value.boolean = true, .type = QFV_BOOL },
+{ .value.boolean = false, .type = QFV_BOOL },
+{ .value.number = 3.14159265, .type = QFV_NUMBER },
+{ .value.u8 = 0, .type = QFV_U8 },
+{ .value.u8 = 1, .type = QFV_U8 },
+{ .value.u8 = 128, .type = QFV_U8 },
+{ .value.u8 = 255u, .type = QFV_U8 },
+{ .value.u16 = 0, .type = QFV_U16 },
+{ .value.u16 = 1, .type = QFV_U16 },
+{ .value.u16 = 32768, .type = QFV_U16 },
+{ .value.u16 = 65535u, .type = QFV_U16 },
+{ .value.u32 = 0, .type = QFV_U32 },
+{ .value.u32 = 1, .type = QFV_U32 },
+{ .value.u32 = 2147483648, .type = QFV_U32 },
+{ .value.u32 = 4294967295u, .type = QFV_U32 },
+{ .value.u64 = 0, .type = QFV_U64 },
+{ .value.u64 = 1, .type = QFV_U64 },
+{ .value.u64 = 9223372036854775808u, .type = QFV_U64 },
+{ .value.u64 = 18446744073709551615u, .type = QFV_U64 },
+{ .value.s8 = 0, .type = QFV_S8 },
+{ .value.s8 = 1, .type = QFV_S8 },
+{ .value.s8 = 128, .type = QFV_S8 },
+{ .value.s8 = -1, .type = QFV_S8 },
+{ .value.s16 = 0, .type = QFV_S16 },
+{ .value.s16 = 1, .type = QFV_S16 },
+{ .value.s16 = 32768, .type = QFV_S16 },
+{ .value.s16 = -1, .type = QFV_S16 },
+{ .value.s32 = 0, .type = QFV_S32 },
+{ .va

Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode

2011-09-19 Thread Andi Kleen

> You generally want CSE, yes?  So you can't blame gcc for getting it 
> wrong sometimes.

There are cases where CSE pessimizes the code, .e.g when it increases
memory pressure too much or caches something that is easier recomputed.
This is just another one.

BTW I checked again and the problem seems to be fixed in gcc 4.6, still
there in 4.5, at least in my example.

-Andi

Re: [Qemu-devel] [PATCH] scsi: fix sign extension problems

Am 09.09.2011 16:47, schrieb Paolo Bonzini:
> When assigning a 32-bit value to cmd->xfer (which is 64-bits)
> it can be erroneously sign extended because the intermediate
> 32-bit computation is signed.  Fix this by standardizing on
> the ld*_be_p functions.
> 
> Signed-off-by: Paolo Bonzini 

Thanks, applied to the block branch.

Kevin

[Qemu-devel] [RFC 6/8] savevm: add QEMUFile->visitor lookup routines


Signed-off-by: Michael Roth 
---
 hw/hw.h |   19 +
 qemu-file.c |   64 +++
 2 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 244bfb9..e6d7c6a 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -10,6 +10,25 @@
 
 #include "ioport.h"
 #include "irq.h"
+#include "qemu-queue.h"
+#include "qapi/qapi-visit-core.h"
+#include "qapi/qemu-file-output-visitor.h"
+#include "qapi/qemu-file-input-visitor.h"
+
+/* QEMUFile->Visitor lookup routines to support legacy qemu_put_* calls */
+typedef struct QemuFileVisitorNode {
+void *opaque; /* Visitor type */
+QemuFileOutputVisitor *ov;
+QemuFileInputVisitor *iv;
+QEMUFile *f;
+QTAILQ_ENTRY(QemuFileVisitorNode) entry;
+} QemuFileVisitorNode;
+
+Visitor *qemu_file_get_input_visitor(QEMUFile *f);
+Visitor *qemu_file_get_output_visitor(QEMUFile *f);
+QemuFileVisitorNode *qemu_file_get_visitor_node(QEMUFile *f);
+void qemu_file_put_visitor_node(QemuFileVisitorNode *n);
+void qemu_file_remove_visitor_node(QemuFileVisitorNode *n);
 
 /* VM Load/Save */
 
diff --git a/qemu-file.c b/qemu-file.c
index fb91b91..ec69627 100644
--- a/qemu-file.c
+++ b/qemu-file.c
@@ -25,6 +25,8 @@
 #include "qemu-common.h"
 #include "qemu_socket.h"
 #include "hw/hw.h"
+#include "qapi/qemu-file-output-visitor.h"
+#include "qapi/qemu-file-input-visitor.h"
 
 #define IO_BUF_SIZE 32768
 
@@ -47,6 +49,61 @@ struct QEMUFile {
 int has_error;
 };
 
+/* TODO: temporary mechanism to support existing function signatures by
+ * creating a 1-to-1 mapping between QEMUFile's and the actual Visitor type.
+ * In the case of QemuFileOutputVisitor/QemuFileInputVisitor, the QEMUFile
+ * arg corresponds to the handle used by the visitor for reads/writes. For
+ * other visitors, the QEMUFile will serve purely as a key.
+ *
+ * Once all interfaces are converted to using Visitor directly, we will
+ * remove this lookup logic and pass the Visitor to the registered save/load
+ * functions directly.
+ */
+static QTAILQ_HEAD(, QemuFileVisitorNode) qemu_file_visitors =
+QTAILQ_HEAD_INITIALIZER(qemu_file_visitors);
+
+QemuFileVisitorNode *qemu_file_get_visitor_node(QEMUFile *f)
+{
+QemuFileVisitorNode *node;
+QTAILQ_FOREACH(node, &qemu_file_visitors, entry) {
+if (node->f == f) {
+return node;
+}
+}
+return NULL;
+}
+
+Visitor *qemu_file_get_output_visitor(QEMUFile *f)
+{
+QemuFileVisitorNode *node = qemu_file_get_visitor_node(f);
+
+if (node && node->ov) {
+return qemu_file_output_get_visitor(node->ov);
+}
+return NULL;
+}
+
+Visitor *qemu_file_get_input_visitor(QEMUFile *f)
+{
+QemuFileVisitorNode *node = qemu_file_get_visitor_node(f);
+
+if (node && node->iv) {
+return qemu_file_input_get_visitor(node->iv);
+}
+return NULL;
+}
+
+void qemu_file_put_visitor_node(QemuFileVisitorNode *node)
+{
+QTAILQ_INSERT_TAIL(&qemu_file_visitors, node, entry);
+}
+
+void qemu_file_remove_visitor_node(QemuFileVisitorNode *node)
+{
+QTAILQ_REMOVE(&qemu_file_visitors, node, entry);
+g_free(node);
+}
+
 QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer,
  QEMUFileGetBufferFunc *get_buffer,
  QEMUFileCloseFunc *close,
@@ -55,6 +112,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
  QEMUFileGetRateLimit *get_rate_limit)
 {
 QEMUFile *f;
+QemuFileVisitorNode *node;
 
 f = g_malloc0(sizeof(QEMUFile));
 
@@ -67,6 +125,12 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f->get_rate_limit = get_rate_limit;
 f->is_write = 0;
 
+node = g_malloc0(sizeof(QemuFileVisitorNode));
+node->ov = qemu_file_output_visitor_new(f);
+node->iv = qemu_file_input_visitor_new(f);
+node->f = f;
+qemu_file_put_visitor_node(node);
+
 return f;
 }
 
-- 
1.7.0.4

[Qemu-devel] [RFC 8/8] slirp: convert save/load function to visitor interface

Where possible common routines are used for both input and output, thus
save on lines of code, theoretically. The added lines here are mostly
due to extra logic for each save/load routine to manipulate strings into
a unique field name for each saved field, and in some cases a few extra
Visitor calls to due list/struct i/o. With some reworking we can
probably optimize all of these to reduce the amount of added code.

Signed-off-by: Michael Roth 
---
 slirp/slirp.c |  366 +---
 1 files changed, 216 insertions(+), 150 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 19d69eb..8783626 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -26,6 +26,9 @@
 #include "qemu-char.h"
 #include "slirp.h"
 #include "hw/hw.h"
+#include "qemu-error.h"
+
+#define SLIRP_DELIMITER 42 /* used to separate slirp instances in save/load */
 
 /* host loopback address */
 struct in_addr loopback_addr;
@@ -871,96 +874,171 @@ void slirp_socket_recv(Slirp *slirp, struct in_addr 
guest_addr, int guest_port,
 tcp_output(sototcpcb(so));
 }
 
-static void slirp_tcp_save(QEMUFile *f, struct tcpcb *tp)
+static void slirp_tcp_visit(Visitor *v, struct tcpcb *tp, const char *pfield, 
Error *err)
 {
-int i;
-
-qemu_put_sbe16(f, tp->t_state);
-for (i = 0; i < TCPT_NTIMERS; i++)
-qemu_put_sbe16(f, tp->t_timer[i]);
-qemu_put_sbe16(f, tp->t_rxtshift);
-qemu_put_sbe16(f, tp->t_rxtcur);
-qemu_put_sbe16(f, tp->t_dupacks);
-qemu_put_be16(f, tp->t_maxseg);
-qemu_put_sbyte(f, tp->t_force);
-qemu_put_be16(f, tp->t_flags);
-qemu_put_be32(f, tp->snd_una);
-qemu_put_be32(f, tp->snd_nxt);
-qemu_put_be32(f, tp->snd_up);
-qemu_put_be32(f, tp->snd_wl1);
-qemu_put_be32(f, tp->snd_wl2);
-qemu_put_be32(f, tp->iss);
-qemu_put_be32(f, tp->snd_wnd);
-qemu_put_be32(f, tp->rcv_wnd);
-qemu_put_be32(f, tp->rcv_nxt);
-qemu_put_be32(f, tp->rcv_up);
-qemu_put_be32(f, tp->irs);
-qemu_put_be32(f, tp->rcv_adv);
-qemu_put_be32(f, tp->snd_max);
-qemu_put_be32(f, tp->snd_cwnd);
-qemu_put_be32(f, tp->snd_ssthresh);
-qemu_put_sbe16(f, tp->t_idle);
-qemu_put_sbe16(f, tp->t_rtt);
-qemu_put_be32(f, tp->t_rtseq);
-qemu_put_sbe16(f, tp->t_srtt);
-qemu_put_sbe16(f, tp->t_rttvar);
-qemu_put_be16(f, tp->t_rttmin);
-qemu_put_be32(f, tp->max_sndwnd);
-qemu_put_byte(f, tp->t_oobflags);
-qemu_put_byte(f, tp->t_iobc);
-qemu_put_sbe16(f, tp->t_softerror);
-qemu_put_byte(f, tp->snd_scale);
-qemu_put_byte(f, tp->rcv_scale);
-qemu_put_byte(f, tp->request_r_scale);
-qemu_put_byte(f, tp->requested_s_scale);
-qemu_put_be32(f, tp->ts_recent);
-qemu_put_be32(f, tp->ts_recent_age);
-qemu_put_be32(f, tp->last_ack_sent);
+char f[128];
+int i, l = 0;
+int16_t *ptr;
+
+if (pfield) {
+assert(strlen(pfield) < sizeof(f));
+strcpy(f, pfield);
+l = strlen(pfield);
+}
+
+visit_type_int16_t(v, &tp->t_state, strocat(f, ".t_state", l), &err);
+ptr = tp->t_timer;
+visit_start_array(v, (void **)&ptr, strocat(f, ".t_timer", l), 
TCPT_NTIMERS, sizeof(*ptr), &err);
+for (i = 0; i < TCPT_NTIMERS; ++i) {
+visit_type_int16_t(v, &ptr[i], NULL, &err);
+}
+visit_end_array(v, &err);
+visit_type_int16_t(v, &tp->t_rxtshift, strocat(f, ".t_rxtshift", l), 
&err); 
+visit_type_int16_t(v, &tp->t_rxtcur, strocat(f, ".t_rxtcur", l), &err);
+visit_type_int16_t(v, &tp->t_dupacks, strocat(f, ".t_dupacks", l), &err);
+visit_type_uint16_t(v, &tp->t_maxseg, strocat(f, ".t_maxseg", l), &err); 
+visit_type_uint8_t(v, (uint8_t *)&tp->t_force, strocat(f, ".t_force", l), 
&err); 
+visit_type_uint16_t(v, &tp->t_flags, strocat(f, ".t_flags", l), &err); 
+visit_type_uint32_t(v, &tp->snd_una, strocat(f, ".snd_una", l), &err); 
+visit_type_uint32_t(v, &tp->snd_nxt, strocat(f, ".snd_nxt", l), &err); 
+visit_type_uint32_t(v, &tp->snd_up, strocat(f, ".snd_up", l), &err); 
+visit_type_uint32_t(v, &tp->snd_wl1, strocat(f, ".snd_wl1", l), &err); 
+visit_type_uint32_t(v, &tp->snd_wl2, strocat(f, ".snd_wl2", l), &err); 
+visit_type_uint32_t(v, &tp->iss, strocat(f, ".iss", l), &err); 
+visit_type_uint32_t(v, &tp->snd_wnd, strocat(f, ".snd_wnd", l), &err); 
+visit_type_uint32_t(v, &tp->rcv_wnd, strocat(f, ".rcv_wnd", l), &err); 
+visit_type_uint32_t(v, &tp->rcv_nxt, strocat(f, ".rcv_nxt", l), &err); 
+visit_type_uint32_t(v, &tp->rcv_up, strocat(f, ".rcv_up", l), &err); 
+visit_type_uint32_t(v, &tp->irs, strocat(f, ".irs", l), &err); 
+visit_type_uint32_t(v, &tp->rcv_adv, strocat(f, ".rcv_adv", l), &err); 
+visit_type_uint32_t(v, &tp->snd_max, strocat(f, ".snd_max", l), &err); 
+visit_type_uint32_t(v, &tp->snd_cwnd, strocat(f, ".snd_cwnd", l), &err); 
+visit_type_uint32_t(v, &tp->snd_ssthresh, strocat(f, ".snd_ssthresh", l), 
&err); 
+visit_type_int16_t(v, &tp->t_id

Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2


On 09/19/2011 04:37 PM, Frediano Ziglio wrote:

Now that iothread is always compiled sending a signal seems only an
additional step. This patch also avoid writing to two pipe (one from signal
and one in qemu_service_io).

Work with kvm enabled or disabled. strace output is more readable (less 
syscalls).

Signed-off-by: Frediano Ziglio
---
  cpus.c |5 -
  posix-aio-compat.c |   29 +
  2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/cpus.c b/cpus.c
index 54c188c..d0cfe91 100644
--- a/cpus.c
+++ b/cpus.c
@@ -380,11 +380,6 @@ static int qemu_signal_init(void)
  int sigfd;
  sigset_t set;

-/* SIGUSR2 used by posix-aio-compat.c */
-sigemptyset(&set);
-sigaddset(&set, SIGUSR2);
-pthread_sigmask(SIG_UNBLOCK,&set, NULL);
-
  /*
   * SIG_IPI must be blocked in the main thread and must not be caught
   * by sigwait() in the signal thread. Otherwise, the cpu thread will
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 3193dbf..185d5b2 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -42,7 +42,6 @@ struct qemu_paiocb {
  int aio_niov;
  size_t aio_nbytes;
  #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
-int ev_signo;
  off_t aio_offset;

  QTAILQ_ENTRY(qemu_paiocb) node;
@@ -309,6 +308,8 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
  return nbytes;
  }

+static void posix_aio_notify_event(void);
+
  static void *aio_thread(void *unused)
  {
  pid_t pid;
@@ -381,7 +382,7 @@ static void *aio_thread(void *unused)
  aiocb->ret = ret;
  mutex_unlock(&lock);

-if (kill(pid, aiocb->ev_signo)) die("kill failed");
+posix_aio_notify_event();
  }

  cur_threads--;
@@ -548,18 +549,14 @@ static int posix_aio_flush(void *opaque)

  static PosixAioState *posix_aio_state;

-static void aio_signal_handler(int signum)
+static void posix_aio_notify_event(void)
  {
-if (posix_aio_state) {
-char byte = 0;
-ssize_t ret;
-
-ret = write(posix_aio_state->wfd,&byte, sizeof(byte));
-if (ret<  0&&  errno != EAGAIN)
-die("write()");
-}
+char byte = 0;
+ssize_t ret;

-qemu_service_io();
+ret = write(posix_aio_state->wfd,&byte, sizeof(byte));
+if (ret<  0&&  errno != EAGAIN)
+die("write()");
  }

  static void paio_remove(struct qemu_paiocb *acb)
@@ -623,7 +620,6 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
  return NULL;
  acb->aio_type = type;
  acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;

  if (qiov) {
  acb->aio_iov = qiov->iov;
@@ -651,7 +647,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,
  return NULL;
  acb->aio_type = QEMU_AIO_IOCTL;
  acb->aio_fildes = fd;
-acb->ev_signo = SIGUSR2;
  acb->aio_offset = 0;
  acb->aio_ioctl_buf = buf;
  acb->aio_ioctl_cmd = req;
@@ -665,7 +660,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int fd,

  int paio_init(void)
  {
-struct sigaction act;
  PosixAioState *s;
  int fds[2];
  int ret;
@@ -675,11 +669,6 @@ int paio_init(void)

  s = g_malloc(sizeof(PosixAioState));

-sigfillset(&act.sa_mask);
-act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
-act.sa_handler = aio_signal_handler;
-sigaction(SIGUSR2,&act, NULL);
-
  s->first_aio = NULL;
  if (qemu_pipe(fds) == -1) {
  fprintf(stderr, "failed to create pipe\n");


I think it is possible to go a step further, turn 
posix_aio_process_queue into a bottom half and get rid of the pipe 
altogether.  This in turn would remove the only real user of 
io_process_queue in qemu_aio_set_fd_handler.  However, this is already a 
nice improvement.


Reviewed-by: Paolo Bonzini

[Qemu-devel] [PATCH] linux-aio: remove process requests callback

Signed-off-by: Paolo Bonzini 
---
 linux-aio.c |   11 +--
 1 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/linux-aio.c b/linux-aio.c
index 5265a02..bffa6cd 100644
--- a/linux-aio.c
+++ b/linux-aio.c
@@ -68,15 +68,6 @@ static void qemu_laio_process_completion(struct 
qemu_laio_state *s,
 qemu_aio_release(laiocb);
 }
 
-/*
- * All requests are directly processed when they complete, so there's nothing
- * left to do during qemu_aio_wait().
- */
-static int qemu_laio_process_requests(void *opaque)
-{
-return 0;
-}
-
 static void qemu_laio_completion_cb(void *opaque)
 {
 struct qemu_laio_state *s = opaque;
@@ -215,7 +206,7 @@ void *laio_init(void)
 goto out_close_efd;
 
 qemu_aio_set_fd_handler(s->efd, qemu_laio_completion_cb, NULL,
-qemu_laio_flush_cb, qemu_laio_process_requests, s);
+qemu_laio_flush_cb, NULL, s);
 
 return s;
 
-- 
1.7.6

Re: [Qemu-devel] Help needed -- vvfat.c

Am 15.09.2011 14:49, schrieb Pintu Kumar:
> Hi,
> 
> This is regarding qemu block/vvfat.c.
> 
> Currently vvfat scans all directories and sub-directories in the
> beginning during init_directories().
> 
> I want to modify vvfat such that it should scan only the TOP directory
> content and not the sub-directory content before mounting.
> And after mounting vvfat I should be able to dynamically get the TOP
> directory content when I do "ls -l" on that directory.
> How can I do that?
> 
> I tried skipping sub-directory in read_directory() as follows.
> else if(S_ISDIR(st.st_mode))
> {
> LOGD("Skipping sub-directory : %s\n", buffer);
> s->current_mapping=NULL;
> continue;
> }
> After that the sub-directories are visible after mounting and part1 is 
> working.
> But after mounting when I do "ls -l" on vvfat mounted path, on any
> directory, vvfat_read() is not getting called.
> 
> Somebody who knows about vvfat implementation please let me know where
> is the problem.

I doubt that this is possible. A guest OS doesn't expect that data on
the disk changes when it didn't write to it. So for example, it could
read in the whole FAT and never really look at the on-disk FAT any more.

In practice, probably only parts of the FAT are cached, but that doesn't
help you. You never know which parts are cached, and anyway, even if you
knew which parts you may change, restricting changes to these parts of
the FATs would be tricky... vvfat is already complicated enough without
doing anything like this.

Kevin

Re: [Qemu-devel] [net-next RFC V2 PATCH 0/5] Multiqueue support in tun/tap

2011-09-19 Thread Ben Hutchings

On Sat, 2011-09-17 at 14:02 +0800, Jason Wang wrote:
[...]
> 2 Current implementation may also get regression for single session
> packet transmission.
> 
> The reason is packets from each flow were not handled by the same
> queue/vhost thread.
> 
> Various method could be done to handle this:
> 
> 2.1 hack the guest driver, and store the queue index into the rxhash and
> use it when choosing tx in guest. This need some hack to store the
> rxhash into sk and pass it in to skb again in
> skb_orphan_try(). sk_rxhash is only used by RPS now, so some more
> clean method is needed.
[...]

I have previously suggested doing this as a general rule.  However, I
now think we can do much better with accelerated RFS and automatic XPS
(but the latter is not yet implemented).  For virtio_net, accelerated
RFS would effectively push the guest's RFS socket map out to the host.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2

Am 19.09.2011 17:02, schrieb Paolo Bonzini:
> On 09/19/2011 04:37 PM, Frediano Ziglio wrote:
>> Now that iothread is always compiled sending a signal seems only an
>> additional step. This patch also avoid writing to two pipe (one from signal
>> and one in qemu_service_io).
>>
>> Work with kvm enabled or disabled. strace output is more readable (less 
>> syscalls).
>>
>> Signed-off-by: Frediano Ziglio
>> ---
>>   cpus.c |5 -
>>   posix-aio-compat.c |   29 +
>>   2 files changed, 9 insertions(+), 25 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index 54c188c..d0cfe91 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -380,11 +380,6 @@ static int qemu_signal_init(void)
>>   int sigfd;
>>   sigset_t set;
>>
>> -/* SIGUSR2 used by posix-aio-compat.c */
>> -sigemptyset(&set);
>> -sigaddset(&set, SIGUSR2);
>> -pthread_sigmask(SIG_UNBLOCK,&set, NULL);
>> -
>>   /*
>>* SIG_IPI must be blocked in the main thread and must not be caught
>>* by sigwait() in the signal thread. Otherwise, the cpu thread will
>> diff --git a/posix-aio-compat.c b/posix-aio-compat.c
>> index 3193dbf..185d5b2 100644
>> --- a/posix-aio-compat.c
>> +++ b/posix-aio-compat.c
>> @@ -42,7 +42,6 @@ struct qemu_paiocb {
>>   int aio_niov;
>>   size_t aio_nbytes;
>>   #define aio_ioctl_cmd   aio_nbytes /* for QEMU_AIO_IOCTL */
>> -int ev_signo;
>>   off_t aio_offset;
>>
>>   QTAILQ_ENTRY(qemu_paiocb) node;
>> @@ -309,6 +308,8 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
>>   return nbytes;
>>   }
>>
>> +static void posix_aio_notify_event(void);
>> +
>>   static void *aio_thread(void *unused)
>>   {
>>   pid_t pid;
>> @@ -381,7 +382,7 @@ static void *aio_thread(void *unused)
>>   aiocb->ret = ret;
>>   mutex_unlock(&lock);
>>
>> -if (kill(pid, aiocb->ev_signo)) die("kill failed");
>> +posix_aio_notify_event();
>>   }
>>
>>   cur_threads--;
>> @@ -548,18 +549,14 @@ static int posix_aio_flush(void *opaque)
>>
>>   static PosixAioState *posix_aio_state;
>>
>> -static void aio_signal_handler(int signum)
>> +static void posix_aio_notify_event(void)
>>   {
>> -if (posix_aio_state) {
>> -char byte = 0;
>> -ssize_t ret;
>> -
>> -ret = write(posix_aio_state->wfd,&byte, sizeof(byte));
>> -if (ret<  0&&  errno != EAGAIN)
>> -die("write()");
>> -}
>> +char byte = 0;
>> +ssize_t ret;
>>
>> -qemu_service_io();
>> +ret = write(posix_aio_state->wfd,&byte, sizeof(byte));
>> +if (ret<  0&&  errno != EAGAIN)
>> +die("write()");
>>   }
>>
>>   static void paio_remove(struct qemu_paiocb *acb)
>> @@ -623,7 +620,6 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int 
>> fd,
>>   return NULL;
>>   acb->aio_type = type;
>>   acb->aio_fildes = fd;
>> -acb->ev_signo = SIGUSR2;
>>
>>   if (qiov) {
>>   acb->aio_iov = qiov->iov;
>> @@ -651,7 +647,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int 
>> fd,
>>   return NULL;
>>   acb->aio_type = QEMU_AIO_IOCTL;
>>   acb->aio_fildes = fd;
>> -acb->ev_signo = SIGUSR2;
>>   acb->aio_offset = 0;
>>   acb->aio_ioctl_buf = buf;
>>   acb->aio_ioctl_cmd = req;
>> @@ -665,7 +660,6 @@ BlockDriverAIOCB *paio_ioctl(BlockDriverState *bs, int 
>> fd,
>>
>>   int paio_init(void)
>>   {
>> -struct sigaction act;
>>   PosixAioState *s;
>>   int fds[2];
>>   int ret;
>> @@ -675,11 +669,6 @@ int paio_init(void)
>>
>>   s = g_malloc(sizeof(PosixAioState));
>>
>> -sigfillset(&act.sa_mask);
>> -act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
>> -act.sa_handler = aio_signal_handler;
>> -sigaction(SIGUSR2,&act, NULL);
>> -
>>   s->first_aio = NULL;
>>   if (qemu_pipe(fds) == -1) {
>>   fprintf(stderr, "failed to create pipe\n");
> 
> I think it is possible to go a step further, turn 
> posix_aio_process_queue into a bottom half and get rid of the pipe 
> altogether.  This in turn would remove the only real user of 
> io_process_queue in qemu_aio_set_fd_handler.  However, this is already a 
> nice improvement.

But without the fd, wouldn't the I/O thread possibly wait for much
longer until its select() times out and it starts processing BHs?

Kevin

[Qemu-devel] mesavantages.net, bénéficiez d'avantages pour vous et vos salariés

2011-09-19 Thread Canalce

Canalce lance mesavantages.net
Si ce message n'est pas lisible, merci de suivre ce lien 














Si vous ne voulez plus recevoir cet email, merci de suivre ce lien 
Si vous souhaitez vous désabonner, recopiez cette adresse dans la barre 
d'adresse de votre navigateur : 
http://www.ccmail-gmc.com/u-1.1.php?param=3Rbi_1IvEkRQjb1KxEXDWVukfGahjZsOBcEkRQjb1KwEjb1KEkRQjb1KxE2/jGEg/W/KEkRQjb1KSEIQIvIGmQRvEkRQjb1KwEiQI3bi3EkRQjb1KxEwJCLLLSLEkRQjb1KwE3sk/lPbY/EkRQjb1KxExEkRQjb1KwE1glibjkbvI/EkRQjb1KxExLL

Re: [Qemu-devel] [PATCH] linux-aio: remove process requests callback

Am 19.09.2011 17:05, schrieb Paolo Bonzini:
> Signed-off-by: Paolo Bonzini 
> ---
>  linux-aio.c |   11 +--
>  1 files changed, 1 insertions(+), 10 deletions(-)

Thanks applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2

Am 19.09.2011 16:37, schrieb Frediano Ziglio:
> Now that iothread is always compiled sending a signal seems only an
> additional step. This patch also avoid writing to two pipe (one from signal
> and one in qemu_service_io).
> 
> Work with kvm enabled or disabled. strace output is more readable (less 
> syscalls).
> 
> Signed-off-by: Frediano Ziglio 

Thanks applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 0/8] tcg/interpreter: Add TCG + interpreter for bytecode (virtual machine)

2011-09-19 Thread Mulyadi Santosa

Hi Stefan

On Mon, Sep 19, 2011 at 03:15, Stefan Weil  wrote:
> Am 18.09.2011 18:39, schrieb Mulyadi Santosa:
>> On Sun, Sep 18, 2011 at 22:13, Stefan Weil  wrote:
>> So, that interpreter, should it be build inside Qemu too? Or can we
>> use/write external one? let's say creating one in python and TCI
>> passes the generated bytecode via UNIX socket to the listening Python
>> script, is that doable or one of the goal your design?
>
> Do you think of something like http://bellard.org/jslinux/?

None specific, but yes, that could be something that describe my idea
:) (anyway, that jslinux is awesome so to speak).

> The current interpreter is built inside QEMU, and I'm afraid
> that separating code generator and interpreter in different
> processes might be a lot of work. Maybe running both in
> separate threads would be possible, so the code generator
> could prepare new bytecode while the interpreter is still
> running the previous one.

Hm, got it...thanks for your kind explanation. I am very appreciate it.


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files

2011-09-19 Thread Alex Williamson

On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
> 
> ===
> 
> 1.  Overview
> 
>   This specification describes the layout of device files
>   used in the context of vfio, which gives user space
>   direct access to I/O devices that have been bound to
>   vfio.
> 
>   When a device fd is opened and read, offset 0x0 contains
>   a fixed sized header followed by a number of variable length
>   records that describe different characteristics
>   of the device-- addressable regions, interrupts, etc.
> 
>   0x0  +-+-+
>| magic | u32  // identifies this as a vfio
> device file
>+---+ and identifies the type of bus
>| version   | u32  // specifies the version of this
>+---+
>| flags | u32  // encodes any flags
>+---+
>|  dev info record 0|
>|type   | u32   // type of record
>|rec_len| u32   // length in bytes of record
>|   |  (including record header)
>|flags  | u32   // type specific flags
>|...content...  |   // record content, which could
>+---+   // include sub-records
>|  dev info record 1|
>+---+
>|  dev info record N|
>+---+
> 
>   The device info records following the file header may have
>   the following record types each with content encoded in
>   a record specific way:
> 
>   +---+--
>   |  type |
>Region |  num  | Description
>   ---
>   REGION   1describes an addressable address range for the device
>   DTPATH   2describes the device tree path for the device
>   DTINDEX  3describes the index into the related device tree
>   property (reg,ranges,interrupts,interrupt-map)
>   INTERRUPT4describes an interrupt for the device
>   PCI_CONFIG_SPACE 5property identifying a region as PCI config space
>   PCI_BAR_INDEX6describes the BAR index for a PCI region
>   PHYS_ADDR7describes the physical address of the region
>   ---
> 
> 2. Header
> 
> The header is located at offset 0x0 in the device fd
> and has the following format:
> 
> struct devfd_header {
> __u32 magic;
> __u32 version;
> __u32 flags;
> };
> 
> The 'magic' field contains a magic value that will
> identify the type bus the device is on.  Valid values
> are:
> 
> 0x70636900   // "pci" - PCI device
> 0x6474   // "dt" - device tree (system bus)
> 
> 3. Region
> 
>   A REGION record an addressable address region for the device.
> 
> struct devfd_region {
> __u32 type;   // must be 0x1
> __u32 record_len;
> __u32 flags;
> __u64 offset; // seek offset to region from beginning
>   // of file
> __u64 len   ; // length of the region
> };
> 
>   The 'flags' field supports one flag:
> 
>   IS_MMAPABLE
> 
> 4. Device Tree Path (DTPATH)
> 
>   A DTPATH record is a sub-record of a REGION and describes
>   the path to a device tree node for the region

Can we better distinguish sub-records from records?  I assume we're
trying to be as versatile as possible by having a single "type" address
space, but is this going to lead to implementation problems?  A DTPATH
as a record, an INTERRUPT as a sub-record, etc.  Should we instead have
a "subtype" address space per "type" and per device type?  For a "dt"
device, it looks like we really have:

  * REGION (type 0)
  * DTPATH (subtype 0)
  * DTINDEX (subtype 1)
  * PHYS_ADDR (subtype 2)
  * INTERRUPT (type 1)
  * DTPATH (subtype 0)
  * DTINDEX (subtype 1)

While "pci" is:

  * REGION (type 0)
  * PCI_CONFIG_SPACE (subtype 0)
  * PCI_BAR_INDEX (subtype 1)
  * INTERRUPT (type 1)

> struct devfd_dtpath {
> __u32 type;   // must be 0x2
> __u32 record_len;
> __u64 char[]   ; // length of the region
> };
> 
> 5. Device Tree Index (DTINDEX)
> 
>   A DTINDEX record is a sub-record of a REGION and specifies
>   the index into the resource list encoded in the associated
>   device tree property-- "reg", "ranges", "interrupts", or

[Qemu-devel] [PATCH] qcow2: Unlock during COW

Unlocking during COW allows for more parallelism. One change it requires is
that buffers are dynamically allocated instead of just using a per-image
buffer.

While touching the code, drop the synchronous qcow2_read() function and replace
it by a bdrv_read() call.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c |   95 +---
 1 files changed, 26 insertions(+), 69 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c79323a..61b8e62 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -289,89 +289,42 @@ void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t 
sector_num,
 }
 }
 
-
-static int qcow2_read(BlockDriverState *bs, int64_t sector_num,
-  uint8_t *buf, int nb_sectors)
-{
-BDRVQcowState *s = bs->opaque;
-int ret, index_in_cluster, n, n1;
-uint64_t cluster_offset;
-struct iovec iov;
-QEMUIOVector qiov;
-
-while (nb_sectors > 0) {
-n = nb_sectors;
-
-ret = qcow2_get_cluster_offset(bs, sector_num << 9, &n,
-&cluster_offset);
-if (ret < 0) {
-return ret;
-}
-
-index_in_cluster = sector_num & (s->cluster_sectors - 1);
-if (!cluster_offset) {
-if (bs->backing_hd) {
-/* read from the base image */
-iov.iov_base = buf;
-iov.iov_len = n * 512;
-qemu_iovec_init_external(&qiov, &iov, 1);
-
-n1 = qcow2_backing_read1(bs->backing_hd, &qiov, sector_num, n);
-if (n1 > 0) {
-BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING);
-ret = bdrv_read(bs->backing_hd, sector_num, buf, n1);
-if (ret < 0)
-return -1;
-}
-} else {
-memset(buf, 0, 512 * n);
-}
-} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
-if (qcow2_decompress_cluster(bs, cluster_offset) < 0)
-return -1;
-memcpy(buf, s->cluster_cache + index_in_cluster * 512, 512 * n);
-} else {
-BLKDBG_EVENT(bs->file, BLKDBG_READ);
-ret = bdrv_pread(bs->file, cluster_offset + index_in_cluster * 
512, buf, n * 512);
-if (ret != n * 512)
-return -1;
-if (s->crypt_method) {
-qcow2_encrypt_sectors(s, sector_num, buf, buf, n, 0,
-&s->aes_decrypt_key);
-}
-}
-nb_sectors -= n;
-sector_num += n;
-buf += n * 512;
-}
-return 0;
-}
-
 static int copy_sectors(BlockDriverState *bs, uint64_t start_sect,
 uint64_t cluster_offset, int n_start, int n_end)
 {
 BDRVQcowState *s = bs->opaque;
 int n, ret;
+void *buf;
 
 n = n_end - n_start;
-if (n <= 0)
+if (n <= 0) {
 return 0;
+}
+
+buf = qemu_blockalign(bs, n * BDRV_SECTOR_SIZE);
+
 BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
-ret = qcow2_read(bs, start_sect + n_start, s->cluster_data, n);
-if (ret < 0)
-return ret;
+ret = bdrv_read(bs, start_sect + n_start, buf, n);
+if (ret < 0) {
+goto out;
+}
+
 if (s->crypt_method) {
 qcow2_encrypt_sectors(s, start_sect + n_start,
-s->cluster_data,
-s->cluster_data, n, 1,
+buf, buf, n, 1,
 &s->aes_encrypt_key);
 }
+
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
-ret = bdrv_write(bs->file, (cluster_offset >> 9) + n_start,
-s->cluster_data, n);
-if (ret < 0)
-return ret;
-return 0;
+ret = bdrv_write(bs->file, (cluster_offset >> 9) + n_start, buf, n);
+if (ret < 0) {
+goto out;
+}
+
+ret = 0;
+out:
+qemu_vfree(buf);
+return ret;
 }
 
 
@@ -620,7 +573,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 start_sect = (m->offset & ~(s->cluster_size - 1)) >> 9;
 if (m->n_start) {
 cow = true;
+qemu_co_mutex_unlock(&s->lock);
 ret = copy_sectors(bs, start_sect, cluster_offset, 0, m->n_start);
+qemu_co_mutex_lock(&s->lock);
 if (ret < 0)
 goto err;
 }
@@ -628,8 +583,10 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 if (m->nb_available & (s->cluster_sectors - 1)) {
 uint64_t end = m->nb_available & ~(uint64_t)(s->cluster_sectors - 1);
 cow = true;
+qemu_co_mutex_unlock(&s->lock);
 ret = copy_sectors(bs, start_sect + end, cluster_offset + (end << 9),
 m->nb_available - end, s->cluster_sectors);
+qemu_co_mutex_lock(&s->lock);
 if (ret < 0)
 goto err;
 }
-- 
1.7.6.2

Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2

On 09/19/2011 05:11 PM, Kevin Wolf wrote:

>  I think it is possible to go a step further, turn
>  posix_aio_process_queue into a bottom half and get rid of the pipe
>  altogether.  This in turn would remove the only real user of
>  io_process_queue in qemu_aio_set_fd_handler.  However, this is already a
>  nice improvement.

But without the fd, wouldn't the I/O thread possibly wait for much
longer until its select() times out and it starts processing BHs?

Hmm, in qemu_aio_wait yes...  In the normal qemu event loop, however, 
bottom halves exit the select loop with qemu_notify_event().  qemu 
currently has a 1-second timeout for the select, but it should work just 
as well with an infinite timeout.  If it doesn't, it's a bug.

It should be possible to turn posix_aio_process_queue into a bottom 
half, but the pipe is still necessary in order to exit the qemu_aio_wait 
select loop and schedule the bottom half.

Paolo

Re: [Qemu-devel] [PATCH 24/58] PPC: E500: Add PV spinning code

2011-09-19 Thread Scott Wood

On 09/19/2011 06:35 AM, Alexander Graf wrote:
> 
> On 17.09.2011, at 19:40, Blue Swirl wrote:
> 
>> On Sat, Sep 17, 2011 at 5:15 PM, Alexander Graf  wrote:
>>>
>>> Am 17.09.2011 um 18:58 schrieb Blue Swirl :
>>>
 On Sparc32, there is no need for a PV device. The CPU is woken up from
 halted state with an IPI. Maybe you could use this approach?
>>>
>>> The way it's done here is defined by u-boot and now also nailed down in the 
>>> ePAPR architecture spec. While alternatives might be more appealing, this 
>>> is how guests work today :).
>>
>> OK. I hoped that there were no implementations yet. The header (btw
>> missing) should point to the spec.

The goal with the spin table stuff, suboptimal as it is, was something
that would work on any powerpc implementation.  Other
implementation-specific release mechanisms are allowed, and are
indicated by a property in the cpu node, but only if the loader knows
that the OS supports it.

> IIUC the spec that includes these bits is not finalized yet. It is however in 
> use on all u-boot versions for e500 that I'm aware of and the method Linux 
> uses to bring up secondary CPUs.

It's in ePAPR 1.0, which has been out for a while now.  ePAPR 1.1 was
just released which clarifies some things such as WIMG.

> Stuart / Scott, do you have any pointers to documentation where the spinning 
> is explained?

https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf

-Scott

Re: [Qemu-devel] blobstore disk format (was Re: Design of the blobstore)

2011-09-19 Thread Stefan Berger


On 09/17/2011 03:28 PM, Michael S. Tsirkin wrote:

On Fri, Sep 16, 2011 at 12:46:40PM -0400, Stefan Berger wrote:

On 09/16/2011 10:44 AM, Michael S. Tsirkin wrote:

On Thu, Sep 15, 2011 at 10:33:13AM -0400, Stefan Berger wrote:

On 09/15/2011 08:28 AM, Michael S. Tsirkin wrote:

So the below is a proposal for a directory scheme
for storing (optionally multiple) nvram images,
along with any metadata.
Data is encoded using BER:
http://en.wikipedia.org/wiki/Basic_Encoding_Rules
Specifically, we mostly use the subsets.


Would it change anything if we were to think of the NVRAM image as
another piece of metadata?

Yes, we can do that, sure. I had the feeling that it will help to lay
out the image at the end, to make directory listing
more efficient - the rest of metadata is usually small,
image might be somewhat large.


Why not let a convenience library handle the metadata on the device
level, having it create the blob that the NVRAM layer ends up
writing and parsing before the device uses it? Otherwise I should
maybe rename the nvram to meatdata_store :-/

Maybe we are talking about different things. All I agrue for
is using a common standard format for storing metadata,
instead of having each device roll its own.
That's fine. The TPM code inside libtpms serializes all internal data 
structures for later resumption. It doesn't use ASN.1 but in effect 
endianess-normalizes them and stores them in a packed format that can 
later be resumed along with a version tag prepended where need . Are you 
suggesting to change that ? I hope not...





I am also wondering whether each device shouldn't just handle the
metadata itself,

It could be that just means we will have custom code with
different bugs in each device.
Note that from experience with formats, the problem with
time becomes less trivial than it seems as we
need to provide forward and backward compatibility
guarantees.


Is that guaranteed just by using ASN.1 ?

At least for BER, yes. We can always skip an optional field
that we don't recognize without knowing anything about
its internal format.



Do we need to add a
revision to the metadata?

IMO, no. Instead we add optional attributes as long as we can
preserve backwards compatibility, and madatory attributes
if we can't.

Are devices doing this right now or are these future changes to devices' 
code?



encryption, integrity value (crc32 or sha1) and so on. What
metadata should there be that really need to be handled on the NVRAM
API and below level rather than on the device-specific code level?

So checksum  (checksum value and type) 'and so on' are what I call
metadata :) Doing it at device level seems wrong.


You mean doing it at the NVRAM level seems wrong. Of course, again
something a device could write into a header prepended to the actual
blob. Maybe every device that needs it should do that so that if we
were to support encryption of blobs and the key for decryption was
wrong one could detect it early without feeding badly decrypted /
corrupted state into the device and see what happens.

Do what? Checksum the data? Well, error detection is nice,
but it could be that people actually care about not losing
all of the data on nvram if qemu is killed.  I also wonder whether
invalidating all data because of a single bit corruption is a bug or a
feature.

The checksuming I think makes sense if encryption is being added so 
decryption and testing for proper key material remains an NVRAM 
operation rather than a device operation.

We use a directory as a SET in a CER format.
This allows generating directory online without scanning
the entries beforehand.


I guess it is the 'unknown' for me... but what is the advantage of
using ASN1 for this rather than just writing out packed and
endianess-normalized data structures (with revision value),

If you want an example of where this 'custom formats are easy
so let us write one' leads to in the end,
look no further than live migration code.
It's a mess of hacks that does not even work across
upstream qemu versions, leave alone across
downstreams (different linux distros).


So is ASN1 the answer or does one still need to add a revision tag
to each blob putting in custom code for parsing the different
revisions of data structures (I guess) that may be extended/changed
over time?

Stefan

We don't need revisions. We can always parse a new structure
skipping optional attributes we don't recognize. In case we want to
break old qemu versions intentially, we can add
a mandatory attribute.
So you said you had some code for the handling of ASN.1. Can sketch how 
the interaction of devices would work with mandatory and optional 
attributes along with an API? I'd prefer to NOT have the attributes and 
values be a part of the NVRAM API itself but let a (mandatory) library 
handle the serialization and deserialization of these metadata when a 
device wants to write or read state respectively. But maybe I just want 
to keep the NVRAM API 'too simple'.


   Stefan

Re: [Qemu-devel] [PATCH 04/14] qdev: take ownership of id pointer

2011-09-19 Thread Anthony Liguori


On 09/19/2011 02:34 AM, Gerd Hoffmann wrote:

On 09/16/11 18:00, Anthony Liguori wrote:

qdev has this quirk that it owns a seemingly arbitrary QemuOpts pointer. That's
because qdev expects a static string for the id (which really makes no sense
since ids are supposed to be provided by the user). Instead of managing just
the id pointer, we currently take ownership of the entire QemuOpts structure
that was used to create the device just to keep the name around.


FYI: Keeping the pointer to the QemuOpts has one more reason: It will free the
QemuOpts on hot-unplug, which is needed to free the id from QemuOpts point of
view, which in turn allows to re-use the id when hot-plugging the same (or
another) device later on.


You mean, tie QemuOpts life cycle to devices life cycle such that you cannot 
accidentally create a non-device QemuOpts that conflicts with the id of a device?


Regard,

Anthony Liguori



cheers,
Gerd

Re: [Qemu-devel] [PATCH 5/8] tcg: Add interpreter for bytecode

2011-09-19 Thread Richard Henderson

On 09/17/2011 01:00 PM, Stefan Weil wrote:
> +#if TCG_TARGET_HAS_ext8s_i32
> +case INDEX_op_ext8s_i32:
> +t0 = *tb_ptr++;
> +t1 = tci_read_r8s(&tb_ptr);
> +tci_write_reg32(t0, t1);
> +break;
> +#endif

You really ought not need all these ifdefs.

> +#if TCG_TARGET_HAS_rot_i64
> +case INDEX_op_rotl_i64:
> +case INDEX_op_rotr_i64:
> +TODO();
> +break;
> +#endif

Eh?  Just the same as your rot_i32 implementations,
with different types.


r~

Re: [Qemu-devel] [RFC PATCH 3/5] VFIO: Base framework for new VFIO driver

2011-09-19 Thread Alex Williamson

Sorry for the delay, just getting back from LPC and some time off...

On Wed, 2011-09-07 at 10:52 -0400, Konrad Rzeszutek Wilk wrote:
> > +static long vfio_iommu_unl_ioctl(struct file *filep,
> > +unsigned int cmd, unsigned long arg)
> > +{
> > +   struct vfio_iommu *viommu = filep->private_data;
> > +   struct vfio_dma_map dm;
> > +   int ret = -ENOSYS;
> > +
> > +   switch (cmd) {
> > +   case VFIO_IOMMU_MAP_DMA:
> > +   if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +   return -EFAULT;
> > +   ret = 0; // XXX - Do something
> 
> 

Truly an RFC ;)

> > +   if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +   ret = -EFAULT;
> > +   break;
> > +
> > +   case VFIO_IOMMU_UNMAP_DMA:
> > +   if (copy_from_user(&dm, (void __user *)arg, sizeof dm))
> > +   return -EFAULT;
> > +   ret = 0; // XXX - Do something
> > +   if (!ret && copy_to_user((void __user *)arg, &dm, sizeof dm))
> > +   ret = -EFAULT;
> > +   break;
> > +   }
> > +   return ret;
> > +}
> > +
> > +#ifdef CONFIG_COMPAT
> > +static long vfio_iommu_compat_ioctl(struct file *filep,
> > +   unsigned int cmd, unsigned long arg)
> > +{
> > +   arg = (unsigned long)compat_ptr(arg);
> > +   return vfio_iommu_unl_ioctl(filep, cmd, arg);
> > +}
> > +#endif /* CONFIG_COMPAT */
> > +
> > +const struct file_operations vfio_iommu_fops = {
> > +   .owner  = THIS_MODULE,
> > +   .release= vfio_iommu_release,
> > +   .unlocked_ioctl = vfio_iommu_unl_ioctl,
> > +#ifdef CONFIG_COMPAT
> > +   .compat_ioctl   = vfio_iommu_compat_ioctl,
> > +#endif
> > +};
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> .. snip..
> > +int vfio_group_add_dev(struct device *dev, void *data)
> > +{
> > +   struct vfio_device_ops *ops = data;
> > +   struct list_head *pos;
> > +   struct vfio_group *vgroup = NULL;
> > +   struct vfio_device *vdev = NULL;
> > +   unsigned int group;
> > +   int ret = 0, new_group = 0;
> 
> 'new_group' should probably be 'bool'.

ok

> > +
> > +   if (iommu_device_group(dev, &group))
> > +   return 0;
> 
> -EEXIST?

I think I made this return 0 because it's called from device add
notifiers and walking devices lists.  It's ok for it to fail, not all
devices have to be backed by an iommu, they just won't show up in vfio.
Maybe I should leave that to the leaf callers though.  EINVAL is
probably more appropriate.

> > +
> > +   mutex_lock(&vfio.group_lock);
> > +
> > +   list_for_each(pos, &vfio.group_list) {
> > +   vgroup = list_entry(pos, struct vfio_group, next);
> > +   if (vgroup->group == group)
> > +   break;
> > +   vgroup = NULL;
> > +   }
> > +
> > +   if (!vgroup) {
> > +   int id;
> > +
> > +   if (unlikely(idr_pre_get(&vfio.idr, GFP_KERNEL) == 0)) {
> > +   ret = -ENOMEM;
> > +   goto out;
> > +   }
> > +   vgroup = kzalloc(sizeof(*vgroup), GFP_KERNEL);
> > +   if (!vgroup) {
> > +   ret = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   vgroup->group = group;
> > +   INIT_LIST_HEAD(&vgroup->device_list);
> > +
> > +   ret = idr_get_new(&vfio.idr, vgroup, &id);
> > +   if (ret == 0 && id > MINORMASK) {
> > +   idr_remove(&vfio.idr, id);
> > +   kfree(vgroup);
> > +   ret = -ENOSPC;
> > +   goto out;
> > +   }
> > +
> > +   vgroup->devt = MKDEV(MAJOR(vfio.devt), id);
> > +   list_add(&vgroup->next, &vfio.group_list);
> > +   device_create(vfio.class, NULL, vgroup->devt,
> > + vgroup, "%u", group);
> > +
> > +   new_group = 1;
> > +   } else {
> > +   list_for_each(pos, &vgroup->device_list) {
> > +   vdev = list_entry(pos, struct vfio_device, next);
> > +   if (vdev->dev == dev)
> > +   break;
> > +   vdev = NULL;
> > +   }
> > +   }
> > +
> > +   if (!vdev) {
> > +   /* Adding a device for a group that's already in use? */
> > +   /* Maybe we should attach to the domain so others can't */
> > +   BUG_ON(vgroup->container &&
> > +  vgroup->container->iommu &&
> > +  vgroup->container->iommu->refcnt);
> > +
> > +   vdev = ops->new(dev);
> > +   if (IS_ERR(vdev)) {
> > +   /* If we just created this vgroup, tear it down */
> > +   if (new_group) {
> > +   device_destroy(vfio.class, vgroup->devt);
> > +   idr_remove(&vfio.idr, MINOR(vgroup->devt));
> > +   list_del(&vgroup->next);
> > +   kfree(vgroup)

Re: [Qemu-devel] [Qemu-ppc] [PATCH 33/58] KVM: update kernel headers

2011-09-19 Thread Scott Wood

On 09/17/2011 11:59 AM, Blue Swirl wrote:
> On Wed, Sep 14, 2011 at 8:42 AM, Alexander Graf  wrote:
>> diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
>> index 7bdcf93..b315e27 100644
>> --- a/linux-headers/linux/kvm_para.h
>> +++ b/linux-headers/linux/kvm_para.h
>> @@ -26,3 +26,4 @@
>>  #include 
>>
>>  #endif /* __LINUX_KVM_PARA_H */
>> +
> 
> Can we avoid this?

It could be fixed in the kernel, but I don't think we should be making
local changes to this in qemu.  It'll just get reintroduced the next
time somebody runs the update script.

-Scott

Re: [Qemu-devel] [Qemu-ppc] [PATCH 33/58] KVM: update kernel headers