Re: [PATCH] qemu: virtio save/load bindings

2009-05-26 Thread Paul Brook
> > -/* FIXME: load/save binding.  */
> > -//pci_device_save(&vdev->pci_dev, f);
> > -//msix_save(&vdev->pci_dev, f);
>
> qdev regressed save/restore?  What else is broken right now from the
> qdev commit?
>
> I'm beginning to think committing in the state it was in was a mistake.
> Paul, can you put together a TODO so that we know all of the things that
> have regressed so we can get things back into shape?

Sorry, this one apparently slipped past. I#'d intended to fix it, but 
apparently never ported that bit of the patch.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-08 Thread Paul Brook
On Monday 08 June 2009, Weidong Han wrote:
> When hot remove an assigned device, segmentation fault was triggered
> by qemu_free(&pci_dev->qdev) in pci_unregister_device().
> pci_register_device() doesn't initialize or set pci_dev->qdev. For an
> assigned device, qdev variable isn't touched at all. So segmentation
> fault happens when to free a non-initialized qdev.

Better would be to just disable hot remove for devices still using the legacy 
pci_register_device API.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] qemu: fix hot remove assigned device

2009-06-09 Thread Paul Brook
On Tuesday 09 June 2009, Han, Weidong wrote:
> Paul Brook wrote:
> > On Monday 08 June 2009, Weidong Han wrote:
> >> When hot remove an assigned device, segmentation fault was triggered
> >> by qemu_free(&pci_dev->qdev) in pci_unregister_device().
> >> pci_register_device() doesn't initialize or set pci_dev->qdev. For an
> >> assigned device, qdev variable isn't touched at all. So segmentation
> >> fault happens when to free a non-initialized qdev.
> >
> > Better would be to just disable hot remove for devices still using
> > the legacy pci_register_device API.
>
> PCI passthrough uses pci_register_device to register assigned device to
> qemu. Is there newer API to do so?

Yes. See e.g. LSI scsi emulation.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-09 Thread Paul Brook
On Monday 25 May 2009, Michael S. Tsirkin wrote:
> Add functions implementing MSI-X support. First user will be virtio-pci.
> Note that platform must set a flag to declare MSI supported.
> For PC this will be set by APIC.

This sounds wrong. The device shouldn't know or care whether the system has a 
MSI capable interrupt controller. That's for the guest OS to figure out.

Paul 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
> > > Note that platform must set a flag to declare MSI supported.
> > > For PC this will be set by APIC.
> >
> > This sounds wrong. The device shouldn't know or care whether the system
> > has a MSI capable interrupt controller. That's for the guest OS to figure
> > out.
>
> You are right of course. In theory there's nothing that breaks if I
> set this flag to on, on all platforms. OTOH if qemu emulates some
> controller incorrectly, guest might misdetect MSI support in the
> controller, and things will break horribly.
>
> It seems safer to have a flag that can be enabled by people
> that know about a specific platform.

No. The solution is to fix whatever is broken.

If we really need to avoid MSI-X capable devices then that should be done 
explicity per-device. i.e. you have a different virtio-net device that does 
not use MSI-X.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
> > If we really need to avoid MSI-X capable devices then that should be done
> > explicity per-device. i.e. you have a different virtio-net device that
> > does not use MSI-X.
> >
> > Paul
>
> Why should it be done per-device?


Because otherwise you end up with the horrible hacks that you're currently 
tripping over: devices have to magically morph into a different device when 
you load a VM. That's seems just plain wrong to me. Loading a VM shouldn't not 
do anything that can't happen during normal operation.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
> > That's seems just plain wrong to me.
> > Loading a VM shouldn't not
> > do anything that can't happen during normal operation.
>
> At least wrt pci, we are very far from this state: load just overwrites
> all registers, readonly or not, which can never happen during normal
> operation.

IMO that code is wrong. We should only be loading things that the guest can 
change (directly or indirectly).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv3 03/13] qemu: add routines to manage PCI capabilities

2009-06-10 Thread Paul Brook
> > caps can be anywhere, but we don't expect it to change during machine
> > execution lifetime.
> >
> > Or I am just confused by the name "pci_device_load" ?
>
> Right. So I want to load an image and it has capability X at offset Y.
> wmask has to match. I don't want to assume that we never change Y
> for the device without breaking old images, so I clear wmask here
> and set it up again after looking up capabilities that I loaded.

We should not be loading state into a different device (or a similar device 
with a different set of capabilities).

If you want to provide backwards compatibility then you should do that by 
creating a device that is the same as the original.  As I mentioned in my 
earlier mail, loading a snapshot should never do anything that can not be 
achieved through normal operation.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2009 at 04:15:04PM +0100, Paul Brook wrote:
> > > > That's seems just plain wrong to me.
> > > > Loading a VM shouldn't not
> > > > do anything that can't happen during normal operation.
> > >
> > > At least wrt pci, we are very far from this state: load just overwrites
> > > all registers, readonly or not, which can never happen during normal
> > > operation.
> >
> > IMO that code is wrong. We should only be loading things that the guest
> > can change (directly or indirectly).
>
> Making it work this way will mean that minor changes to a device can
> break backwards compatibility with old images, often in surprising ways.
> What are the advantages?

If you can't create an identical machine from scratch then I don't consider 
snapshot/migration to be a useful feature. i.e. as soon as you shutdown and 
restart the guest it is liable to break anyway.

It may be that the snapshot/migration code wants to include a machine config, 
and create a new machine from that. However this is a separate issue, and 
arguably something your VM manager should be handling for you.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
> > If you can't create an identical machine from scratch then I don't
> > consider snapshot/migration to be a useful feature. i.e. as soon as you
> > shutdown and restart the guest it is liable to break anyway.
>
> Why is liable to break?

A VM booted on an old version of qemu and migrated to a new version will 
behave differently to a the same VM booted on a new version of qemu.
I hope I don't need to explain why this is bad.

As previously discussed, any guest visible changes are liable to break a guest 
OS, particularly guests like Windows which deliberately break when run on 
"different" hardware. Personally I don't particularly care, but if we support 
live migration we also need to support "cold" migration - i.e. shutdown and 
restart.

>So once you load and image with MSIX capability off,
>it will stay off across guest restarts.

I'm assuming guest restart includes restarting qemu.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
On Wednesday 10 June 2009, Michael S. Tsirkin wrote:
> On Wed, Jun 10, 2009 at 05:46:03PM +0100, Paul Brook wrote:
> > > > If you can't create an identical machine from scratch then I don't
> > > > consider snapshot/migration to be a useful feature. i.e. as soon as
> > > > you shutdown and restart the guest it is liable to break anyway.
> > >
> > > Why is liable to break?
> >
> > A VM booted on an old version of qemu and migrated to a new version will
> > behave differently to a the same VM booted on a new version of qemu.
>
> It will behave identically. That's what the patch does: discover
> how did the device behave on old qemu, and make it behave the same way
> on new qemu.

You're missing the point.  After doing a live migration from old-qemu to new-
qemu, there is no snapshot to load.  We need to be able to shutdown the guest, 
kill qemu (without saving a snapshot), then start qemu with the exact same 
hardware.

If we can't start a new qemu with the same hardware configuration then we 
should not be allowing migration or loading of snapshots.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-10 Thread Paul Brook
> > If we can't start a new qemu with the same hardware configuration then we
> > should not be allowing migration or loading of snapshots.
>
> OK, so I'll add an option in virtio-net to disable msi-x, and such
> an option will be added in any device with msi-x support.
> Will that address your concern?

Yes, as long as migration fails when you try to migrate to the wrong kind of 
device.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv5 08/13] qemu: add support for resizing regions

2009-06-18 Thread Paul Brook
On Thursday 18 June 2009, Michael S. Tsirkin wrote:
> Make it possible to resize PCI regions.  This will be used by virtio
> with MSI-X, where the region size depends on whether MSI-X is enabled,
> and can change across load/save.

I thought we'd agreed we shouldn't be doing this.
i.e. if the user tries to load the wrong device, just bail out.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Paul Brook
On Tuesday 23 June 2009, Avi Kivity wrote:
> On 06/23/2009 12:47 AM, Andre Przywara wrote:
> > KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU
> > clears it. On some occasions one want to set or clear it the other way
> > round (for instance to get HyperV running inside a guest).
> > Allow the default to be overridden on the command line and fix some
> > whitespace damage on the way.
>
> It makes sense for qemu to set the hypervisor bit unconditionally.  A
> guest running under qemu is not bare metal.

I see no reason why a guest has to be told that it's running inside a VM.
In principle an appropriately configured qemu should be indistinguishable from 
real hardware. In practice it's technically infeasible to cover absolutely 
everything, but if we set this bit we're not even trying.

I have no objection to the bit being set by default for the QEMU CPU types.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Paul Brook
> I agree it's pointless, but it is a Microsoft requirement for passing
> their SVVP tests.  Enabling it by default makes life a little easier for
> users who wish to validate their hypervisor and has no drawbacks.

I wasn't arguing against setting it by default (for QEMU CPU types), just 
against enabling it unconditionally.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook
> Here are two patches. One implements a virtio-serial device in qemu
> and the other is the driver for a guest kernel.

So I'll ask again. Why is this separate from virtio-console?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook
On Tuesday 23 June 2009, Amit Shah wrote:
> On (Tue) Jun 23 2009 [13:55:52], Paul Brook wrote:
> > > Here are two patches. One implements a virtio-serial device in qemu
> > > and the other is the driver for a guest kernel.
> >
> > So I'll ask again. Why is this separate from virtio-console?
>
> I'm basically writing a vmchannel and found out that a lot can be shared
> between some virtio devices. So I'm just trying to abstract out those
> things in virtio-serial. Once we're sure virtio-serial is good and ready
> to be merged, I will look at converting over virtio-console to the
> virtio-serial interface.

That doesn't really answer my question. We already have a virtual serial 
device (called virtio-console). Why are you inventing another one?

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook
On Tuesday 23 June 2009, Christian Bornträger wrote:
> Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook:
> > > Here are two patches. One implements a virtio-serial device in qemu
> > > and the other is the driver for a guest kernel.
> >
> > So I'll ask again. Why is this separate from virtio-console?
>
> I did some work on virtio-console, since kvm on s390 does not provide any
> other. I dont think we should mix two different types of devices into one
> driver. The only thing that these drivers have in common, is the fact that
> there are two virtqueues, piping data (single bytes or larger chunks). So
> you could make the same argument with the first virtio_net driver (the one
> before GSO) - which is obviously wrong. The common part of the transport is
> already factored out to virtio_ring and the transports.

virtio-net is packet based, not stream based.

> In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a
> guest/host ABI for this console. (and virtio was not meant to be a KVM-only
> interface, that we can change all the time). David A. Wheeler's 'SLOCCount'
> gives me 141 lines of code for virtio_console.c. I am quite confident that
> the saving we could achieve by merging these two drivers is not worth the
> hazzle.

AFAICS the functionality provided is exactly the same. The host API is 
identical, and the guest userspace API only has trivial differences (which 
could be eliminated with a simple udev rule). By my reading virtio-serial 
makes virtio-console entirely redundant.

> Discussion about merging the console code into this distracts from the main
> problem: To get the interface and functionality right before it becomes an
> ABI (is it /dev/ttyS, network like or is it something completely
> different?).

Ah, now that's a different question. I don't know what the requirements are 
for the higher level vmchannel interface. However I also don't care.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Warn if a qcow (not qcow2) file is opened

2009-06-30 Thread Paul Brook
> >>> The qcow block driver format is no longer maintained and likely
> >>> contains
> >>> serious data corruptors.  Urge users to stay away for it, and advertise
> >>> the new and improved replacement.
> >
> > I'm not sure how I feel about this.  Can we prove qcow is broken?  Is
> > it only broken for writes and not reads?
>
> Well, Kevin posted a patch, so it is.  It's definitely unmaintained.
> Given it's a qemu native format, there is no interoperability value
> except with old qemu versions.
>
> > If we're printing a warning, does that mean we want to deprecate qcow
> > and eventually remove it (or remove write support, at least)?
>
> Yes.

IMHO there's little value in just printing a warning. Until it actually goes 
away, people are liable to assume we're just being paranoid/awkward and keep 
using it anyway.

I suggest crippling it now and, assuming noone steps up to fix+maintain it, 
ripping out the write support altogether at next release.
I'm assuming the readonly code is in better shape, and can be supported with 
relatively little effort.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] allow multi-core guests: introduce cores= option to -cpu

2009-07-03 Thread Paul Brook
> currently SMP guests happen to see  vCPUs as  different sockets.
> Some guests (Windows comes to mind) have license restrictions and refuse
> to run on multi-socket machines.
> So lets introduce a "cores=" parameter to the -cpu option to let the user
> specify the number of _cores_ the guest should see.

Sounds like this should be part of the -numa option.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] allow multi-core guests: introduce cores= option to -cpu

2009-07-03 Thread Paul Brook
On Saturday 04 July 2009, Andre Przywara wrote:
> Paul Brook wrote:
> >> currently SMP guests happen to see  vCPUs as  different sockets.
> >> Some guests (Windows comes to mind) have license restrictions and refuse
> >> to run on multi-socket machines.
> >> So lets introduce a "cores=" parameter to the -cpu option to let the
> >> user specify the number of _cores_ the guest should see.
> >
> > Sounds like this should be part of the -numa option.
>
> Sound reasonable on the first glance, but would make it rather
> complicated in real life. I suppose multi-core is far more interesting
> to most of the people than multi-node, so I would opt for the easier:
> -smp 2,cores=2 to specify a dual core guest.

I disagree. I think it makes a sense of the topology of nodes, cores and 
threads to all be specified in the same place. All the options you don't 
specify should have sensible defaults.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Planning for the 0.11.0 release

2009-07-10 Thread Paul Brook
> Right, that part I'm okay with.  But the vCont based gdb model presumes
> a unified address space which while usually true for kernel address
> spaces, isn't universally true and certainly not true when PC is in
> userspace.  That's what I understood to be the major objection to vCont.

The thread bits are the wrong way to do things, but are probably relatively 
harmless for now. Expect me to remove them at the first available opportunity.

The 32/64-bit switching is just plain wrong, and makes it absolutely 
impossible for a client debugger to work correctly.
If you really can't be bothered fixing gdb (and you *really* should), then it 
should be some form of user switch that tells qemu to always report a 32-bit 
register set.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Planning for the 0.11.0 release

2009-07-10 Thread Paul Brook
> As pointed out before, it doesn't break anything but adds a workaround
> for scenarios which are _now_ broken (16/32 bit target code exported as
> 64 bit is widely useless for gdb today). Sorry, but you never explained
> to me how user are _currently_ supposed to debug under that conditions,
> namely 16/32 bit code executed by qemu-system-x86_64.

You're working around a gdb bug it in a way that means a fixed gdb can't 
possibly work.  IMO the cure is worse than the disease.  Changing the register 
set reported to gdb part way through a session will always break. There's no 
possible way for gdb to what state the target is going to be in until it 
actually queries it.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev5: support colon in filenames

2009-07-15 Thread Paul Brook
On Wednesday 15 July 2009, Anthony Liguori wrote:
> Blue Swirl wrote:
> > I bet this won't compile on win32.
> >
> > Instead of this (IMHO doomed) escape approach, maybe the filename
> > parameter could be specified as the next argument, for example:
> > -hda format=qcow2,blah,blah,filename_is_next_arg -hda "filename with
> > funky characters like ',' ':' & '!'"
>
> -drive name=hda,if=ide,cache=off -hda foo.img
> -drive name=vda,if=virtio,cache=writeback -vda foo.img
> -drive name=sdb,if=scsi,unit=1 -sdb boo.img
>
> But Paul has long objected to having -vda or -sda syntaxes.  I do agree
> though that the most sane thing to do is to make the filename an
> independent argument.

I dislike implicit ordering of arguments even less. Having -foo -bar behave 
differently to -bar -foo is bad. We already have this a bit for things like 
-net, but splitting something into two as a parsing hack seems really lame.

How about -drive args=whatever,unparsed_name=bah

Where unparsed_name is defined to be the last argument, and blah is 
interpreted literally.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev5: support colon in filenames

2009-07-15 Thread Paul Brook
> Instead of using '-drive if=none' we could use some other syntax where
> the filename can be passed as separate argument.  Can switches have two
> arguments?  If so, maybe this:
>
>-hostdrive $file $options

This only works for a single mandatory argument that needs to contain awkward 
characters.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] rev3: support colon in filenames

2009-07-16 Thread Paul Brook
> So I propose this as a universal quoting scheme:
>
> \ where  is not ASCII alphanumeric.

No thank you. This sounds dangerously like the windows command shell quoting 
rules. At first clance they appear to "just work", however when you actually 
try to figure out what's going on it gets horribly messy. For example UNC 
paths (which start with a double backslash) come out really weird with your 
suggestion.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook
> > In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
> > for an additional Ack.
> 
> No problem, I'll get an Ack :)
> Meanwhile - here's a summary, as far as I understand it.
> 
> Originally PCI SIG only defined the electrical
> and mechanical requirements from hotplug, no standard
> software interface. So it needed ACPI to drive device-specific registers
> to actually do hotplug.
> At some point PCISIG defined standard interfaces
> for PCI hotplug. There are two of them: standard
> hot plug controller (SHPC) for PCI and PCIE hotplug
> for Express.
> 
> Now an OS can have a standard driver and use it
> to activate hotplug functionality. This is OS hotplug (OSHP).

So presumably this will work on targets that don't have ACPI?
Assuming a competent guest OS of course.  Have you tested this?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook
> > > Now an OS can have a standard driver and use it
> > > to activate hotplug functionality. This is OS hotplug (OSHP).
> > 
> > So presumably this will work on targets that don't have ACPI?
> > Assuming a competent guest OS of course.  Have you tested this?
> 
> This being the qemu side of things? I run Linux
> and verified that it calls OSHP and afterwards,
> runs the native driver and handles hotplug/unplug
> without invoking ACPI at all.

I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets 
(i.e anything other than x86 PC).  From your description it sounds like it 
*should* work.
 
> It seems that at least the SHPC driver in linux
> doesn't work if you don't have an acpi table
> with the OSHP method - not many people run with acpi=off
> nowdays, so it's probably just a bug.
> I'll check how hard it is to fix this.

Targets other than x86 don't have ACPI to start with.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCHv2] virtio-pci: add MMIO property

2012-03-20 Thread Paul Brook
> @@ -682,10 +733,18 @@ void virtio_init_pci(VirtIOPCIProxy *proxy,
> VirtIODevice *vdev) if (size & (size-1))
>  size = 1 << qemu_fls(size);
>  
> +proxy->bar0_mask = size - 1;

You'll get better performance if you use page-sized mappings.  You're already 
creating a mapping bigger than the actual data (rounding up to power-of-two), 
so you may as well pick a value that's convenient for qemu to map into the 
address space.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [patch 0/2] USB UHCI global suspend / remote wakeup

2010-11-25 Thread Paul Brook
> This patch enables USB UHCI global suspend/resume feature. The OS will
> stop the HC once all ports are suspended. If there is activity on the
> port(s), an interrupt signalling remote wakeup will be triggered.

I'm pretty sure this is wrong.  Suspend/resume works based on physical 
topology, i.e. the resume notification should go to the the port/hub to which 
the device is connected, not directly to the host controller.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [patch 0/2] USB UHCI global suspend / remote wakeup

2010-11-26 Thread Paul Brook
> On 11/26/10 03:15, Marcelo Tosatti wrote:
> > On Fri, Nov 26, 2010 at 12:38:28AM +0000, Paul Brook wrote:
> >>> This patch enables USB UHCI global suspend/resume feature. The OS will
> >>> stop the HC once all ports are suspended. If there is activity on the
> >>> port(s), an interrupt signalling remote wakeup will be triggered.
> >> 
> >> I'm pretty sure this is wrong.  Suspend/resume works based on physical
> >> topology, i.e. the resume notification should go to the the port/hub to
> >> which the device is connected, not directly to the host controller.
> > 
> > You are correct in that USB HUB emulation does not propagate resume, but
> > this does not make this patch incorrect.
> 
> Well, it does.  When the notification is port based our software model
> should better reflect that, so we have the chance to add resume
> propagation to the hub emulation later on.

Exactly. The patch assumes the device is connected to a root hub port. This 
assumption is incorrect.

The device should be sending the resume signal to the port/hub to which it is 
connected. If that hub is still active it will reactivate the port, and flag a 
port change notification in the normal manner. If the hub is also suspended it 
will propagate the resume notification upstream (which may or may not be the 
root hub).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-27 Thread Paul Brook
> One question I have about Kemari is whether it adds new constraints to
> the QEMU codebase?  Fault tolerance seems like a cross-cutting concern
> - everyone writing device emulation or core QEMU code may need to be
> aware of new constraints.  For example, "you are not allowed to
> release I/O operations to the outside world directly, instead you need
> to go through Kemari code which makes I/O transactional and
> communicates with the passive host".  You have converted e1000,
> virtio-net, and virtio-blk.  How do we make sure new devices that are
> merged into qemu.git don't break Kemari?  How do we go about
> supporting the existing hw/* devices?

IMO anything that requires devices to act differently is wrong.  All external 
IO already goes though a common API (e.g. qemu_send_packet). You should be 
putting your transaction code there, not hacking individual devices.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> >> Could you formulate the constraints so developers are aware of them in
> >> the future and can protect the codebase.  How about expanding the
> >> Kemari wiki pages?
> > 
> > If you like the idea above, I'm happy to make the list also on
> > the wiki page.
> 
> Here's a different question: what requirements must an emulated device
> meet in order to be added to the Kemari supported whitelist?  That's
> what I want to know so that I don't break existing devices and can add
> new devices that work with Kemari :).

Why isn't it completely device agnostic? i.e. if a device has to care about 
Kemari at all (of vice-versa) then IMO you're doing it wrong. The whole point 
of the internal block/net APIs is that they isolate the host implementation 
details from the device emulation.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> 2010/11/29 Paul Brook :
> >> >> Could you formulate the constraints so developers are aware of them
> >> >> in the future and can protect the codebase.  How about expanding the
> >> >> Kemari wiki pages?
> >> > 
> >> > If you like the idea above, I'm happy to make the list also on
> >> > the wiki page.
> >> 
> >> Here's a different question: what requirements must an emulated device
> >> meet in order to be added to the Kemari supported whitelist?  That's
> >> what I want to know so that I don't break existing devices and can add
> >> new devices that work with Kemari :).
> > 
> > Why isn't it completely device agnostic? i.e. if a device has to care
> > about Kemari at all (of vice-versa) then IMO you're doing it wrong. The
> > whole point of the internal block/net APIs is that they isolate the host
> > implementation details from the device emulation.
> 
> You're right "theoretically".  But what I've learned so far,
> there are cases like virtio-net and e1000 woks but virtio-blk
> doesn't.  "Theoretically", any emulated device should be able to
> get into the whitelist if the event-tap is properly implemented
> but sometimes it doesn't seem to be that simple.
> 
> To answer Stefan's question, there shouldn't be any requirement
> for a device, but must be tested with Kemari.  If it doesn't work
> correctly, the problems must be fixed before adding to the list.

What exactly are the problems? Is this a device bus of a Kemari bug?
If it's the former then that implies you're imposing additional requirements 
that weren't previously part of the API.  If the latter, then it's a bug like 
any other.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> >> To answer Stefan's question, there shouldn't be any requirement
> >> for a device, but must be tested with Kemari.  If it doesn't work
> >> correctly, the problems must be fixed before adding to the list.
> > 
> > What exactly are the problems? Is this a device bus of a Kemari bug?
> > If it's the former then that implies you're imposing additional
> > requirements that weren't previously part of the API.  If the latter,
> > then it's a bug like any other.
> 
> It's a problem if devices don't continue correctly upon failover.
> I would say it's a bug of live migration (not all of course)
> because Kemari is just live migrating at specific points.

Ah, now we're getting somewhere.  So you're saying that these devices are 
broken anyway, and Kemari happens to trigger that brokenness more frequently?

If the requirement is that a device must support live migration, then that 
should be the criteria for enabling Kemari, not some arbitrary whitelist.
If devices incorrectly claim support for live migration, then that should also 
be fixed, either by removing the broken code or by making it work.

AFAICT your current proposal is just feeding back the results of some fairly 
specific QA testing.  I'd rather not get into that game.  The correct response 
in the context of upstream development is to file a bug and/or fix the code.
We already have config files that allow third party packagers to remove 
devices they don't want to support.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> > If devices incorrectly claim support for live migration, then that should
> > also be fixed, either by removing the broken code or by making it work.
> 
> I totally agree with you.
> 
> > AFAICT your current proposal is just feeding back the results of some
> > fairly specific QA testing.  I'd rather not get into that game.  The
> > correct response in the context of upstream development is to file a bug
> > and/or fix the code. We already have config files that allow third party
> > packagers to remove devices they don't want to support.
> 
> Sorry, I didn't get what you're trying to tell me.  My plan would
> be to initially start from a subset of devices, and gradually
> grow the number of devices that Kemari works with.  While this
> process, it'll include what you said above, file a but and/or fix
> the code.  Am I missing what you're saying?

My point is that the whitelist shouldn't exist at all.  Devices either support 
migration or they don't.  Having some sort of separate whitelist is the wrong 
way to determine which devices support migration.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> >> Sorry, I didn't get what you're trying to tell me.  My plan would
> >> be to initially start from a subset of devices, and gradually
> >> grow the number of devices that Kemari works with.  While this
> >> process, it'll include what you said above, file a but and/or fix
> >> the code.  Am I missing what you're saying?
> > 
> > My point is that the whitelist shouldn't exist at all.  Devices either
> > support migration or they don't.  Having some sort of separate whitelist
> > is the wrong way to determine which devices support migration.
> 
> Alright!
> 
> Then if a user encounters a problem with Kemari, we'll fix Kemari
> or the devices or both. Correct?

Correct.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> > Is this a fair summary: any device that supports live migration workw
> > under Kemari?
> 
> It might be fair summary but practically we barely have live migration
> working w/o Kemari. In addition, last I checked Kemari needs additional
> hooks and it will be too hard to keep that out of tree until all devices
> get it.

That's not what I've been hearing earlier in this thread.
The responses from Yoshi indicate that Stefan's summary is correct.  i.e. the 
current Kemari implementation may require per-device hooks, but that's a bug 
and should be fixed before merging.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Paul Brook
> On 11/29/2010 10:53 AM, Paul Brook wrote:
> >>> Is this a fair summary: any device that supports live migration workw
> >>> under Kemari?
> >> 
> >> It might be fair summary but practically we barely have live migration
> >> working w/o Kemari. In addition, last I checked Kemari needs additional
> >> hooks and it will be too hard to keep that out of tree until all devices
> >> get it.
> > 
> > That's not what I've been hearing earlier in this thread.
> > The responses from Yoshi indicate that Stefan's summary is correct.  i.e.
> > the current Kemari implementation may require per-device hooks, but
> > that's a bug and should be fixed before merging.
> 
> It's actually really important that Kemari make use of an intermediate
> layer such that the hooks can distinguish between a device access and a
> recursive access.

I'm failing to understand how this is anything other than running sed over 
block/*.c (or hw/*.c, depending whether you choose to rename the internal or 
external API).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v4 1/2] Minimal RAM API support

2010-12-15 Thread Paul Brook
> This adds a minimum chunk of Anthony's RAM API support so that we
> can identify actual VM RAM versus all the other things that make
> use of qemu_ram_alloc.

Why do we care? How are you defining "actual VM RAM"?

Surely the whole point of qemu_ram_alloc is to allocate a chunk of memory that 
can be mapped into the guest physical address space, so all uses of 
qemu_ram_alloc should be using this API.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-12 Thread Paul Brook
> However, as I've mentioned repeatedly, the reason I won't merge
> virtio-serial is that it duplicates functionality with virtio-console.
> If the two are converged, I'm happy to merge it.  I'm not opposed to
> having more functionality.

I strongly agree.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] virtio-spec: document block CMD and FLUSH

2010-04-20 Thread Paul Brook
> Does this mean that virtio-blk supports all three combinations?
> 
>1. FLUSH that isn't a barrier
>2. FLUSH that is also a barrier
>3. Barrier that is not a flush
> 
> 1 is good for fsync-like operations;
> 2 is good for journalling-like ordered operations.
> 3 sounds like it doesn't mean a lot as the host cache provides no
> guarantees and has no ordering facility that can be used.

(3) allows the guest to queue overlapping transfers with well defined results.
I have no idea how useful this is in practice, but it's certainly plausible.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-08 Thread Paul Brook
> The offset given to a block created via qemu_ram_alloc/map() is arbitrary,
> let the caller specify a name so we can make a positive match.

> @@ -1924,7 +1925,9 @@ static int pci_add_option_rom(PCIDevice *pdev)
> +snprintf(name, sizeof(name), "pci:%02x.%x.rom",
> + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> +pdev->rom_offset = qemu_ram_alloc(name, size);

This looks pretty bogus.  It should be associated with the device, rather than 
incorrectly trying to generate a globally unique name.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-08 Thread Paul Brook
> On 06/08/2010 09:30 PM, Paul Brook wrote:
> >> The offset given to a block created via qemu_ram_alloc/map() is
> >> arbitrary, let the caller specify a name so we can make a positive
> >> match.
> >> 
> >> 
> >> @@ -1924,7 +1925,9 @@ static int pci_add_option_rom(PCIDevice *pdev)
> >> +snprintf(name, sizeof(name), "pci:%02x.%x.rom",
> >> + PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> >> +pdev->rom_offset = qemu_ram_alloc(name, size);
> > 
> > This looks pretty bogus.  It should be associated with the device, rather
> > than incorrectly trying to generate a globally unique name.
> 
> Not all ram is associated with a device.

Maybe not, but where it is we should be using that information.
Absolute minimum we should be using the existing qdev address rather than 
inventing a new one.  Duplicating this logic inside every device seems like a 
bad idea so I suggest identifying ram blocks by a (name, device) pair. For now 
we can allow a NULL device for system memory.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-09 Thread Paul Brook
> > > Not all ram is associated with a device.
> > 
> > Maybe not, but where it is we should be using that information.
> > Absolute minimum we should be using the existing qdev address rather than
> > inventing a new one.  Duplicating this logic inside every device seems
> > like a bad idea so I suggest identifying ram blocks by a (name, device)
> > pair. For now we can allow a NULL device for system memory.
> 
> I'm not sure if this is what you're after, but what if we change it to
> something like:
> 
> +snprintf(name, sizeof(name), "%s.%02x.%x.rom-%s",
> + pdev->bus->qbus.name, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), pdev->qdev.info->name);
> 
> This gives us things like "pci.0.03.0.rom-rtl8139".  For pci-assign, we
> can expand on the host details, such as:
> ..
> Giving us "pci.0.04.0.bar0-pci-assign(:01:10.0)".  Anywhere closer?

Not really.  This identifier is device and bus independent, which is why I 
suggested passing the device to qemu_ram_alloc.  This can then figure out how 
to the identify the device. It should probably do this the same way that we 
identify the saved state for the device.  Currently I think this is an 
arbitrary vmstate name/id, but I expect this to change to a qdev address
(e.g. /i440FX-pcihost/pci.0/_addr_04.0").

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-09 Thread Paul Brook
> * Alex Williamson (alex.william...@redhat.com) wrote:
> > +// XXX check duplicates
> 
> Yes, definitely.  You created a notion of a hierarchical namespace,
> can this be formalized any more?

We already have one: The qdev tree.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-09 Thread Paul Brook
> Keep in mind, this has to be a stable string across versions of qemu
> since this is savevm/migration.  Are we absolutely confident that the
> full qdev path isn't going to change?  I'm more confident that a unique
> device name is going to be static across qemu versions.

The actual representation of the device address is a secondary issue. The 
important point is that ram blocks should be associated with devices[*], and 
matched in exactly the same way. Devices should not be duplicating this on an 
ad-hoc basis.

Paul

[*] Ignore that we don't currently have a root system device node. A null 
device will suffice for now.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-09 Thread Paul Brook
> > Not really.  This identifier is device and bus independent, which is why
> > I suggested passing the device to qemu_ram_alloc.  This can then figure
> > out how to the identify the device. It should probably do this the same
> > way that we identify the saved state for the device.  Currently I think
> > this is an arbitrary vmstate name/id, but I expect this to change to a
> > qdev address (e.g. /i440FX-pcihost/pci.0/_addr_04.0").
> 
> Ok, that seems fairly reasonable, so from a device pointer we can get
> something like "/i440FX-pcihost/pci.0/_addr_04.0", then we can add
> something like ":rom" or ":bar.0" to it via an extra string.
> 
> qemu_ram_alloc(DeviceState *dev, const char *info, size)

Exactly - though personally I wouldn't call the second argument "info".
 
> Does a function already exist to print out a qdev address path?

No.

I may have been a bit misleading here. What we really want to do is use the 
same matching algorithm as is used by the rest of the device state. Currently 
this is a vmstate name and [arbitrary] numeric id. I don't remember whether 
there's a convenient link from a device to its associated vmstate - if there 
isn't there probably should be.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-10 Thread Paul Brook
> On Thu, 2010-06-10 at 10:23 +0200, Gerd Hoffmann wrote:
> > > I may have been a bit misleading here. What we really want to do is use
> > > the same matching algorithm as is used by the rest of the device
> > > state. Currently this is a vmstate name and [arbitrary] numeric id. I
> > > don't remember whether there's a convenient link from a device to its
> > > associated vmstate - if there isn't there probably should be.
> > 
> > DeviceState->info->vmsd->name for the name.
> > Dunno about the numeric id, I think savevm.c doesn't export it.
> 
> Ok, we can certainly do name>.instance>\.
> It seems like this highlights a deficiency in the vmstate matching

Why are you forcing this to be a string?
 
> Then we start the target, ordering the nics sequentially, are we going
> to store the vmstate into the opposite nics?

That's a separate problem. As long as you use the same matching as for the 
rest of the device state then it should just work. If it doesn't work then 
migration is already broken so it doen't matter.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-10 Thread Paul Brook
> > > to the identify the device. It should probably do this the same way
> > > that we identify the saved state for the device.  Currently I think
> > > this is an arbitrary vmstate name/id, but I expect this to change to a
> > > qdev address (e.g. /i440FX-pcihost/pci.0/_addr_04.0").
> > 
> > Ok, that seems fairly reasonable, so from a device pointer we can get
> > something like "/i440FX-pcihost/pci.0/_addr_04.0", then we can add
> > something like ":rom" or ":bar.0" to it via an extra string.
> 
> In the fun game of what ifs...
> 
> The cmdline starts w/ device A placed at pci bus addr 00:04.0 (so
> matched on source and target).  The source does hotunplug of 04.0 and
> replaces it w/ new device.  I think we need something that is more
> uniquely identifying the block.  Not sure that device name is correct or
> a generation ID.

You shouldn't be solving this problem for RAM blocks. You should be solving it 
for the device state.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 3/6] RAMBlock: Add a name field

2010-06-11 Thread Paul Brook
> The trouble I'm running into is that the SaveStateEntry.instance_id is
> effectively private, and there's no easy way to associate a
> SaveStateEntry to a device since it passes an opaque pointer, which
> could be whatever the driver decides it wants.  I'm wondering if we
> should pass the DeviceState pointer in when registering the vmstate so
> that we can stuff the instance_id into the DeviceInfo.  Or maybe
> DeviceInfo should include a pointer to the SaveStateEntry.  It all seems
> pretty messy.  Any thoughts?

DeviceInfo is effectively a const structure (it may be modified at startup, 
but there's only one of it shared between multiple devices). I suspect you 
mean DeviceState.

Most likely a lot of the messyness is because we still have devices that have 
not been qdev-ified, so the VMState code can't assume it has a device. We 
should fix this.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/10] Add bdrv_flush_all()

2008-09-10 Thread Paul Brook
> I'm actually liking bdrv_flush_all() less and less.  If there are any
> outstanding IO requests, it will increase the down time associated with
> live migration.  I think we definitely need to add a live save handler
> that waits until there are no outstanding IO requests to converge.  I'm
> concerned though that it's not totally unreasonable to expect a guest to
> always have an IO request in flight.  That leads me to think that maybe
> we should be cancelling outstanding requests, and somehow saving their
> state?

That's not possible with the current code because the IO callbacks 
(particularly when you start involving the SCSI layer) are generated 
dynamically.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed

2008-10-03 Thread Paul Brook
On Friday 03 October 2008, Ryan Harper wrote:
> The default buffer size breaks up larger read/write requests unnecessarily.
> When we encounter requests larger than the default dma buffer, reallocate
> the buffer to support the request.

Allocating unboundedly large host buffers based on guest input seems like a 
bad idea.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed

2008-10-03 Thread Paul Brook
On Saturday 04 October 2008, Anthony Liguori wrote:
> Paul Brook wrote:
> > On Friday 03 October 2008, Ryan Harper wrote:
> >> The default buffer size breaks up larger read/write requests
> >> unnecessarily. When we encounter requests larger than the default dma
> >> buffer, reallocate the buffer to support the request.
> >
> > Allocating unboundedly large host buffers based on guest input seems like
> > a bad idea.
>
> Perhaps they could be at least bound to phys_ram_size.

That's still way too large.  It means that the maximum host footprint of qemu 
is many times the size of the guest RAM. There's a good chance that the host 
machine doesn't even have enough virtual address space to satisfy this 
request.

I expect that the only situation where you can only avoid breaking up large 
transfers when you have zero-copy IO.  Previous zero-copy/vectored IO patches 
suffered from a similar problem: It is not acceptable to allocate huge chunks 
of host ram when you fallback to normal IO.

> In general, I don't think there's a correct size to bound them that's
> less than phys_ram_size.  The guest may be issuing really big IO requests.

Qemu is perfectly capable of handling large IO requests by splitting them into 
multiple smaller requests. Enlarging the size of this buffer is just a 
secondary performance optimisation.

Admittedly we don't currently limit the number of simultaneous commands a 
guest can submit, but that's relatively easy to fix.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 4/4] Reallocate dma buffers in read/write path if needed

2008-10-04 Thread Paul Brook
On Saturday 04 October 2008, Ryan Harper wrote:
> In all, it seems silly to worry about this sort of thing since the
> entire process could be contained with process ulimits if this is really
> a concern.  Are we any more concerned that by splitting the requests
> into many smaller requests that we're wasting cpu, pegging the
> processor to 100% in some cases?

Using small requests may be a bit inefficient, but it still works and allows 
the guest to make progress.

Allocating very large quantities of memory is very likely to kill the VM one 
way or another. This is not acceptable, especially when the guest hasn't even 
done anything wrong. There are legitimate circumstances where the size of the 
outstanding IO requests may be comparable to the guest ram size.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/2] Add qemu-test for automated testing

2008-10-15 Thread Paul Brook
On Wednesday 15 October 2008, Ryan Harper wrote:
> This patch places the qemu-test framework and tests into the qemu source
> tree. There are a number of components to this patch:

Is there any point having this in the qemu repository?

AFAICS it gains nothing from being "integrated" with qemu. It should work 
equally well with other hypervisors, and even real hardware.

I'm not saying this is a bad thing, just that is seems like it should be a 
separate project, and I'd be surprised if such projects don't already exist.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] Re: [Qemu-devel] Changing the QEMU svn VERSION string

2009-04-07 Thread Paul Brook
On Tuesday 07 April 2009, Daniel Jacobowitz wrote:
> On Tue, Apr 07, 2009 at 08:52:46AM -0500, Anthony Liguori wrote:
> > I think that's going to lead to even more confusion.  While I'm inclined
> > to not greatly mind 0.10.99 for the development tree, when we do release
> > candidates for the next release, it's going to be 0.11.0-rc1.  I don't
> > expect RPMs to ever be created from non-release versions of QEMU provided
> > we stick to our plan of frequent releases.
>
> FWIW, GDB uses 6.8.50 (devel branch), 6.8.90 (release branch), 6.8.91
> (rc1).  That's worked out well for us.

I like this one.

I'm extremely sceptical of anything that claims to need a fine grained version 
number. In practice version numbers for open source projects are fairly 
arbitrary and meaningless because almost everyone has their own set of 
patches and backported fixes anyway.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-29 Thread Paul Brook
On Wednesday 29 April 2009, Christoph Hellwig wrote:
> On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote:
> > Ah, excellent.  I think that's a great thing to do.  So do you think
> > virtio-scsi would deprecate virtio-blk?
>
> I don't think so.  If you have an image format or a non-scsi blockdevice
> underneath virtio-block avoids the encoding into SCSI CDBs and back and
> should be faster.

Is this actually measurably faster, or just infinitesimally faster in theory?

I can maybe see that virtio-blk is slightly simpler for dumb drivers, though 
even then a basic scsi host is pretty straightforward and I find it hard to 
believe there's much real benefit.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-29 Thread Paul Brook
On Wednesday 29 April 2009, Christian Borntraeger wrote:
> Am Wednesday 29 April 2009 13:11:19 schrieb Paul Brook:
> > On Wednesday 29 April 2009, Christoph Hellwig wrote:
> > > On Tue, Apr 28, 2009 at 11:37:01AM -0500, Anthony Liguori wrote:
> > > > Ah, excellent.  I think that's a great thing to do.  So do you think
> > > > virtio-scsi would deprecate virtio-blk?
> > >
> > > I don't think so.  If you have an image format or a non-scsi
> > > blockdevice underneath virtio-block avoids the encoding into SCSI CDBs
> > > and back and should be faster.
> >
> > Is this actually measurably faster, or just infinitesimally faster in
> > theory?
>
> If the underlying backing is 4k block size and the emulated scsi disk has
> 512 byte block size, write performance can be a lot slower.

Rubbish. If this is really a problem you can just use a scsi disk with 4k 
sectors.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-29 Thread Paul Brook
On Wednesday 29 April 2009, Christoph Hellwig wrote:
> On Wed, Apr 29, 2009 at 12:11:19PM +0100, Paul Brook wrote:
> > Is this actually measurably faster, or just infinitesimally faster in
> > theory?
>
> On normal disks it's rather theoretical.  On high-end SSDs and arrays the
> impact is noticeable, mostly due to the additional latency.

How exactly does it introduce additional latency? A scsi command block is 
hardly large or complicated. Are you suggesting that a 16/32byte scsi command 
takes significantly longer to process than a 16byte virtio command 
descriptor? I'd expect any extra processing to be a small fraction of the 
host syscall latency, let alone the latency of the physical host adapter. It 
probably even fits on the same CPU cache line.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH] virtio-blk: add SGI_IO passthru support

2009-04-30 Thread Paul Brook
On Thursday 30 April 2009, Christoph Hellwig wrote:
> On Wed, Apr 29, 2009 at 12:37:20PM +0100, Paul Brook wrote:
> > How exactly does it introduce additional latency? A scsi command block is
> > hardly large or complicated. Are you suggesting that a 16/32byte scsi
> > command takes significantly longer to process than a 16byte virtio
> > command descriptor? I'd expect any extra processing to be a small
> > fraction of the host syscall latency, let alone the latency of the
> > physical host adapter. It probably even fits on the same CPU cache line.
>
> Encoding the scsi CDB is additional work but I would be surprised it it
> is mesurable.  Just using scsi cdbs would be simple enough, the bigger
> issue is emulating a full blown scsi bus because then you need to do all
> kinds queueing decisions at target levels etc and drag in a complicated
> scsi stack and not just a simple block driver in the guest.  And at
> least on current linux kernels that does introduce mesurable latency.

Only if you emulate a crufty old parallel scsi bus, and that's just silly.
One of the nice things about scsi is it separates the command set from the 
transport layer. cf. USB mass-storage, SAS, SBP2(firewire), and probably 
several others I've forgotten.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Bring in all the Linux headers we depend on in QEMU

2009-05-04 Thread Paul Brook
> I think we need to use the output of 'make headers-install', which
> removes things like __user and CONFIG_*.

Yes. Assuming we do decide to import a set of headers, they should definitely 
be the sanitised version created by make headers-install.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] bios: Use the correct mask to size the PCI option ROM BAR

2009-05-12 Thread Paul Brook
On Tuesday 12 May 2009, Alex Williamson wrote:
> Bit 0 is the enable bit, which we not only don't want to set, but
> it will stick and make us think it's an I/O port resource.

Why is the ROM slot special? Doesn't the same apply to all BARs?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Wednesday 20 May 2009, Michael S. Tsirkin wrote:
> define api for allocating/setting up msi-x irqs, and for updating them
> with msi-x vector information, supply implementation in ioapic. Please
> comment on this API: I intend to port my msi-x patch to work on top of
> it.

I though the point of MSI is that they are just a regular memory writes, and 
don't require any special bus support.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> The PCI bus doesn't need any special support (I think) but something on
> the other end needs to interpret those writes.

Sure. But there's definitely nothing PCI specific about it. I assumed this 
would all be contained within the APIC.

> In any case we need some internal API for this, and qemu_irq looks like
> a good choice.

What do you expect to be using this API?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> >> The PCI bus doesn't need any special support (I think) but something on
> >> the other end needs to interpret those writes.
> >
> > Sure. But there's definitely nothing PCI specific about it. I assumed
> > this would all be contained within the APIC.
>
> MSIs are defined by PCI and their configuration is done using the PCI
> configuration space.

A MSI is just a regular memory write, and the PCI spec explicitly states that 
a target (e.g. the APIC) is unable to distinguish between a MSI and any other 
write. The PCI config bits just provide a way of telling the device where/what 
to write.

> >> In any case we need some internal API for this, and qemu_irq looks like
> >> a good choice.
> >
> > What do you expect to be using this API?
>
> virtio, emulated devices capable of supporting MSI (e1000?), device
> assignment (not yet in qemu.git).

It probably makes sense to have common infrastructure in pci.c to 
expose/implement device side MSI functionality. However I see no need for a 
direct API between the device and the APIC. We already have an API for memory 
accesses and MMIO regions. I'm pretty sure a system could implement MSI by 
pointing the device at system ram, and having the CPU periodically poll that.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >>>> In any case we need some internal API for this, and qemu_irq looks
> >>>> like a good choice.
> >>>
> >>> What do you expect to be using this API?
> >>
> >> virtio, emulated devices capable of supporting MSI (e1000?), device
> >> assignment (not yet in qemu.git).
> >
> > It probably makes sense to have common infrastructure in pci.c to
> > expose/implement device side MSI functionality. However I see no need for
> > a direct API between the device and the APIC. We already have an API for
> > memory accesses and MMIO regions. I'm pretty sure a system could
> > implement MSI by pointing the device at system ram, and having the CPU
> > periodically poll that.
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().
> This is easier for device authors.  The default implementation of the
> irq callback could write to apic memory, while for kvm we can directly
> trigger the interrupt via the kvm APIs.

I'm still not convinced.

A tight coupling between PCI devices and the APIC is just going to cause us 
problems later one. I'm going to come back to the fact that these are memory 
writes so once we get IOMMU support they will presumably be subject to 
remapping by that, just like any other memory access.

Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
off event, not a level state. OTOH stl_phys is exactly the right interface.

The KVM interface should be contained within the APIC implementation.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > A tight coupling between PCI devices and the APIC is just going to cause
> > us problems later one. I'm going to come back to the fact that these are
> > memory writes so once we get IOMMU support they will presumably be
> > subject to remapping by that, just like any other memory access.
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.
> Think of it as connecting the device core with its interrupt unit.
>
> > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > one- off event, not a level state. OTOH stl_phys is exactly the right
> > interface.
>
> The qemu_irq callback should do an stl_phys().  The device is happy
> since it's using the same API it uses for non-MSI. 

MSI provides multiple edge triggered interrupts, whereas traditional mode 
provides a single level triggered interrupt. My guess is most devices will 
want to treat these differently anyway.

Either way, this is an implementation detail between pci.c and individual 
devices. It has nothing to do with the APIC.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > provides a single level triggered interrupt. My guess is most devices
> > will want to treat these differently anyway.
>
> So, is qemu_send_msi better than qemu_set_irq.

Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > mode provides a single level triggered interrupt. My guess is most
> > > devices will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
>
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

To clarify, you seem to be trying to fuse two largely separate features 
together.

MSI is a standard PCI device capability[1] that involves the device performing 
a 32-bit memory write when something interesting occurs. These writes may or 
may not be directed at a APIC.

The x86 APIC has a memory mapped interface that allows generation of CPU 
interrupts in response response to memory writes. These may or may not come 
from an MSI capable PCI device.

Paul

[1] Note a *device* capability, not a bus capability.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> > which is a trivial wrapper around stl_phys.
>
> OK, but I'm adding another level of indirection in the middle,
> to allow us to tie in a kvm backend.

kvm has no business messing with the PCI device code.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >>> which is a trivial wrapper around stl_phys.
> >>
> >> OK, but I'm adding another level of indirection in the middle,
> >> to allow us to tie in a kvm backend.
> >
> > kvm has no business messing with the PCI device code.
>
> kvm has a fast path for irq injection.  If qemu wants to support it we
> need some abstraction here.

Fast path from where to where? Having the PCI layer bypass/re-implement the 
APIC and inject the interrupt directly into the cpu core sounds a particularly 
bad idea.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
> >>> kvm has no business messing with the PCI device code.
> >>
> >> kvm has a fast path for irq injection.  If qemu wants to support it we
> >> need some abstraction here.
> >
> > Fast path from where to where? Having the PCI layer bypass/re-implement
> > the APIC and inject the interrupt directly into the cpu core sounds a
> > particularly bad idea.
>
> kvm implements the APIC in the host kernel (qemu upstream doesn't
> support this yet).  The fast path is wired to the in-kernel APIC, not
> the cpu core directly.
>
> The idea is to wire it to UIO for device assignment, to a virtio-device
> implemented in the kernel, and to qemu.

I still don't see why you're trying to bypass straight from the pci layer to 
the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
presumably got to update the apic state anyway.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Paul Brook
On Thursday 21 May 2009, Avi Kivity wrote:
> Paul Brook wrote:
> >> kvm implements the APIC in the host kernel (qemu upstream doesn't
> >> support this yet).  The fast path is wired to the in-kernel APIC, not
> >> the cpu core directly.
> >>
> >> The idea is to wire it to UIO for device assignment, to a virtio-device
> >> implemented in the kernel, and to qemu.
> >
> > I still don't see why you're trying to bypass straight from the pci layer
> > to the apic. Why can't you just pass the apic MMIO writes to the kernel?
> > You've presumably got to update the apic state anyway.
>
> The fast path is an eventfd so that we don't have to teach all the
> clients about the details of MSI.  Userspace programs the MSI details
> into kvm and hands the client an eventfd.  All the client has to do is
> bang on the eventfd for the interrupt to be queued.  The eventfd
> provides event coalescing and is equally useful from the kernel and
> userspace, and can be used with targets other than kvm.

So presumably if a device triggers an APIC interrupt using a write that isn't 
one of the currently configured PCI devices, it all explodes horribly?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [patch uq/master 2/2] Add option to use file backed guest memory

2010-02-27 Thread Paul Brook
>+/*
>+ * ftruncate is not supported by hugetlbfs in older
>+ * hosts, so don't bother checking for errors.
>+ * If anything goes wrong with it under other filesystems,
>+ * mmap will fail.
>+ */
>+if (ftruncate(fd, memory))
>+   perror("ftruncate");

Code does not match comment.

>+if (asprintf(&filename, "%s/kvm.XX", path) == -1) {
>+   return NULL;
>+}

This isn't kvm any more :-)

>+flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;

Missing spaces round logic operator (plus several other occurrences).

>+static void *file_ram_alloc(ram_addr_t memory, const char *path)
>+{
>+return NULL;
>+}

Silently ignoring commandline options is bad. 
Especially as the other option you added (-mem-prealloc) causes an error if 
not supported.

>+if (kvm_enabled() && !kvm_has_sync_mmu()) {
>+fprintf(stderr, "kvm: host lacks mmu notifiers, disabling
> -mem-path\n"); +return NULL;
>+}

Code does not match error message.  Users are liable to see this many times.

>+new_block->host = file_ram_alloc(size, mem_path);

IMHO it would be better to check the mem_path != NULL here, rather that 
burying the check in file_ram_alloc.

>+if (memory < hpagesize) {
>+return NULL;
>+}

Ah, so it's actually "allocate memory in $path, if you feel like it". Good job 
we aren't relying on this for correctness.  At minimum I recommend documenting 
this heuristic.

>+if (!new_block->host) {
> #if defined(TARGET_S390X) && defined(CONFIG_KVM)
>-/* XXX S390 KVM requires the topmost vma of the RAM to be < 256GB */

By my reading this implies -mempath is probably broken on s390 KVM?

>+DEF("mem-path", HAS_ARG, QEMU_OPTION_mempath,
>+"-mem-path FILE  provide backing storage for guest RAM\n")
>+STEXI
>+...@item -mem-path @var{path}
>+Allocate guest RAM from a temporarily created file in @var{path}.
>+ETEXI

You should mention that this is only useful when PATH happens to be a linux 
hugetlbfs mount.

>+#ifdef MAP_POPULATE
>+case QEMU_OPTION_mem_prealloc:
>+mem_prealloc = !mem_prealloc;
>+#endif

This looks highly suspect.  Having redundant options toggle the sate seems 
like a particularly bad UI.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-07 Thread Paul Brook
> Support an inter-vm shared memory device that maps a shared-memory object
> as a PCI device in the guest.  This patch also supports interrupts between
> guest by communicating over a unix domain socket.  This patch applies to
>  the qemu-kvm repository.

No. All new devices should be fully qdev based.

I suspect you've also ignored a load of coherency issues, especially when not 
using KVM. As soon as you have shared memory in more than one host 
thread/process you have to worry about memory barriers.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Paul Brook
> On 03/08/2010 12:53 AM, Paul Brook wrote:
> >> Support an inter-vm shared memory device that maps a shared-memory
> >> object as a PCI device in the guest.  This patch also supports
> >> interrupts between guest by communicating over a unix domain socket. 
> >> This patch applies to the qemu-kvm repository.
> >
> > No. All new devices should be fully qdev based.
> >
> > I suspect you've also ignored a load of coherency issues, especially when
> > not using KVM. As soon as you have shared memory in more than one host
> > thread/process you have to worry about memory barriers.
> 
> Shouldn't it be sufficient to require the guest to issue barriers (and
> to ensure tcg honours the barriers, if someone wants this with tcg)?.

In a cross environment that becomes extremely hairy.  For example the x86 
architecture effectively has an implicit write barrier before every store, and 
an implicit read barrier before every load.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-08 Thread Paul Brook
> However, coherence could be made host-type-independent by the host
> mapping and unampping pages, so that each page is only mapped into one
> guest (or guest CPU) at a time.  Just like some clustering filesystems
> do to maintain coherence.

You're assuming that a TLB flush implies a write barrier, and a TLB miss 
implies a read barrier.  I'd be surprised if this were true in general.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-09 Thread Paul Brook
> > In a cross environment that becomes extremely hairy.  For example the x86
> > architecture effectively has an implicit write barrier before every
> > store, and an implicit read barrier before every load.
> 
> Btw, x86 doesn't have any implicit barriers due to ordinary loads.
> Only stores and atomics have implicit barriers, afaik.

As of March 2009[1] Intel guarantees that memory reads occur in order (they 
may only be reordered relative to writes). It appears AMD do not provide this 
guarantee, which could be an interesting problem for heterogeneous migration..

Paul

[*] The most recent docs I have handy. Up to and including Core-2 Duo.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Paul Brook
> >> As of March 2009[1] Intel guarantees that memory reads occur in order
> >> (they may only be reordered relative to writes). It appears AMD do not
> >> provide this guarantee, which could be an interesting problem for
> >> heterogeneous migration..
> >
> > Interesting, but what ordering would cause problems that AMD would do
> > but Intel wouldn't?  Wouldn't that ordering cause the same problems
> > for POSIX shared memory in general (regardless of Qemu) on AMD?
> 
> If some code was written for the Intel guarantees it would break if
> migrated to AMD.  Of course, it would also break if run on AMD in the
> first place.

Right. This is independent of shared memory, and is a case where reporting an 
Intel CPUID on and AMD host might get you into trouble.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-10 Thread Paul Brook
> > You're much better off using a bulk-data transfer API that relaxes
> > coherency requirements.  IOW, shared memory doesn't make sense for TCG
> 
> Rather, tcg doesn't make sense for shared memory smp.  But we knew that
> already.

In think TCG SMP is a hard, but soluble problem, especially when you're 
running guests used to coping with NUMA.

TCG interacting with third parties via shared memory is probably never going 
to make sense.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Inter-VM shared memory PCI device

2010-03-11 Thread Paul Brook
> On 03/10/2010 07:41 PM, Paul Brook wrote:
> >>> You're much better off using a bulk-data transfer API that relaxes
> >>> coherency requirements.  IOW, shared memory doesn't make sense for TCG
> >>
> >> Rather, tcg doesn't make sense for shared memory smp.  But we knew that
> >> already.
> >
> > In think TCG SMP is a hard, but soluble problem, especially when you're
> > running guests used to coping with NUMA.
> 
> Do you mean by using a per-cpu tlb?  These kind of solutions are
> generally slow, but tcg's slowness may mask this out.

Yes.

> > TCG interacting with third parties via shared memory is probably never
> > going to make sense.
> 
> The third party in this case is qemu.

Maybe. But it's a different instance of qemu, and once this feature exists I 
bet people will use it for other things.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-16 Thread Paul Brook
> Where does the translator need access to this original code?  I was
> just thinking about this problem today, wondering how much overhead
> there is with this SMC page protection thing.

When an MMU fault occurs qemu re-translates the TB with additional annotations 
to determine which guest instruction caused the fault.
See translate-all.c:cpu_restore_state().

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: >2 serial ports?

2010-03-17 Thread Paul Brook
> Oh, well, yes, I remember.  qemu is more strict on ISA irq sharing now.
>   A bit too strict.
> 
> /me goes dig out a old patch which never made it upstream for some
> reason I forgot.  Attached.

This is wrong. Two devices should never be manipulating the same qemu_irq 
object.  If you want multiple devices connected to the same IRQ then you need 
an explicit multiplexer. e.g. arm_timer.c:sp804_set_irq.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 2/6] qemu-kvm: Modify and introduce wrapper functions to access phys_ram_dirty.

2010-03-17 Thread Paul Brook
> On 03/16/2010 10:10 PM, Blue Swirl wrote:
> >>   Yes, and is what tlb_protect_code() does and it's called from
> >> tb_alloc_page() which is what's code when a TB is created.
> >
> > Just a tangential note: a long time ago, I tried to disable self
> > modifying code detection for Sparc. On most RISC architectures, SMC
> > needs explicit flushing so in theory we need not track code memory
> > writes. However, during exceptions the translator needs to access the
> > original unmodified code that was used to generate the TB. But maybe
> > there are other ways to avoid SMC tracking, on x86 it's still needed
> 
> On x86 you're supposed to execute a serializing instruction (one of
> INVD, INVEPT, INVLPG, INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control
> register, with the exception of MOV CR8), MOV (to debug register),
> WBINVD, WRMSR, CPUID, IRET, and RSM) before running modified code.

Last time I checked, a jump instruction was sufficient to ensure coherency 
withing a core.  Serializing instructions are only required for coherency 
between cores on SMP systems.

QEMU effectively has a very large physically tagged icache[1] with very 
expensive cache loads.  AFAIK The only practical way to maintain that cache on 
x86 targets is to do write snooping via dirty bits. On targets that mandate 
explicit icache invalidation we might be able to get away with this, however I 
doubt it actually gains you anything - a correctly written guest is going to 
invalidate at least as much as we get from dirty tracking, and we still need 
to provide correct behaviour when executing with cache disabled.

> > but I suppose SMC is pretty rare.
> 
> Every time you demand load a code page from disk, you're running self
> modifying code (though it usually doesn't exist in the tlb, so there's
> no previous version that can cause trouble).

I think you're confusing TLB flushes with TB flushes.

Paul

[1] Even modern x86 only have relatively small icache. The large L2/L3 caches 
aren't relevant as they are unified I/D caches.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [questions] savevm|loadvm

2010-04-01 Thread Paul Brook
> Wenhao Xu  wrote:
> > Hi, Juan,
> >I am fresh to both QEMU and KVM. But so far, I notice that QEMU
> > uses "KVM_SET_USER_MEMORY_REGION" to set memory region that KVM can
> > use and uses cpu_register_physical_memory_offset to register the same
> > memory to QEMU emulator, which means QEMU and KVM use the same host
> > virtual memory. And therefore the memory KVM modified could be
> > directly reflected to QEMU. I don't quite understand the different
> > memory layout problem between the two. So I don't know exactly what
> > you mean to "fix" it?
> 
> 1st. qemu-kvm.git and qemu.git memory layouts are different, indeed with
> qemu.git kvm mode. (yes it is complex and weird).
> 
> kvm vs qemu initialization is different.  Expecting to stop kvm, and run
> tcg from there is not going to work.  I guess it would need a lot of
> changes, but I haven't looked at it myself.

FWIW I think this really *should* work, and any failure to do so is definitely 
a bug.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
> > Looks like the tablet is set to 100 Hz polling rate.  We may be able
> > to get away with 30 Hz or even less (ep_bInterval, in ms, in
> > hw/usb-wacom.c).
>
> Changing the USB tablet polling interval from 10ms to 100ms in both
> hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase
>  in bInterval shown in lsusb -v in the guest and the hint of jerky mouse
>  movement I expected from setting this value so high. A similar change to
>  the polling interval for the keyboard and mouse also made no difference to
>  their performance impact.

The USB HID devices implement the SET_IDLE command, so the polling interval 
will have no real effect on performance.

My guess is that the overhead you're seeing is entirely from the USB host 
adapter having to wake up and check the transport descriptor lists. This will 
only result in the guest being woken if a device actually responds (as 
mentioned above it should not).

>Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce
>the CPU load quite a bit, but at the expense of making the USB tablet (and
>presumably all other USB devices) very laggy.

The guest USB driver explicitly decides which devices to poll each frame. 
Slowing down the frame rate will effectively change the polling period by the 
same factor. e.g. the HID device requests a polling rate of 10ms, you slowed 
down frame rate by 10x, so you're efectively only polling every 100ms.

If you want a quick and nasty hack then you can probably make the device wake 
up less often, and process multiple frames every wakeup.  However this is 
probably going to do bad things (at best extremely poor performance) when 
using actual USB devices.

Fixing this properly is hard because the transport descriptor lists are stores 
in system RAM, and polled by the host adapter.  The first step is to read the 
whole table of descriptors, and calculate when the next event is due. However 
the guest will not explicitly notify the HBA when these tables are modified, 
so you also need some sort of MMU trap to trigger recalculation.

This only gets you down to the base polling interval requested by the device.  
Increasing this interval causes significant user visible latency, so 
increasing it is not an option. The guest is also likely to distribute polling 
events evenly, further reducing the effective sleep interval.  To fix this you 
need additional APIs so that a device can report when an endpoint will become 
unblocked, rather than just waiting to be polled and NAKing the request.

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
> > The USB HID devices implement the SET_IDLE command, so the polling
> > interval will have no real effect on performance.
> 
> On a Linux guest (F12), I see 125 USB interrupts per second with no
> mouse movement, so something is broken (on the guest or host).

Turns out to be a a bug in the UHCI emulation. It should only raise an 
interrupt if the transfer actually completes (i.e. the active bit is set to 
zero). Fixed by 5bd2c0d7.

I was testing with an OHCI controller, which does not have this bug.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] High CPU use of -usbdevice tablet (was Re: KVM usability)

2010-04-04 Thread Paul Brook
> > My guess is that the overhead you're seeing is entirely from the USB host
> > adapter having to wake up and check the transport descriptor lists. This
> > will only result in the guest being woken if a device actually responds
> > (as mentioned above it should not).
> 
> A quick profile on the host side doesn't show this.  Instead, I see a
> lot of select() overhead.

This actually confirms my hypothesis. After fixing the UHCI bug the guest is 
completely idle, but the host still needs to wake up at 1ms intervals to do 
UHCI emulation. I can believe that the most visible part of this is the 
select() syscall.

> Surprising as there are ~10 descriptors being
> polled, so ~1200 polls per second.  Maybe epoll will help here.

I'm not sure where you get 1200 from.  select will be called once per host 
wakeup. i.e. if the USB controller is enabled then 1k times per second due to 
the frame tick.

Are you sure there are actually 10 descriptors being polled? Remember that the 
nfds argument is the value of the largest fd in the set (+1), not the number 
of descriptors in the set.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] net packet storms with multiple NICs

2009-11-10 Thread Paul Brook
> I immediately reproduced the problem locally.  It turns out that
> kvm reflects packets coming from one guest NIC on another guest
> NIC, and since both are connected to the same bridge we're getting
> endless packet storm.  To a level when kvm process becomes 100%
> busy and does not respond to anything but `kill -9'.

You created a network loop. It is working exactly as expected.
Create the same topology with a physical network hub and a pair of NICs and 
you'll get the same end result.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-11-10 Thread Paul Brook
> But I certainly do _not_ want to update the SCSI disk
> emulation, as this is really quite tied to the SCSI parallel
> interface used by the old lsi53c895a.c.

This is completely untrue. scsi-disk.c contains no transport-specific code. It 
is deliberately designed to be independent of both the transport (e.g. 
parallel SCSI or USB) and the mechanism by which the initiator transfers data 
to the host.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-14 Thread Paul Brook
> > > "/main-system-bus/pci.0,addr=09.0/virtio-blk-pci"

There's a device missing between the main system bus and the pci bus.  Should 
be something like:

/main-system-bus/piix4-pcihost/pci.0/_09.0

> > Could you explain why you add "identified properties of the immediate
> > parent bus and device"?  They make the result ver much *not* a "dev
> > path" in the qdev sense...
> 
> In order to try to get a unique string.  Without looking into device
> properties, two e1000s would both be:
> 
> /main-system-bus/pci.0/e1000
> /main-system-bus/pci.0/e1000
> 
> Which is no better than simply "e1000" and would require us to fall back
> to instance ids again.  The goal here is that anything that makes use of
> passing a dev when registering a vmstate gets an instance id of zero.

You already got the information you need, you just put it in the wrong place. 
The canonical ID for the device could be its bus address. In practice we'd 
probably want to allow the user to specify it by name, provided these are 
unique. e.g. in the above machine we could accept [...]/virtiio-blk-pci would 
as an aias for [...]:_09.0. Device names have a restricted namespace, so we 
can use an initial prefix to disambiguate a name/label from a bus address.

For busses that don't have a consistent addressing scheme then some sort of 
instance ID is unavoidable. I guess it may be possible to invent something 
based on other device properties (e.g. address of the first IO port/memory 
region).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-14 Thread Paul Brook
> On Mon, 2010-06-14 at 14:09 +0100, Paul Brook wrote:
> > > > > "/main-system-bus/pci.0,addr=09.0/virtio-blk-pci"
> > 
> > There's a device missing between the main system bus and the pci bus. 
> > Should be something like:
> > 
> > /main-system-bus/piix4-pcihost/pci.0/_09.0
> 
> Ok, I can easily come up with:
> 
> /System/main-system-bus/i440FX-pcihost/PCI/pci.0/_09.0/virtio-blk-pci/virti
> o-blk

No. Now you've got way to many elements in the path.  My point is that you 
replace the name of the device with the bus address of the device.

However you determine the element names the path should be /busname/devicename 
pairs. I'm undecided whether the root element should be the main system bus, 
or a system device node that contains the main system bus.

> > You already got the information you need, you just put it in the wrong
> > place. The canonical ID for the device could be its bus address. In
> > practice we'd probably want to allow the user to specify it by name,
> > provided these are unique. e.g. in the above machine we could accept
> > [...]/virtiio-blk-pci would as an aias for [...]:_09.0.
> 
> Sure, if there was a guaranteed unique char* on a DeviceState that was
> consistent between guest invocations, we could just use that instead.  I
> suppose all devices could have a global system id property and if that
> was present we'd use that instead of creating the device path.

All we require is some way of uniquely addressing each device on the bus. For 
PCI that's trivial (devfn). For other busses we get to pick something 
appropriate.
 
> > Device names have a restricted namespace, so we
> > can use an initial prefix to disambiguate a name/label from a bus
> > address.
> 
> I'm not sure it's necessary.  It seems like it would only be necessary
> to differentiate the two if we wanted to translate between namespaces.
> But I think it's reasonable to require that if a global name is
> provided, it must always be provided for the life of the VM.

I don't think requiring that all devices are given a globally unique name is 
going to fly. locally (per-bus) unique user-specified are still a PITA, but 
may be acceptable. Having a canonical addressing system that's independent of 
user assigned IDs seems like a good thing.

> > For busses that don't have a consistent addressing scheme then some sort
> > of instance ID is unavoidable. I guess it may be possible to invent
> > something based on other device properties (e.g. address of the first IO
> > port/memory region).
> 
> Instance IDs aren't always bad, we just run into trouble with hotplug
> and maybe creating unique ramblock names.  But, such busses typically
> don't support hotplug and don't have multiple instances of the same
> device on the bus, so I don't expect us to hit many issues there as long
> as we can get a stable address path.  Thanks,

Simple consecutive instance IDs are inherently unstable. They depend on teh 
order of device creation. The ID really wants to be something that can be 
reliably determined from the device bus itself (and/or its interaction with 
its parent bus).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-14 Thread Paul Brook
> On Mon, 2010-06-14 at 18:49 +0200, Jan Kiszka wrote:
> > Alex Williamson wrote:
> > > On Mon, 2010-06-14 at 18:00 +0200, Jan Kiszka wrote:
> > >> And instead of introducing another hierarchy level with the bus
> > >> address, I would also prefer to add this as prefix or suffix to the
> > >> device name, e.g. ..
> > > 
> > > That's what I had started with.  The first post in this thread has
> > > "pci.0,addr=09.0" as a single hierarchy level.  The "addr=" may be
> > > unnecessary there, but I also prefer something along those lines.  For
> > > PCI it'd make sense to have :, which comes out to
> > > "pci.0:09.0".  (Maybe rather than flagging properties as being relevant
> > > to the path and printing them generically, we should extract specific
> > > properties based on the bus type.)
> > 
> > Not bus.addr, driver.addr. We only have one PCI bus here, not as many as
> > there are slots on that bus.
> 
> Ok, I can get it down to something like:
> 
> /i440FX-pcihost/pci.0/virtio-blk-pci,09.0
> 
> The addr on the device is initially a little non-intuitive to me since
> it's a property of the bus, but I guess it make sense if we think of
> that level as slot, which includes an address and driver.

That indicates you're thinking about things the wrong way.
The point of this path is to uniquely identify an entity.

/i440FX-pcihost/pci0 identifies a PCI bus attached to the i440FX-pcihost 
device. To identify a device attached to that bus all you need to know is the 
devfn of the device.

For an end-user it may be helpful to allow devices to be identified by the 
device type (assuming only one device of a particular type on that bus), or 
specify the device type as a crude error checking mechanism. However we're 
talking about canonical addresses. These need not include the device type. 
Verifying that the device is actually what you expect is a separate problem, 
and the device type is not sufficient for that.

i.e. /i440FX-pcihost/pci.0/,09.0 Is an appropriate canonical address.
 
> > > I started down that path, but it still breaks for hotplug.  If we start
> > > a VM with two e1000 NICs, then remove the first, we can no longer
> > > migrate because the target can't represent having a single e1000 with a
> > > non-zero instance ID.
> > 
> > That's indeed a good point.

That's a feature. If you start with two NICs and remove the first, the chances 
are that the second will be in a different place to the nice created in a 
single-nic config. Allowing migration between these two will cause a PCI 
device to suddenly change location. This is not physically or logically 
possible, and everyone looses.

Hot-removing a nic and then hotplugging a new nic in the same location may 
result in something that is reasonable to migrate.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-14 Thread Paul Brook
> > > Ok, I can get it down to something like:
> > > 
> > > /i440FX-pcihost/pci.0/virtio-blk-pci,09.0
> > > 
> > > The addr on the device is initially a little non-intuitive to me since
> > > it's a property of the bus, but I guess it make sense if we think of
> > > that level as slot, which includes an address and driver.
> > 
> > That indicates you're thinking about things the wrong way.
> > The point of this path is to uniquely identify an entity.
> > 
> > /i440FX-pcihost/pci0 identifies a PCI bus attached to the i440FX-pcihost
> > device. To identify a device attached to that bus all you need to know is
> > the devfn of the device.
> 
> Hmm, I think that identifies where the device is, not what the device
> is.  It's more helpful to say "the e1000 in slot 7" than it is to say
> "the device in slot 7".

Why is this more useful? Canonical addresses should not be helpful. They 
should identify entities within a machine that is already known to be 
consistent. Making them "helpful" just makes them more volatile.
 
> > For an end-user it may be helpful to allow devices to be identified by
> > the device type (assuming only one device of a particular type on that
> > bus), or specify the device type as a crude error checking mechanism.
> > However we're talking about canonical addresses. These need not include
> > the device type. Verifying that the device is actually what you expect
> > is a separate problem, and the device type is not sufficient for that.
> > 
> > i.e. /i440FX-pcihost/pci.0/,09.0 Is an appropriate canonical address.
> 
> We seem to keep introducing new problems, and I'm not sure this one
> exists.  If I drop the device name/type and use only the devfn, then
> what's to prevent the /e1000,09.0/rom (/,09.0/rom) from being stuffed
> into the /rtl8139,09.0/rom (/,09.0/rom) on a migration?  (or we match it
> to /09.0/savevm when trying to migrate state)  We can argue that "e1000"
> isn't a sufficient identifier, but I can't think of a case where it'd
> fail.

The migration code needs to check that the devices are actually compatible. 
I'd expect this to require much more than just the device name. What you 
actually need is more like "An e1000 with 64k eeprom, fast ethernet PHY, and 
frobnitz B". In fact what you really want to do is transfer the device tree 
(including properties), and create the machine from scratch, not load state 
into a pre-supplied device tree.

> > > > > I started down that path, but it still breaks for hotplug.  If we
> > > > > start a VM with two e1000 NICs, then remove the first, we can no
> > > > > longer migrate because the target can't represent having a single
> > > > > e1000 with a non-zero instance ID.
> > > > 
> > > > That's indeed a good point.
> > 
> > That's a feature. If you start with two NICs and remove the first, the
> > chances are that the second will be in a different place to the nice
> > created in a single-nic config. Allowing migration between these two
> > will cause a PCI device to suddenly change location. This is not
> > physically or logically possible, and everyone looses.
> 
> If the BAR addresses don't follow the VM when it's migrated, that's
> another bug that needs to be fixed, but we can't get there until we at
> least have some infrastructure in place to make that bug possible.

Not BAR addresses, the actual PCI device addresses. Devices on the PCI bus are 
addressed by device and function.  This is guest visible.  The device part of 
this address corresponds to the physical slot, which typically effects IRQ 
routing (amongst other things).  If you arbitrarily move a device from slot A 
to slot B then this will have catastrophic effects on a running machine.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-15 Thread Paul Brook
> > In fact what you really want to do is transfer the device tree
> > (including properties), and create the machine from scratch, not load
> > state into a pre-supplied device tree.
> 
> Well, I agree, but that's a lot more of an overhaul, and once again
> we're changing the problem.

I think it's you that's changing the problem.
The requirement is to uniquely identify a device within a machine.
Verifying that this device is that compatible with the device at the same 
address in a different machine is a separate problem. We should not be trying 
to encode this information in the canonical device path.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-15 Thread Paul Brook
> > Alex proposed to disambiguate by adding "identified properties of the
> > immediate parent bus and device" to the path component.  For PCI, these
> > are dev.fn.  Likewise for any other bus where devices have unambigous
> > bus address.  The driver name carries no information!
> 
> From user POV, driver names are very handly to address a device
> intuitively - except for the case you have tones of devices on the same
> bus that are handled by the same driver. For that case we need to
> augment the device name with a useful per-bus ID, derived from the bus
> address where available, otherwise based on instance numbers.

This is where I think you're missing a trick. We don't need to augment the 
name, we just need to allow the bus id to be used instead.
 
> > For other buses, we need to make something up.
> > 
> > Note that addressing by bus address rather than name is generally
> > useful, not just in the context of savevm.  For instance, I'd appreciate
> > being able to say something like "device_del pci.0/04.0".
> 
> And I prefer "device_del [.../]pci.0/e1000". Otherwise you need to dump
> the bus first before you can identify which device you want to remove.

We can allow both.

A bus address is sufficient to uniquely identify a device.  I see no reason to 
require the driver name,  or to include it in the canonical device address.

> > An easy way to get that is to reserve part of the name space for bus
> > addresses.  If the path component starts with a letter, it's an ID or
> > driver name.  If it starts with say '@', it's a bus address in
> > bus-specific syntax.  The bus provides a method to look it up.
> 
> I would prefer [@|.]. The former is
> set for buses that implement some to-be-defined device addressing
> service, the latter is the default on buses where that service is not
> available.

If we have bus-address then I see no good reason to also add instance-no.
For busses that no natural address, we can define the address to be an 
instance number.

> > That way, we gain a useful feature, and avoid having an savevm-specific
> > "device path" that isn't recognized anywhere else.
> 
> Agreed, we should find one solution for all use cases.

I wasn't aware that there was any suggestion of a separate savevm-specific 
path.  The whole point of a device path is to uniquely identify a device 
within a machine. There may be many different paths that identify the same 
device.  When given a device and asked to generate  path, the result should be 
the canonical address.  IMO this should be the least volatile, and avoid 
redundant information.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-15 Thread Paul Brook
> >> From user POV, driver names are very handly to address a device
> >> intuitively - except for the case you have tones of devices on the same
> >> bus that are handled by the same driver. For that case we need to
> >> augment the device name with a useful per-bus ID, derived from the bus
> >> address where available, otherwise based on instance numbers.
> > 
> > This is where I think you're missing a trick. We don't need to augment
> > the name, we just need to allow the bus id to be used instead.
> 
> I prefer having one name per device, both unique AND human-friendly.
> Adding yet another alias will solve only the first requirement. E.g.,
> which one should I present to the monitor user when listing a bus for
> auto-completion or path error reporting?

Autocompletion can report all of them. Errors should report either what the 
user specified (if the problem is with the address), or the canonical address 
(if the problem is unrelated to the address).

Remember that we also have global aliases (paths that do not begin with "/").

> >>> An easy way to get that is to reserve part of the name space for bus
> >>> addresses.  If the path component starts with a letter, it's an ID or
> >>> driver name.  If it starts with say '@', it's a bus address in
> >>> bus-specific syntax.  The bus provides a method to look it up.
> >> 
> >> I would prefer [@|.]. The former is
> >> set for buses that implement some to-be-defined device addressing
> >> service, the latter is the default on buses where that service is not
> >> available.
> > 
> > If we have bus-address then I see no good reason to also add instance-no.
> > For busses that no natural address, we can define the address to be an
> > instance number.
> 
> Again readability: isa-serial.0 & isa-serial.1 is more intuitive than
> isa-serial.6 & isa-serial.7 just because there happen to be 6 other ISA
> devices registered before them.

I don't think either of these are intuitive. There's a good chance that
isa-serial.0 will not correspond to the first serial port in the guest.
Much better would be allowing use of IO port or MMIO addresses to identify ISA 
devices.  Some modification to the ISABus code may be required to implement 
this.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-15 Thread Paul Brook
> Paul Brook wrote:
> >>>> From user POV, driver names are very handly to address a device
> >>>> intuitively - except for the case you have tones of devices on the
> >>>> same bus that are handled by the same driver. For that case we need
> >>>> to augment the device name with a useful per-bus ID, derived from the
> >>>> bus address where available, otherwise based on instance numbers.
> >>> 
> >>> This is where I think you're missing a trick. We don't need to augment
> >>> the name, we just need to allow the bus id to be used instead.
> >> 
> >> I prefer having one name per device, both unique AND human-friendly.
> >> Adding yet another alias will solve only the first requirement. E.g.,
> >> which one should I present to the monitor user when listing a bus for
> >> auto-completion or path error reporting?
> > 
> > Autocompletion can report all of them.
> 
> Autocompletion can only provide what is later on parseable.

Of course.

> It doesn't
> help to see "e1000" and "03.0" as device addresses because you do not
> know their relation that way. Only combining the information into a
> single name does.

I don't get your argument here. Why shouldn't e1000 and @03.0 refer to the 
same device? Querying the device itself will tell you both, so it's not hard 
to figure out that they refer to the same thing. Either piece of information 
is sufficient, so why do we require both?

Note that autocompletion and enumeration for mechanical traversal are 
different problems. The former should include useful aliases for humans (i.e. 
both e1000 and @03.0). The latter should be limited to the unique values 
corresponding to canonical addresses (i.e. @03.0).

> >> Again readability: isa-serial.0 & isa-serial.1 is more intuitive than
> >> isa-serial.6 & isa-serial.7 just because there happen to be 6 other ISA
> >> devices registered before them.
> > 
> > I don't think either of these are intuitive. There's a good chance that
> > isa-serial.0 will not correspond to the first serial port in the guest.
> 
> Only if you start tweaking the base addresses. Then it will still
> correspond to the addition order AND the user should be aware of this
> special setup.

I'm pretty sure there are some machines that have both internal UARTs 
(considered to be the primary ports), and secondary ports on an ISA bus.

If you really want instance numbers, then they may make most sense appended to 
the driver name. However I think this should be independent of bus addressing, 
and bus addresses make most sense as the canonical address.

> > Much better would be allowing use of IO port or MMIO addresses to
> > identify ISA devices.  Some modification to the ISABus code may be
> > required to implement this.
> 
> Works for serial, but fails for ISA devices not occupying an address.

An ISA device without an IO/MMIO capabilities seems extremely unlikely. What 
exactly would such a device do?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/5] qdev: Create qdev_get_dev_path()

2010-06-15 Thread Paul Brook
> >> Works for serial, but fails for ISA devices not occupying an address.
> > 
> > An ISA device without an IO/MMIO capabilities seems extremely unlikely.
> > What exactly would such a device do?
> 
> Inject interrupts via that bus (while exposing registers in some other
> way). The m48t59 seems to fall in this category (unless I'm missing
> something ATM).

m48t59_isa responds to IO ports.  You're probably confused because it has not 
been properly converted to the qdev.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >