[Qemu-devel] Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-02 Thread Jan Kiszka
Marcelo Tosatti wrote:
> On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
>> This grand cleanup drops all reset and vmsave/load related
>> synchronization points in favor of four(!) generic hooks:
>>
>> - cpu_synchronize_all_states in qemu_savevm_state_complete
>>   (initial sync from kernel before vmsave)
>> - cpu_synchronize_all_post_init in qemu_loadvm_state
>>   (writeback after vmload)
>> - cpu_synchronize_all_post_init in main after machine init
>> - cpu_synchronize_all_post_reset in qemu_system_reset
>>   (writeback after system reset)
>>
>> These writeback points + the existing one of VCPU exec after
>> cpu_synchronize_state map on three levels of writeback:
>>
>> - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
>> - KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
>> - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)
>>
>> This level is passed to the arch-specific VCPU state writing function
>> that will decide which concrete substates need to be written. That way,
>> no writer of load, save or reset functions that interact with in-kernel
>> KVM states will ever have to worry about synchronization again. That
>> also means that a lot of reasons for races, segfaults and deadlocks are
>> eliminated.
>>
>> cpu_synchronize_state remains untouched, just as Anthony suggested. We
>> continue to need it before reading or writing of VCPU states that are
>> also tracked by in-kernel KVM subsystems.
>>
>> Consequently, this patch removes many cpu_synchronize_state calls that
>> are now redundant, just like remaining explicit register syncs.
>>
>> Signed-off-by: Jan Kiszka 
> 
> Jan,
> 
> This patch breaks system reset of WinXP.32 install (more easily
> reproducible without iothread enabled).
> 
> Screenshot attached.
> 

Strange - no issues with qemu-kvm? Any special command line switch? /me
goes scrounging for some installation XP32 CD in the meantime...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH RFC 12/48] error: New error_printf() and error_vprintf()

2010-03-02 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Mon, 01 Mar 2010 09:54:32 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Wed, 24 Feb 2010 18:55:24 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> FIXME They should return int, so callers can calculate width.
>> >> 
>> >> Signed-off-by: Markus Armbruster 
>> >> ---
>> >>  qemu-error.c |   49 ++---
>> >>  qemu-error.h |   14 ++
>> >>  2 files changed, 56 insertions(+), 7 deletions(-)
>> >> 
>> >> diff --git a/qemu-error.c b/qemu-error.c
>> >> index 63bcdcf..76c660a 100644
>> >> --- a/qemu-error.c
>> >> +++ b/qemu-error.c
>> >> @@ -1,18 +1,53 @@
[...]
>> >> +/*
>> >> + * Print to current monitor if we have one, else to stderr.
>> >> + * FIXME should return int, so callers can calculate width, but that
>> >> + * requires surgery to monitor_printf().  Left for another day.
>> >> + */
>> >> +void error_vprintf(const char *fmt, va_list ap)
>> >>  {
>> >> -va_list args;
>> >> -
>> >> -va_start(args, fmt);
>> >>  if (cur_mon) {
>> >> -monitor_vprintf(cur_mon, fmt, args);
>> >> +monitor_vprintf(cur_mon, fmt, ap);
>> >>  } else {
>> >> -vfprintf(stderr, fmt, args);
>> >> +vfprintf(stderr, fmt, ap);
>> >>  }
>> >> -va_end(args);
>> >> +}
>> >
>> >  This can be static.
>> 
>> Yes.  But why would that be useful?  It's neither a name space pollution
>> nor does it poke a hole into an abstraction.
>
>  Well, IMHO unused public symbols serve only one purpose: to pollute the
> global namespace :)
>
>  So, I think the question is: if it doesn't have any user and if you
> don't expect it to be used anytime soon: why make it public?

It's a basic building block.  Uses will come.

>> >> +
>> >> +/*
>> >> + * Print to current monitor if we have one, else to stderr.
>> >> + * FIXME just like error_vprintf()
>> >> + */
>> >> +void error_printf(const char *fmt, ...)
>> >> +{
>> >> +va_list ap;
>> >> +
>> >> +va_start(ap, fmt);
>> >> +error_vprintf(fmt, ap);
>> >> +va_end(ap);
>> >> +}
>> >
>> >  This function's name is inconsistent with qemu_error() and
>> > qemu_error_new().
>> 
>> I'm fond of prepending qemu_ to random symbols left and right.  Yes, I
>> know I'm reading QEMU source code, thank you :)
>> 
>> If the names here are really important: What about stripping qemu_ from
>> qemu_error() & friends?
>
>  I'm ok with that (and Paolo gave some suggestions), but I hope you
> submit a patch soon. It's ok to criticize/improve bad consistency policies,
> but it's not ok to break them.

Paolo's error_raise() works for me, although I like error_report() a bit
better.

But we need two names, one for simple, direct error reporting (now
qemu_error()), and one for QMP-compatible error reporting (now
qemu_error_new()).

Call them error_report() and qerror_report()?  Or is that too similar?




Re: [Qemu-devel] [PATCH RFC 35/48] monitor: New in_qmp_mon()

2010-03-02 Thread Markus Armbruster
Luiz Capitulino  writes:

> On Mon, 01 Mar 2010 10:19:49 +0100
> Markus Armbruster  wrote:
>
>> Luiz Capitulino  writes:
>> 
>> > On Wed, 24 Feb 2010 18:55:47 +0100
>> > Markus Armbruster  wrote:
>> >
>> >> 
>> >> Signed-off-by: Markus Armbruster 
>> >> ---
>> >>  monitor.c |5 +
>> >>  monitor.h |2 ++
>> >>  2 files changed, 7 insertions(+), 0 deletions(-)
>> >> 
>> >> diff --git a/monitor.c b/monitor.c
>> >> index a4263af..5c87a98 100644
>> >> --- a/monitor.c
>> >> +++ b/monitor.c
>> >> @@ -194,6 +194,11 @@ static inline int monitor_ctrl_mode(const Monitor 
>> >> *mon)
>> >>  return (mon->flags & MONITOR_USE_CONTROL);
>> >>  }
>> >>  
>> >> +int in_qmp_mon(void)
>> >> +{
>> >> +return cur_mon && monitor_ctrl_mode(cur_mon);
>> >> +}
>> >> +
>> >
>> >  Afaik, all public monitor functions begin with 'monitor_'. While it's
>> > debatable if it's a good name, let's keep the consistency.
>> >
>> >  Also, I'm going to rename monitor_ctrl_mode() to something like
>> > monitor_qmp_mode() or monitor_is_qmp(). In this case the difference with
>> > in_qmp_mon() is not clear.
>> 
>> Care to suggest a name?
>
>  Maybe, monitor_ctrl_mode() should be monitor_is_qmp() and the new one
> monitor_cur_is_qmp()?

Works for me.

>  Or should we start exporting monitor names as 'mon_' to make them short?

Fine with me, but not in my patch series, it's plenty long already :)




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Michael S. Tsirkin
On Mon, Mar 01, 2010 at 03:54:00PM -0600, Anthony Liguori wrote:
> On 03/01/2010 01:27 PM, Michael S. Tsirkin wrote:
>> On Sun, Feb 28, 2010 at 10:39:21PM +, Paul Brook wrote:
>>
 I'm sympathetic to your arguments though.  As qemu is today, the above
 is definitely the right thing to do.  But ram is always ram and ram
 always has a fixed (albeit non-linear) mapping within a guest.

>>> I think this assumption is unsafe. There are machines where RAM mappings can
>>> change. It's not uncommon for a chip select (i.e. physical memory address
>>> region) to be switchable to several different sources, one of which may be
>>> RAM.  I'm pretty sure this functionality is present (but not actually
>>> implemented) on some of the current qemu targets.
>>>
>>> I agree that changing RAM mappings under an active DMA is a fairly suspect
>>> thing to do. However I think we need to avoid cache mappings between 
>>> separate
>>> DMA transactions i.e. when the guest can know that no DMA will occur, and
>>> safely remap things.
>>>
>>> I'm also of the opinion that virtio devices should behave the same as any
>>> other device. i.e. if you put a virtio-net-pci device on a PCI bus behind an
>>> IOMMU, then it should see the same address space as any other PCI device in
>>> that location.
>>>  
>> It already doesn't. virtio passes physical memory addresses
>> to device instead of DMA addresses.
>>
>
> That's technically a bug.
>
>>> Apart from anything else, failure to do this breaks nested
>>> virtualization.
>>>  
>> Assigning PV device in nested virtualization? It could work, but not
>> sure what the point would be.
>>
>
> It misses the point really.
>
> vhost-net is not a device model and it shouldn't have to care about  
> things like PCI IOMMU.  If we did ever implement a PCI IOMMU, then we  
> would perform ring translation (or not use vhost-net).
>
> Regards,
>
> Anthony Liguori

Right.

>>>   While qemu doesn't currently implement an IOMMU, the DMA
>>> interfaces have been designed to allow it.
>>>
>>>  
 void cpu_ram_add(target_phys_addr_t start, ram_addr_t size);

>>> We need to support aliased memory regions. For example the ARM RealView 
>>> boards
>>> expose the first 256M RAM at both address 0x0 and 0x7000. It's also 
>>> common
>>> for systems to create aliases by ignoring certain address bits. e.g. each 
>>> sim
>>> slot is allocated a fixed 256M region. Populating that slot with a 128M 
>>> stick
>>> will cause the contents to be aliased in both the top and bottom halves of
>>> that region.
>>>
>>> Paul
>>>  




Re: [Qemu-devel] [PATCH] VirtIO: Fix QEMU crash during Windows PNP tests

2010-03-02 Thread Michael S. Tsirkin
On Mon, Mar 01, 2010 at 01:26:37PM +0200, Gleb Natapov wrote:
> On Mon, Mar 01, 2010 at 12:14:43PM +0100, Alexander Graf wrote:
> > 
> > On 14.09.2009, at 15:31, Yan Vugenfirer wrote:
> > 
> > > Signed-off-by: Yan Vugenfirer 
> > > 
> > > ---
> > > hw/virtio-pci.c |   14 --
> > > 1 files changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> > > index f812ab7..a0a22c4 100644
> > > --- a/hw/virtio-pci.c
> > > +++ b/hw/virtio-pci.c
> > > @@ -364,8 +364,17 @@ static void virtio_map(PCIDevice *pci_dev, int 
> > > region_num,
> > > static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
> > > uint32_t val, int len)
> > > {
> > > +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
> > > +
> > > +if (PCI_COMMAND == address) {
> > > +if (!(val & PCI_COMMAND_MASTER)) {
> > > +proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
> > 
> > This part breaks PCI hotplug with Linux guests.
> > 
> > What happens is the following:
> Looks like something is broken even without this patch.
> 
> > 
> > (qemu) pci_add auto nic model=virtio,vlan=0
> >   - Virtio dev 1 -> write config (0)
> >   - Virtio dev 1 -> write config (0x3)
> >   - Virtio dev 1 -> set status explicitly to OK
> Why Linux doesn't enable bus mastering on this device?

I fixed this, and Rusty said he has applied the patch,
but it seems the patch got lost later:

http://lkml.org/lkml/2009/11/29/73

Alexander, could you please check whether applying
this patch fixes hotplug for you?
If yes I will queue it up for 2.6.34.


> > 
> > (qemu) pci_add auto storage file=/tmp/image.raw,if=virtio
> >   - Virtio dev 1 -> write config (0x3)
> Why Linux touches unrelated device's config space?
> 
> >   -> Unset DRIVER_OK bit
> >   -> network device becomes unresponsive 
> >   - Virtio dev 2 -> write config (0)
> >   - Virtio dev 2 -> write config (0x3)
> >   - Virtio dev 2 -> set status explicitly to OK
> > 
> > 
> > Please either revert this patch or provide a proper fix.
> > 
> > Alex
> 
> --
>   Gleb.




Re: [Qemu-devel] Re: EHCI support in QEMU

2010-03-02 Thread Kevin Wolf
Am 01.03.2010 22:17, schrieb Jan Kiszka:
> Niels de Vos wrote:
>> If someone is interested in this partially ported patch, I'm happy to
>> share, but it will at least need some attention to make it compile.
>> After that, lots of tests need to be done and probably quite some
>> bugfixes are required. I'm happy to assist, but do not have a lot of
>> time to spare on this hobby project. On the occasion that it is something
>> more solid and starting to do something, I will of course inform this list
>> again.
> 
> OK, to keep this heavy ball rolling, I would suggest posting your patch.
> Either it's already in a good shape to get it merged as experimental
> feature. Or I will pick it up in git tree, collect patches as they fly
> in, and will keep on pushing it upstream. I can't promise spending much
> time on hacking, but integration work, basic testing, and some more or
> less helpful comments should be feasible.

Yes, please post it. I won't promise anything either, but maybe I can
find some time to help a bit. Anyway, I'd love to see EHCI in qemu.

Kevin




RE: [Qemu-devel] Errors while building qemu

2010-03-02 Thread Taimoor Mirza

 

I've not passed any switch to enable or disable libcurl.

 

I just downloaded QEMU 0.12.2, ran "./configure" and then "make" which gives me 
errors

 

Also I've few questions

 

1) How can I enable or disable something?

2) How can I change config-host.h. Actually I want to configure gadget fs and 
for that I must have CONFIG_GADGETFS in config-host.h

 

Regards,

Taimoor
 
> Date: Fri, 19 Feb 2010 10:22:56 -0200
> From: lcapitul...@redhat.com
> To: roy...@gmail.com
> Subject: Re: [Qemu-devel] Errors while building qemu
> CC: mooni_mi...@hotmail.com; qemu-devel@nongnu.org
> 
> On Thu, 18 Feb 2010 23:53:47 +0800
> Roy Tam  wrote:
> 
> > 2010/2/18 Taimoor Mirza :
> > > Hi all,
> > >
> > > I've been getting following errors while building qemu 0.11.1 and 0.12.2 
> > > on
> > > my Redhat machine.
> > >
> > > block/curl.c:80: error: syntax error before "curl_socket_t"
> > > block/curl.c:82: warning: function declaration isn't a prototype
> > > block/curl.c: In function `curl_sock_cb':
> > > block/curl.c:84: error: `action' undeclared (first use in this function)
> > > block/curl.c:84: error: (Each undeclared identifier is reported only once
> > > block/curl.c:84: error: for each function it appears in.)
> > > block/curl.c:85: error: `CURL_POLL_IN' undeclared (first use in this
> > > function)
> > > block/curl.c:86: error: `fd' undeclared (first use in this function)
> > > block/curl.c:86: error: `s' undeclared (first use in this function)
> > > block/curl.c:88: error: `CURL_POLL_OUT' undeclared (first use in this
> > > function)
> > > block/curl.c:91: error: `CURL_POLL_INOUT' undeclared (first use in this
> > > function )
> > > block/curl.c:95: error: `CURL_POLL_REMOVE' undeclared (first use in this
> > > functio n)
> > > block/curl.c: In function `curl_multi_do':
> > > block/curl.c:212: warning: implicit declaration of function
> > > `curl_multi_socket_a ll'
> > > block/curl.c: In function `curl_open':
> > > block/curl.c:381: warning: implicit declaration of function
> > > `curl_multi_setopt'
> > > block/curl.c:381: error: `CURLMOPT_SOCKETDATA' undeclared (first use in 
> > > this
> > > fun ction)
> > > block/curl.c:382: error: `CURLMOPT_SOCKETFUNCTION' undeclared (first use 
> > > in
> > > this function)
> > > make: *** [block/curl.o] Error 1
> > >
> > > Can anyone tell me the reason?
> > >
> > 
> > Do you have libcurl devel library?
> > Try rebuild with "--disable-curl" switch.
> 
> Taimoor, please, let us know if you don't have libcurl installed,
> because in this case it's a bug (assuming that you didn't pass
> --enable-curl).
> 
> 
> 
  
_
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969

Re: [Qemu-devel] [PATCH] VirtIO: Fix QEMU crash during Windows PNP tests

2010-03-02 Thread Alexander Graf

On 02.03.2010, at 11:09, Michael S. Tsirkin wrote:

> On Mon, Mar 01, 2010 at 01:26:37PM +0200, Gleb Natapov wrote:
>> On Mon, Mar 01, 2010 at 12:14:43PM +0100, Alexander Graf wrote:
>>> 
>>> On 14.09.2009, at 15:31, Yan Vugenfirer wrote:
>>> 
 Signed-off-by: Yan Vugenfirer 
 
 ---
 hw/virtio-pci.c |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index f812ab7..a0a22c4 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -364,8 +364,17 @@ static void virtio_map(PCIDevice *pci_dev, int 
 region_num,
 static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
uint32_t val, int len)
 {
 +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
 +
 +if (PCI_COMMAND == address) {
 +if (!(val & PCI_COMMAND_MASTER)) {
 +proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
>>> 
>>> This part breaks PCI hotplug with Linux guests.
>>> 
>>> What happens is the following:
>> Looks like something is broken even without this patch.
>> 
>>> 
>>> (qemu) pci_add auto nic model=virtio,vlan=0
>>>  - Virtio dev 1 -> write config (0)
>>>  - Virtio dev 1 -> write config (0x3)
>>>  - Virtio dev 1 -> set status explicitly to OK
>> Why Linux doesn't enable bus mastering on this device?
> 
> I fixed this, and Rusty said he has applied the patch,
> but it seems the patch got lost later:
> 
> http://lkml.org/lkml/2009/11/29/73
> 
> Alexander, could you please check whether applying
> this patch fixes hotplug for you?
> If yes I will queue it up for 2.6.34.

So we're looking at a guest bug?

It's still a qemu regression then. This configuration used to work with 0.10. 
There should at least be a feature bit from the guest saying "I'm good with bus 
mastering", so we can enable the check only for those.


Alex



Re: [Qemu-devel] [PATCH] VirtIO: Fix QEMU crash during Windows PNP tests

2010-03-02 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 12:03:15PM +0100, Alexander Graf wrote:
> 
> On 02.03.2010, at 11:09, Michael S. Tsirkin wrote:
> 
> > On Mon, Mar 01, 2010 at 01:26:37PM +0200, Gleb Natapov wrote:
> >> On Mon, Mar 01, 2010 at 12:14:43PM +0100, Alexander Graf wrote:
> >>> 
> >>> On 14.09.2009, at 15:31, Yan Vugenfirer wrote:
> >>> 
>  Signed-off-by: Yan Vugenfirer 
>  
>  ---
>  hw/virtio-pci.c |   14 --
>  1 files changed, 12 insertions(+), 2 deletions(-)
>  
>  diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>  index f812ab7..a0a22c4 100644
>  --- a/hw/virtio-pci.c
>  +++ b/hw/virtio-pci.c
>  @@ -364,8 +364,17 @@ static void virtio_map(PCIDevice *pci_dev, int 
>  region_num,
>  static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
> uint32_t val, int len)
>  {
>  +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
>  +
>  +if (PCI_COMMAND == address) {
>  +if (!(val & PCI_COMMAND_MASTER)) {
>  +proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
> >>> 
> >>> This part breaks PCI hotplug with Linux guests.
> >>> 
> >>> What happens is the following:
> >> Looks like something is broken even without this patch.
> >> 
> >>> 
> >>> (qemu) pci_add auto nic model=virtio,vlan=0
> >>>  - Virtio dev 1 -> write config (0)
> >>>  - Virtio dev 1 -> write config (0x3)
> >>>  - Virtio dev 1 -> set status explicitly to OK
> >> Why Linux doesn't enable bus mastering on this device?
> > 
> > I fixed this, and Rusty said he has applied the patch,
> > but it seems the patch got lost later:
> > 
> > http://lkml.org/lkml/2009/11/29/73
> > 
> > Alexander, could you please check whether applying
> > this patch fixes hotplug for you?
> > If yes I will queue it up for 2.6.34.
> 
> So we're looking at a guest bug?

Donnu. Does it work with the patch applied?

> It's still a qemu regression then. This configuration used to work
> with 0.10. There should at least be a feature bit from the guest
> saying "I'm good with bus mastering", so we can enable the check only
> for those.
> Alex




[Qemu-devel] [PATCH] qemu-img rebase: Add -f option

2010-03-02 Thread Kevin Wolf
Allow the user to specify the format of the image to rebase.

Signed-off-by: Kevin Wolf 
---
 qemu-img.c |   20 
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 258dc62..7fc980a 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1081,24 +1081,28 @@ static int img_snapshot(int argc, char **argv)
 static int img_rebase(int argc, char **argv)
 {
 BlockDriverState *bs, *bs_old_backing, *bs_new_backing;
-BlockDriver *old_backing_drv, *new_backing_drv;
+BlockDriver *drv, *old_backing_drv, *new_backing_drv;
 char *filename;
-const char *out_basefmt, *out_baseimg;
+const char *fmt, *out_basefmt, *out_baseimg;
 int c, flags, ret;
 int unsafe = 0;
 
 /* Parse commandline parameters */
+fmt = NULL;
 out_baseimg = NULL;
 out_basefmt = NULL;
 
 for(;;) {
-c = getopt(argc, argv, "uhF:b:");
+c = getopt(argc, argv, "uhf:F:b:");
 if (c == -1)
 break;
 switch(c) {
 case 'h':
 help();
 return 0;
+case 'f':
+fmt = optarg;
+break;
 case 'F':
 out_basefmt = optarg;
 break;
@@ -1125,8 +1129,16 @@ static int img_rebase(int argc, char **argv)
 if (!bs)
 error("Not enough memory");
 
+drv = NULL;
+if (fmt) {
+drv = bdrv_find_format(fmt);
+if (drv == NULL) {
+error("Invalid format name: '%s'", fmt);
+}
+}
+
 flags = BRDV_O_FLAGS | BDRV_O_RDWR | (unsafe ? BDRV_O_NO_BACKING : 0);
-if (bdrv_open2(bs, filename, flags, NULL) < 0) {
+if (bdrv_open2(bs, filename, flags, drv) < 0) {
 error("Could not open '%s'", filename);
 }
 
-- 
1.6.6.1





Re: [Qemu-devel] [PATCH] VirtIO: Fix QEMU crash during Windows PNP tests

2010-03-02 Thread Alexander Graf

On 02.03.2010, at 12:05, Michael S. Tsirkin wrote:

> On Tue, Mar 02, 2010 at 12:03:15PM +0100, Alexander Graf wrote:
>> 
>> On 02.03.2010, at 11:09, Michael S. Tsirkin wrote:
>> 
>>> On Mon, Mar 01, 2010 at 01:26:37PM +0200, Gleb Natapov wrote:
 On Mon, Mar 01, 2010 at 12:14:43PM +0100, Alexander Graf wrote:
> 
> On 14.09.2009, at 15:31, Yan Vugenfirer wrote:
> 
>> Signed-off-by: Yan Vugenfirer 
>> 
>> ---
>> hw/virtio-pci.c |   14 --
>> 1 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>> index f812ab7..a0a22c4 100644
>> --- a/hw/virtio-pci.c
>> +++ b/hw/virtio-pci.c
>> @@ -364,8 +364,17 @@ static void virtio_map(PCIDevice *pci_dev, int 
>> region_num,
>> static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
>>   uint32_t val, int len)
>> {
>> +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
>> +
>> +if (PCI_COMMAND == address) {
>> +if (!(val & PCI_COMMAND_MASTER)) {
>> +proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
> 
> This part breaks PCI hotplug with Linux guests.
> 
> What happens is the following:
 Looks like something is broken even without this patch.
 
> 
> (qemu) pci_add auto nic model=virtio,vlan=0
> - Virtio dev 1 -> write config (0)
> - Virtio dev 1 -> write config (0x3)
> - Virtio dev 1 -> set status explicitly to OK
 Why Linux doesn't enable bus mastering on this device?
>>> 
>>> I fixed this, and Rusty said he has applied the patch,
>>> but it seems the patch got lost later:
>>> 
>>> http://lkml.org/lkml/2009/11/29/73
>>> 
>>> Alexander, could you please check whether applying
>>> this patch fixes hotplug for you?
>>> If yes I will queue it up for 2.6.34.
>> 
>> So we're looking at a guest bug?
> 
> Donnu. Does it work with the patch applied?

Yes, it works with the patch applied. It's still missing a capability though 
:-).


Alex



[Qemu-devel] Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-02 Thread Marcelo Tosatti
On Tue, Mar 02, 2010 at 09:00:04AM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
> >> This grand cleanup drops all reset and vmsave/load related
> >> synchronization points in favor of four(!) generic hooks:
> >>
> >> - cpu_synchronize_all_states in qemu_savevm_state_complete
> >>   (initial sync from kernel before vmsave)
> >> - cpu_synchronize_all_post_init in qemu_loadvm_state
> >>   (writeback after vmload)
> >> - cpu_synchronize_all_post_init in main after machine init
> >> - cpu_synchronize_all_post_reset in qemu_system_reset
> >>   (writeback after system reset)
> >>
> >> These writeback points + the existing one of VCPU exec after
> >> cpu_synchronize_state map on three levels of writeback:
> >>
> >> - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
> >> - KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
> >> - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)
> >>
> >> This level is passed to the arch-specific VCPU state writing function
> >> that will decide which concrete substates need to be written. That way,
> >> no writer of load, save or reset functions that interact with in-kernel
> >> KVM states will ever have to worry about synchronization again. That
> >> also means that a lot of reasons for races, segfaults and deadlocks are
> >> eliminated.
> >>
> >> cpu_synchronize_state remains untouched, just as Anthony suggested. We
> >> continue to need it before reading or writing of VCPU states that are
> >> also tracked by in-kernel KVM subsystems.
> >>
> >> Consequently, this patch removes many cpu_synchronize_state calls that
> >> are now redundant, just like remaining explicit register syncs.
> >>
> >> Signed-off-by: Jan Kiszka 
> > 
> > Jan,
> > 
> > This patch breaks system reset of WinXP.32 install (more easily
> > reproducible without iothread enabled).
> > 
> > Screenshot attached.
> > 
> 
> Strange - no issues with qemu-kvm? Any special command line switch? /me
> goes scrounging for some installation XP32 CD in the meantime...

No issues with qemu-kvm. Could not spot anything obvious.

> 
> Jan
> 






[Qemu-devel] Re: [PATCH RFC 12/48] error: New error_printf() and error_vprintf()

2010-03-02 Thread Paolo Bonzini

On 03/02/2010 09:33 AM, Markus Armbruster wrote:

Paolo's error_raise() works for me, although I like error_report() a bit
better.


error_report is also fine, of course.


But we need two names, one for simple, direct error reporting (now
qemu_error()), and one for QMP-compatible error reporting (now
qemu_error_new()).

Call them error_report() and qerror_report()?  Or is that too similar?


Actually I like it.

Paolo




[Qemu-devel] [PATCH] qemu: printf related build fixes

2010-03-02 Thread Michael S. Tsirkin
Some versions of glibc (e.g. globc 2.5 that ships with rhel5.4)
make printf a macro. This makes using preprocessor within calls
to printf illegal. Move preprocessor use outside printf calls.

Signed-off-by: Michael S. Tsirkin 
---
 qemu-img.c |5 +++--
 readline.c |1 +
 vl.c   |   16 +++-
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index cbba4fc..435a9c1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -58,7 +58,8 @@ static void format_print(void *opaque, const char *name)
 /* Please keep in synch with qemu-img.texi */
 static void help(void)
 {
-printf("qemu-img version " QEMU_VERSION ", Copyright (c) 2004-2008 Fabrice 
Bellard\n"
+const char h[] = 
+  ("qemu-img version " QEMU_VERSION ", Copyright (c) 2004-2008 Fabrice 
Bellard\n"
"usage: qemu-img command [command options]\n"
"QEMU disk image utility\n"
"\n"
@@ -93,7 +94,7 @@ static void help(void)
"  '-d' deletes a snapshot\n"
"  '-l' lists all snapshots in the given image\n"
);
-printf("\nSupported formats:");
+printf("%s\nSupported formats:", h);
 bdrv_iterate_format(format_print, NULL);
 printf("\n");
 exit(1);
diff --git a/readline.c b/readline.c
index 7834af0..92f9cd1 100644
--- a/readline.c
+++ b/readline.c
@@ -28,6 +28,7 @@
 #define IS_ESC  1
 #define IS_CSI  2
 
+#undef printf
 #define printf do_not_use_printf
 
 void readline_show_prompt(ReadLineState *rs)
diff --git a/vl.c b/vl.c
index 4ef6a78..7804001 100644
--- a/vl.c
+++ b/vl.c
@@ -4080,7 +4080,8 @@ static void version(void)
 static void help(int exitcode)
 {
 version();
-printf("usage: %s [options] [disk_image]\n"
+const char h[] =
+  ("usage: %s [options] [disk_image]\n"
"\n"
"'disk_image' is a raw hard image image for IDE hard disk 0\n"
"\n"
@@ -4098,15 +4099,20 @@ static void help(int exitcode)
"ctrl-alttoggle mouse and keyboard grab\n"
"\n"
"When using -nographic, press 'ctrl-a h' to get some help.\n"
-   ,
-   "qemu",
-   DEFAULT_RAM_SIZE,
+   );
 #ifndef _WIN32
+printf(h, "qemu",
+   DEFAULT_RAM_SIZE,
DEFAULT_NETWORK_SCRIPT,
DEFAULT_NETWORK_DOWN_SCRIPT,
-#endif
DEFAULT_GDBSTUB_PORT,
"/tmp/qemu.log");
+#else
+printf(h, "qemu",
+   DEFAULT_RAM_SIZE,
+   DEFAULT_GDBSTUB_PORT,
+   "/tmp/qemu.log");
+#endif
 exit(exitcode);
 }
 
-- 
1.7.0.18.g0d53a5




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 02/28/2010 04:39 PM, Paul Brook wrote:

I'm sympathetic to your arguments though.  As qemu is today, the above
is definitely the right thing to do.  But ram is always ram and ram
always has a fixed (albeit non-linear) mapping within a guest.
 

I think this assumption is unsafe. There are machines where RAM mappings can
change. It's not uncommon for a chip select (i.e. physical memory address
region) to be switchable to several different sources, one of which may be
RAM.  I'm pretty sure this functionality is present (but not actually
implemented) on some of the current qemu targets.
   


But I presume this is more about switching a dim to point at a different 
region in memory.  It's a rare event similar to memory hot plug.


Either way, if there are platforms where we don't treat ram with the new 
ram api, that's okay.



I agree that changing RAM mappings under an active DMA is a fairly suspect
thing to do. However I think we need to avoid cache mappings between separate
DMA transactions i.e. when the guest can know that no DMA will occur, and
safely remap things.
   


One thing I like about having a new ram api is it gives us a stronger 
interface than what we have today.  Today, we don't have a strong 
guarantee that mappings won't be changed during a DMA transaction.


With a new api, cpu_physical_memory_map() changes semantics.  It only 
returns pointers for static ram mappings.  Everything else is bounced 
which guarantees that an address can't change during DMA.



I'm also of the opinion that virtio devices should behave the same as any
other device. i.e. if you put a virtio-net-pci device on a PCI bus behind an
IOMMU, then it should see the same address space as any other PCI device in
that location.  Apart from anything else, failure to do this breaks nested
virtualization.  While qemu doesn't currently implement an IOMMU, the DMA
interfaces have been designed to allow it.
   


Yes, I've been working on that.  virtio is a bit more complicated than a 
normal PCI device because it can be on top of two busses.  It needs an 
additional layer of abstraction to deal with this.



void cpu_ram_add(target_phys_addr_t start, ram_addr_t size);
 

We need to support aliased memory regions. For example the ARM RealView boards
expose the first 256M RAM at both address 0x0 and 0x7000. It's also common
for systems to create aliases by ignoring certain address bits. e.g. each sim
slot is allocated a fixed 256M region. Populating that slot with a 128M stick
will cause the contents to be aliased in both the top and bottom halves of
that region.
   


Okay, I'd prefer to add an explicit aliasing API.  That gives us more 
information to work with.


Regards,

Anthony Liguori


Paul
   






[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Paul Brook
> > I think this assumption is unsafe. There are machines where RAM mappings
> > can change. It's not uncommon for a chip select (i.e. physical memory
> > address region) to be switchable to several different sources, one of
> > which may be RAM.  I'm pretty sure this functionality is present (but not
> > actually implemented) on some of the current qemu targets.
> 
> But I presume this is more about switching a dim to point at a different
> region in memory.  It's a rare event similar to memory hot plug.

Approximately that, yes. One use is to start with ROM at address zero, then 
switch to RAM once you've initialised the DRAM controller (using a small 
internal SRAM as a trampoline). 
 
> Either way, if there are platforms where we don't treat ram with the new
> ram api, that's okay.
> 
> > I agree that changing RAM mappings under an active DMA is a fairly
> > suspect thing to do. However I think we need to avoid cache mappings
> > between separate DMA transactions i.e. when the guest can know that no
> > DMA will occur, and safely remap things.
> 
> One thing I like about having a new ram api is it gives us a stronger
> interface than what we have today.  Today, we don't have a strong
> guarantee that mappings won't be changed during a DMA transaction.
> 
> With a new api, cpu_physical_memory_map() changes semantics.  It only
> returns pointers for static ram mappings.  Everything else is bounced
> which guarantees that an address can't change during DMA.

Doesn't this mean that only the initial RAM is directly DMA-able?

While memory hotplug(and unplug) may be an infrequent event, having the 
majority of ram be hotplug seems much more likely.  Giving each guest a small 
base allocation, then hotplug the rest as required/paid for seems an entirely 
reasonable setup.  I use VMs which are normally hosted on a multi-core machine 
with buckets of ram, but may be migrated to a single CPU host with limited ram 
in an emergency.

Paul




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 03/02/2010 08:33 AM, Paul Brook wrote:

I think this assumption is unsafe. There are machines where RAM mappings
can change. It's not uncommon for a chip select (i.e. physical memory
address region) to be switchable to several different sources, one of
which may be RAM.  I'm pretty sure this functionality is present (but not
actually implemented) on some of the current qemu targets.
   

But I presume this is more about switching a dim to point at a different
region in memory.  It's a rare event similar to memory hot plug.
 

Approximately that, yes. One use is to start with ROM at address zero, then
switch to RAM once you've initialised the DRAM controller (using a small
internal SRAM as a trampoline).

   

Either way, if there are platforms where we don't treat ram with the new
ram api, that's okay.

 

I agree that changing RAM mappings under an active DMA is a fairly
suspect thing to do. However I think we need to avoid cache mappings
between separate DMA transactions i.e. when the guest can know that no
DMA will occur, and safely remap things.
   

One thing I like about having a new ram api is it gives us a stronger
interface than what we have today.  Today, we don't have a strong
guarantee that mappings won't be changed during a DMA transaction.

With a new api, cpu_physical_memory_map() changes semantics.  It only
returns pointers for static ram mappings.  Everything else is bounced
which guarantees that an address can't change during DMA.
 

Doesn't this mean that only the initial RAM is directly DMA-able?

While memory hotplug(and unplug) may be an infrequent event, having the
majority of ram be hotplug seems much more likely.


Hotplug works fine for direct DMA'ing.  map/unmap would maintain a 
reference count on the registered RAM region and hot unplug would not be 
allowed until that reference dropped to zero.  For something like 
virtio, it means that the driver has to be unloaded in the guest before 
you hot unplug the region of memory if it happens to be using that 
region of memory for the ring storage.


The key difference is that these regions are created and destroyed 
rarely and in such a way that the destruction is visible to the guest.


If you compare that to IO memory, we currently flip IO memory's mappings 
dynamically without the guest really being aware (such as the VGA 
optimization).  An API like this wouldn't work for IO memory today 
without some serious thinking about how to model this sort of thing.


Regards,

Anthony Liguori





[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Paul Brook
> >> With a new api, cpu_physical_memory_map() changes semantics.  It only
> >> returns pointers for static ram mappings.  Everything else is bounced
> >> which guarantees that an address can't change during DMA.
> >
> > Doesn't this mean that only the initial RAM is directly DMA-able?
> >
> > While memory hotplug(and unplug) may be an infrequent event, having the
> > majority of ram be hotplug seems much more likely.
> 
> Hotplug works fine for direct DMA'ing.  map/unmap would maintain a
> reference count on the registered RAM region and hot unplug would not be
> allowed until that reference dropped to zero.  For something like
> virtio, it means that the driver has to be unloaded in the guest before
> you hot unplug the region of memory if it happens to be using that
> region of memory for the ring storage.
> 
> The key difference is that these regions are created and destroyed
> rarely and in such a way that the destruction is visible to the guest.

So you're making ram unmap an asynchronous process, and requiring that the 
address space not be reused until that umap has completed?

Paul




Re: [Qemu-devel] can i get the QEMU source code for ARM host platform?

2010-03-02 Thread Paul Brook
> I want to port QEMU on the ARM11 platform. I think, many developers try to
> this work.

Should already work.

Paul




[Qemu-devel] Re: [PATCH] qemu: printf related build fixes

2010-03-02 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 03:25:40PM +0200, Michael S. Tsirkin wrote:
> Some versions of glibc (e.g. globc 2.5 that ships with rhel5.4)
> make printf a macro. This makes using preprocessor within calls
> to printf illegal. Move preprocessor use outside printf calls.
> 
> Signed-off-by: Michael S. Tsirkin 

Looks like this was solved differently meanwhile.
Pls disregard.

-- 
MST




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 03/02/2010 08:55 AM, Paul Brook wrote:

With a new api, cpu_physical_memory_map() changes semantics.  It only
returns pointers for static ram mappings.  Everything else is bounced
which guarantees that an address can't change during DMA.
 

Doesn't this mean that only the initial RAM is directly DMA-able?

While memory hotplug(and unplug) may be an infrequent event, having the
majority of ram be hotplug seems much more likely.
   

Hotplug works fine for direct DMA'ing.  map/unmap would maintain a
reference count on the registered RAM region and hot unplug would not be
allowed until that reference dropped to zero.  For something like
virtio, it means that the driver has to be unloaded in the guest before
you hot unplug the region of memory if it happens to be using that
region of memory for the ring storage.

The key difference is that these regions are created and destroyed
rarely and in such a way that the destruction is visible to the guest.
 

So you're making ram unmap an asynchronous process, and requiring that the
address space not be reused until that umap has completed?
   


It technically already would be.  If you've got a pending DMA 
transaction and you try to hot unplug badness will happen.  This is 
something that is certainly exploitable.


Regards,

Anthony Liguori


Paul
   






[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Paul Brook
> >> The key difference is that these regions are created and destroyed
> >> rarely and in such a way that the destruction is visible to the guest.
> >
> > So you're making ram unmap an asynchronous process, and requiring that
> > the address space not be reused until that umap has completed?
> 
> It technically already would be.  If you've got a pending DMA
> transaction and you try to hot unplug badness will happen.  This is
> something that is certainly exploitable.

Hmm, I guess we probably want to make this work with all mappings then. DMA to 
a ram backed PCI BAR (e.g. video ram) is certainly feasible.
Technically it's not the unmap that causes badness, it's freeing the 
underlying ram.

For these reasons I'm tempted to push the refcounting down to the ram 
allocation level. This has a couple of nice properties.

Firstly we don't care about dynamic allocation any more. We just say that 
mapping changes may not effect active DMA transactions. If virtio chooses to 
define that the vring DMA transaction starts when the device is enabled and 
ends when disabled, that's fine by me.  This probably requires revisiting the 
memory barrier issues - barriers are pointless if you don't guarantee cache 
coherence (i.e. no bounce buffers).

Secondly, ram deallocation is not guest visible. The guest visible parts 
(memory unmapping) can happen immediately, and we avoid a whole set of 
unplug/replug race conditions. We may want to delay the completion of a 
monitor hotplug command until the actual deallocation occurs, but that's a 
largely separate issue.

Paul




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 03:53:30PM +, Paul Brook wrote:
> > >> The key difference is that these regions are created and destroyed
> > >> rarely and in such a way that the destruction is visible to the guest.
> > >
> > > So you're making ram unmap an asynchronous process, and requiring that
> > > the address space not be reused until that umap has completed?
> > 
> > It technically already would be.  If you've got a pending DMA
> > transaction and you try to hot unplug badness will happen.  This is
> > something that is certainly exploitable.
> 
> Hmm, I guess we probably want to make this work with all mappings then. DMA 
> to 
> a ram backed PCI BAR (e.g. video ram) is certainly feasible.

This used to be possible with PCI/PCI-X.  But as far as I know, with PCI
Express, devices can not access each other's BARs.

> Technically it's not the unmap that causes badness, it's freeing the 
> underlying ram.
> 
> For these reasons I'm tempted to push the refcounting down to the ram 
> allocation level. This has a couple of nice properties.
> 
> Firstly we don't care about dynamic allocation any more. We just say that 
> mapping changes may not effect active DMA transactions. If virtio chooses to 
> define that the vring DMA transaction starts when the device is enabled and 
> ends when disabled, that's fine by me.  This probably requires revisiting the 
> memory barrier issues - barriers are pointless if you don't guarantee cache 
> coherence (i.e. no bounce buffers).
> 
> Secondly, ram deallocation is not guest visible. The guest visible parts 
> (memory unmapping) can happen immediately, and we avoid a whole set of 
> unplug/replug race conditions. We may want to delay the completion of a 
> monitor hotplug command until the actual deallocation occurs, but that's a 
> largely separate issue.
> 
> Paul




[Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 03/02/2010 09:53 AM, Paul Brook wrote:

The key difference is that these regions are created and destroyed
rarely and in such a way that the destruction is visible to the guest.
 

So you're making ram unmap an asynchronous process, and requiring that
the address space not be reused until that umap has completed?
   

It technically already would be.  If you've got a pending DMA
transaction and you try to hot unplug badness will happen.  This is
something that is certainly exploitable.
 

Hmm, I guess we probably want to make this work with all mappings then. DMA to
a ram backed PCI BAR (e.g. video ram) is certainly feasible.
Technically it's not the unmap that causes badness, it's freeing the
underlying ram.
   


Let's avoid confusing terminology.  We have RAM mappings and then we 
have PCI BARs that are mapped as IO_MEM_RAM.


PCI BARs mapped as IO_MEM_RAM are allocated by the device and live for 
the duration of the device.  If you did something that changed the BAR's 
mapping from IO_MEM_RAM to an actual IO memory type, then you'd continue 
to DMA to the allocated device memory instead of doing MMIO operations.[1]


That's completely accurate and safe.  If you did this to bare metal, I 
expect you'd get very similar results.


This is different from DMA'ing to a RAM region and then removing the RAM 
region while the IO is in flight.  In this case, the mapping disappears 
and you potentially have the guest writing to an invalid host pointer.


[1] I don't think it's useful to support DMA'ing to arbitrary IO_MEM_RAM 
areas.  Instead, I think we should always bounce to this memory.  The 
benefit is that we avoid the complications resulting from PCI hot unplug 
and reference counting.



For these reasons I'm tempted to push the refcounting down to the ram
allocation level. This has a couple of nice properties.

Firstly we don't care about dynamic allocation any more. We just say that
mapping changes may not effect active DMA transactions.


Only if we think it's necessary to support native DMA to arbitrary 
IO_MEM_RAM.  I contend this is never a normal or performance sensitive 
case and it's not worth supporting.



  If virtio chooses to
define that the vring DMA transaction starts when the device is enabled and
ends when disabled, that's fine by me.  This probably requires revisiting the
memory barrier issues - barriers are pointless if you don't guarantee cache
coherence (i.e. no bounce buffers).

Secondly, ram deallocation is not guest visible. The guest visible parts
(memory unmapping) can happen immediately, and we avoid a whole set of
unplug/replug race conditions. We may want to delay the completion of a
monitor hotplug command until the actual deallocation occurs, but that's a
largely separate issue.
   


You can do the same thing and always bounce IO_MEM_RAM IO regions.  It's 
just a question of whether we think it's worth the effort to do native 
DMA to this type of memory.  I personally don't think it is at least in 
the beginning.


Regards,

Anthony Liguori


Paul
   






Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Marcelo Tosatti
On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote:
> On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
> >>Both have  security implications so I think it's important that they
> >>be addressed.   Otherwise, I'm pretty happy with how things are.
> >Care suggesting some solutions?
> 
> The obvious thing to do would be to use the memory notifier in vhost
> to keep track of whenever something remaps the ring's memory region
> and if that happens, issue an ioctl to vhost to change the location
> of the ring.  Also, you would need to merge the vhost slot
> management code with the KVM slot management code.

There are no security implications as long as vhost uses the qemu
process mappings.

But your point is valid.

> I'm sympathetic to your arguments though.  As qemu is today, the
> above is definitely the right thing to do.  But ram is always ram
> and ram always has a fixed (albeit non-linear) mapping within a
> guest.  We can probably be smarter in qemu.
> 
> There are areas of MMIO/ROM address space that *sometimes* end up
> behaving like ram, but that's a different case.  The one other case
> to consider is ram hot add/remove in which case, ram may be removed
> or added (but it's location will never change during its lifetime).
> 
> Here's what I'll propose, and I'd really like to hear what Paul
> think about it before we start down this path.
> 
> I think we should add a new API that's:
> 
> void cpu_ram_add(target_phys_addr_t start, ram_addr_t size);
> 
> This API would do two things.  It would call qemu_ram_alloc() and
> cpu_register_physical_memory() just as code does today.  It would
> also add this region into a new table.
> 
> There would be:
> 
> void *cpu_ram_map(target_phys_addr_t start, ram_addr_t *size);
> void cpu_ram_unmap(void *mem);
> 
> These calls would use this new table to lookup ram addresses.  These
> mappings are valid as long as the guest is executed.  Within the
> table, each region would have a reference count.  When it comes time
> to do hot add/remove, we would wait to remove a region until the
> reference count went to zero to avoid unmapping during DMA.
>
> cpu_ram_add() never gets called with overlapping regions.  We'll
> modify cpu_register_physical_memory() to ensure that a ram mapping
> is never changed after initial registration.

What is the difference between your proposal and
cpu_physical_memory_map?

What i'd like to see is binding between cpu_physical_memory_map and qdev 
devices, so that you can use different host memory mappings for device
context and for CPU context (and provide the possibility for, say, map 
a certain memory region as read-only).

> vhost no longer needs to bother keeping the dynamic table up to date
> so it removes all of the slot management code from vhost.  KVM still
> needs the code to handle rom/ram mappings but we can take care of
> that next.  virtio-net's userspace code can do the same thing as
> vhost and only map the ring once which should be a big performance
> improvement.

Pinning the host virtual addresses as you propose reduces flexibility.
See above about different mappings for DMA/CPU contexes.




[Qemu-devel] Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-02 Thread Jan Kiszka
Marcelo Tosatti wrote:
> On Tue, Mar 02, 2010 at 09:00:04AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
 This grand cleanup drops all reset and vmsave/load related
 synchronization points in favor of four(!) generic hooks:

 - cpu_synchronize_all_states in qemu_savevm_state_complete
   (initial sync from kernel before vmsave)
 - cpu_synchronize_all_post_init in qemu_loadvm_state
   (writeback after vmload)
 - cpu_synchronize_all_post_init in main after machine init
 - cpu_synchronize_all_post_reset in qemu_system_reset
   (writeback after system reset)

 These writeback points + the existing one of VCPU exec after
 cpu_synchronize_state map on three levels of writeback:

 - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
 - KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
 - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)

 This level is passed to the arch-specific VCPU state writing function
 that will decide which concrete substates need to be written. That way,
 no writer of load, save or reset functions that interact with in-kernel
 KVM states will ever have to worry about synchronization again. That
 also means that a lot of reasons for races, segfaults and deadlocks are
 eliminated.

 cpu_synchronize_state remains untouched, just as Anthony suggested. We
 continue to need it before reading or writing of VCPU states that are
 also tracked by in-kernel KVM subsystems.

 Consequently, this patch removes many cpu_synchronize_state calls that
 are now redundant, just like remaining explicit register syncs.

 Signed-off-by: Jan Kiszka 
>>> Jan,
>>>
>>> This patch breaks system reset of WinXP.32 install (more easily
>>> reproducible without iothread enabled).
>>>
>>> Screenshot attached.
>>>
>> Strange - no issues with qemu-kvm? Any special command line switch? /me
>> goes scrounging for some installation XP32 CD in the meantime...
> 
> No issues with qemu-kvm. Could not spot anything obvious.
> 

And, of course, my WinXP installation did not trigger any reset issue,
even in non-iothreaded mode. :(

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Marcelo Tosatti
On Tue, Mar 02, 2010 at 10:12:05AM -0600, Anthony Liguori wrote:
> On 03/02/2010 09:53 AM, Paul Brook wrote:
> The key difference is that these regions are created and destroyed
> rarely and in such a way that the destruction is visible to the guest.
> >>>So you're making ram unmap an asynchronous process, and requiring that
> >>>the address space not be reused until that umap has completed?
> >>It technically already would be.  If you've got a pending DMA
> >>transaction and you try to hot unplug badness will happen.  This is
> >>something that is certainly exploitable.
> >Hmm, I guess we probably want to make this work with all mappings then. DMA 
> >to
> >a ram backed PCI BAR (e.g. video ram) is certainly feasible.
> >Technically it's not the unmap that causes badness, it's freeing the
> >underlying ram.
> 
> Let's avoid confusing terminology.  We have RAM mappings and then we
> have PCI BARs that are mapped as IO_MEM_RAM.
> 
> PCI BARs mapped as IO_MEM_RAM are allocated by the device and live
> for the duration of the device.  If you did something that changed
> the BAR's mapping from IO_MEM_RAM to an actual IO memory type, then
> you'd continue to DMA to the allocated device memory instead of
> doing MMIO operations.[1]
> 
> That's completely accurate and safe.  If you did this to bare metal,
> I expect you'd get very similar results.
> 
> This is different from DMA'ing to a RAM region and then removing the
> RAM region while the IO is in flight.  In this case, the mapping
> disappears and you potentially have the guest writing to an invalid
> host pointer.
> 
> [1] I don't think it's useful to support DMA'ing to arbitrary
> IO_MEM_RAM areas.  Instead, I think we should always bounce to this
> memory.  The benefit is that we avoid the complications resulting
> from PCI hot unplug and reference counting.

Agree. Thus the suggestion to tie cpu_physical_memory_map to qdev
infrastructure.





Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 03/02/2010 10:12 AM, Marcelo Tosatti wrote:

On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote:
   

On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
 

Both have  security implications so I think it's important that they
be addressed.   Otherwise, I'm pretty happy with how things are.
 

Care suggesting some solutions?
   

The obvious thing to do would be to use the memory notifier in vhost
to keep track of whenever something remaps the ring's memory region
and if that happens, issue an ioctl to vhost to change the location
of the ring.  Also, you would need to merge the vhost slot
management code with the KVM slot management code.
 

There are no security implications as long as vhost uses the qemu
process mappings.
   


There potentially are within a guest.  If a guest can trigger a qemu bug 
that results in qemu writing to a different location than what the guest 
told it to write, a malicious software may use this to escalate it's 
privileges within a guest.



cpu_ram_add() never gets called with overlapping regions.  We'll
modify cpu_register_physical_memory() to ensure that a ram mapping
is never changed after initial registration.
 

What is the difference between your proposal and
cpu_physical_memory_map?
   


cpu_physical_memory_map() has the following semantics:

- it always returns a transient mapping
- it may (transparently) bounce
- it may fail to bounce, caller must deal

The new function I'm proposing has the following semantics:

- it always returns a persistent mapping
- it never bounces
- it will only fail if the mapping isn't ram

A caller can use the new function to implement an optimization to force 
the device to only work with real ram.  IOW, this is something we can 
use in virtio, but very little else.  cpu_physical_memory_map can be 
used in more circumstances.



What i'd like to see is binding between cpu_physical_memory_map and qdev
devices, so that you can use different host memory mappings for device
context and for CPU context (and provide the possibility for, say, map
a certain memory region as read-only).
   


We really want per-bus mappings.  At the lowest level, we'll have 
sysbus_memory_map() but we'll also have pci_memory_map(), 
virtio_memory_map(), etc.


Nothing should ever call cpu_physical_memory_map() directly.

Regards,

Anthony Liguori




[Qemu-devel] [PATCH RFC] vhost: ring: verify ring is not being moved

2010-03-02 Thread Michael S. Tsirkin
abort if it is

Signed-off-by: Michael S. Tsirkin 
---

So the following is a simple solution for unstable
ring mappings security issue: simply detect this and stop.

Will repost series with this later after some testing,
but this is an RFC to get early feedback if any.

 hw/vhost.c |   52 +---
 hw/vhost.h |3 +++
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 3b3a109..b9e115e 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -256,6 +256,33 @@ static inline void vhost_dev_log_resize(struct vhost_dev* 
dev, uint64_t size)
 dev->log_size = size;
 }
 
+static int vhost_verify_ring_mappings(struct vhost_dev *dev,
+  uint64_t start_addr,
+  uint64_t size)
+{
+int i;
+for (i = 0; i < dev->nvqs; ++i) {
+struct vhost_virtqueue *vq = dev->vqs + i;
+target_phys_addr_t l;
+void *p;
+
+if (!ranges_overlap(start_addr, size, vq->ring_phys, vq->ring_size))
+continue;
+l = vq->ring_size;
+p = cpu_physical_memory_map(vq->ring_phys, &l, 1);
+if (!p || l != vq->ring_size) {
+fprintf(stderr, "Unable to map ring buffer for ring %d\n", i);
+return -ENOMEM;
+}
+if (p != vq->ring) {
+fprintf(stderr, "Ring buffer relocated for ring %d\n", i);
+return -EBUSY;
+}
+cpu_physical_memory_unmap(p, l, 0, 0);
+}
+return 0;
+}
+
 static void vhost_client_set_memory(CPUPhysMemoryClient *client,
 target_phys_addr_t start_addr,
 ram_addr_t size,
@@ -284,6 +311,12 @@ static void vhost_client_set_memory(CPUPhysMemoryClient 
*client,
 if (!dev->started) {
 return;
 }
+
+if (dev->started) {
+r = vhost_verify_ring_mappings(dev, start_addr, size);
+assert(r >= 0);
+}
+
 if (!dev->log_enabled) {
 r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
 assert(r >= 0);
@@ -442,6 +475,14 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 goto fail_alloc_used;
 }
 
+vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
+vq->ring_phys = a = virtio_queue_get_ring(vdev, idx);
+vq->ring = cpu_physical_memory_map(a, &l, 1);
+if (!vq->ring || l != s) {
+r = -ENOMEM;
+goto fail_alloc_ring;
+}
+
 r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
 if (r < 0) {
 r = -errno;
@@ -485,6 +526,9 @@ fail_host_notifier:
 vdev->binding->guest_notifier(vdev->binding_opaque, idx, false);
 fail_guest_notifier:
 fail_alloc:
+cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+  0, 0);
+fail_alloc_ring:
 cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
   0, 0);
 fail_alloc_used:
@@ -526,12 +570,14 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
 }
 virtio_queue_set_last_avail_idx(vdev, idx, state.num);
 assert (r >= 0);
+cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+  0, virtio_queue_get_ring_size(vdev, idx));
 cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
   0, 0);
 cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, 
idx),
   0, 0);
 cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
   0, 0);
 }
 
 int vhost_dev_init(struct vhost_dev *hdev, int devfd)
diff --git a/hw/vhost.h b/hw/vhost.h
index 48b52c7..86dd834 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -14,6 +14,9 @@ struct vhost_virtqueue {
 int num;
 unsigned long long used_phys;
 unsigned used_size;
+void *ring;
+unsigned long long ring_phys;
+unsigned ring_size;
 };
 
 typedef unsigned long vhost_log_chunk_t;
-- 
1.7.0.18.g0d53a5




[Qemu-devel] Re: [PATCH RFC] vhost: ring: verify ring is not being moved

2010-03-02 Thread Anthony Liguori

On 03/02/2010 10:54 AM, Michael S. Tsirkin wrote:

abort if it is

Signed-off-by: Michael S. Tsirkin
---

So the following is a simple solution for unstable
ring mappings security issue: simply detect this and stop.

Will repost series with this later after some testing,
but this is an RFC to get early feedback if any.
   


It's certainly a reasonable compromise.

Regards,

Anthony Liguori


  hw/vhost.c |   52 +---
  hw/vhost.h |3 +++
  2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 3b3a109..b9e115e 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -256,6 +256,33 @@ static inline void vhost_dev_log_resize(struct vhost_dev* 
dev, uint64_t size)
  dev->log_size = size;
  }

+static int vhost_verify_ring_mappings(struct vhost_dev *dev,
+  uint64_t start_addr,
+  uint64_t size)
+{
+int i;
+for (i = 0; i<  dev->nvqs; ++i) {
+struct vhost_virtqueue *vq = dev->vqs + i;
+target_phys_addr_t l;
+void *p;
+
+if (!ranges_overlap(start_addr, size, vq->ring_phys, vq->ring_size))
+continue;
+l = vq->ring_size;
+p = cpu_physical_memory_map(vq->ring_phys,&l, 1);
+if (!p || l != vq->ring_size) {
+fprintf(stderr, "Unable to map ring buffer for ring %d\n", i);
+return -ENOMEM;
+}
+if (p != vq->ring) {
+fprintf(stderr, "Ring buffer relocated for ring %d\n", i);
+return -EBUSY;
+}
+cpu_physical_memory_unmap(p, l, 0, 0);
+}
+return 0;
+}
+
  static void vhost_client_set_memory(CPUPhysMemoryClient *client,
  target_phys_addr_t start_addr,
  ram_addr_t size,
@@ -284,6 +311,12 @@ static void vhost_client_set_memory(CPUPhysMemoryClient 
*client,
  if (!dev->started) {
  return;
  }
+
+if (dev->started) {
+r = vhost_verify_ring_mappings(dev, start_addr, size);
+assert(r>= 0);
+}
+
  if (!dev->log_enabled) {
  r = ioctl(dev->control, VHOST_SET_MEM_TABLE, dev->mem);
  assert(r>= 0);
@@ -442,6 +475,14 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
  goto fail_alloc_used;
  }

+vq->ring_size = s = l = virtio_queue_get_ring_size(vdev, idx);
+vq->ring_phys = a = virtio_queue_get_ring(vdev, idx);
+vq->ring = cpu_physical_memory_map(a,&l, 1);
+if (!vq->ring || l != s) {
+r = -ENOMEM;
+goto fail_alloc_ring;
+}
+
  r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
  if (r<  0) {
  r = -errno;
@@ -485,6 +526,9 @@ fail_host_notifier:
  vdev->binding->guest_notifier(vdev->binding_opaque, idx, false);
  fail_guest_notifier:
  fail_alloc:
+cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+  0, 0);
+fail_alloc_ring:
  cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
0, 0);
  fail_alloc_used:
@@ -526,12 +570,14 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
  }
  virtio_queue_set_last_avail_idx(vdev, idx, state.num);
  assert (r>= 0);
+cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
+  0, virtio_queue_get_ring_size(vdev, idx));
  cpu_physical_memory_unmap(vq->used, virtio_queue_get_used_size(vdev, idx),
0, 0);
  cpu_physical_memory_unmap(vq->avail, virtio_queue_get_avail_size(vdev, 
idx),
0, 0);
  cpu_physical_memory_unmap(vq->desc, virtio_queue_get_desc_size(vdev, idx),
0, 0);
  }

  int vhost_dev_init(struct vhost_dev *hdev, int devfd)
diff --git a/hw/vhost.h b/hw/vhost.h
index 48b52c7..86dd834 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -14,6 +14,9 @@ struct vhost_virtqueue {
  int num;
  unsigned long long used_phys;
  unsigned used_size;
+void *ring;
+unsigned long long ring_phys;
+unsigned ring_size;
  };

  typedef unsigned long vhost_log_chunk_t;
   






Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 10:56:48AM -0600, Anthony Liguori wrote:
> On 03/02/2010 10:12 AM, Marcelo Tosatti wrote:
>> On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote:
>>
>>> On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
>>>  
> Both have  security implications so I think it's important that they
> be addressed.   Otherwise, I'm pretty happy with how things are.
>  
 Care suggesting some solutions?

>>> The obvious thing to do would be to use the memory notifier in vhost
>>> to keep track of whenever something remaps the ring's memory region
>>> and if that happens, issue an ioctl to vhost to change the location
>>> of the ring.  Also, you would need to merge the vhost slot
>>> management code with the KVM slot management code.
>>>  
>> There are no security implications as long as vhost uses the qemu
>> process mappings.
>>
>
> There potentially are within a guest.  If a guest can trigger a qemu bug  
> that results in qemu writing to a different location than what the guest  
> told it to write, a malicious software may use this to escalate it's  
> privileges within a guest.

If malicious software has access to hardware that does DMA,
game is likely over :)

>>> cpu_ram_add() never gets called with overlapping regions.  We'll
>>> modify cpu_register_physical_memory() to ensure that a ram mapping
>>> is never changed after initial registration.
>>>  
>> What is the difference between your proposal and
>> cpu_physical_memory_map?
>>
>
> cpu_physical_memory_map() has the following semantics:
>
> - it always returns a transient mapping
> - it may (transparently) bounce
> - it may fail to bounce, caller must deal
>
> The new function I'm proposing has the following semantics:
>
> - it always returns a persistent mapping
> - it never bounces
> - it will only fail if the mapping isn't ram
>
> A caller can use the new function to implement an optimization to force  
> the device to only work with real ram.  IOW, this is something we can  
> use in virtio, but very little else.  cpu_physical_memory_map can be  
> used in more circumstances.
>
>> What i'd like to see is binding between cpu_physical_memory_map and qdev
>> devices, so that you can use different host memory mappings for device
>> context and for CPU context (and provide the possibility for, say, map
>> a certain memory region as read-only).
>>
>
> We really want per-bus mappings.  At the lowest level, we'll have  
> sysbus_memory_map() but we'll also have pci_memory_map(),  
> virtio_memory_map(), etc.
>
> Nothing should ever call cpu_physical_memory_map() directly.
>
> Regards,
>
> Anthony Liguori




[Qemu-devel] Re: [PATCHv2 02/12] kvm: add API to set ioeventfd

2010-03-02 Thread Michael S. Tsirkin
On Thu, Feb 25, 2010 at 01:19:30PM -0600, Anthony Liguori wrote:
> On 02/25/2010 12:28 PM, Michael S. Tsirkin wrote:
>> Comment on kvm usage: rather than require users to do if (kvm_enabled())
>> and/or ifdefs, this patch adds an API that, internally, is defined to
>> stub function on non-kvm build, and checks kvm_enabled for non-kvm
>> run.
>>
>> While rest of qemu code still uses if (kvm_enabled()), I think this
>> approach is cleaner, and we should convert rest of code to it
>> long term.
>>
>
> I'm not opposed to that.
>
>> Signed-off-by: Michael S. Tsirkin
>> ---
>>
>> Avi, Marcelo, pls review/ack.
>>
>>   kvm-all.c |   22 ++
>>   kvm.h |   16 
>>   2 files changed, 38 insertions(+), 0 deletions(-)
>>
>> diff --git a/kvm-all.c b/kvm-all.c
>> index 1a02076..9742791 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1138,3 +1138,25 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t 
>> *sigset)
>>
>>   return r;
>>   }
>> +
>> +#ifdef KVM_IOEVENTFD
>> +int kvm_set_ioeventfd(uint16_t addr, uint16_t data, int fd, bool assigned)
>>
>
> I think this API could use some love.  You're using a very limited set  
> of things that ioeventfd can do and you're multiplexing creation and  
> destruction in a single call.
>
> I think:
>
> kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t data);
> kvm_unset_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t data);
>
> Would be better.  Alternatively, an API that matched ioeventfd exactly:
>
> kvm_set_ioeventfd(int fd, uint64_t addr, int size, uint64_t data, int  
> flags);
> kvm_unset_ioeventfd(...);
>
> Could work too.
>
> Regards,
>
> Anthony Liguori
>

So I renamed to kvm_set_ioeventfd_pio_word, but I still left assign
boolean in place because both implementation and usage take up
less code this way.

It's just an internal function, so no biggie to change it later ...

-- 
MST




Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Marcelo Tosatti
On Tue, Mar 02, 2010 at 10:56:48AM -0600, Anthony Liguori wrote:
> On 03/02/2010 10:12 AM, Marcelo Tosatti wrote:
> >On Sun, Feb 28, 2010 at 02:57:56PM -0600, Anthony Liguori wrote:
> >>On 02/28/2010 11:19 AM, Michael S. Tsirkin wrote:
> Both have  security implications so I think it's important that they
> be addressed.   Otherwise, I'm pretty happy with how things are.
> >>>Care suggesting some solutions?
> >>The obvious thing to do would be to use the memory notifier in vhost
> >>to keep track of whenever something remaps the ring's memory region
> >>and if that happens, issue an ioctl to vhost to change the location
> >>of the ring.  Also, you would need to merge the vhost slot
> >>management code with the KVM slot management code.
> >There are no security implications as long as vhost uses the qemu
> >process mappings.
> 
> There potentially are within a guest.  If a guest can trigger a qemu
> bug that results in qemu writing to a different location than what
> the guest told it to write, a malicious software may use this to
> escalate it's privileges within a guest.
> 
> >>cpu_ram_add() never gets called with overlapping regions.  We'll
> >>modify cpu_register_physical_memory() to ensure that a ram mapping
> >>is never changed after initial registration.
> >What is the difference between your proposal and
> >cpu_physical_memory_map?
> 
> cpu_physical_memory_map() has the following semantics:


> 
> - it always returns a transient mapping
> - it may (transparently) bounce
> - it may fail to bounce, caller must deal
> 
> The new function I'm proposing has the following semantics:

What exactly are the purposes of the new function?

> - it always returns a persistent mapping

For hotplug? What exactly you mean persistent?

> - it never bounces
> - it will only fail if the mapping isn't ram
> 
> A caller can use the new function to implement an optimization to
> force the device to only work with real ram.

To bypass the address translation in exec.c? 

>   IOW, this is something we can use in virtio, but very little else.
> cpu_physical_memory_map can be used in more circumstances.

Does not make much sense to me. The qdev <-> memory map mapping seems
more important. Following your security enhancement drive, you can for
example check whether the region can actually be mapped by the device
and deny otherwise, or do whatever host-side memory protection tricks 
you'd like.

And "cpu_ram_map" seems like the memory is accessed through cpu context, 
while it is really always device context.

> >What i'd like to see is binding between cpu_physical_memory_map and qdev
> >devices, so that you can use different host memory mappings for device
> >context and for CPU context (and provide the possibility for, say, map
> >a certain memory region as read-only).
> 
> We really want per-bus mappings.  At the lowest level, we'll have
> sysbus_memory_map() but we'll also have pci_memory_map(),
> virtio_memory_map(), etc.
>
> Nothing should ever call cpu_physical_memory_map() directly.

Yep.

> 
> Regards,
> 
> Anthony Liguori
> 




Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Anthony Liguori

On 03/02/2010 12:00 PM, Marcelo Tosatti wrote:

- it always returns a transient mapping
- it may (transparently) bounce
- it may fail to bounce, caller must deal

The new function I'm proposing has the following semantics:
 

What exactly are the purposes of the new function?
   


We need an API that can be used to obtain a persistent mapping of a 
given guest physical address.  We don't have that today.  We can 
potentially change cpu_physical_memory_map() to also accommodate this 
use case but that's a second step.



- it always returns a persistent mapping
 

For hotplug? What exactly you mean persistent?
   


Hotplug cannot happen as long as a persistent mapping exists for an 
address within that region.  This is okay.  You cannot have an active 
device driver DMA'ing to a DIMM and then hot unplug it.  The guest is 
responsible for making sure this doesn't happen.



- it never bounces
- it will only fail if the mapping isn't ram

A caller can use the new function to implement an optimization to
force the device to only work with real ram.
 

To bypass the address translation in exec.c?
   


No, the ultimate goal is to convert virtio ring accesses from:

static inline uint32_t vring_desc_len(target_phys_addr_t desc_pa, int i)
{
target_phys_addr_t pa;
pa = desc_pa + sizeof(VRingDesc) * i + offsetof(VRingDesc, len);
return ldl_phys(pa);
}

len = vring_desc_len(vring.desc_pa, i)

To:

len = ldl_w(vring->desc[i].len);

When host == arch, ldl_w is a nop.  Otherwise, it's just a byte swap.  
ldl_phys() today turns into cpu_physical_memory_read() which is very slow.


To support this, we must enforce that when a guest passes us a physical 
address, we can safely obtain a persistent mapping to it.  This is true 
for any ram address.  It's not true for MMIO memory.  We have no way to 
do this with cpu_physical_memory_map().



   IOW, this is something we can use in virtio, but very little else.
cpu_physical_memory_map can be used in more circumstances.
 

Does not make much sense to me. The qdev<->  memory map mapping seems
more important. Following your security enhancement drive, you can for
example check whether the region can actually be mapped by the device
and deny otherwise, or do whatever host-side memory protection tricks
you'd like.
   


It's two independent things.  Part of what makes virtio so complicated 
to convert to proper bus accessors is it's use of 
ldl_phys/stl_phys/etc.  No other device use those functions.  If we 
reduce virtio to just use a map() function, it simplifies the bus 
accessor conversion.



And "cpu_ram_map" seems like the memory is accessed through cpu context,
while it is really always device context.
   


Yes, but that's a separate effort.  In fact, see 
http://wiki.qemu.org/Features/RamAPI vs. 
http://wiki.qemu.org/Features/PCIMemoryAPI


Regards,

Anthony Liguori




[Qemu-devel] [PATCH 11/20] eepro100: Use symbolic names for bits in EEPROM id

2010-03-02 Thread Stefan Weil
V2 - Use UPPER_CASE for enum values

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index c2ff569..2a871b8 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -272,6 +272,21 @@ typedef enum {
 EEPROM_SMBUS_ADDR = 0x90,
 } EEPROMOffset;
 
+/* Bit values for EEPROM ID word. */
+typedef enum {
+EEPROM_ID_MDM = BIT(0), /* Modem */
+EEPROM_ID_STB = BIT(1), /* Standby Enable */
+EEPROM_ID_WMR = BIT(2), /* ??? */
+EEPROM_ID_WOL = BIT(5), /* Wake on LAN */
+EEPROM_ID_DPD = BIT(6), /* Deep Power Down */
+EEPROM_ID_ALT = BIT(7), /* */
+/* BITS(10, 8) device revision */
+EEPROM_ID_BD = BIT(11), /* boot disable */
+EEPROM_ID_ID = BIT(13), /* id bit */
+/* BITS(15, 14) signature */
+EEPROM_ID_VALID = BIT(14),  /* signature for valid eeprom */
+} eeprom_id_bit;
+
 /* Default values for MDI (PHY) registers */
 static const uint16_t eepro100_mdi_default[] = {
 /* MDI Registers 0 - 6, 7 */
@@ -643,7 +658,7 @@ static void nic_selective_reset(EEPRO100State * s)
 uint16_t *eeprom_contents = eeprom93xx_data(s->eeprom);
 //~ eeprom93xx_reset(s->eeprom);
 memcpy(eeprom_contents, s->conf.macaddr.a, 6);
-eeprom_contents[EEPROM_ID] = 0x4000;
+eeprom_contents[EEPROM_ID] = EEPROM_ID_VALID;
 if (s->device == i82557B || s->device == i82557C)
 eeprom_contents[5] = 0x0100;
 eeprom_contents[EEPROM_PHY_ID] = 1;
-- 
1.7.0





[Qemu-devel] [PATCH 03/20] eepro100: Fix PXE boot

2010-03-02 Thread Stefan Weil
The phy handling was wrong for PXE, GPXE boot:
GPXE's eepro100 driver did not detect a valid link.

This is fixed here.

V2 - Use UPPER_CASE for enum values

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   20 ++--
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 124bc49..f6764cc 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -20,7 +20,7 @@
  * along with this program; if not, see .
  *
  * Tested features (i82559):
- *  PXE boot (i386) no valid link
+ *  PXE boot (i386) ok
  *  Linux networking (i386) ok
  *
  * Untested:
@@ -33,10 +33,6 @@
  * Open Source Software Developer Manual
  */
 
-#if defined(TARGET_I386)
-# warning "PXE boot still not working!"
-#endif
-
 #include  /* offsetof */
 #include 
 #include "hw.h"
@@ -243,6 +239,17 @@ typedef struct {
 bool has_extended_tcb_support;
 } EEPRO100State;
 
+/* Word indices in EEPROM. */
+typedef enum {
+EEPROM_CNFG_MDIX  = 0x03,
+EEPROM_ID = 0x05,
+EEPROM_PHY_ID = 0x06,
+EEPROM_VENDOR_ID  = 0x0c,
+EEPROM_CONFIG_ASF = 0x0d,
+EEPROM_DEVICE_ID  = 0x23,
+EEPROM_SMBUS_ADDR = 0x90,
+} EEPROMOffset;
+
 /* Default values for MDI (PHY) registers */
 static const uint16_t eepro100_mdi_default[] = {
 /* MDI Registers 0 - 6, 7 */
@@ -632,9 +639,10 @@ static void nic_selective_reset(EEPRO100State * s)
 uint16_t *eeprom_contents = eeprom93xx_data(s->eeprom);
 //~ eeprom93xx_reset(s->eeprom);
 memcpy(eeprom_contents, s->conf.macaddr.a, 6);
-eeprom_contents[0xa] = 0x4000;
+eeprom_contents[EEPROM_ID] = 0x4000;
 if (s->device == i82557B || s->device == i82557C)
 eeprom_contents[5] = 0x0100;
+eeprom_contents[EEPROM_PHY_ID] = 1;
 uint16_t sum = 0;
 for (i = 0; i < EEPROM_SIZE - 1; i++) {
 sum += eeprom_contents[i];
-- 
1.7.0





[Qemu-devel] [PATCHv3 00/12] vhost-net: upstream integration

2010-03-02 Thread Michael S. Tsirkin
Here's a patchset with vhost support for upstream qemu,
rabed to latest bits.

Note that irqchip/MSI is no longer required for vhost, but you should
not expect performance gains from vhost unless in-kernel irqchip is
enabled (which is not in upstream qemu now), and unless guest enables
MSI.  A follow-up patchset against qemu-kvm will add irqchip support.

Only virtio-pci is currently supported: I'm interested in supporting
syborg/s390 as well, and tried to make APIs generic to make this
possible.

Also missing is packet socket backend.

Cc'd, you did review of these internally, I would be thankful
for review/ack upstream.

Changes from v2:
  Addressed style comments
  Detect mapping changes and abort
  Unmap ring on cleanup

Changes from v1:
  Addressed style comments
  Migration fixes.
  Gracefully fail with non-tap backends.

Michael S. Tsirkin (12):
  tap: add interface to get device fd
  kvm: add API to set ioeventfd
  notifier: event notifier implementation
  virtio: add notifier support
  virtio: add APIs for queue fields
  virtio: add set_status callback
  virtio: move typedef to qemu-common
  virtio-pci: fill in notifier support
  vhost: vhost net support
  tap: add vhost/vhostfd options
  tap: add API to retrieve vhost net header
  virtio-net: vhost net support

 Makefile.target  |3 +
 configure|   36 +++
 hw/notifier.c|   62 +
 hw/notifier.h|   16 ++
 hw/s390-virtio-bus.c |4 +-
 hw/syborg_virtio.c   |2 +-
 hw/vhost.c   |  706 ++
 hw/vhost.h   |   48 
 hw/vhost_net.c   |  198 ++
 hw/vhost_net.h   |   19 ++
 hw/virtio-net.c  |   71 +-
 hw/virtio-pci.c  |   67 +-
 hw/virtio.c  |   81 ++-
 hw/virtio.h  |   28 ++-
 kvm-all.c|   22 ++
 kvm.h|   16 ++
 net.c|8 +
 net/tap.c|   43 +++
 net/tap.h|5 +
 qemu-common.h|2 +
 qemu-options.hx  |4 +-
 21 files changed, 1431 insertions(+), 10 deletions(-)
 create mode 100644 hw/notifier.c
 create mode 100644 hw/notifier.h
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h




[Qemu-devel] [PATCHv3 01/12] tap: add interface to get device fd

2010-03-02 Thread Michael S. Tsirkin
Will be used by vhost to attach/detach to backend.

Signed-off-by: Michael S. Tsirkin 
---
 net/tap.c |7 +++
 net/tap.h |2 ++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 7a7320c..fc59fd4 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -269,6 +269,13 @@ static void tap_poll(VLANClientState *nc, bool enable)
 tap_write_poll(s, enable);
 }
 
+int tap_get_fd(VLANClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc->info->type == NET_CLIENT_TYPE_TAP);
+return s->fd;
+}
+
 /* fd support */
 
 static NetClientInfo net_tap_info = {
diff --git a/net/tap.h b/net/tap.h
index 538a562..a244b28 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -48,4 +48,6 @@ int tap_probe_vnet_hdr(int fd);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int 
ufo);
 
+int tap_get_fd(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 02/12] kvm: add API to set ioeventfd

2010-03-02 Thread Michael S. Tsirkin
Comment on kvm usage: rather than require users to do if (kvm_enabled())
and/or ifdefs, this patch adds an API that, internally, is defined to
stub function on non-kvm build, and checks kvm_enabled for non-kvm
run.

While rest of qemu code still uses if (kvm_enabled()), I think this
approach is cleaner, and we should convert rest of code to it
long term.

Signed-off-by: Michael S. Tsirkin 
---
 kvm-all.c |   22 ++
 kvm.h |   16 
 2 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 1a02076..f54af71 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1138,3 +1138,25 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t 
*sigset)
 
 return r;
 }
+
+#ifdef KVM_IOEVENTFD
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool 
assign)
+{
+struct kvm_ioeventfd kick = {
+.datamatch = val,
+.addr = addr,
+.len = 2,
+.flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO,
+.fd = fd,
+};
+int r;
+if (!kvm_enabled())
+return -ENOSYS;
+if (!assign)
+kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
+if (r < 0)
+return r;
+return 0;
+}
+#endif
diff --git a/kvm.h b/kvm.h
index a74dfcb..45d087b 100644
--- a/kvm.h
+++ b/kvm.h
@@ -14,10 +14,16 @@
 #ifndef QEMU_KVM_H
 #define QEMU_KVM_H
 
+#include 
+#include 
 #include "config.h"
 #include "qemu-queue.h"
 
 #ifdef CONFIG_KVM
+#include 
+#endif
+
+#ifdef CONFIG_KVM
 extern int kvm_allowed;
 
 #define kvm_enabled() (kvm_allowed)
@@ -135,4 +141,14 @@ static inline void cpu_synchronize_state(CPUState *env)
 }
 }
 
+#if defined(KVM_IOEVENTFD) && defined(CONFIG_KVM)
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
+#else
+static inline
+int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign)
+{
+return -ENOSYS;
+}
+#endif
+
 #endif
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 03/12] notifier: event notifier implementation

2010-03-02 Thread Michael S. Tsirkin
event notifiers are slightly generalized eventfd descriptors. Current
implementation depends on eventfd because vhost is the only user, and
vhost depends on eventfd anyway, but a stub is provided for non-eventfd
case.

We'll be able to further generalize this when another user comes along
and we see how to best do this.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile.target |1 +
 hw/notifier.c   |   62 +++
 hw/notifier.h   |   16 ++
 qemu-common.h   |1 +
 4 files changed, 80 insertions(+), 0 deletions(-)
 create mode 100644 hw/notifier.c
 create mode 100644 hw/notifier.h

diff --git a/Makefile.target b/Makefile.target
index 320f807..1f5aa9d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -172,6 +172,7 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o 
machine.o gdbstub.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-pci.o 
virtio-serial-bus.o
+obj-y += notifier.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/hw/notifier.c b/hw/notifier.c
new file mode 100644
index 000..c6db3c3
--- /dev/null
+++ b/hw/notifier.c
@@ -0,0 +1,62 @@
+/*
+ * event notifier support
+ *
+ * Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Michael S. Tsirkin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw.h"
+#include "notifier.h"
+#ifdef CONFIG_EVENTFD
+#include 
+#endif
+
+int event_notifier_init(EventNotifier *e, int active)
+{
+#ifdef CONFIG_EVENTFD
+   int fd = eventfd(!!active, EFD_NONBLOCK | EFD_CLOEXEC);
+   if (fd < 0)
+   return -errno;
+   e->fd = fd;
+   return 0;
+#else
+   return -ENOSYS;
+#endif
+}
+
+void event_notifier_cleanup(EventNotifier *e)
+{
+   close(e->fd);
+}
+
+int event_notifier_get_fd(EventNotifier *e)
+{
+   return e->fd;
+}
+
+int event_notifier_test_and_clear(EventNotifier *e)
+{
+   uint64_t value;
+   int r = read(e->fd, &value, sizeof(value));
+   return r == sizeof(value);
+}
+
+int event_notifier_test(EventNotifier *e)
+{
+   uint64_t value;
+   int r = read(e->fd, &value, sizeof(value));
+   if (r == sizeof(value)) {
+   /* restore previous value. */
+   int s = write(e->fd, &value, sizeof(value));
+   /* never blocks because we use EFD_SEMAPHORE.
+* If we didn't we'd get EAGAIN on overflow
+* and we'd have to write code to ignore it. */
+   assert(s == sizeof(value));
+   }
+   return r == sizeof(value);
+}
diff --git a/hw/notifier.h b/hw/notifier.h
new file mode 100644
index 000..24117ea
--- /dev/null
+++ b/hw/notifier.h
@@ -0,0 +1,16 @@
+#ifndef QEMU_EVENT_NOTIFIER_H
+#define QEMU_EVENT_NOTIFIER_H
+
+#include "qemu-common.h"
+
+struct EventNotifier {
+   int fd;
+};
+
+int event_notifier_init(EventNotifier *, int active);
+void event_notifier_cleanup(EventNotifier *);
+int event_notifier_get_fd(EventNotifier *);
+int event_notifier_test_and_clear(EventNotifier *);
+int event_notifier_test(EventNotifier *);
+
+#endif
diff --git a/qemu-common.h b/qemu-common.h
index 805be1a..f12a8f5 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -227,6 +227,7 @@ typedef struct uWireSlave uWireSlave;
 typedef struct I2SCodec I2SCodec;
 typedef struct DeviceState DeviceState;
 typedef struct SSIBus SSIBus;
+typedef struct EventNotifier EventNotifier;
 
 typedef uint64_t pcibus_t;
 
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 04/12] virtio: add notifier support

2010-03-02 Thread Michael S. Tsirkin
Add binding API to set host/guest notifiers.
Will be used by vhost.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio.c |5 -
 hw/virtio.h |3 +++
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 7c020a3..1f5e7be 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -73,6 +73,7 @@ struct VirtQueue
 int inuse;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
+VirtIODevice *vdev;
 };
 
 /* virt queue functions */
@@ -714,8 +715,10 @@ VirtIODevice *virtio_common_init(const char *name, 
uint16_t device_id,
 vdev->queue_sel = 0;
 vdev->config_vector = VIRTIO_NO_VECTOR;
 vdev->vq = qemu_mallocz(sizeof(VirtQueue) * VIRTIO_PCI_QUEUE_MAX);
-for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++)
+for(i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
 vdev->vq[i].vector = VIRTIO_NO_VECTOR;
+vdev->vq[i].vdev = vdev;
+}
 
 vdev->name = name;
 vdev->config_len = config_size;
diff --git a/hw/virtio.h b/hw/virtio.h
index 3baa2a3..af87889 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -19,6 +19,7 @@
 #include "qdev.h"
 #include "sysemu.h"
 #include "block_int.h"
+#include "notifier.h"
 
 /* from Linux's linux/virtio_config.h */
 
@@ -89,6 +90,8 @@ typedef struct {
 int (*load_config)(void * opaque, QEMUFile *f);
 int (*load_queue)(void * opaque, int n, QEMUFile *f);
 unsigned (*get_features)(void * opaque);
+int (*guest_notifier)(void * opaque, int n, bool assigned);
+int (*host_notifier)(void * opaque, int n, bool assigned);
 } VirtIOBindings;
 
 #define VIRTIO_PCI_QUEUE_MAX 64
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 05/12] virtio: add APIs for queue fields

2010-03-02 Thread Michael S. Tsirkin
vhost needs physical addresses for ring and other queue fields,
so add APIs for these.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio.c |   76 +++
 hw/virtio.h |   15 +++-
 2 files changed, 90 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 1f5e7be..e5787fa 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -74,6 +74,8 @@ struct VirtQueue
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 VirtIODevice *vdev;
+EventNotifier guest_notifier;
+EventNotifier host_notifier;
 };
 
 /* virt queue functions */
@@ -593,6 +595,12 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 return &vdev->vq[i];
 }
 
+void virtio_irq(VirtQueue *vq)
+{
+vq->vdev->isr |= 0x01;
+virtio_notify_vector(vq->vdev, vq->vector);
+}
+
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 /* Always notify when queue is empty (when feature acknowledge) */
@@ -736,3 +744,71 @@ void virtio_bind_device(VirtIODevice *vdev, const 
VirtIOBindings *binding,
 vdev->binding = binding;
 vdev->binding_opaque = opaque;
 }
+
+target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].vring.desc;
+}
+
+target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].vring.avail;
+}
+
+target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].vring.used;
+}
+
+target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].vring.desc;
+}
+
+target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n)
+{
+return sizeof(VRingDesc) * vdev->vq[n].vring.num;
+}
+
+target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n)
+{
+return offsetof(VRingAvail, ring) +
+sizeof(u_int64_t) * vdev->vq[n].vring.num;
+}
+
+target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n)
+{
+return offsetof(VRingUsed, ring) +
+sizeof(VRingUsedElem) * vdev->vq[n].vring.num;
+}
+
+
+target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].vring.used - vdev->vq[n].vring.desc +
+   virtio_queue_get_used_size(vdev, n);
+}
+
+uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n)
+{
+return vdev->vq[n].last_avail_idx;
+}
+
+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx)
+{
+vdev->vq[n].last_avail_idx = idx;
+}
+
+VirtQueue *virtio_queue(VirtIODevice *vdev, int n)
+{
+return vdev->vq + n;
+}
+
+EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq)
+{
+return &vq->guest_notifier;
+}
+EventNotifier *virtio_queue_host_notifier(VirtQueue *vq)
+{
+return &vq->host_notifier;
+}
diff --git a/hw/virtio.h b/hw/virtio.h
index af87889..26cf5fd 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -184,5 +184,18 @@ void virtio_net_exit(VirtIODevice *vdev);
DEFINE_PROP_BIT("indirect_desc", _state, _field, \
VIRTIO_RING_F_INDIRECT_DESC, true)
 
-
+target_phys_addr_t virtio_queue_get_desc(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_avail(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_used(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_ring(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_desc_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_avail_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_used_size(VirtIODevice *vdev, int n);
+target_phys_addr_t virtio_queue_get_ring_size(VirtIODevice *vdev, int n);
+uint16_t virtio_queue_last_avail_idx(VirtIODevice *vdev, int n);
+void virtio_queue_set_last_avail_idx(VirtIODevice *vdev, int n, uint16_t idx);
+VirtQueue *virtio_queue(VirtIODevice *vdev, int n);
+EventNotifier *virtio_queue_guest_notifier(VirtQueue *vq);
+EventNotifier *virtio_queue_host_notifier(VirtQueue *vq);
+void virtio_irq(VirtQueue *vq);
 #endif
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 06/12] virtio: add set_status callback

2010-03-02 Thread Michael S. Tsirkin
vhost net backend needs to be notified when
frontend status changes. Add a callback,
similar to set_features.

Signed-off-by: Michael S. Tsirkin 
---
 hw/s390-virtio-bus.c |4 +++-
 hw/syborg_virtio.c   |2 +-
 hw/virtio-pci.c  |5 +++--
 hw/virtio.h  |9 +
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/s390-virtio-bus.c b/hw/s390-virtio-bus.c
index fa0a74f..8e05cfc 100644
--- a/hw/s390-virtio-bus.c
+++ b/hw/s390-virtio-bus.c
@@ -241,8 +241,10 @@ void s390_virtio_device_update_status(VirtIOS390Device 
*dev)
 {
 VirtIODevice *vdev = dev->vdev;
 uint32_t features;
+uint8_t status;
 
-vdev->status = ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS);
+status = ldub_phys(dev->dev_offs + VIRTIO_DEV_OFFS_STATUS);
+virtio_set_status(vdev, status);
 
 /* Update guest supported feature bitmap */
 
diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index 65239a0..71f271c 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -149,7 +149,7 @@ static void syborg_virtio_writel(void *opaque, 
target_phys_addr_t offset,
 virtio_queue_notify(vdev, value);
 break;
 case SYBORG_VIRTIO_STATUS:
-vdev->status = value & 0xFF;
+virtio_set_status(vdev, status);
 if (vdev->status == 0)
 virtio_reset(vdev);
 break;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index bcd40f7..fd9759a 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -206,7 +206,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 virtio_queue_notify(vdev, val);
 break;
 case VIRTIO_PCI_STATUS:
-vdev->status = val & 0xFF;
+virtio_set_status(vdev, val & 0xFF);
 if (vdev->status == 0) {
 virtio_reset(proxy->vdev);
 msix_unuse_all_vectors(&proxy->pci_dev);
@@ -377,7 +377,8 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 
 if (PCI_COMMAND == address) {
 if (!(val & PCI_COMMAND_MASTER)) {
-proxy->vdev->status &= ~VIRTIO_CONFIG_S_DRIVER_OK;
+virtio_set_status(proxy->vdev,
+  proxy->vdev->status & 
~VIRTIO_CONFIG_S_DRIVER_OK);
 }
 }
 
diff --git a/hw/virtio.h b/hw/virtio.h
index 26cf5fd..58b06bf 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -115,12 +115,21 @@ struct VirtIODevice
 void (*get_config)(VirtIODevice *vdev, uint8_t *config);
 void (*set_config)(VirtIODevice *vdev, const uint8_t *config);
 void (*reset)(VirtIODevice *vdev);
+void (*set_status)(VirtIODevice *vdev, uint8_t val);
 VirtQueue *vq;
 const VirtIOBindings *binding;
 void *binding_opaque;
 uint16_t device_id;
 };
 
+static inline void virtio_set_status(VirtIODevice *vdev, uint8_t val)
+{
+if (vdev->set_status) {
+vdev->set_status(vdev, val);
+}
+vdev->status = val;
+}
+
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 void (*handle_output)(VirtIODevice *,
   VirtQueue *));
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 07/12] virtio: move typedef to qemu-common

2010-03-02 Thread Michael S. Tsirkin
make it possible to use type without header include

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio.h   |1 -
 qemu-common.h |1 +
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.h b/hw/virtio.h
index 58b06bf..6f2fab0 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -69,7 +69,6 @@ static inline target_phys_addr_t 
vring_align(target_phys_addr_t addr,
 }
 
 typedef struct VirtQueue VirtQueue;
-typedef struct VirtIODevice VirtIODevice;
 
 #define VIRTQUEUE_MAX_SIZE 1024
 
diff --git a/qemu-common.h b/qemu-common.h
index f12a8f5..90ca3b8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -228,6 +228,7 @@ typedef struct I2SCodec I2SCodec;
 typedef struct DeviceState DeviceState;
 typedef struct SSIBus SSIBus;
 typedef struct EventNotifier EventNotifier;
+typedef struct VirtIODevice VirtIODevice;
 
 typedef uint64_t pcibus_t;
 
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 08/12] virtio-pci: fill in notifier support

2010-03-02 Thread Michael S. Tsirkin
Support host/guest notifiers in virtio-pci.
The last one only with kvm, that's okay
because vhost relies on kvm anyway.

Note on kvm usage: kvm ioeventfd API
is implemented on non-kvm systems as well,
this is the reason we don't need if (kvm_enabled())
around it.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-pci.c |   62 +++
 1 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index fd9759a..596f056 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -24,6 +24,7 @@
 #include "net.h"
 #include "block_int.h"
 #include "loader.h"
+#include "kvm.h"
 
 /* from Linux's linux/virtio_pci.h */
 
@@ -392,6 +393,65 @@ static unsigned virtio_pci_get_features(void *opaque)
 return proxy->host_features;
 }
 
+static void virtio_pci_guest_notifier_read(void *opaque)
+{
+VirtQueue *vq = opaque;
+EventNotifier *n = virtio_queue_guest_notifier(vq);
+if (event_notifier_test_and_clear(n)) {
+virtio_irq(vq);
+}
+}
+
+static int virtio_pci_guest_notifier(void *opaque, int n, bool assign)
+{
+VirtIOPCIProxy *proxy = opaque;
+VirtQueue *vq = virtio_queue(proxy->vdev, n);
+EventNotifier *notifier = virtio_queue_guest_notifier(vq);
+
+if (assign) {
+int r = event_notifier_init(notifier, 0);
+   if (r < 0)
+   return r;
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+virtio_pci_guest_notifier_read, NULL, vq);
+} else {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+NULL, NULL, NULL);
+event_notifier_cleanup(notifier);
+}
+
+return 0;
+}
+
+static int virtio_pci_host_notifier(void *opaque, int n, bool assign)
+{
+VirtIOPCIProxy *proxy = opaque;
+VirtQueue *vq = virtio_queue(proxy->vdev, n);
+EventNotifier *notifier = virtio_queue_host_notifier(vq);
+int r;
+if (assign) {
+r = event_notifier_init(notifier, 1);
+if (r < 0) {
+return r;
+}
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r < 0) {
+event_notifier_cleanup(notifier);
+}
+} else {
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r < 0) {
+return r;
+}
+event_notifier_cleanup(notifier);
+}
+return r;
+}
+
 static const VirtIOBindings virtio_pci_bindings = {
 .notify = virtio_pci_notify,
 .save_config = virtio_pci_save_config,
@@ -399,6 +459,8 @@ static const VirtIOBindings virtio_pci_bindings = {
 .save_queue = virtio_pci_save_queue,
 .load_queue = virtio_pci_load_queue,
 .get_features = virtio_pci_get_features,
+.host_notifier = virtio_pci_host_notifier,
+.guest_notifier = virtio_pci_guest_notifier,
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 09/12] vhost: vhost net support

2010-03-02 Thread Michael S. Tsirkin
This adds vhost net device support in qemu. Will be tied to tap device
and virtio by following patches.  Raw backend is currently missing,
will be worked on/submitted separately.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile.target |2 +
 configure   |   36 +++
 hw/vhost.c  |  706 +++
 hw/vhost.h  |   48 
 hw/vhost_net.c  |  198 
 hw/vhost_net.h  |   19 ++
 6 files changed, 1009 insertions(+), 0 deletions(-)
 create mode 100644 hw/vhost.c
 create mode 100644 hw/vhost.h
 create mode 100644 hw/vhost_net.c
 create mode 100644 hw/vhost_net.h

diff --git a/Makefile.target b/Makefile.target
index 1f5aa9d..f833735 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -173,6 +173,8 @@ obj-y = vl.o async.o monitor.o pci.o pci_host.o pcie_host.o 
machine.o gdbstub.o
 # need to fix this properly
 obj-y += virtio-blk.o virtio-balloon.o virtio-net.o virtio-pci.o 
virtio-serial-bus.o
 obj-y += notifier.o
+obj-y += vhost_net.o
+obj-$(CONFIG_VHOST_NET) += vhost.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/configure b/configure
index 8eb5f5b..66f0335 100755
--- a/configure
+++ b/configure
@@ -87,6 +87,7 @@ libs_softmmu=""
 libs_tools=""
 audio_pt_int=""
 audio_win_int=""
+audio_win_int=""
 
 # parse CC options first
 for opt do
@@ -263,6 +264,7 @@ vnc_tls=""
 vnc_sasl=""
 xen=""
 linux_aio=""
+vhost_net=""
 
 gprof="no"
 debug_tcg="no"
@@ -651,6 +653,10 @@ for opt do
   ;;
   --enable-docs) docs="yes"
   ;;
+  --disable-vhost-net) vhost_net="no"
+  ;;
+  --enable-vhost-net) vhost_net="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1498,6 +1504,32 @@ EOF
 fi
 
 ##
+# test for vhost net
+
+if test "$vhost_net" != "no"; then
+if test "$kvm" != "no"; then
+cat > $TMPC <
+int main(void) { return 0; }
+EOF
+if compile_prog "$kvm_cflags" "" ; then
+vhost_net=yes
+else
+if "$vhost_net" == "yes" ; then
+feature_not_found "vhost-net"
+fi
+vhost_net=no
+fi
+else
+if "$vhost_net" == "yes" ; then
+echo -e "NOTE: vhost-net feature requires KVM (--enable-kvm)."
+feature_not_found "vhost-net"
+fi
+vhost_net=no
+fi
+fi
+
+##
 # pthread probe
 PTHREADLIBS_LIST="-lpthread -lpthreadGC2"
 
@@ -1968,6 +2000,7 @@ echo "fdt support   $fdt"
 echo "preadv support$preadv"
 echo "fdatasync $fdatasync"
 echo "uuid support  $uuid"
+echo "vhost-net support $vhost_net"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2492,6 +2525,9 @@ case "$target_arch2" in
   if test "$kvm_para" = "yes"; then
 echo "CONFIG_KVM_PARA=y" >> $config_target_mak
   fi
+  if test $vhost_net = "yes" ; then
+echo "CONFIG_VHOST_NET=y" >> $config_target_mak
+  fi
 fi
 esac
 echo "TARGET_PHYS_ADDR_BITS=$target_phys_bits" >> $config_target_mak
diff --git a/hw/vhost.c b/hw/vhost.c
new file mode 100644
index 000..6e6c026
--- /dev/null
+++ b/hw/vhost.c
@@ -0,0 +1,706 @@
+/*
+ * vhost support
+ *
+ * Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Michael S. Tsirkin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include 
+#include 
+#include 
+#include "vhost.h"
+#include "hw/hw.h"
+/* For range_get_last */
+#include "pci.h"
+
+static void vhost_dev_sync_region(struct vhost_dev *dev,
+  uint64_t mfirst, uint64_t mlast,
+  uint64_t rfirst, uint64_t rlast)
+{
+uint64_t start = MAX(mfirst, rfirst);
+uint64_t end = MIN(mlast, rlast);
+vhost_log_chunk_t *from = dev->log + start / VHOST_LOG_CHUNK;
+vhost_log_chunk_t *to = dev->log + end / VHOST_LOG_CHUNK + 1;
+uint64_t addr = (start / VHOST_LOG_CHUNK) * VHOST_LOG_CHUNK;
+
+assert(end / VHOST_LOG_CHUNK < dev->log_size);
+assert(start / VHOST_LOG_CHUNK < dev->log_size);
+if (end < start) {
+return;
+}
+for (;from < to; ++from) {
+vhost_log_chunk_t log;
+int bit;
+/* We first check with non-atomic: much cheaper,
+ * and we expect non-dirty to be the common case. */
+if (!*from) {
+continue;
+}
+/* Data must be read atomically. We don't really
+ * need the barrier semantics of __sync
+ * builtins, but it's easier to use them than
+ * roll our own. */
+log = __sync_fetch_and_and(from, 0);
+while ((bit = sizeof(log) > sizeof(int) ?
+ffsll(log) : ffs(log))) {
+bit -= 1;
+cpu_physical_memory_

[Qemu-devel] [PATCHv3 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Michael S. Tsirkin
This adds vhost binary option to tap, to enable vhost net accelerator.
Default is off for now, we'll be able to make default on long term
when we know it's stable.

vhostfd option can be used by management, to pass in the fd. Assigning
vhostfd implies vhost=on.

Signed-off-by: Michael S. Tsirkin 
---
 net.c   |8 
 net/tap.c   |   29 +
 qemu-options.hx |4 +++-
 3 files changed, 40 insertions(+), 1 deletions(-)

diff --git a/net.c b/net.c
index a1bf49f..d1e23f1 100644
--- a/net.c
+++ b/net.c
@@ -973,6 +973,14 @@ static const struct {
 .name = "vnet_hdr",
 .type = QEMU_OPT_BOOL,
 .help = "enable the IFF_VNET_HDR flag on the tap interface"
+}, {
+.name = "vhost",
+.type = QEMU_OPT_BOOL,
+.help = "enable vhost-net network accelerator",
+}, {
+.name = "vhostfd",
+.type = QEMU_OPT_STRING,
+.help = "file descriptor of an already opened vhost net 
device",
 },
 #endif /* _WIN32 */
 { /* end of list */ }
diff --git a/net/tap.c b/net/tap.c
index fc59fd4..19c4fa2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -41,6 +41,8 @@
 
 #include "net/tap-linux.h"
 
+#include "hw/vhost_net.h"
+
 /* Maximum GSO packet size (64k) plus plenty of room for
  * the ethernet and virtio_net headers
  */
@@ -57,6 +59,7 @@ typedef struct TAPState {
 unsigned int has_vnet_hdr : 1;
 unsigned int using_vnet_hdr : 1;
 unsigned int has_ufo: 1;
+VHostNetState *vhost_net;
 } TAPState;
 
 static int launch_script(const char *setup_script, const char *ifname, int fd);
@@ -252,6 +255,10 @@ static void tap_cleanup(VLANClientState *nc)
 {
 TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
+if (s->vhost_net) {
+vhost_net_cleanup(s->vhost_net);
+}
+
 qemu_purge_queued_packets(nc);
 
 if (s->down_script[0])
@@ -307,6 +314,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
 s->has_ufo = tap_probe_has_ufo(s->fd);
 tap_set_offload(&s->nc, 0, 0, 0, 0, 0);
 tap_read_poll(s, 1);
+s->vhost_net = NULL;
 return s;
 }
 
@@ -456,5 +464,26 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 }
 }
 
+if (qemu_opt_get_bool(opts, "vhost", !!qemu_opt_get(opts, "vhostfd"))) {
+int vhostfd, r;
+if (qemu_opt_get(opts, "vhostfd")) {
+r = net_handle_fd_param(mon, qemu_opt_get(opts, "vhostfd"));
+if (r == -1) {
+return -1;
+}
+vhostfd = r;
+} else {
+vhostfd = -1;
+}
+s->vhost_net = vhost_net_init(&s->nc, vhostfd);
+if (!s->vhost_net) {
+qemu_error("vhost-net requested but could not be initialized\n");
+return -1;
+}
+} else if (qemu_opt_get(opts, "vhostfd")) {
+qemu_error("vhostfd= is not valid without vhost\n");
+return -1;
+}
+
 return 0;
 }
diff --git a/qemu-options.hx b/qemu-options.hx
index 7daa246..9dc81b4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -879,7 +879,7 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "-net tap[,vlan=n][,name=str],ifname=name\n"
 "connect the host TAP network interface to VLAN 'n'\n"
 #else
-"-net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n"
+"-net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off][,vhost=on|off][,vhostfd=h]\n"
 "connect the host TAP network interface to VLAN 'n' and 
use the\n"
 "network scripts 'file' (default=" DEFAULT_NETWORK_SCRIPT 
")\n"
 "and 'dfile' (default=" DEFAULT_NETWORK_DOWN_SCRIPT ")\n"
@@ -889,6 +889,8 @@ DEF("net", HAS_ARG, QEMU_OPTION_net,
 "default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n"
 "use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n"
 "use vnet_hdr=on to make the lack of IFF_VNET_HDR support 
an error condition\n"
+"use vhost=on to enable experimental in kernel 
accelerator\n"
+"use 'vhostfd=h' to connect to an already opened vhost net 
device\n"
 #endif
 "-net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n"
 "connect the vlan 'n' to another VLAN using a socket 
connection\n"
-- 
1.7.0.18.g0d53a5





[Qemu-devel] [PATCHv3 12/12] virtio-net: vhost net support

2010-03-02 Thread Michael S. Tsirkin
This connects virtio-net to vhost net backend.
The code is structured in a way analogous to what we have with vnet
header capability in tap.

We start/stop backend on driver start/stop as
well as on save and vm start (for migration).

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-net.c |   71 +-
 1 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 5c0093e..9ddd58c 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -17,6 +17,7 @@
 #include "net/tap.h"
 #include "qemu-timer.h"
 #include "virtio-net.h"
+#include "vhost_net.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -47,6 +48,8 @@ typedef struct VirtIONet
 uint8_t nomulti;
 uint8_t nouni;
 uint8_t nobcast;
+uint8_t vhost_started;
+VMChangeStateEntry *vmstate;
 struct {
 int in_use;
 int first_multi;
@@ -114,6 +117,10 @@ static void virtio_net_reset(VirtIODevice *vdev)
 n->nomulti = 0;
 n->nouni = 0;
 n->nobcast = 0;
+if (n->vhost_started) {
+vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+n->vhost_started = 0;
+}
 
 /* Flush any MAC and VLAN filter table state */
 n->mac_table.in_use = 0;
@@ -172,7 +179,14 @@ static uint32_t virtio_net_get_features(VirtIODevice 
*vdev, uint32_t features)
 features &= ~(0x1 << VIRTIO_NET_F_HOST_UFO);
 }
 
-return features;
+if (!n->nic->nc.peer ||
+n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+return features;
+}
+if (!tap_get_vhost_net(n->nic->nc.peer)) {
+return features;
+}
+return vhost_net_get_features(tap_get_vhost_net(n->nic->nc.peer), 
features);
 }
 
 static uint32_t virtio_net_bad_features(VirtIODevice *vdev)
@@ -698,6 +712,12 @@ static void virtio_net_save(QEMUFile *f, void *opaque)
 {
 VirtIONet *n = opaque;
 
+if (n->vhost_started) {
+/* TODO: should we really stop the backend?
+ * If we don't, it might keep writing to memory. */
+vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
+n->vhost_started = 0;
+}
 virtio_save(&n->vdev, f);
 
 qemu_put_buffer(f, n->mac, ETH_ALEN);
@@ -810,7 +830,6 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_mod_timer(n->tx_timer,
qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
 }
-
 return 0;
 }
 
@@ -830,6 +849,47 @@ static NetClientInfo net_virtio_info = {
 .link_status_changed = virtio_net_set_link_status,
 };
 
+static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
+{
+VirtIONet *n = to_virtio_net(vdev);
+if (!n->nic->nc.peer) {
+return;
+}
+if (n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
+return;
+}
+
+if (!tap_get_vhost_net(n->nic->nc.peer)) {
+return;
+}
+if (!!n->vhost_started == !!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+return;
+}
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+int r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), vdev);
+if (r < 0) {
+fprintf(stderr, "unable to start vhost net: %d: "
+"falling back on userspace virtio\n", -r);
+} else {
+n->vhost_started = 1;
+}
+} else {
+vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), vdev);
+n->vhost_started = 0;
+}
+}
+
+static void virtio_net_vmstate_change(void *opaque, int running, int reason)
+{
+VirtIONet *n = opaque;
+if (!running) {
+return;
+}
+/* This is called when vm is started, it will start vhost backend if
+ * appropriate e.g. after migration. */
+virtio_net_set_status(&n->vdev, n->vdev.status);
+}
+
 VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf)
 {
 VirtIONet *n;
@@ -845,6 +905,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
*conf)
 n->vdev.set_features = virtio_net_set_features;
 n->vdev.bad_features = virtio_net_bad_features;
 n->vdev.reset = virtio_net_reset;
+n->vdev.set_status = virtio_net_set_status;
 n->rx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_rx);
 n->tx_vq = virtio_add_queue(&n->vdev, 256, virtio_net_handle_tx);
 n->ctrl_vq = virtio_add_queue(&n->vdev, 64, virtio_net_handle_ctrl);
@@ -867,6 +928,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
*conf)
 
 register_savevm("virtio-net", virtio_net_id++, VIRTIO_NET_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
+n->vmstate = qemu_add_vm_change_state_handler(virtio_net_vmstate_change, 
n);
 
 return &n->vdev;
 }
@@ -874,6 +936,11 @@ VirtIODevice *virtio_net_init(DeviceState *dev, NICConf 
*conf)
 void virtio_net_exit(VirtIODevice *vdev)
 {
 VirtIONet *n = DO_UPCAST(VirtIONet, vdev, vdev);
+qemu_del_vm_change_state_handler(n->vmstate);
+
+if (n->vhost_started) {
+  

[Qemu-devel] [PATCHv3 11/12] tap: add API to retrieve vhost net header

2010-03-02 Thread Michael S. Tsirkin
will be used by virtio-net for vhost net support

Signed-off-by: Michael S. Tsirkin 
---
 net/tap.c |7 +++
 net/tap.h |3 +++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 19c4fa2..35c05d7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -487,3 +487,10 @@ int net_init_tap(QemuOpts *opts, Monitor *mon, const char 
*name, VLANState *vlan
 
 return 0;
 }
+
+VHostNetState *tap_get_vhost_net(VLANClientState *nc)
+{
+TAPState *s = DO_UPCAST(TAPState, nc, nc);
+assert(nc->info->type == NET_CLIENT_TYPE_TAP);
+return s->vhost_net;
+}
diff --git a/net/tap.h b/net/tap.h
index a244b28..b8cec83 100644
--- a/net/tap.h
+++ b/net/tap.h
@@ -50,4 +50,7 @@ void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, 
int ecn, int ufo);
 
 int tap_get_fd(VLANClientState *vc);
 
+struct vhost_net;
+struct vhost_net *tap_get_vhost_net(VLANClientState *vc);
+
 #endif /* QEMU_NET_TAP_H */
-- 
1.7.0.18.g0d53a5





[Qemu-devel] Loading RAM from /dev/shm

2010-03-02 Thread Piotr 'deletek' Kupisiewicz
Hi all,
As far as qemu makes use of /dev/shm - we can access RAM of any VM running on 
the system.
This gives some nice possibility of copying that RAM to some physical location 
(disk) and restore it later (something like snapshot, but we can make /dev/shm 
working in diff way and lot more).
Question is how to restore RAM from such file (dump of /dev/shm) ?

It would be really nice for implementing some kind of failover on your great 
(REALLY !) product !

Thanks for help,

Piotr Kupisiewicz
http://ekg2.org/




[Qemu-devel] Re: [PATCH 03/20] eepro100: Fix PXE boot

2010-03-02 Thread Michael S. Tsirkin
On Tue, Mar 02, 2010 at 07:43:36PM +0100, Stefan Weil wrote:
> The phy handling was wrong for PXE, GPXE boot:
> GPXE's eepro100 driver did not detect a valid link.
> 
> This is fixed here.
> 
> V2 - Use UPPER_CASE for enum values
> 
> Signed-off-by: Stefan Weil 

Please repost the whole series, using prefix PATCHv3
and Cc me.

Thanks!




[Qemu-devel] [PATCHv3 01/20] eepro100: Fix compiler errors from debug messages

2010-03-02 Thread Stefan Weil
When debug output was enabled (by defining DEBUG_EEPRO100),
some debug messages resulted in a compiler error.

This is fixed here.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index b33dbb8..6580ca8 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -56,7 +56,9 @@
 #define KiB 1024
 
 /* Debug EEPRO100 card. */
-//~ #define DEBUG_EEPRO100
+#if 0
+# define DEBUG_EEPRO100
+#endif
 
 #ifdef DEBUG_EEPRO100
 #define logout(fmt, ...) fprintf(stderr, "EE100\t%-24s" fmt, __func__, ## 
__VA_ARGS__)
@@ -874,9 +876,8 @@ static void action_command(EEPRO100State *s)
 cpu_physical_memory_read(s->cb_address, (uint8_t *)&s->tx, 
sizeof(s->tx));
 uint16_t status = le16_to_cpu(s->tx.status);
 uint16_t command = le16_to_cpu(s->tx.command);
-logout
-("val=0x%02x (cu start), status=0x%04x, command=0x%04x, 
link=0x%08x\n",
- val, status, command, s->tx.link);
+logout("val=(cu start), status=0x%04x, command=0x%04x, link=0x%08x\n",
+   status, command, s->tx.link);
 bool bit_el = ((command & 0x8000) != 0);
 bool bit_s = ((command & 0x4000) != 0);
 bool bit_i = ((command & 0x2000) != 0);
@@ -891,7 +892,7 @@ static void action_command(EEPRO100State *s)
 break;
 case CmdIASetup:
 cpu_physical_memory_read(s->cb_address + 8, &s->conf.macaddr.a[0], 
6);
-TRACE(OTHER, logout("macaddr: %s\n", nic_dump(&s->macaddr[0], 6)));
+TRACE(OTHER, logout("macaddr: %s\n", 
nic_dump(&s->conf.macaddr.a[0], 6)));
 break;
 case CmdConfigure:
 cpu_physical_memory_read(s->cb_address + 8, &s->configuration[0],
@@ -1875,7 +1876,7 @@ static int nic_init(PCIDevice *pci_dev, uint32_t device)
pci_mmio_map);
 
 qemu_macaddr_default_if_unset(&s->conf.macaddr);
-logout("macaddr: %s\n", nic_dump(&s->macaddr[0], 6));
+logout("macaddr: %s\n", nic_dump(&s->conf.macaddr.a[0], 6));
 assert(s->region[1] == 0);
 
 nic_reset(s);
-- 
1.7.0





[Qemu-devel] [PATCHv3 03/20] eepro100: Fix PXE boot

2010-03-02 Thread Stefan Weil
The phy handling was wrong for PXE, GPXE boot:
GPXE's eepro100 driver did not detect a valid link.

This is fixed here.

V2 - Use UPPER_CASE for enum values

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   20 ++--
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 124bc49..f6764cc 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -20,7 +20,7 @@
  * along with this program; if not, see .
  *
  * Tested features (i82559):
- *  PXE boot (i386) no valid link
+ *  PXE boot (i386) ok
  *  Linux networking (i386) ok
  *
  * Untested:
@@ -33,10 +33,6 @@
  * Open Source Software Developer Manual
  */
 
-#if defined(TARGET_I386)
-# warning "PXE boot still not working!"
-#endif
-
 #include  /* offsetof */
 #include 
 #include "hw.h"
@@ -243,6 +239,17 @@ typedef struct {
 bool has_extended_tcb_support;
 } EEPRO100State;
 
+/* Word indices in EEPROM. */
+typedef enum {
+EEPROM_CNFG_MDIX  = 0x03,
+EEPROM_ID = 0x05,
+EEPROM_PHY_ID = 0x06,
+EEPROM_VENDOR_ID  = 0x0c,
+EEPROM_CONFIG_ASF = 0x0d,
+EEPROM_DEVICE_ID  = 0x23,
+EEPROM_SMBUS_ADDR = 0x90,
+} EEPROMOffset;
+
 /* Default values for MDI (PHY) registers */
 static const uint16_t eepro100_mdi_default[] = {
 /* MDI Registers 0 - 6, 7 */
@@ -632,9 +639,10 @@ static void nic_selective_reset(EEPRO100State * s)
 uint16_t *eeprom_contents = eeprom93xx_data(s->eeprom);
 //~ eeprom93xx_reset(s->eeprom);
 memcpy(eeprom_contents, s->conf.macaddr.a, 6);
-eeprom_contents[0xa] = 0x4000;
+eeprom_contents[EEPROM_ID] = 0x4000;
 if (s->device == i82557B || s->device == i82557C)
 eeprom_contents[5] = 0x0100;
+eeprom_contents[EEPROM_PHY_ID] = 1;
 uint16_t sum = 0;
 for (i = 0; i < EEPROM_SIZE - 1; i++) {
 sum += eeprom_contents[i];
-- 
1.7.0





[Qemu-devel] [PATCHv3 02/20] eepro100: Add missing SCB register names

2010-03-02 Thread Stefan Weil
Some system control block registers were addressed
using their offset value. Use symbolic names now
and clean the documentation.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 6580ca8..124bc49 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -125,16 +125,20 @@
 /* Offsets to the various registers.
All accesses need not be longword aligned. */
 enum speedo_offsets {
-SCBStatus = 0,
+SCBStatus = 0,  /* Status Word. */
 SCBAck = 1,
 SCBCmd = 2, /* Rx/Command Unit command and status. */
 SCBIntmask = 3,
 SCBPointer = 4, /* General purpose pointer. */
 SCBPort = 8,/* Misc. commands and operands.  */
-SCBflash = 12, SCBeeprom = 14,  /* EEPROM and flash memory control. */
+SCBflash = 12,  /* Flash memory control. */
+SCBeeprom = 14, /* EEPROM control. */
 SCBCtrlMDI = 16,/* MDI interface control. */
 SCBEarlyRx = 20,/* Early receive byte count. */
-SCBFlow = 24,
+SCBFlow = 24,   /* Flow Control. */
+SCBpmdr = 27,   /* Power Management Driver. */
+SCBgctrl = 28,  /* General Control. */
+SCBgstat = 29,  /* General Status. */
 };
 
 /* A speedo3 transmit buffer descriptor with two buffers... */
@@ -1333,11 +1337,11 @@ static uint8_t eepro100_read1(EEPRO100State * s, 
uint32_t addr)
 case SCBeeprom:
 val = eepro100_read_eeprom(s);
 break;
-case 0x1b: /* PMDR (power management driver register) */
+case SCBpmdr:   /* Power Management Driver Register */
 val = 0;
 TRACE(OTHER, logout("addr=%s val=0x%02x\n", regname(addr), val));
 break;
-case 0x1d: /* general status register */
+case SCBgstat:  /* General Status Register */
 /* 100 Mbps full duplex, valid link */
 val = 0x07;
 TRACE(OTHER, logout("addr=General Status val=%02x\n", val));
@@ -1431,7 +1435,7 @@ static void eepro100_write1(EEPRO100State * s, uint32_t 
addr, uint8_t val)
 case SCBFlow:   /* does not exist on 82557 */
 case SCBFlow + 1:
 case SCBFlow + 2:
-case SCBFlow + 3:
+case SCBpmdr:   /* does not exist on 82557 */
 TRACE(OTHER, logout("addr=%s val=0x%02x\n", regname(addr), val));
 break;
 case SCBeeprom:
-- 
1.7.0





[Qemu-devel] [PATCHv3 05/20] eepro100: Add all supported devices to pci.c

2010-03-02 Thread Stefan Weil
All eepro100 devices work with drivers which
only use basic features.

They were tested with gpxe boot.

Signed-off-by: Stefan Weil 
---
 hw/pci.c |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index eb2043e..1ba3f92 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1436,9 +1436,18 @@ void do_pci_info(Monitor *mon, QObject **ret_data)
 
 static const char * const pci_nic_models[] = {
 "ne2k_pci",
+"i82550",
 "i82551",
+"i82557a",
 "i82557b",
+"i82557c",
+"i82558a",
+"i82558b",
+"i82559a",
+"i82559b",
+"i82559c",
 "i82559er",
+"i82562",
 "rtl8139",
 "e1000",
 "pcnet",
@@ -1448,9 +1457,18 @@ static const char * const pci_nic_models[] = {
 
 static const char * const pci_nic_names[] = {
 "ne2k_pci",
+"i82550",
 "i82551",
+"i82557a",
 "i82557b",
+"i82557c",
+"i82558a",
+"i82558b",
+"i82559a",
+"i82559b",
+"i82559c",
 "i82559er",
+"i82562",
 "rtl8139",
 "e1000",
 "pcnet",
-- 
1.7.0





[Qemu-devel] [PATCHv3 07/20] eepro100: Update copyright notice

2010-03-02 Thread Stefan Weil
Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 8ee4087..6c11209 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1,15 +1,15 @@
 /*
  * QEMU i8255x (PRO100) emulation
  *
- * Copyright (c) 2006-2007 Stefan Weil
+ * Copyright (C) 2006-2010 Stefan Weil
  *
  * Portions of the code are copies from grub / etherboot eepro100.c
  * and linux e100.c.
  *
- * This program is free software; you can redistribute it and/or modify
+ * This program is free software: you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
  *
  * This program is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
@@ -17,7 +17,7 @@
  * GNU General Public License for more details.
  *
  * You should have received a copy of the GNU General Public License
- * along with this program; if not, see .
+ * along with this program.  If not, see .
  *
  * Tested features (i82559):
  *  PXE boot (i386) ok
-- 
1.7.0





[Qemu-devel] [PATCHv3 06/20] eepro100: Add TODO list

2010-03-02 Thread Stefan Weil
Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index c072e90..8ee4087 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -31,6 +31,14 @@
  *
  * Intel 8255x 10/100 Mbps Ethernet Controller Family
  * Open Source Software Developer Manual
+ *
+ * TODO:
+ *  * PHY emulation should be separated from nic emulation.
+ *Most nic emulations could share the same phy code.
+ *  * i82550 is untested. It is programmed like the i82559.
+ *  * i82562 is untested. It is programmed like the i82559.
+ *  * Power management (i82558 and later) is not implemented.
+ *  * Wake-on-LAN is not implemented.
  */
 
 #include  /* offsetof */
-- 
1.7.0





[Qemu-devel] [PATCHv3 08/20] eepro100: Add device descriptions

2010-03-02 Thread Stefan Weil
Add descriptions for all devices.
These descriptions are shown when users call
qemu -device ?

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 6c11209..d5d3abc 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1980,6 +1980,7 @@ static int pci_i82562_init(PCIDevice *pci_dev)
 static PCIDeviceInfo eepro100_info[] = {
 {
 .qdev.name = "i82550",
+.qdev.desc = "Intel i82550 Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82550_init,
 .exit  = pci_nic_uninit,
@@ -1990,6 +1991,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82551",
+.qdev.desc = "Intel i82551 Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82551_init,
 .exit  = pci_nic_uninit,
@@ -2000,6 +2002,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82557a",
+.qdev.desc = "Intel i82557A Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82557a_init,
 .exit  = pci_nic_uninit,
@@ -2010,6 +2013,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82557b",
+.qdev.desc = "Intel i82557B Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82557b_init,
 .exit  = pci_nic_uninit,
@@ -2020,6 +2024,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82557c",
+.qdev.desc = "Intel i82557C Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82557c_init,
 .exit  = pci_nic_uninit,
@@ -2030,6 +2035,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82558a",
+.qdev.desc = "Intel i82558A Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82558a_init,
 .exit  = pci_nic_uninit,
@@ -2040,6 +2046,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82558b",
+.qdev.desc = "Intel i82558B Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82558b_init,
 .exit  = pci_nic_uninit,
@@ -2050,6 +2057,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82559a",
+.qdev.desc = "Intel i82559A Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82559a_init,
 .exit  = pci_nic_uninit,
@@ -2060,6 +2068,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82559b",
+.qdev.desc = "Intel i82559B Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82559b_init,
 .exit  = pci_nic_uninit,
@@ -2070,6 +2079,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82559c",
+.qdev.desc = "Intel i82559C Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82559c_init,
 .exit  = pci_nic_uninit,
@@ -2080,6 +2090,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82559er",
+.qdev.desc = "Intel i82559ER Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82559er_init,
 .exit  = pci_nic_uninit,
@@ -2090,6 +2101,7 @@ static PCIDeviceInfo eepro100_info[] = {
 },
 },{
 .qdev.name = "i82562",
+.qdev.desc = "Intel i82562 Ethernet",
 .qdev.size = sizeof(EEPRO100State),
 .init  = pci_i82562_init,
 .exit  = pci_nic_uninit,
-- 
1.7.0





[Qemu-devel] [PATCHv3 10/20] eepro100: Remove old unused code

2010-03-02 Thread Stefan Weil
This code is no longer needed.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   18 --
 1 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index a58c640..c2ff569 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -469,24 +469,6 @@ static void pci_reset(EEPRO100State * s)
 PCI_CONFIG_8(PCI_LATENCY_TIMER, 0x20);   // latency timer = 32 clocks
 /* PCI Header Type */
 /* BIST (built-in self test) */
-#if defined(TARGET_I386)
-// !!! workaround for buggy bios
-//~ #define PCI_BASE_ADDRESS_MEM_PREFETCH 0
-#endif
-#if 0
-/* PCI Base Address Registers */
-/* CSR Memory Mapped Base Address */
-PCI_CONFIG_32(PCI_BASE_ADDRESS_0,
-  PCI_BASE_ADDRESS_SPACE_MEMORY |
-  PCI_BASE_ADDRESS_MEM_PREFETCH);
-/* CSR I/O Mapped Base Address */
-PCI_CONFIG_32(PCI_BASE_ADDRESS_1, PCI_BASE_ADDRESS_SPACE_IO);
-#if 0
-/* Flash Memory Mapped Base Address */
-PCI_CONFIG_32(PCI_BASE_ADDRESS_2,
-  0xfffe | PCI_BASE_ADDRESS_SPACE_MEMORY);
-#endif
-#endif
 /* Expansion ROM Base Address (depends on boot disable!!!) */
 /* TODO: not needed, set when BAR is registered */
 PCI_CONFIG_32(PCI_ROM_ADDRESS, PCI_BASE_ADDRESS_SPACE_MEMORY);
-- 
1.7.0





[Qemu-devel] [PATCHv3 14/20] eepro100: Fix CU Start command

2010-03-02 Thread Stefan Weil
CU Start is allowed when the CU is in the idle or suspended state.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 40d8db5..a9bf7a0 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -976,17 +976,17 @@ static void action_command(EEPRO100State *s)
 
 static void eepro100_cu_command(EEPRO100State * s, uint8_t val)
 {
+cu_state_t cu_state;
 switch (val) {
 case CU_NOP:
 /* No operation. */
 break;
 case CU_START:
-if (get_cu_state(s) != cu_idle) {
-/* Intel documentation says that CU must be idle for the CU
- * start command. Intel driver for Linux also starts the CU
- * from suspended state. */
-logout("CU state is %u, should be %u\n", get_cu_state(s), cu_idle);
-//~ assert(!"wrong CU state");
+cu_state = get_cu_state(s);
+if (cu_state != cu_idle && cu_state != cu_suspended) {
+/* Intel documentation says that CU must be idle or suspended
+ * for the CU start command. */
+logout("unexpected CU state is %u\n", cu_state);
 }
 set_cu_state(s, cu_active);
 s->cu_offset = s->pointer;
-- 
1.7.0





[Qemu-devel] [PATCHv3 13/20] eepro100: Support RNR interrupt

2010-03-02 Thread Stefan Weil
The RNR interrupt is triggered under these conditions:

* the RU is not ready to receive a frame due to missing resources
* the RU is ready and a RU abort command was requested

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 7eaa876..40d8db5 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -120,7 +120,7 @@
 #define  RU_NOP 0x
 #define  RX_START   0x0001
 #define  RX_RESUME  0x0002
-#define  RX_ABORT   0x0004
+#define  RU_ABORT   0x0004
 #define  RX_ADDR_LOAD   0x0006
 #define  RX_RESUMENR0x0007
 #define INT_MASK0x0100
@@ -426,13 +426,11 @@ static void eepro100_fr_interrupt(EEPRO100State * s)
 eepro100_interrupt(s, 0x40);
 }
 
-#if 0
 static void eepro100_rnr_interrupt(EEPRO100State * s)
 {
 /* RU is not ready. */
 eepro100_interrupt(s, 0x10);
 }
-#endif
 
 static void eepro100_mdi_interrupt(EEPRO100State * s)
 {
@@ -1065,6 +1063,13 @@ static void eepro100_ru_command(EEPRO100State * s, 
uint8_t val)
 }
 set_ru_state(s, ru_ready);
 break;
+case RU_ABORT:
+/* RU abort. */
+if (get_ru_state(s) == ru_ready) {
+eepro100_rnr_interrupt(s);
+}
+set_ru_state(s, ru_idle);
+break;
 case RX_ADDR_LOAD:
 /* Load RU base. */
 TRACE(OTHER, logout("val=0x%02x (RU base address)\n", val));
@@ -1747,6 +1752,8 @@ static ssize_t nic_receive(VLANClientState *nc, const 
uint8_t * buf, size_t size
 if (get_ru_state(s) != ru_ready) {
 /* No resources available. */
 logout("no resources, state=%u\n", get_ru_state(s));
+/* TODO: RNR interrupt only at first failed frame? */
+eepro100_rnr_interrupt(s);
 s->statistics.rx_resource_errors++;
 //~ assert(!"no resources");
 return -1;
-- 
1.7.0





[Qemu-devel] [PATCHv3 11/20] eepro100: Use symbolic names for bits in EEPROM id

2010-03-02 Thread Stefan Weil
V2 - Use UPPER_CASE for enum values

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index c2ff569..2a871b8 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -272,6 +272,21 @@ typedef enum {
 EEPROM_SMBUS_ADDR = 0x90,
 } EEPROMOffset;
 
+/* Bit values for EEPROM ID word. */
+typedef enum {
+EEPROM_ID_MDM = BIT(0), /* Modem */
+EEPROM_ID_STB = BIT(1), /* Standby Enable */
+EEPROM_ID_WMR = BIT(2), /* ??? */
+EEPROM_ID_WOL = BIT(5), /* Wake on LAN */
+EEPROM_ID_DPD = BIT(6), /* Deep Power Down */
+EEPROM_ID_ALT = BIT(7), /* */
+/* BITS(10, 8) device revision */
+EEPROM_ID_BD = BIT(11), /* boot disable */
+EEPROM_ID_ID = BIT(13), /* id bit */
+/* BITS(15, 14) signature */
+EEPROM_ID_VALID = BIT(14),  /* signature for valid eeprom */
+} eeprom_id_bit;
+
 /* Default values for MDI (PHY) registers */
 static const uint16_t eepro100_mdi_default[] = {
 /* MDI Registers 0 - 6, 7 */
@@ -643,7 +658,7 @@ static void nic_selective_reset(EEPRO100State * s)
 uint16_t *eeprom_contents = eeprom93xx_data(s->eeprom);
 //~ eeprom93xx_reset(s->eeprom);
 memcpy(eeprom_contents, s->conf.macaddr.a, 6);
-eeprom_contents[EEPROM_ID] = 0x4000;
+eeprom_contents[EEPROM_ID] = EEPROM_ID_VALID;
 if (s->device == i82557B || s->device == i82557C)
 eeprom_contents[5] = 0x0100;
 eeprom_contents[EEPROM_PHY_ID] = 1;
-- 
1.7.0





[Qemu-devel] [PATCHv3 12/20] eepro100: Replace variable name to fix a compiler warning

2010-03-02 Thread Stefan Weil
When compiling with -Wshadow, gcc gives a warning
which is fixed by renaming stat -> status.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 2a871b8..7eaa876 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -392,14 +392,14 @@ static void eepro100_acknowledge(EEPRO100State * s)
 }
 }
 
-static void eepro100_interrupt(EEPRO100State * s, uint8_t stat)
+static void eepro100_interrupt(EEPRO100State * s, uint8_t status)
 {
 uint8_t mask = ~s->mem[SCBIntmask];
-s->mem[SCBAck] |= stat;
-stat = s->scb_stat = s->mem[SCBAck];
-stat &= (mask | 0x0f);
-//~ stat &= (~s->mem[SCBIntmask] | 0x0xf);
-if (stat && (mask & 0x01)) {
+s->mem[SCBAck] |= status;
+status = s->scb_stat = s->mem[SCBAck];
+status &= (mask | 0x0f);
+//~ status &= (~s->mem[SCBIntmask] | 0x0xf);
+if (status && (mask & 0x01)) {
 /* SCB mask and SCB Bit M do not disable interrupt. */
 enable_interrupt(s);
 } else if (s->int_stat) {
-- 
1.7.0





[Qemu-devel] [PATCHv3 09/20] eepro100: Use symbolic names and BIT macros in binary operations

2010-03-02 Thread Stefan Weil
Instead of magic numbers like 0x8000, symbolic names are used
for the SCB command and status bits.

There are too many configuration bits to use symbolic names
there, too. Using the BIT macro is a little help when comparing
code and documentation.

For the same reason, some other constants were replaced by
the BITS macro.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   52 +---
 1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index d5d3abc..a58c640 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -172,6 +172,20 @@ typedef struct {
 char packet[MAX_ETH_FRAME_SIZE + 4];
 } eepro100_rx_t;
 
+typedef enum {
+COMMAND_EL = BIT(15),
+COMMAND_S = BIT(14),
+COMMAND_I = BIT(13),
+COMMAND_NC = BIT(4),
+COMMAND_SF = BIT(3),
+COMMAND_CMD = BITS(2, 0),
+} scb_command_bit;
+
+typedef enum {
+STATUS_C = BIT(15),
+STATUS_OK = BIT(13),
+} scb_status_bit;
+
 typedef struct {
 uint32_t tx_good_frames, tx_max_collisions, tx_late_collisions,
 tx_underruns, tx_lost_crs, tx_deferred, tx_single_collisions,
@@ -753,22 +767,22 @@ enum commands {
 
 static cu_state_t get_cu_state(EEPRO100State * s)
 {
-return ((s->mem[SCBStatus] >> 6) & 0x03);
+return ((s->mem[SCBStatus] & BITS(7, 6)) >> 6);
 }
 
 static void set_cu_state(EEPRO100State * s, cu_state_t state)
 {
-s->mem[SCBStatus] = (s->mem[SCBStatus] & 0x3f) + (state << 6);
+s->mem[SCBStatus] = (s->mem[SCBStatus] & ~BITS(7, 6)) + (state << 6);
 }
 
 static ru_state_t get_ru_state(EEPRO100State * s)
 {
-return ((s->mem[SCBStatus] >> 2) & 0x0f);
+return ((s->mem[SCBStatus] & BITS(5, 2)) >> 2);
 }
 
 static void set_ru_state(EEPRO100State * s, ru_state_t state)
 {
-s->mem[SCBStatus] = (s->mem[SCBStatus] & 0xc3) + (state << 2);
+s->mem[SCBStatus] = (s->mem[SCBStatus] & ~BITS(5, 2)) + (state << 2);
 }
 
 static void dump_statistics(EEPRO100State * s)
@@ -898,13 +912,13 @@ static void action_command(EEPRO100State *s)
 uint16_t command = le16_to_cpu(s->tx.command);
 logout("val=(cu start), status=0x%04x, command=0x%04x, link=0x%08x\n",
status, command, s->tx.link);
-bool bit_el = ((command & 0x8000) != 0);
-bool bit_s = ((command & 0x4000) != 0);
-bool bit_i = ((command & 0x2000) != 0);
-bool bit_nc = ((command & 0x0010) != 0);
+bool bit_el = ((command & COMMAND_EL) != 0);
+bool bit_s = ((command & COMMAND_S) != 0);
+bool bit_i = ((command & COMMAND_I) != 0);
+bool bit_nc = ((command & COMMAND_NC) != 0);
 bool success = true;
-//~ bool bit_sf = ((command & 0x0008) != 0);
-uint16_t cmd = command & 0x0007;
+//~ bool bit_sf = ((command & COMMAND_SF) != 0);
+uint16_t cmd = command & COMMAND_CMD;
 s->cu_offset = le32_to_cpu(s->tx.link);
 switch (cmd) {
 case CmdNOp:
@@ -941,7 +955,7 @@ static void action_command(EEPRO100State *s)
 break;
 }
 /* Write new status. */
-stw_phys(s->cb_address, status | 0x8000 | (success ? 0x2000 : 0));
+stw_phys(s->cb_address, status | STATUS_C | (success ? STATUS_OK : 0));
 if (bit_i) {
 /* CU completed action. */
 eepro100_cx_interrupt(s);
@@ -1684,13 +1698,13 @@ static ssize_t nic_receive(VLANClientState *nc, const 
uint8_t * buf, size_t size
 /* CSMA is disabled. */
 logout("%p received while CSMA is disabled\n", s);
 return -1;
-} else if (size < 64 && (s->configuration[7] & 1)) {
+} else if (size < 64 && (s->configuration[7] & BIT(0))) {
 /* Short frame and configuration byte 7/0 (discard short receive) set:
  * Short frame is discarded */
 logout("%p received short frame (%zu byte)\n", s, size);
 s->statistics.rx_short_frame_errors++;
 //~ return -1;
-} else if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] & 8)) 
{
+} else if ((size > MAX_ETH_FRAME_SIZE + 4) && !(s->configuration[18] & 
BIT(3))) {
 /* Long frame and configuration byte 18/3 (long receive ok) not set:
  * Long frames are discarded. */
 logout("%p received long frame (%zu byte), ignored\n", s, size);
@@ -1713,7 +1727,7 @@ static ssize_t nic_receive(VLANClientState *nc, const 
uint8_t * buf, size_t size
   assert(mcast_idx < 64);
   if (s->mult[mcast_idx >> 3] & (1 << (mcast_idx & 7))) {
 /* Multicast frame is allowed in hash table. */
-  } else if (s->configuration[15] & 1) {
+  } else if (s->configuration[15] & BIT(0)) {
   /* Promiscuous: receive all. */
   rfd_status |= 0x0004;
   } else {
@@ -1723,7 +1737,7 @@ static ssize_t nic_receive(VLANClientState *nc, const 
uint8_t * buf, size_t size
 }
 /* TODO: Next not for promiscuous mode? */
 rfd_status |= 0x0002;
-} else 

[Qemu-devel] [PATCHv3 16/20] eepro100: Use tx.status

2010-03-02 Thread Stefan Weil
There is no need for a local variable "status".
Using tx.status makes it clearer which status
is addressed.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index cf4e0ac..f5aa306 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -903,10 +903,10 @@ static void action_command(EEPRO100State *s)
 for (;;) {
 s->cb_address = s->cu_base + s->cu_offset;
 cpu_physical_memory_read(s->cb_address, (uint8_t *)&s->tx, 
sizeof(s->tx));
-uint16_t status = le16_to_cpu(s->tx.status);
 uint16_t command = le16_to_cpu(s->tx.command);
+s->tx.status = le16_to_cpu(s->tx.status);
 logout("val=(cu start), status=0x%04x, command=0x%04x, link=0x%08x\n",
-   status, command, s->tx.link);
+   s->tx.status, command, s->tx.link);
 bool bit_el = ((command & COMMAND_EL) != 0);
 bool bit_s = ((command & COMMAND_S) != 0);
 bool bit_i = ((command & COMMAND_I) != 0);
@@ -950,7 +950,7 @@ static void action_command(EEPRO100State *s)
 break;
 }
 /* Write new status. */
-stw_phys(s->cb_address, status | STATUS_C | (success ? STATUS_OK : 0));
+stw_phys(s->cb_address, s->tx.status | STATUS_C | (success ? STATUS_OK 
: 0));
 if (bit_i) {
 /* CU completed action. */
 eepro100_cx_interrupt(s);
-- 
1.7.0





[Qemu-devel] [PATCHv3 15/20] eepro100: Prettify code (no functional changes)

2010-03-02 Thread Stefan Weil
* Fix indentation.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index a9bf7a0..cf4e0ac 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -188,11 +188,11 @@ typedef enum {
 
 typedef struct {
 uint32_t tx_good_frames, tx_max_collisions, tx_late_collisions,
-tx_underruns, tx_lost_crs, tx_deferred, tx_single_collisions,
-tx_multiple_collisions, tx_total_collisions;
+ tx_underruns, tx_lost_crs, tx_deferred, tx_single_collisions,
+ tx_multiple_collisions, tx_total_collisions;
 uint32_t rx_good_frames, rx_crc_errors, rx_alignment_errors,
-rx_resource_errors, rx_overrun_errors, rx_cdt_errors,
-rx_short_frame_errors;
+ rx_resource_errors, rx_overrun_errors, rx_cdt_errors,
+ rx_short_frame_errors;
 uint32_t fc_xmt_pause, fc_rcv_pause, fc_rcv_unsupported;
 uint16_t xmt_tco_frames, rcv_tco_frames;
 /* TODO: i82559 has six reserved statistics but a total of 24 dwords. */
-- 
1.7.0





[Qemu-devel] [PATCHv3 17/20] eepro100: New function for reading command block

2010-03-02 Thread Stefan Weil
Move code which reads the command block to the
new function read_cb. The patch also fixes some
endianess issues related to the command block
and moves declarations of local variables to
the beginning of the block.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |   42 --
 1 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index f5aa306..e10ce62 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -798,6 +798,16 @@ static void dump_statistics(EEPRO100State * s)
 //~ missing("CU dump statistical counters");
 }
 
+static void read_cb(EEPRO100State *s)
+{
+cpu_physical_memory_read(s->cb_address, (uint8_t *) &s->tx, sizeof(s->tx));
+s->tx.status = le16_to_cpu(s->tx.status);
+s->tx.command = le16_to_cpu(s->tx.command);
+s->tx.link = le32_to_cpu(s->tx.link);
+s->tx.tbd_array_addr = le32_to_cpu(s->tx.tbd_array_addr);
+s->tx.tcb_bytes = le16_to_cpu(s->tx.tcb_bytes);
+}
+
 static void tx_command(EEPRO100State *s)
 {
 uint32_t tbd_array = le32_to_cpu(s->tx.tbd_array_addr);
@@ -901,21 +911,25 @@ static void set_multicast_list(EEPRO100State *s)
 static void action_command(EEPRO100State *s)
 {
 for (;;) {
-s->cb_address = s->cu_base + s->cu_offset;
-cpu_physical_memory_read(s->cb_address, (uint8_t *)&s->tx, 
sizeof(s->tx));
-uint16_t command = le16_to_cpu(s->tx.command);
-s->tx.status = le16_to_cpu(s->tx.status);
-logout("val=(cu start), status=0x%04x, command=0x%04x, link=0x%08x\n",
-   s->tx.status, command, s->tx.link);
-bool bit_el = ((command & COMMAND_EL) != 0);
-bool bit_s = ((command & COMMAND_S) != 0);
-bool bit_i = ((command & COMMAND_I) != 0);
-bool bit_nc = ((command & COMMAND_NC) != 0);
+bool bit_el;
+bool bit_s;
+bool bit_i;
+bool bit_nc;
 bool success = true;
-//~ bool bit_sf = ((command & COMMAND_SF) != 0);
-uint16_t cmd = command & COMMAND_CMD;
-s->cu_offset = le32_to_cpu(s->tx.link);
-switch (cmd) {
+s->cb_address = s->cu_base + s->cu_offset;
+read_cb(s);
+bit_el = ((s->tx.command & COMMAND_EL) != 0);
+bit_s = ((s->tx.command & COMMAND_S) != 0);
+bit_i = ((s->tx.command & COMMAND_I) != 0);
+bit_nc = ((s->tx.command & COMMAND_NC) != 0);
+#if 0
+bool bit_sf = ((s->tx.command & COMMAND_SF) != 0);
+#endif
+s->cu_offset = s->tx.link;
+TRACE(OTHER,
+  logout("val=(cu start), status=0x%04x, command=0x%04x, 
link=0x%08x\n",
+ s->tx.status, s->tx.command, s->tx.link));
+switch (s->tx.command & COMMAND_CMD) {
 case CmdNOp:
 /* Do nothing. */
 break;
-- 
1.7.0





[Qemu-devel] [PATCHv3 18/20] eepro100: Add diagnose command

2010-03-02 Thread Stefan Weil
Real hardware would run an internal self-test.
The emulation just returns a passed status.

Original patch was from Reimar Döffinger, thanks.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index e10ce62..0f07b70 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -958,6 +958,11 @@ static void action_command(EEPRO100State *s)
 /* Starting with offset 8, the command contains
  * 64 dwords microcode which we just ignore here. */
 break;
+case CmdDiagnose:
+TRACE(OTHER, logout("diagnose\n"));
+/* Make sure error flag is not set. */
+s->tx.status = 0;
+break;
 default:
 missing("undefined command");
 success = false;
-- 
1.7.0





[Qemu-devel] [PATCHv3 20/20] eepro100: Keep includes sorted

2010-03-02 Thread Stefan Weil
I always try to keep standard includes sorted
and add a comment why they are there (so they
can be removed when they are no longer needed).

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 43a9060..45ab497 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -41,8 +41,8 @@
  *  * Wake-on-LAN is not implemented.
  */
 
+#include /* bool */
 #include  /* offsetof */
-#include 
 #include "hw.h"
 #include "pci.h"
 #include "net.h"
-- 
1.7.0





[Qemu-devel] [PATCHv3 19/20] eepro100: Remove C++ comments

2010-03-02 Thread Stefan Weil
C++ comments are unwanted, so this is fixed here.

* Replace C++ comments by C comments.
* Put code which was deactivated by a C++ comment in #if 0...#endif.

Signed-off-by: Stefan Weil 
---
 hw/eepro100.c |  185 +++--
 1 files changed, 126 insertions(+), 59 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 0f07b70..43a9060 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -154,11 +154,13 @@ typedef struct {
 uint16_t tcb_bytes; /* transmit command block byte count (in lower 
14 bits */
 uint8_t tx_threshold;   /* transmit threshold */
 uint8_t tbd_count;  /* TBD number */
-//~ /* This constitutes two "TBD" entries: hdr and data */
-//~ uint32_t tx_buf_addr0;  /* void *, header of frame to be transmitted.  
*/
-//~ int32_t  tx_buf_size0;  /* Length of Tx hdr. */
-//~ uint32_t tx_buf_addr1;  /* void *, data to be transmitted.  */
-//~ int32_t  tx_buf_size1;  /* Length of Tx data. */
+#if 0
+/* This constitutes two "TBD" entries: hdr and data */
+uint32_t tx_buf_addr0;  /* void *, header of frame to be transmitted.  */
+int32_t  tx_buf_size0;  /* Length of Tx hdr. */
+uint32_t tx_buf_addr1;  /* void *, data to be transmitted.  */
+int32_t  tx_buf_size1;  /* Length of Tx data. */
+#endif
 } eepro100_tx_t;
 
 /* Receive frame descriptor. */
@@ -398,7 +400,9 @@ static void eepro100_interrupt(EEPRO100State * s, uint8_t 
status)
 s->mem[SCBAck] |= status;
 status = s->scb_stat = s->mem[SCBAck];
 status &= (mask | 0x0f);
-//~ status &= (~s->mem[SCBIntmask] | 0x0xf);
+#if 0
+status &= (~s->mem[SCBIntmask] | 0x0xf);
+#endif
 if (status && (mask & 0x01)) {
 /* SCB mask and SCB Bit M do not disable interrupt. */
 enable_interrupt(s);
@@ -477,9 +481,11 @@ static void pci_reset(EEPRO100State * s)
 pci_config_set_class(pci_conf, PCI_CLASS_NETWORK_ETHERNET);
 /* PCI Cache Line Size */
 /* check cache line size!!! */
-//~ PCI_CONFIG_8(0x0c, 0x00);
+#if 0
+PCI_CONFIG_8(0x0c, 0x00);
+#endif
 /* PCI Latency Timer */
-PCI_CONFIG_8(PCI_LATENCY_TIMER, 0x20);   // latency timer = 32 clocks
+PCI_CONFIG_8(PCI_LATENCY_TIMER, 0x20);   /* latency timer = 32 clocks */
 /* PCI Header Type */
 /* BIST (built-in self test) */
 /* Expansion ROM Base Address (depends on boot disable!!!) */
@@ -492,7 +498,7 @@ static void pci_reset(EEPRO100State * s)
 /* Interrupt Line */
 /* Interrupt Pin */
 /* TODO: RST# value should be 0 */
-PCI_CONFIG_8(PCI_INTERRUPT_PIN, 1);  // interrupt pin 0
+PCI_CONFIG_8(PCI_INTERRUPT_PIN, 1);  /* interrupt pin 0 */
 /* Minimum Grant */
 PCI_CONFIG_8(PCI_MIN_GNT, 0x08);
 /* Maximum Latency */
@@ -500,20 +506,20 @@ static void pci_reset(EEPRO100State * s)
 
 switch (device) {
 case i82550:
-// TODO: check device id.
+/* TODO: check device id. */
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_82551IT);
 /* Revision ID: 0x0c, 0x0d, 0x0e. */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x0e);
-// TODO: check size of statistical counters.
+/* TODO: check size of statistical counters. */
 s->stats_size = 80;
-// TODO: check extended tcb support.
+/* TODO: check extended tcb support. */
 s->has_extended_tcb_support = 1;
 break;
 case i82551:
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_82551IT);
 /* Revision ID: 0x0f, 0x10. */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x0f);
-// TODO: check size of statistical counters.
+/* TODO: check size of statistical counters. */
 s->stats_size = 80;
 s->has_extended_tcb_support = 1;
 break;
@@ -572,7 +578,7 @@ static void pci_reset(EEPRO100State * s)
 PCI_CONFIG_16(PCI_STATUS, PCI_STATUS_DEVSEL_MEDIUM |
   PCI_STATUS_FAST_BACK | PCI_STATUS_CAP_LIST);
 PCI_CONFIG_8(PCI_REVISION_ID, 0x08);
-// TODO: Windows wants revision id 0x0c.
+/* TODO: Windows wants revision id 0x0c. */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x0c);
 #if EEPROM_SIZE > 0
 PCI_CONFIG_16(PCI_SUBSYSTEM_VENDOR_ID, 0x8086);
@@ -590,7 +596,7 @@ static void pci_reset(EEPRO100State * s)
 s->has_extended_tcb_support = 1;
 break;
 case i82562:
-// TODO: check device id.
+/* TODO: check device id. */
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_82551IT);
 /* TODO: wrong revision id. */
 PCI_CONFIG_8(PCI_REVISION_ID, 0x0e);
@@ -637,14 +643,16 @@ static void pci_reset(EEPRO100State * s)
 
 #if EEPROM_SIZE > 0
 if (device == i82557C || device == i82558B || device == i82559C) {
-// TODO: get vendor id from EEPROM for i82557C or later.
-// TODO: get device id from EEPROM for i82557C or later.
-// TODO: status bit 4 can be disabled by EEPROM fo

Re: [Qemu-devel] tcg/arm fixes and improvements

2010-03-02 Thread andrzej zaborowski
On 1 March 2010 22:33, Aurelien Jarno  wrote:
> This patch series fix a bug in div2/divu2 ops, implement setcond
> and setcond2 ops and improve brcond/setcond by allowing immediate
> constants.

Thanks, pushed the four changes + added a missing break after
setcond2.  I'm not totally sure but I think setcond can be shortened
to two instructions for each case or some cases, the COND_NE case
could look like:

subs dest, t1, t2
movne dest, #1

and the lesser/greater cases can use the carry flag and "adc".

Cheers




Re: [Qemu-devel] Re: [PATCHv2 10/12] tap: add vhost/vhostfd options

2010-03-02 Thread Paul Brook
> The new function I'm proposing has the following semantics:
> 
> - it always returns a persistent mapping
> - it never bounces
> - it will only fail if the mapping isn't ram

So you're assuming that virtio rings are in ram that is not hot-pluggable or 
remapable, and the whole region is contiguous?
That sounds like it's likely to come back and bite you. The guest has no idea 
which areas of ram happen to be contiguous on the host.

Paul




Re: [Qemu-devel] Intel PXA270 System-on-chip emulation in QEMU

2010-03-02 Thread andrzej zaborowski
Hi Taimoor,

On 1 March 2010 09:01, Taimoor Mirza  wrote:
> I want to know about Intel PXA270 System-on-chip emulation in QEMU. Does
> this emulation includes "USB client controller" emulation?

If you mean usb "slave" or "device" support, no, it's not supported by
Qemu's PXA270 emulator.  There is a framework for this used by another
System-on-chip in qemu, so it can be implemented for PXA270 too.

Cheers




Re: [Qemu-devel] [PATCH] Fix curses return key when using -k

2010-03-02 Thread andrzej zaborowski
On 28 February 2010 15:35, Samuel Thibault  wrote:
> Hello,
>
> There is a small incoherency in curses_keys.h, which makes it fail to
> emit \n when using e.g. -k fr: curses2keysym transforms \r and 0x157
> into \n, but name2keysym binds \r with Return, not \n.  The patch below
> fixes that.

Thanks, applied both patches.

Cheers




[Qemu-devel] VGA doesn't work on isapc target

2010-03-02 Thread Agarwal, Lomesh
I have downloaded qemu 0.12.3 and successfully compiled it on Ubuntu 9.10
I can boot Linux disk image from qemu download site.
Now I want to boot a compiled Linux image for 486 on a 486 target. For this 
purpose I use the following command -
qemu -M isapc -kernel /usr/src/linux-source-2.6.31/arch/x86/boot/bzImage
I don't see anything on the target console. While debugging I found that if I 
enable PCI (by passing 1 as last parameter to pc_init1 in hw/pc.c file), it 
works fine.
I tried using std vga in place of cirrus vga but that doesn't work either.
Any help or pointer to debug?




[Qemu-devel] Re: [PATCH] Added Cortex_a8_v3 version, and added a shift instruction for NEON.

2010-03-02 Thread George Karavaev

Here's the patch after suggested changes.

---
target-arm/cpu.h   |1 +
target-arm/helper.c|3 +++
target-arm/translate.c |   16 +---
3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 4a1c53f..8751402 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -392,6 +392,7 @@ void cpu_arm_set_cp_io(CPUARMState *env, int cpnum,
#define ARM_CPUID_ARM1136_R2  0x4107b362
#define ARM_CPUID_ARM11MPCORE 0x410fb022
#define ARM_CPUID_CORTEXA80x410fc080
+#define ARM_CPUID_CORTEXA8_V3 0x410fc083
#define ARM_CPUID_CORTEXA90x410fc090
#define ARM_CPUID_CORTEXM30x410fc231
#define ARM_CPUID_ANY 0x
diff --git a/target-arm/helper.c b/target-arm/helper.c
index 6f40084..ed0b476 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -87,6 +87,7 @@ static void cpu_reset_model_id(CPUARMState *env, 
uint32_t id)

env->cp15.c0_cachetype = 0x1dd20d2;
break;
case ARM_CPUID_CORTEXA8:
+case ARM_CPUID_CORTEXA8_V3:
set_feature(env, ARM_FEATURE_V6);
set_feature(env, ARM_FEATURE_V6K);
set_feature(env, ARM_FEATURE_V7);
@@ -314,6 +315,7 @@ static const struct arm_cpu_t arm_cpu_names[] = {
{ ARM_CPUID_ARM11MPCORE, "arm11mpcore"},
{ ARM_CPUID_CORTEXM3, "cortex-m3"},
{ ARM_CPUID_CORTEXA8, "cortex-a8"},
+{ ARM_CPUID_CORTEXA8_V3, "cortex-a8-v3"},
{ ARM_CPUID_CORTEXA9, "cortex-a9"},
{ ARM_CPUID_TI925T, "ti925t" },
{ ARM_CPUID_PXA250, "pxa250" },
@@ -1633,6 +1635,7 @@ uint32_t HELPER(get_cp15)(CPUState *env, uint32_t 
insn)

case ARM_CPUID_ARM11MPCORE:
return 1;
case ARM_CPUID_CORTEXA8:
+case ARM_CPUID_CORTEXA8_V3:
return 2;
case ARM_CPUID_CORTEXA9:
return 0;
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 9607aae..5474408 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4656,9 +4656,19 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)

/* Accumulate.  */
neon_load_reg64(cpu_V0, rd + pass);
tcg_gen_add_i64(cpu_V0, cpu_V0, cpu_V1);
-} else if (op == 4 || (op == 5 && u)) {
-/* Insert */
-cpu_abort(env, "VS[LR]I.64 not implemented");
+} else if (op == 4) {
+/* Insert - VSRI */
+cpu_abort(env, "VSRI.64 not implemented");
+} else if (op == 5 && u) {
+/* VSLI */
+tmp64 = tcg_temp_new_i64();
+tcg_gen_movi_i64(tmp64, -1);
+gen_helper_neon_shl_u64(tmp64, tmp64, cpu_V1);
+tcg_gen_and_i64(cpu_V0, cpu_V0, tmp64);
+neon_load_reg64(cpu_V1, rd + pass);
+tcg_gen_andc_i64(cpu_V1, cpu_V1, tmp64);
+tcg_temp_free_i64(tmp64);
+tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
}
neon_store_reg64(cpu_V0, rd + pass);
} else { /* size < 3 */
--
1.6.4.4




[Qemu-devel] Re: [PATCH 2/4] KVM: Rework VCPU state writeback API

2010-03-02 Thread Marcelo Tosatti
On Tue, Mar 02, 2010 at 05:31:09PM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Tue, Mar 02, 2010 at 09:00:04AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Mon, Mar 01, 2010 at 07:10:30PM +0100, Jan Kiszka wrote:
>  This grand cleanup drops all reset and vmsave/load related
>  synchronization points in favor of four(!) generic hooks:
> 
>  - cpu_synchronize_all_states in qemu_savevm_state_complete
>    (initial sync from kernel before vmsave)
>  - cpu_synchronize_all_post_init in qemu_loadvm_state
>    (writeback after vmload)
>  - cpu_synchronize_all_post_init in main after machine init
>  - cpu_synchronize_all_post_reset in qemu_system_reset
>    (writeback after system reset)
> 
>  These writeback points + the existing one of VCPU exec after
>  cpu_synchronize_state map on three levels of writeback:
> 
>  - KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
>  - KVM_PUT_RESET_STATE   (on synchronous system reset, all VCPUs stopped)
>  - KVM_PUT_FULL_STATE(on init or vmload, all VCPUs stopped as well)
> 
>  This level is passed to the arch-specific VCPU state writing function
>  that will decide which concrete substates need to be written. That way,
>  no writer of load, save or reset functions that interact with in-kernel
>  KVM states will ever have to worry about synchronization again. That
>  also means that a lot of reasons for races, segfaults and deadlocks are
>  eliminated.
> 
>  cpu_synchronize_state remains untouched, just as Anthony suggested. We
>  continue to need it before reading or writing of VCPU states that are
>  also tracked by in-kernel KVM subsystems.
> 
>  Consequently, this patch removes many cpu_synchronize_state calls that
>  are now redundant, just like remaining explicit register syncs.
> 
>  Signed-off-by: Jan Kiszka 
> >>> Jan,
> >>>
> >>> This patch breaks system reset of WinXP.32 install (more easily
> >>> reproducible without iothread enabled).
> >>>
> >>> Screenshot attached.
> >>>
> >> Strange - no issues with qemu-kvm? Any special command line switch? /me
> >> goes scrounging for some installation XP32 CD in the meantime...
> > 
> > No issues with qemu-kvm. Could not spot anything obvious.
> > 
> 
> And, of course, my WinXP installation did not trigger any reset issue,
> even in non-iothreaded mode. :(

Try kvm-autotest. I could not reproduce easily in hand run either.

This is not it, but still looks wrong:

In real mode emulation, kvm_get_sregs returns TR segment values of VMX
fields, while KVM caches them in vmx_vcpu->rmode.

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 14873b9..898173a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1822,13 +1822,23 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, 
int seg)
 static void vmx_get_segment(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg)
 {
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
u32 ar;
 
+   if (vmx->rmode.vm86_active && seg == VCPU_SREG_TR) {
+   var->base = vmx->rmode.tr.base;
+   var->limit = vmx->rmode.tr.limit;
+   var->selector = vmx->rmode.tr.selector;
+   ar = vmx->rmode.tr.ar;
+   goto ar;
+   }
+
var->base = vmcs_readl(sf->base);
var->limit = vmcs_read32(sf->limit);
var->selector = vmcs_read16(sf->selector);
ar = vmcs_read32(sf->ar_bytes);
+ar:
if ((ar & AR_UNUSABLE_MASK) && !emulate_invalid_guest_state)
ar = 0;
var->type = ar & 15;




[Qemu-devel] [PATCH] qemu-img: Fix error "Unknown option 'size'" when using host_device format

2010-03-02 Thread Bitti
qemu-kvm-0.11.0

When I run:
sudo qemu-img convert ubuntu-kvm/disk0.qcow2 -O host_device /dev/vg01/vm_test

I get this:
Unknown option 'size'

Ubuntu Bug:
https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/460542

Signed-off-by: Bitti 
---
--- block/raw-posix.c.orig  2009-09-23 10:30:02.0 +0300
+++ block/raw-posix.c   2009-10-25 18:22:08.780047000 +0200
@@ -1172,6 +1172,15 @@
 return ret;
 }

+static QEMUOptionParameter host_device_create_options[] = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = OPT_SIZE,
+.help = "Virtual disk size"
+},
+{ NULL }
+};
+
 static BlockDriver bdrv_host_device = {
 .format_name   = "host_device",
 .instance_size = sizeof(BDRVRawState),
@@ -1197,6 +1206,7 @@
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 #endif
 #endif
+.create_options = host_device_create_options,
 };

 #ifdef __linux__
--- block/raw-posix.c.orig	2009-09-23 10:30:02.0 +0300
+++ block/raw-posix.c	2009-10-25 18:22:08.780047000 +0200
@@ -1172,6 +1172,15 @@
 return ret;
 }
 
+static QEMUOptionParameter host_device_create_options[] = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = OPT_SIZE,
+.help = "Virtual disk size"
+},
+{ NULL }
+};
+
 static BlockDriver bdrv_host_device = {
 .format_name	= "host_device",
 .instance_size	= sizeof(BDRVRawState),
@@ -1197,6 +1206,7 @@
 .bdrv_aio_ioctl = hdev_aio_ioctl,
 #endif
 #endif
+.create_options = host_device_create_options,
 };
 
 #ifdef __linux__


[Qemu-devel] Possible invalid emulation of rex.W-prefixed far jump

2010-03-02 Thread Brad Spengler
Hi all,

I'm writing to report a possible bug in the qemu emulation of 
rex.W-prefixed far jumps.  It affects far jumps of this type with both 
rip-relative and absolute addresses.

The yasm syntax for these instructions:
jmp far qword [addr]
jmp far qword [addr wrt rip]

and the resulting disassembly:
8:   48 ff 2c 25 00 00 00 00 rex.W ljmpq  *0x0  c: R_X86_64_32  .text+0x17
10:  48 ff 2d 00 00 00 00rex.W ljmpq  *0x0(%rip)# 0x17

qemu triggers a gpf with error 0xfffc (presumably this is 0x masked 
to 0xfffc by the & on new_cs from 
target-i386/op_helper.c:helper_ljmp_protected())

It's suspected that qemu is treating the far address as 16:32 instead of 
16:64 as it should, since the far address as laid out in memory is:
12 34 56 78 ff ff ff ff 10 00

The far address is intended to be a Linux kernel address, so the upper 
32bits are 0x.

If qemu is treating the 16:64 layout as 16:32, you can see why new_cs 
would have the value of 0x.  The code only fails on qemu -- it works 
as expected on real systems.

I'm not familiar with your code-base so I have no patch for the issue, 
but I thought I'd fire off a mail as I imagine it's a simple 
oversight and easy fix for someone familiar with the code.

Thanks for your help, and please keep me on CC for replies

-Brad


signature.asc
Description: Digital signature