date:20160515

Re: [Qemu-devel] Regression with windows 7 VMs and VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

2016-05-15 Thread Stefan Weil

Am 15.05.2016 um 01:13 schrieb Thomas Lamprecht:
> Hi all,
>
> I recently ran into Problems when trying to install some Windows VMs
> this was after an update to QEMU 2.5.1.1, the VM shows Windows loading
> files for the installation, then the "Starting Windows" screen appears
> here it hangs and never continues.
>
> Changing the "-vga" option to cirrus solves this, the installation can
> proceed and finish. When changing back to std (or also qxl, vmware) the
> installed VM also hangs on the "Starting Windows" screen while qemu
> showing a little but no excessive load.
>
> This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
> git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
> sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
> culprit for this regression, as its a fix for a DoS its not an option to
> just revert it, I guess.
> The (short) bisect log is:
>
> git bisect start
> # bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
> release
> git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
> # good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
> v2.6.0-rc4 release
> git bisect good 975eb6a547f809608ccb08c221552f11af25
> # good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
> setup on vbe changes
> git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
> # bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking 
> branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
> git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
>
> I could reproduce that with QEMU 2.5.1 and QEMU 2.6 on a Debian derivate
> (Promox VE) with 4.4 Kernel and also with QEMU 2.6 on an Arch Linux
> System with a 4.5 Kernel, so it should not be host distro depended. Both
> machines have Intel x86_64 processors.
> The problem should be reproducible with said Versions or a build from
> git including the above mentioned commit (fd3c136) by starting a VM with
> an Windows 7 ISO, e.g.:
>
> Hanging installation
> ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024
>
> Working installation:
> ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 -vga 
> cirrus
>
> Noteworthy may be that Windows 10 is working, I do not had time to get
> other Windows versions and test them, I'll do that as soon as possible.
> Various Linux system also seems to work fine, at least I did not ran
> into an issue there yet.
>
> I also tried testing with SeaBIOS and OVMF, as initially I had no idea
> what broke, both lead to the same result - without the CVE-2016-3712 fix
> they both work, with not.
> Further, KVM enabled and disabled does not make any difference.
>
> If I can take any further step, e.g. open a bug report at another place
> or help with testing I'd glad to do so.
>
> best regards,
> Thomas

Hi Thomas,

thanks for the bug report.

I added Gerd to the address list, so I'm sure your report will be noticed.

Bugs can be reported at Launchpad (see
http://wiki.qemu.org/Contribute/ReportABug).
Maybe your report could be posted there, too, so people looking for
known problems
will find it at the well known location.

Cheers
Stefan




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 00/10] RFCv3: vhost-user: simple reconnection support

2016-05-15 Thread Michael S. Tsirkin

On Fri, May 13, 2016 at 08:30:36PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Tue, May 10, 2016 at 6:28 PM, Marc-André Lureau  wrote:
> > Hi
> >
> > - Original Message -
> >> On Tue, May 10, 2016 at 06:03:50PM +0200, marcandre.lur...@redhat.com 
> >> wrote:
> >> > From: Marc-André Lureau 
> >> >
> >> > Hi,
> >> >
> >> > In a previous series "RFCv2: vhost-user: shutdown and reconnection", I
> >> > proposed to add a new slave request to handle graceful shutdown, for
> >> > both qemu configuration, server or client, while keeping the guest
> >> > running with link down status.
> >>
> >> OK so I would say patches 1-4 are bugfixes, looks like they
> >> can be Cc stable even?
> >
> > 4 is being used by 5 and 10.
> > 2-3 are only for testing.
> >
> > 4-8 are nice to have as they avoid obvious problems/crashes when handling 
> > disconnected state and add basic reconnection checks.
> >
> > 9 was already considered for stable by Eric in a previous series
> >
> > 10 would be good to have if 1 is accepted, to check the minimum works as 
> > expected
> >
> 
> FYI, I have a follow up series (~20 patches,
> https://github.com/elmarco/qemu/tree/vhost-user-reconnect) doing
> mostly cleanups and extra checks for disconnection at run time. In
> particular, it should avoid some obvious crashers/asserts, and
> prevents qemu from running as long the initial vhost_user_start()
> didn't succeed (so initial flags are set). I would like to know how to
> proceed with the follow-up: should I resend the whole series or should
> we review/merge this rfc first (even though it is known to be
> incomplete in many disconnect cases that the follow up fixes).
> 
> thanks

I think a gradual merge is better.

> -- 
> Marc-André Lureau

[Qemu-devel] [Bug 1581936] [NEW] Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

2016-05-15 Thread Thomas Lamprecht

Public bug reported:

Hi,

As already posted on the QEMU devel list [1] I stumbled upon a problem
with QEMU in version 2.5.1.1 and 2.6.0.

the VM shows Windows loading
files for the installation, then the "Starting Windows" screen appears
here it hangs and never continues.

Changing the "-vga" option to cirrus solves this, the installation can
proceed and finish. When changing back to std (or also qxl, vmware) the
installed VM also hangs on the "Starting Windows" screen while qemu
showing a little but no excessive load.

This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
culprit for this regression, as its a fix for a DoS its not an option to
just revert it, I guess.

The bisect log is:

git bisect start
# bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
release
git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
# good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
v2.6.0-rc4 release
git bisect good 975eb6a547f809608ccb08c221552f11af25
# good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
setup on vbe changes
git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
# bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking branch 
'remotes/kraxel/tags/pull-vga-20160509-1' into staging
git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
# bad: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure vga register 
setup for vbe stays intact (CVE-2016-3712).
git bisect bad fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7
# first bad commit: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure 
vga register setup for vbe stays intact (CVE-2016-3712).


I could reproduce that with QEMU 2.5.1 and QEMU 2.6 on a Debian derivate
(Promox VE) with 4.4 Kernel and also with QEMU 2.6 on an Arch Linux
System with a 4.5 Kernel, so it should not be host distro depended. Both
machines have Intel x86_64 processors.
The problem should be reproducible with said Versions or a build from
git including the above mentioned commit (fd3c136) by starting a VM with
an Windows 7 ISO, e.g.:

Freezing installation (as vga defaults to std I marked it as optional):
./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 [-vga 
(std|qxl|vmware)]

Working installation:
./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 -vga cirrus

If someone has already an installed Windows 7 VM this behaviour should be
also observable when trying to start it with the new versions of QEMU.

Noteworthy may be that Windows 10 is working, I do not had time to get
other Windows versions and test them, I'll do that as soon as possible.
Various Linux system also seems do work fine, at least I did not ran
into an issue there yet.

I also tried testing with SeaBIOS and OVMF as firmware, as initially I
had no idea what broke, both lead to the same result - without the 
CVE-2016-3712 fix they both work, with not.
Further, KVM enabled and disabled does not make any difference.


[1] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02416.html

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581936

Title:
  Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

Status in QEMU:
  New

Bug description:
  Hi,

  As already posted on the QEMU devel list [1] I stumbled upon a problem
  with QEMU in version 2.5.1.1 and 2.6.0.

  the VM shows Windows loading
  files for the installation, then the "Starting Windows" screen appears
  here it hangs and never continues.

  Changing the "-vga" option to cirrus solves this, the installation can
  proceed and finish. When changing back to std (or also qxl, vmware) the
  installed VM also hangs on the "Starting Windows" screen while qemu
  showing a little but no excessive load.

  This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
  git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
  sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
  culprit for this regression, as its a fix for a DoS its not an option to
  just revert it, I guess.

  The bisect log is:

  git bisect start
  # bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
release
  git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
  # good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
v2.6.0-rc4 release
  git bisect good 975eb6a547f809608ccb08c221552f11af25
  # good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
setup on vbe changes
  git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
  # bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking 
branch 'remotes/kraxel/tags/p

Re: [Qemu-devel] Regression with windows 7 VMs and VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

2016-05-15 Thread Thomas Lamprecht

On 15.05.2016 11:28, Stefan Weil wrote:
> Am 15.05.2016 um 01:13 schrieb Thomas Lamprecht:
>> Hi all,
>>
>> I recently ran into Problems when trying to install some Windows VMs
>> this was after an update to QEMU 2.5.1.1, the VM shows Windows loading
>> files for the installation, then the "Starting Windows" screen appears
>> here it hangs and never continues.
>>
>> Changing the "-vga" option to cirrus solves this, the installation can
>> proceed and finish. When changing back to std (or also qxl, vmware) the
>> installed VM also hangs on the "Starting Windows" screen while qemu
>> showing a little but no excessive load.
>>
>> This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
>> git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
>> sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
>> culprit for this regression, as its a fix for a DoS its not an option to
>> just revert it, I guess.
>> The (short) bisect log is:
>>
>> git bisect start
>> # bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
>> release
>> git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
>> # good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
>> v2.6.0-rc4 release
>> git bisect good 975eb6a547f809608ccb08c221552f11af25
>> # good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
>> setup on vbe changes
>> git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
>> # bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking 
>> branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
>> git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
>>
>> I could reproduce that with QEMU 2.5.1 and QEMU 2.6 on a Debian derivate
>> (Promox VE) with 4.4 Kernel and also with QEMU 2.6 on an Arch Linux
>> System with a 4.5 Kernel, so it should not be host distro depended. Both
>> machines have Intel x86_64 processors.
>> The problem should be reproducible with said Versions or a build from
>> git including the above mentioned commit (fd3c136) by starting a VM with
>> an Windows 7 ISO, e.g.:
>>
>> Hanging installation
>> ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024
>>
>> Working installation:
>> ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 -vga 
>> cirrus
>>
>> Noteworthy may be that Windows 10 is working, I do not had time to get
>> other Windows versions and test them, I'll do that as soon as possible.
>> Various Linux system also seems to work fine, at least I did not ran
>> into an issue there yet.
>>
>> I also tried testing with SeaBIOS and OVMF, as initially I had no idea
>> what broke, both lead to the same result - without the CVE-2016-3712 fix
>> they both work, with not.
>> Further, KVM enabled and disabled does not make any difference.
>>
>> If I can take any further step, e.g. open a bug report at another place
>> or help with testing I'd glad to do so.
>>
>> best regards,
>> Thomas
> 
> Hi Thomas,
> 
> thanks for the bug report.
> 
> I added Gerd to the address list, so I'm sure your report will be noticed.
> 
> Bugs can be reported at Launchpad (see
> http://wiki.qemu.org/Contribute/ReportABug).
> Maybe your report could be posted there, too, so people looking for
> known problems
> will find it at the well known location.
> 
> Cheers
> Stefan
> 

Hi Stefan,

thanks for the response and the directions, I opened bug #1581936
https://bugs.launchpad.net/bugs/1581936

Oh and I noticed that I omitted some of the git bisect log in my previous
message, I corrected that in the bug report, also here is the full one:

git bisect start
# bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
release
git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
# good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
v2.6.0-rc4 release
git bisect good 975eb6a547f809608ccb08c221552f11af25
# good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
setup on vbe changes
git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
# bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking branch 
'remotes/kraxel/tags/pull-vga-20160509-1' into staging
git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
# bad: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure vga register 
setup for vbe stays intact (CVE-2016-3712).
git bisect bad fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7
# first bad commit: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure 
vga register setup for vbe stays intact (CVE-2016-3712).

best regards,
Thomas

Re: [Qemu-devel] [PATCH v5 06/18] atomics: add atomic_read_acquire and atomic_set_release

2016-05-15 Thread Pranith Kumar

Hi Emilio,

On Fri, May 13, 2016 at 11:34 PM, Emilio G. Cota  wrote:
> When __atomic is not available, we use full memory barriers instead
> of smp/wmb, since acquire/release barriers apply to all memory
> operations and not just to loads/stores, respectively.
>

If it is not too late can we rename this to
atomic_load_acquire()/atomic_store_release() like in the linux kernel?
Looks good either way.

Reviewed-by: Pranith Kumar 

-- 
Pranith

[Qemu-devel] [Bug 1581936] Re: Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

2016-05-15 Thread pranith

** Changed in: qemu
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581936

Title:
  Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

Status in QEMU:
  Confirmed

Bug description:
  Hi,

  As already posted on the QEMU devel list [1] I stumbled upon a problem
  with QEMU in version 2.5.1.1 and 2.6.0.

  the VM shows Windows loading
  files for the installation, then the "Starting Windows" screen appears
  here it hangs and never continues.

  Changing the "-vga" option to cirrus solves this, the installation can
  proceed and finish. When changing back to std (or also qxl, vmware) the
  installed VM also hangs on the "Starting Windows" screen while qemu
  showing a little but no excessive load.

  This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
  git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
  sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
  culprit for this regression, as its a fix for a DoS its not an option to
  just revert it, I guess.

  The bisect log is:

  git bisect start
  # bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
release
  git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
  # good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
v2.6.0-rc4 release
  git bisect good 975eb6a547f809608ccb08c221552f11af25
  # good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
setup on vbe changes
  git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
  # bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking 
branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
  git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
  # bad: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure vga register 
setup for vbe stays intact (CVE-2016-3712).
  git bisect bad fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7
  # first bad commit: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure 
vga register setup for vbe stays intact (CVE-2016-3712).

  
  I could reproduce that with QEMU 2.5.1 and QEMU 2.6 on a Debian derivate
  (Promox VE) with 4.4 Kernel and also with QEMU 2.6 on an Arch Linux
  System with a 4.5 Kernel, so it should not be host distro depended. Both
  machines have Intel x86_64 processors.
  The problem should be reproducible with said Versions or a build from
  git including the above mentioned commit (fd3c136) by starting a VM with
  an Windows 7 ISO, e.g.:

  Freezing installation (as vga defaults to std I marked it as optional):
  ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 [-vga 
(std|qxl|vmware)]

  Working installation:
  ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 -vga 
cirrus

  If someone has already an installed Windows 7 VM this behaviour should be
  also observable when trying to start it with the new versions of QEMU.

  Noteworthy may be that Windows 10 is working, I do not had time to get
  other Windows versions and test them, I'll do that as soon as possible.
  Various Linux system also seems do work fine, at least I did not ran
  into an issue there yet.

  I also tried testing with SeaBIOS and OVMF as firmware, as initially I
  had no idea what broke, both lead to the same result - without the 
  CVE-2016-3712 fix they both work, with not.
  Further, KVM enabled and disabled does not make any difference.

  
  [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02416.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1581936/+subscriptions

Re: [Qemu-devel] [PATCH RFC 0/3] qdev: order devices by priority before creating them

2016-05-15 Thread Marcel Apfelbaum


On 05/11/2016 10:51 AM, Markus Armbruster wrote:

Marcel Apfelbaum  writes:


On 05/10/2016 11:28 AM, Markus Armbruster wrote:

Marcel Apfelbaum  writes:


This series aims to allow more devices to be used with '-device'
by sorting the devices based on a predefined creation order flag
before creating them.

Devices like IOMMU need to be created before others, so they can leverage
the DeviceCreationPriority flag introduced by the first patch to DeviceClass.

The second patch sorts the devices by their DeviceCreationPriority
before creating them.

Finally, the last patch demonstrates how it can be used to ensure
the creation of host-bridges before the pci-bridges and pci-bridges before
the others.

I preferred to combine all the priorities into a single enum
to better manage the creation order.

This is an RFC because I only wanted to know if it seems like the right way to 
go.
Comments are appreciated,




Hi Markus,
Thanks for looking into this.



Can you explain why requiring the user to specify -device in a sane
order isn't good enough?



Point taken, the truth is I didn't like the 'order' restriction in the
first place.

If the device creation depends on the id of some other devices (e.g we
need the bus id to plug a device into it), for IOMMU devices it gets a
little tricky. You can add the IOMMU device before other PCI devices
but it will not work (because some internal implementation). This is
why we added using -machine pc,iommu=on.  I suppose we have other
examples as well. This is not user friendly IMO.

To solve the specific IOMMU problem we can check that there are no PCI
devices created yet, but I am not sure is a better approach and is
strictly related to this device.

The goal is to be able to add more devices with -device and I thought
this kind of creation in steps may help.



Hi Markus,



In my opinion, there are two sane ways to do command line options.

One is to make order relevant, and process them strictly left to right.

The other is to do the right thing regardless of order.  This requires
some kind of dependency tracking if there are any.



I personally like this way more, however I confess I do not aim to solve this
globally, my scope is making more user friendly the stuff I work with.


QEMU, of course, does neither: we process them in left to right unless
we don't, and users juggle them until the errors go away.



And this is why I thought, we have some case the order is important,
some cases it is not. Adding one more case to "order not important" set
looks like a little win.


I'm afraid this patch adds to "unless we don't" without covering much
ground towards "do the right thing regardless of order".  Static
priorities are a rather crude approximation of dependencies.  Is it the
best we can do for user now?



As stated above, I am perfectly aware the static priorities angle does not
solve all the problems, but it helps and may be a good step in the right 
direction.

The static priorities can make sense, the host bridge needs to be created before
the other pci bridges, which should also be created before the devices itself.

IOMMU can also leverage static priorities, it needs to be created after the 
host bridge,
but before anything else.

I am sure we can think of other cases those will help too. To answer your 
question,
no, is not the best we can do, but maybe it worth it.

Thanks,
Marcel

Re: [Qemu-devel] [PATCH v5 01/11] fix some coding style problems

2016-05-15 Thread Marcel Apfelbaum


On 05/06/2016 07:20 AM, Cao jin wrote:

It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
together, make it more readable.

cc: Dmitry Fleytman 
cc: Jason Wang 
cc: Michael S. Tsirkin 
cc: Markus Armbruster 
cc: Marcel Apfelbaum 

Signed-off-by: Cao jin 
---
  hw/net/vmxnet3.c   |  2 +-
  hw/pci-bridge/ioh3420.c|  7 ++-
  hw/pci-bridge/pci_bridge_dev.c |  4 
  hw/pci-bridge/xio3130_downstream.c |  6 +-
  hw/pci-bridge/xio3130_upstream.c   |  3 +++
  hw/pci/msi.c   | 19 +++
  6 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 093a71e..7a38e47 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -348,7 +348,7 @@ typedef struct {
  /* Interrupt management */

  /*
- *This function returns sign whether interrupt line is in asserted state
+ * This function returns sign whether interrupt line is in asserted state
   * This depends on the type of interrupt used. For INTX interrupt line will
   * be asserted until explicit deassertion, for MSI(X) interrupt line will
   * be deasserted automatically due to notification semantics of the MSI(X)
diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index 0937fa3..b4a7806 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -106,12 +106,14 @@ static int ioh3420_initfn(PCIDevice *d)
  if (rc < 0) {
  goto err_bridge;
  }
+
  rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT);
  if (rc < 0) {
  goto err_bridge;
  }
+
  rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port);
  if (rc < 0) {
  goto err_msi;
@@ -120,18 +122,21 @@ static int ioh3420_initfn(PCIDevice *d)
  pcie_cap_arifwd_init(d);
  pcie_cap_deverr_init(d);
  pcie_cap_slot_init(d, s->slot);
+pcie_cap_root_init(d);
+
  pcie_chassis_create(s->chassis);
  rc = pcie_chassis_add_slot(s);
  if (rc < 0) {
  goto err_pcie_cap;
  }
-pcie_cap_root_init(d);
+
  rc = pcie_aer_init(d, IOH_EP_AER_OFFSET, PCI_ERR_SIZEOF);
  if (rc < 0) {
  goto err;
  }
  pcie_aer_root_init(d);
  ioh3420_aer_vector_update(d);
+
  return 0;

  err:
diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 862a236..32f4daa 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -67,10 +67,12 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
  /* MSI is not applicable without SHPC */
  bridge_dev->flags &= ~(1 << PCI_BRIDGE_DEV_F_MSI_REQ);
  }
+
  err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0);
  if (err) {
  goto slotid_error;
  }
+
  if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) &&
  msi_nonbroken) {
  err = msi_init(dev, 0, 1, true, true);
@@ -78,6 +80,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
  goto msi_error;
  }
  }
+
  if (shpc_present(dev)) {
  /* TODO: spec recommends using 64 bit prefetcheable BAR.
   * Check whether that works well. */
@@ -85,6 +88,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
   PCI_BASE_ADDRESS_MEM_TYPE_64, &bridge_dev->bar);
  }
  return 0;
+
  msi_error:
  slotid_cap_cleanup(dev);
  slotid_error:
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index cf1ee63..e6d653d 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -70,11 +70,13 @@ static int xio3130_downstream_initfn(PCIDevice *d)
  if (rc < 0) {
  goto err_bridge;
  }
+
  rc = pci_bridge_ssvid_init(d, XIO3130_SSVID_OFFSET,
 XIO3130_SSVID_SVID, XIO3130_SSVID_SSID);
  if (rc < 0) {
  goto err_bridge;
  }
+
  rc = pcie_cap_init(d, XIO3130_EXP_OFFSET, PCI_EXP_TYPE_DOWNSTREAM,
 p->port);
  if (rc < 0) {
@@ -83,12 +85,14 @@ static int xio3130_downstream_initfn(PCIDevice *d)
  pcie_cap_flr_init(d);
  pcie_cap_deverr_init(d);
  pcie_cap_slot_init(d, s->slot);
+pcie_cap_arifwd_init(d);
+
  pcie_chassis_create(s->chassis);
  rc = pcie_chassis_add_slot(s);
  if (rc < 0) {
  goto err_pcie_cap;
  }
-pcie_cap_arifwd_init(d);
+
  rc = pcie_aer_init(d, XIO3130_AER_OFFSET, PCI_ERR_SIZEOF);
  if (rc < 0) {
  goto err;
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 164ef58..d976844 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.

Re: [Qemu-devel] [PATCH v5 03/11] megasas: Fix

2016-05-15 Thread Marcel Apfelbaum


On 05/06/2016 08:43 AM, Cao jin wrote:

sorry, forget to cc some maintainers

On 05/06/2016 12:20 PM, Cao jin wrote:

msi_init returns non-zero value on both failure and success.

cc: Hannes Reinecke 
cc: Paolo Bonzini 
Signed-off-by: Cao jin 
---
  hw/scsi/megasas.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index a63a581..56fb645 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -2348,7 +2348,7 @@ static void megasas_scsi_realize(PCIDevice *dev, Error 
**errp)
"megasas-queue", 0x4);

  if (megasas_use_msi(s) &&
-msi_init(dev, 0x50, 1, true, false)) {
+msi_init(dev, 0x50, 1, true, false) < 0) {
  s->flags &= ~MEGASAS_MASK_USE_MSI;
  }
  if (megasas_use_msix(s) &&






Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel

[Qemu-devel] [Bug 1581936] Re: Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

2016-05-15 Thread Florian Strankowski

I can confirm this behaviour. Tested on 3 different machines, all
Windows 7 VMs are broke because of the latest "patch". Also tested
Windows XP and Windows 10, both work with VGA flawlessly.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581936

Title:
  Frozen Windows 7 VMs with VGA CVE-2016-3712 fix (2.6.0 and 2.5.1.1)

Status in QEMU:
  Confirmed

Bug description:
  Hi,

  As already posted on the QEMU devel list [1] I stumbled upon a problem
  with QEMU in version 2.5.1.1 and 2.6.0.

  the VM shows Windows loading
  files for the installation, then the "Starting Windows" screen appears
  here it hangs and never continues.

  Changing the "-vga" option to cirrus solves this, the installation can
  proceed and finish. When changing back to std (or also qxl, vmware) the
  installed VM also hangs on the "Starting Windows" screen while qemu
  showing a little but no excessive load.

  This phenomena appears also with QEMU 2.6.0 but not with 2.6.0-rc4, a
  git bisect shows fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7 (vga: make
  sure vga register setup for vbe stays intact (CVE-2016-3712)) as the
  culprit for this regression, as its a fix for a DoS its not an option to
  just revert it, I guess.

  The bisect log is:

  git bisect start
  # bad: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 
release
  git bisect bad bfc766d38e1fae5767d43845c15c79ac8fa6d6af
  # good: [975eb6a547f809608ccb08c221552f11af25] Update version for 
v2.6.0-rc4 release
  git bisect good 975eb6a547f809608ccb08c221552f11af25
  # good: [2068192dcccd8a80dddfcc8df6164cf9c26e0fc4] vga: update vga register 
setup on vbe changes
  git bisect good 2068192dcccd8a80dddfcc8df6164cf9c26e0fc4
  # bad: [53db932604dfa7bb9241d132e0173894cf54261c] Merge remote-tracking 
branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
  git bisect bad 53db932604dfa7bb9241d132e0173894cf54261c
  # bad: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure vga register 
setup for vbe stays intact (CVE-2016-3712).
  git bisect bad fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7
  # first bad commit: [fd3c136b3e1482cd0ec7285d6bc2a3e6a62c38d7] vga: make sure 
vga register setup for vbe stays intact (CVE-2016-3712).

  
  I could reproduce that with QEMU 2.5.1 and QEMU 2.6 on a Debian derivate
  (Promox VE) with 4.4 Kernel and also with QEMU 2.6 on an Arch Linux
  System with a 4.5 Kernel, so it should not be host distro depended. Both
  machines have Intel x86_64 processors.
  The problem should be reproducible with said Versions or a build from
  git including the above mentioned commit (fd3c136) by starting a VM with
  an Windows 7 ISO, e.g.:

  Freezing installation (as vga defaults to std I marked it as optional):
  ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 [-vga 
(std|qxl|vmware)]

  Working installation:
  ./x86_64-softmmu/qemu-system-x86_64 -boot d -cdrom win7.iso -m 1024 -vga 
cirrus

  If someone has already an installed Windows 7 VM this behaviour should be
  also observable when trying to start it with the new versions of QEMU.

  Noteworthy may be that Windows 10 is working, I do not had time to get
  other Windows versions and test them, I'll do that as soon as possible.
  Various Linux system also seems do work fine, at least I did not ran
  into an issue there yet.

  I also tried testing with SeaBIOS and OVMF as firmware, as initially I
  had no idea what broke, both lead to the same result - without the 
  CVE-2016-3712 fix they both work, with not.
  Further, KVM enabled and disabled does not make any difference.

  
  [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg02416.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1581936/+subscriptions

Re: [Qemu-devel] [PATCH v5 10/11] pci core: assert ENOSPC when add capability

2016-05-15 Thread Marcel Apfelbaum


On 05/06/2016 07:20 AM, Cao jin wrote:

ENOSPC is programming error, assert it for debugging.

cc: Michael S. Tsirkin 
cc: Marcel Apfelbaum 
cc: Markus Armbruster 
Signed-off-by: Cao jin 
---
  hw/pci/pci.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index f0f41dc..fc8b377 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2151,10 +2151,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,

  if (!offset) {
  offset = pci_find_space(pdev, size);
-if (!offset) {
-error_setg(errp, "out of PCI config space");
-return -ENOSPC;
-}
+/* out of PCI config space should be programming error */


'is', not 'should be'


+assert(offset);
  } else {
  /* Verify that capabilities don't overlap.  Note: device assignment
   * depends on this check to verify that the device is not broken.



Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel

Re: [Qemu-devel] [PATCH v5 09/11] pci bridge dev: change msi property type

2016-05-15 Thread Marcel Apfelbaum


On 05/06/2016 07:20 AM, Cao jin wrote:

 From bit to enum OnOffAuto.

cc: Michael S. Tsirkin 
cc: Markus Armbruster 
cc: Marcel Apfelbaum 
Signed-off-by: Cao jin 
---

Actually, I am not quite sure this device need this change, RFC.



Well, it already has the 'msi' property, so we may want to make it standard 
'OnOffAuto'.
One problem I can see is the change of semantics. Until now msi=on means 
'auto'. From now on
it means 'force msi=on', fail otherwise. If I got this right,  old machines 
having msi=on
will failed to start on platforms with msibroken=true, right?

Maybe we should preserve the semantics for old machines? (this patch does not 
actually
affect the semantics, but patch 11/11 should, otherwise why change it to 
OnOffAuto, right? )

Thanks,
Marcel



  hw/pci-bridge/pci_bridge_dev.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index 32f4daa..9e31f0e 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -41,9 +41,10 @@ struct PCIBridgeDev {

  MemoryRegion bar;
  uint8_t chassis_nr;
-#define PCI_BRIDGE_DEV_F_MSI_REQ 0
-#define PCI_BRIDGE_DEV_F_SHPC_REQ 1
+#define PCI_BRIDGE_DEV_F_SHPC_REQ 0
  uint32_t flags;
+
+OnOffAuto msi;
  };
  typedef struct PCIBridgeDev PCIBridgeDev;

@@ -65,7 +66,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
  }
  } else {
  /* MSI is not applicable without SHPC */
-bridge_dev->flags &= ~(1 << PCI_BRIDGE_DEV_F_MSI_REQ);
+bridge_dev->msi = ON_OFF_AUTO_OFF;
  }

  err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0);
@@ -73,7 +74,8 @@ static int pci_bridge_dev_initfn(PCIDevice *dev)
  goto slotid_error;
  }

-if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) &&
+if ((bridge_dev->msi == ON_OFF_AUTO_AUTO ||
+bridge_dev->msi == ON_OFF_AUTO_ON) &&
  msi_nonbroken) {
  err = msi_init(dev, 0, 1, true, true);
  if (err < 0) {
@@ -146,8 +148,8 @@ static Property pci_bridge_dev_properties[] = {
  /* Note: 0 is not a legal chassis number. */
  DEFINE_PROP_UINT8(PCI_BRIDGE_DEV_PROP_CHASSIS_NR, PCIBridgeDev, 
chassis_nr,
0),
-DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_MSI, PCIBridgeDev, flags,
-PCI_BRIDGE_DEV_F_MSI_REQ, true),
+DEFINE_PROP_ON_OFF_AUTO(PCI_BRIDGE_DEV_PROP_MSI, PCIBridgeDev, msi,
+ON_OFF_AUTO_AUTO),
  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
  PCI_BRIDGE_DEV_F_SHPC_REQ, true),
  DEFINE_PROP_END_OF_LIST(),

Re: [Qemu-devel] [PATCH v5 11/11] pci: Convert msi_init() to Error and fix callers to check it

2016-05-15 Thread Marcel Apfelbaum


On 05/06/2016 07:20 AM, Cao jin wrote:

msi_init() reports errors with error_report(), which is wrong
when it's used in realize().

Fix by converting it to Error.

Fix its callers to handle failure instead of ignoring it.

For those callers who don`t handle the failure, it might happen:
when user want msi on, but he doesn`t get what he want because of
msi_init fails silently.

cc: Gerd Hoffmann 
cc: John Snow 
cc: Dmitry Fleytman 
cc: Jason Wang 
cc: Michael S. Tsirkin 
cc: Hannes Reinecke 
cc: Paolo Bonzini 
cc: Alex Williamson 
cc: Markus Armbruster 
cc: Marcel Apfelbaum 

Signed-off-by: Cao jin 
---
the affected device is modified in this way:
1. intel hd audio: move msi_init() above, save the strength to free the
MemoryRegion when it fails.
2. ich9 ahci: move msi_init() above, save the strenth to free the resource
allocated when calling achi_realize(). It doesn`t have msi property, so
msi_init failure leads to fall back to INTx silently. Just free the error
object
3. vmxnet3: move msi_init() above. Remove the unecessary vmxnet3_init_msi().
It doesn`t have msi property, so msi_init() failure leads to fall back to
INTx silently. Just free the error object.
4. ioh3420/xio3130_downstream/xio3130_upstream: they are pcie components, msi
or msix is forced, catch error and report it right there.
5. pci_bridge_dev: msi_init`s failure is fatal, follow the behaviour.
6. megasas_scsi: move msi_init() above, save the strength to free the
MemoryRegion when it fails.
7. mptsas: Move msi_init() above, save the strength to free the MemoryRegion
when it fails.
8. pvscsi: it doesn`t have msi property, msi_init fail leads to fall back to
INTx silently.
9. usb-xhci: move msi_init() above, save the strength to free the MemoryRegion
when it fails.
10. vfio-pci: keep the previous behaviour, and just catch & report error.

  hw/audio/intel-hda.c   | 18 +
  hw/ide/ich.c   | 15 +-
  hw/net/vmxnet3.c   | 41 +++---
  hw/pci-bridge/ioh3420.c|  4 +++-
  hw/pci-bridge/pci_bridge_dev.c |  7 ---
  hw/pci-bridge/xio3130_downstream.c |  4 +++-
  hw/pci-bridge/xio3130_upstream.c   |  4 +++-
  hw/pci/msi.c   |  9 +
  hw/scsi/megasas.c  | 18 +
  hw/scsi/mptsas.c   | 20 ++-
  hw/scsi/vmw_pvscsi.c   |  6 +-
  hw/usb/hcd-xhci.c  | 18 +
  hw/vfio/pci.c  |  6 --
  include/hw/pci/msi.h   |  3 ++-
  14 files changed, 112 insertions(+), 61 deletions(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 61362e5..0a46358 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -1131,6 +1131,7 @@ static void intel_hda_realize(PCIDevice *pci, Error 
**errp)
  {
  IntelHDAState *d = INTEL_HDA(pci);
  uint8_t *conf = d->pci.config;
+Error *err = NULL;

  d->name = object_get_typename(OBJECT(d));

@@ -1139,13 +1140,22 @@ static void intel_hda_realize(PCIDevice *pci, Error 
**errp)
  /* HDCTL off 0x40 bit 0 selects signaling mode (1-HDA, 0 - Ac97) 18.1.19 
*/
  conf[0x40] = 0x01;

+if (d->msi != ON_OFF_AUTO_OFF) {
+msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1,
+ true, false, &err);
+if (err && d->msi == ON_OFF_AUTO_ON) {
+/* If user set msi=on, then device creation fail */
+error_propagate(errp, err);
+return;


The semantics now changed, old machines with msi=on on platforms with 
msi_nonbroken=false
will fail now, right? Is this acceptable or we need a compat way to kepp the 
semantics for old machines?


+} else if (err && d->msi == ON_OFF_AUTO_AUTO) {
+/* If user doesn`t set it on, switch to non-msi variant silently */
+error_free(err);
+}
+}
+
  memory_region_init_io(&d->mmio, OBJECT(d), &intel_hda_mmio_ops, d,
"intel-hda", 0x4000);
  pci_register_bar(&d->pci, 0, 0, &d->mmio);
-if (d->msi == ON_OFF_AUTO_AUTO ||
-d->msi == ON_OFF_AUTO_ON) {
-msi_init(&d->pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
-}

  hda_codec_bus_init(DEVICE(pci), &d->codecs, sizeof(d->codecs),
 intel_hda_response, intel_hda_xfer);
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 0a13334..aec8262 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -111,6 +111,16 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error 
**errp)
  int sata_cap_offset;
  uint8_t *sata_cap;
  d = ICH_AHCI(dev);
+Error *err = NULL;
+
+/* Although the AHCI 1.3 specification states that the first capability
+ * should be PMCAP, the Intel ICH9 data sheet specifies that the ICH9
+ * AHCI device puts the MSI capability first, pointing to 0x80. */
+msi_init(dev, ICH9_MSI_CAP_OFFSET

[Qemu-devel] [PATCH v5 02/16] pci: Introduce define for PM capability version 1.1

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/hw/pci/pci_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index ba8cbe9..7a83142 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -1 +1,3 @@
 #include "standard-headers/linux/pci_regs.h"
+
+#define  PCI_PM_CAP_VER_1_1 0x0002  /* PCI PM spec ver. 1.1 */
-- 
2.5.5

[Qemu-devel] [PATCH v5 01/16] msix: make msix_clr_pending() visible for clients

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

This function will be used by e1000e device code.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/msix.c | 2 +-
 include/hw/pci/msix.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index b75f0e9..0ec1cb1 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -72,7 +72,7 @@ void msix_set_pending(PCIDevice *dev, unsigned int vector)
 *msix_pending_byte(dev, vector) |= msix_pending_mask(vector);
 }
 
-static void msix_clr_pending(PCIDevice *dev, int vector)
+void msix_clr_pending(PCIDevice *dev, int vector)
 {
 *msix_pending_byte(dev, vector) &= ~msix_pending_mask(vector);
 }
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 72e5f93..048a29d 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -29,6 +29,7 @@ int msix_present(PCIDevice *dev);
 
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
+void msix_clr_pending(PCIDevice *dev, int vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
-- 
2.5.5

[Qemu-devel] [PATCH v5 00/16] Introduce Intel 82574 GbE Controller Emulation (e1000e)

2016-05-15 Thread Leonid Bloch

Hello All,

This is v5 of e1000e series.

For convenience, the same patches are available at:
https://github.com/daynix/qemu-e1000e/tree/e1000e-submit-v5

Best regards,
Dmitry.

Changes since v4:

1. Rebased to the latest master (2.6.0+)

Changes since v3:

1. Various code fixes as suggested by Jason and Michael
2. Rebased to the latest master

Changes since v2:

1. Interrupt storm on latest Linux kernels fixed
2. Device unit test added
3. Introduced code sharing between e1000 and e1000e
4. Various code fixes as suggested by Jason
5. Rebased to the latest master

Changes since v1:

1. PCI_PM_CAP_VER_1_1 is defined now in include/hw/pci/pci_regs.h and
   not in include/standard-headers/linux/pci_regs.h.
2. Changes in naming and extra comments in hw/pci/pcie.c and in
   include/hw/pci/pcie.h.
3. Defining pci_dsn_ver and pci_dsn_cap static const variables in
   hw/pci/pcie.c, instead of PCI_DSN_VER and PCI_DSN_CAP symbolic
   constants in include/hw/pci/pcie_regs.h.
4. Changing the vmxnet3_device_serial_num function in hw/net/vmxnet3.c
   to avoid the cast when it is called.
5. Avoiding a preceding underscore in all the e1000e-related names.
6. Minor style changes.

===

Hello All,

This series is the final code of the e1000e device emulation, that we
have developed. Please review, and consider acceptance of these patches
to the upstream QEMU repository.

The code stability was verified by various traffic tests using Fedora 22
Linux, and Windows Server 2012R2 guests. Also, Microsoft Hardware
Certification Kit (HCK) tests were run on a Windows Server 2012R2 guest.

There was a discussion on the possibility of code sharing between the
e1000e, and the existing e1000 devices. We have reviewed the final code
for parts that may be shared between this device and the currently
available e1000 emulation. The device specifications are very different,
and there are almost no registers, nor functions, that were left as is
from e1000. The ring descriptor structures were changed as well, by the
introduction of extended and PS descriptors, as well as additional bits.

Additional differences stem from the fact that the e1000e device re-uses
network packet abstractions introduced by the vmxnet3 device, while the
e1000 has its own code for packet handling. BTW, it may be worth reusing
those abstractions in e1000 as well. (Following these changes the
vmxnet3 device was successfully tested for possible regressions.)

There are a few minor parts that may be shared, e.g. the default
register handlers, and the ring management functions. The total amount
of shared lines will be about 100--150, so we're not sure if it makes
sense bothering, and taking a risk of breaking e1000, which is a good,
old, and stable device.

Currently, the e1000e code is stand alone w.r.t. e1000.

Please share your thoughts.

Thanks in advance,
Dmitry.

Changes since RFCv2:

1. Device functionality verified using Microsoft Hardware Certification Test 
Kit (HCK)
2. Introduced a number of performance improvements
3. The code was cleaned, and rebased to the latest master
4. Patches verified with checkpatch.pl

===

Changes since RFCv1:

1. Added support for all the device features:
  - Interrupt moderation.
  - RSS.
  - Multiqueue.
2. Simulated exact PCI/PCIe configuration space layout.
3. Made fixes needed to pass Microsoft's HW certification tests (HCK).

This series is still an RFC, because the following tasks are not done yet:

1. See which code can be shared between this device and the existing e1000 
device.
2. Rebase patches to the latest master (current base is v2.3.0).

Please share your thoughts,
Thanks, Dmitry.

===

Hello qemu-devel,

This patch series is an RFC for the new networking device emulation
we're developing for QEMU.

This new device emulates the Intel 82574 GbE Controller and works
with unmodified Intel e1000e drivers from the Linux/Windows kernels.

The status of the current series is "Functional Device Ready, work
on Extended Features in Progress".

More precisely, these patches represent a functional device, which
is recognized by the standard Intel drivers, and is able to transfer
TX/RX packets with CSO/TSO offloads, according to the spec.

Extended features not supported yet (work in progress):
  1. TX/RX Interrupt moderation mechanisms
  2. RSS
  3. Full-featured multi-queue (use of multiqueued network backend)

Also, there will be some code refactoring and performance
optimization efforts.

This series was tested on Linux (Fedora 22) and Windows (2012R2)
guests, using Iperf, with TX/RX and TCP/UDP streams, and various
packet sizes.

More thorough testing, including data streams with different MTU
sizes, and Microsoft Certification (HLK) tests, are pending missing
features' development.

See commit messages (esp. "net: Introduce e1000e device emulation")
for more information about the development approaches and the
architecture options chosen for this device.

This series is based up

[Qemu-devel] [PATCH v5 03/16] pcie: Add support for PCIe CAP v1

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Added support for PCIe CAP v1, while reusing some of the existing v2
infrastructure.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/pcie.c  | 84 --
 include/hw/pci/pcie.h  |  4 +++
 include/hw/pci/pcie_regs.h |  5 +--
 3 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 728386a..24cfc3b 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -43,26 +43,15 @@
 /***
  * pci express capability helper functions
  */
-int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
-{
-int pos;
-uint8_t *exp_cap;
-
-assert(pci_is_express(dev));
-
-pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
- PCI_EXP_VER2_SIZEOF);
-if (pos < 0) {
-return pos;
-}
-dev->exp.exp_cap = pos;
-exp_cap = dev->config + pos;
 
+static void
+pcie_cap_v1_fill(uint8_t *exp_cap, uint8_t port, uint8_t type, uint8_t version)
+{
 /* capability register
-   interrupt message number defaults to 0 */
+interrupt message number defaults to 0 */
 pci_set_word(exp_cap + PCI_EXP_FLAGS,
  ((type << PCI_EXP_FLAGS_TYPE_SHIFT) & PCI_EXP_FLAGS_TYPE) |
- PCI_EXP_FLAGS_VER2);
+ version);
 
 /* device capability register
  * table 7-12:
@@ -81,7 +70,27 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 
 pci_set_word(exp_cap + PCI_EXP_LNKSTA,
  PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25 |PCI_EXP_LNKSTA_DLLLA);
+}
+
+int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
+{
+/* PCIe cap v2 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER2_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+/* Filling values common with v1 */
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER2);
 
+/* Filling v2 specific values */
 pci_set_long(exp_cap + PCI_EXP_DEVCAP2,
  PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
 
@@ -89,7 +98,29 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 return pos;
 }
 
-int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset, uint8_t type,
+ uint8_t port)
+{
+/* PCIe cap v1 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER1_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER1);
+
+return pos;
+}
+
+static int
+pcie_endpoint_cap_common_init(PCIDevice *dev, uint8_t offset, uint8_t cap_size)
 {
 uint8_t type = PCI_EXP_TYPE_ENDPOINT;
 
@@ -102,7 +133,19 @@ int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
 type = PCI_EXP_TYPE_RC_END;
 }
 
-return pcie_cap_init(dev, offset, type, 0);
+return (cap_size == PCI_EXP_VER1_SIZEOF)
+? pcie_cap_v1_init(dev, offset, type, 0)
+: pcie_cap_init(dev, offset, type, 0);
+}
+
+int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER2_SIZEOF);
+}
+
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER1_SIZEOF);
 }
 
 void pcie_cap_exit(PCIDevice *dev)
@@ -110,6 +153,11 @@ void pcie_cap_exit(PCIDevice *dev)
 pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF);
 }
 
+void pcie_cap_v1_exit(PCIDevice *dev)
+{
+pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER1_SIZEOF);
+}
+
 uint8_t pcie_cap_get_type(const PCIDevice *dev)
 {
 uint32_t pos = dev->exp.exp_cap;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b48a7a2..cbbf0c5 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -80,8 +80,12 @@ struct PCIExpressDevice {
 
 /* PCI express capability helper functions */
 int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port);
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset,
+ uint8_t type, uint8_t port);
 int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset);
 void pcie_cap_exit(PCIDevice *dev);
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset);
+void pcie_cap_v1_exit(PCIDevice *dev);
 uint8_t pcie_cap_get_type(const PCIDevice *dev);
 void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector);
 uint8_t pcie_cap_flags_get_vector(PCIDevice *dev);
diff --git a/include/hw/pci/pcie_regs.h b/include

[Qemu-devel] [PATCH v5 06/16] net: Introduce Toeplitz hash calculator

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/net/checksum.h | 45 +
 1 file changed, 45 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 7de1acb..dd8b4f6 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -18,6 +18,7 @@
 #ifndef QEMU_NET_CHECKSUM_H
 #define QEMU_NET_CHECKSUM_H
 
+#include "qemu/bswap.h"
 struct iovec;
 
 uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq);
@@ -50,4 +51,48 @@ uint32_t net_checksum_add_iov(const struct iovec *iov,
   const unsigned int iov_cnt,
   uint32_t iov_off, uint32_t size);
 
+typedef struct toeplitz_key_st {
+uint32_t leftmost_32_bits;
+uint8_t *next_byte;
+} net_toeplitz_key;
+
+static inline
+void net_toeplitz_key_init(net_toeplitz_key *key, uint8_t *key_bytes)
+{
+key->leftmost_32_bits = be32_to_cpu(*(uint32_t *)key_bytes);
+key->next_byte = key_bytes + sizeof(uint32_t);
+}
+
+static inline
+void net_toeplitz_add(uint32_t *result,
+  uint8_t *input,
+  uint32_t len,
+  net_toeplitz_key *key)
+{
+register uint32_t accumulator = *result;
+register uint32_t leftmost_32_bits = key->leftmost_32_bits;
+register uint32_t byte;
+
+for (byte = 0; byte < len; byte++) {
+register uint8_t input_byte = input[byte];
+register uint8_t key_byte = *(key->next_byte++);
+register uint8_t bit;
+
+for (bit = 0; bit < 8; bit++) {
+if (input_byte & (1 << 7)) {
+accumulator ^= leftmost_32_bits;
+}
+
+leftmost_32_bits =
+(leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+input_byte <<= 1;
+key_byte <<= 1;
+}
+}
+
+key->leftmost_32_bits = leftmost_32_bits;
+*result = accumulator;
+}
+
 #endif /* QEMU_NET_CHECKSUM_H */
-- 
2.5.5

[Qemu-devel] [PATCH v5 10/16] rtl8139: Move more TCP definitions to common header

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/rtl8139.c  | 5 -
 include/net/eth.h | 8 
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 1e5ec14..562c1fd 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -1867,11 +1867,6 @@ static int rtl8139_transmit_one(RTL8139State *s, int 
descriptor)
 return 1;
 }
 
-/* structures and macros for task offloading */
-#define TCP_HEADER_DATA_OFFSET(tcp) (((be16_to_cpu(tcp->th_offset_flags) >> 
12)&0xf) << 2)
-#define TCP_FLAGS_ONLY(flags) ((flags)&0x3f)
-#define TCP_HEADER_FLAGS(tcp) TCP_FLAGS_ONLY(be16_to_cpu(tcp->th_offset_flags))
-
 #define TCP_HEADER_CLEAR_FLAGS(tcp, off) ((tcp)->th_offset_flags &= 
cpu_to_be16(~TCP_FLAGS_ONLY(off)))
 
 /* produces ones' complement sum of data */
diff --git a/include/net/eth.h b/include/net/eth.h
index 18d0be3..5a32259 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -67,6 +67,14 @@ typedef struct tcp_header {
 uint16_t th_urp;/* urgent pointer */
 } tcp_header;
 
+#define TCP_FLAGS_ONLY(flags) ((flags) & 0x3f)
+
+#define TCP_HEADER_FLAGS(tcp) \
+TCP_FLAGS_ONLY(be16_to_cpu((tcp)->th_offset_flags))
+
+#define TCP_HEADER_DATA_OFFSET(tcp) \
+(((be16_to_cpu((tcp)->th_offset_flags) >> 12) & 0xf) << 2)
+
 typedef struct udp_header {
 uint16_t uh_sport; /* source port */
 uint16_t uh_dport; /* destination port */
-- 
2.5.5

[Qemu-devel] [PATCH v5 08/16] vmxnet3: Use common MAC address tracing macros

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/vmxnet3.c  | 8 
 hw/net/vmxnet_debug.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 0a4db4d..26f6f90 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -474,7 +474,7 @@ static void vmxnet3_set_variable_mac(VMXNET3State *s, 
uint32_t h, uint32_t l)
 s->conf.macaddr.a[4] = VMXNET3_GET_BYTE(h, 0);
 s->conf.macaddr.a[5] = VMXNET3_GET_BYTE(h, 1);
 
-VMW_CFPRN("Variable MAC: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("Variable MAC: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
@@ -1219,7 +1219,7 @@ static void vmxnet3_reset_interrupt_states(VMXNET3State 
*s)
 static void vmxnet3_reset_mac(VMXNET3State *s)
 {
 memcpy(&s->conf.macaddr.a, &s->perm_mac.a, sizeof(s->perm_mac.a));
-VMW_CFPRN("MAC address set to: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("MAC address set to: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 }
 
 static void vmxnet3_deactivate_device(VMXNET3State *s)
@@ -1301,7 +1301,7 @@ static void vmxnet3_update_mcast_filters(VMXNET3State *s)
 cpu_physical_memory_read(mcast_list_pa, s->mcast_list, list_bytes);
 VMW_CFPRN("Current multicast list len is %d:", s->mcast_list_len);
 for (i = 0; i < s->mcast_list_len; i++) {
-VMW_CFPRN("\t" VMXNET_MF, VMXNET_MA(s->mcast_list[i].a));
+VMW_CFPRN("\t" MAC_FMT, MAC_ARG(s->mcast_list[i].a));
 }
 }
 }
@@ -2102,7 +2102,7 @@ static void vmxnet3_net_init(VMXNET3State *s)
 
 s->link_status_and_speed = VMXNET3_LINK_SPEED | VMXNET3_LINK_STATUS_UP;
 
-VMW_CFPRN("Permanent MAC: " VMXNET_MF, VMXNET_MA(s->perm_mac.a));
+VMW_CFPRN("Permanent MAC: " MAC_FMT, MAC_ARG(s->perm_mac.a));
 
 s->nic = qemu_new_nic(&net_vmxnet3_info, &s->conf,
   object_get_typename(OBJECT(s)),
diff --git a/hw/net/vmxnet_debug.h b/hw/net/vmxnet_debug.h
index 96495db..5aab00b 100644
--- a/hw/net/vmxnet_debug.h
+++ b/hw/net/vmxnet_debug.h
@@ -142,7 +142,4 @@
 } \
 } while (0)
 
-#define VMXNET_MF   "%02X:%02X:%02X:%02X:%02X:%02X"
-#define VMXNET_MA(a)(a)[0], (a)[1], (a)[2], (a)[3], (a)[4], (a)[5]
-
 #endif /* _QEMU_VMXNET3_DEBUG_H  */
-- 
2.5.5

[Qemu-devel] [PATCH v5 04/16] pcie: Introduce function for DSN capability creation

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/pci/pcie.c | 10 ++
 include/hw/pci/pcie.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 24cfc3b..9599fde 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -695,3 +695,13 @@ void pcie_ari_init(PCIDevice *dev, uint16_t offset, 
uint16_t nextfn)
 offset, PCI_ARI_SIZEOF);
 pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 0xff) << 8);
 }
+
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num)
+{
+static const int pci_dsn_ver = 1;
+static const int pci_dsn_cap = 4;
+
+pcie_add_capability(dev, PCI_EXT_CAP_ID_DSN, pci_dsn_ver, offset,
+PCI_EXT_CAP_DSN_SIZEOF);
+pci_set_quad(dev->config + offset + pci_dsn_cap, ser_num);
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index cbbf0c5..056d25e 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -119,6 +119,7 @@ void pcie_add_capability(PCIDevice *dev,
  uint16_t offset, uint16_t size);
 
 void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 
 extern const VMStateDescription vmstate_pcie_device;
 
-- 
2.5.5

[Qemu-devel] [PATCH v5 12/16] vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

To make this device and network packets
abstractions ready for IOMMU.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/net_tx_pkt.c | 16 +++-
 hw/net/net_tx_pkt.h |  5 +++--
 hw/net/vmxnet3.c| 51 ++-
 3 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index ad2258c..dbcbe23 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -20,6 +20,7 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "net/net.h"
+#include "hw/pci/pci.h"
 
 enum {
 NET_TX_PKT_VHDR_FRAG = 0,
@@ -30,6 +31,8 @@ enum {
 
 /* TX packet private context */
 struct NetTxPkt {
+PCIDevice *pci_dev;
+
 struct virtio_net_hdr virt_hdr;
 bool has_virt_hdr;
 
@@ -54,11 +57,13 @@ struct NetTxPkt {
 bool is_loopback;
 };
 
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr)
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr)
 {
 struct NetTxPkt *p = g_malloc0(sizeof *p);
 
+p->pci_dev = pci_dev;
+
 p->vec = g_malloc((sizeof *p->vec) *
 (max_frags + NET_TX_PKT_PL_START_FRAG));
 
@@ -383,7 +388,8 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, 
hwaddr pa,
 ventry = &pkt->raw[pkt->raw_frags];
 mapped_len = len;
 
-ventry->iov_base = cpu_physical_memory_map(pa, &mapped_len, false);
+ventry->iov_base = pci_dma_map(pkt->pci_dev, pa,
+   &mapped_len, DMA_DIRECTION_TO_DEVICE);
 
 if ((ventry->iov_base != NULL) && (len == mapped_len)) {
 ventry->iov_len = mapped_len;
@@ -444,8 +450,8 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt)
 assert(pkt->raw);
 for (i = 0; i < pkt->raw_frags; i++) {
 assert(pkt->raw[i].iov_base);
-cpu_physical_memory_unmap(pkt->raw[i].iov_base, pkt->raw[i].iov_len,
-  false, pkt->raw[i].iov_len);
+pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base, pkt->raw[i].iov_len,
+  DMA_DIRECTION_TO_DEVICE, 0);
 }
 pkt->raw_frags = 0;
 
diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index e49772d..07b9a20 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -31,11 +31,12 @@ struct NetTxPkt;
  * Init function for tx packet functionality
  *
  * @pkt:packet pointer
+ * @pci_dev:PCI device processing this packet
  * @max_frags:  max tx ip fragments
  * @has_virt_hdr:   device uses virtio header.
  */
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr);
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr);
 
 /**
  * Clean all tx packet resources.
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 378a2eb..bbd5447 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -802,7 +802,9 @@ vmxnet3_pop_rxc_descr(VMXNET3State *s, int qidx, uint32_t 
*descr_gen)
 hwaddr daddr =
 vmxnet3_ring_curr_cell_pa(&s->rxq_descr[qidx].comp_ring);
 
-cpu_physical_memory_read(daddr, &rxcd, sizeof(struct Vmxnet3_RxCompDesc));
+pci_dma_read(PCI_DEVICE(s), daddr,
+ &rxcd, sizeof(struct Vmxnet3_RxCompDesc));
+
 ring_gen = vmxnet3_ring_curr_gen(&s->rxq_descr[qidx].comp_ring);
 
 if (rxcd.gen != ring_gen) {
@@ -1023,10 +1025,11 @@ nocsum:
 }
 
 static void
-vmxnet3_physical_memory_writev(const struct iovec *iov,
-   size_t start_iov_off,
-   hwaddr target_addr,
-   size_t bytes_to_copy)
+vmxnet3_pci_dma_writev(PCIDevice *pci_dev,
+   const struct iovec *iov,
+   size_t start_iov_off,
+   hwaddr target_addr,
+   size_t bytes_to_copy)
 {
 size_t curr_off = 0;
 size_t copied = 0;
@@ -1036,9 +1039,9 @@ vmxnet3_physical_memory_writev(const struct iovec *iov,
 size_t chunk_len =
 MIN((curr_off + iov->iov_len) - start_iov_off, bytes_to_copy);
 
-cpu_physical_memory_write(target_addr + copied,
-  iov->iov_base + start_iov_off - curr_off,
-  chunk_len);
+pci_dma_write(pci_dev, target_addr + copied,
+  iov->iov_base + start_iov_off - curr_off,
+  chunk_len);
 
 copied += chunk_len;
 start_iov_off += chunk_len;
@@ -1088,15 +1091,15 @@ vmxnet3_indicate_packet(VMXNET3State *s)
 }
 
 chunk_size = MIN(bytes_left, rxd.len);
-vmxnet3_physical_memory_writev(data, bytes_copied,
-   le64_to_cpu(rxd.addr), chunk_size);
+vmxnet3_pci_dma_writev(PCI_DEVICE(s), data, bytes_copied,
+   le64_to_cpu(r

[Qemu-devel] [PATCH v5 05/16] vmxnet3: Use generic function for DSN capability definition

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/vmxnet3.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 093a71e..0a4db4d 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2255,9 +2255,9 @@ static const MemoryRegionOps b1_ops = {
 },
 };
 
-static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
+static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
 {
-static uint64_t dsn_payload;
+uint64_t dsn_payload;
 uint8_t *dsnp = (uint8_t *)&dsn_payload;
 
 dsnp[0] = 0xfe;
@@ -2268,7 +2268,7 @@ static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
 dsnp[5] = s->conf.macaddr.a[1];
 dsnp[6] = s->conf.macaddr.a[2];
 dsnp[7] = 0xff;
-return dsnp;
+return dsn_payload;
 }
 
 static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
@@ -2313,10 +2313,8 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, 
Error **errp)
 pcie_endpoint_cap_init(pci_dev, VMXNET3_EXP_EP_OFFSET);
 }
 
-pcie_add_capability(pci_dev, PCI_EXT_CAP_ID_DSN, 0x1,
-VMXNET3_DSN_OFFSET, PCI_EXT_CAP_DSN_SIZEOF);
-memcpy(pci_dev->config + VMXNET3_DSN_OFFSET + 4,
-   vmxnet3_device_serial_num(s), sizeof(uint64_t));
+pcie_dev_ser_num_init(pci_dev, VMXNET3_DSN_OFFSET,
+  vmxnet3_device_serial_num(s));
 }
 
 register_savevm(dev, "vmxnet3-msix", -1, 1,
-- 
2.5.5

[Qemu-devel] [PATCH v5 09/16] net_pkt: Name vmxnet3 packet abstractions more generic

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 MAINTAINERS|   8 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/net_rx_pkt.c| 187 
 hw/net/net_rx_pkt.h| 174 +++
 hw/net/net_tx_pkt.c| 581 +
 hw/net/net_tx_pkt.h| 146 +
 hw/net/vmxnet3.c   |  88 
 hw/net/vmxnet_rx_pkt.c | 187 
 hw/net/vmxnet_rx_pkt.h | 174 ---
 hw/net/vmxnet_tx_pkt.c | 581 -
 hw/net/vmxnet_tx_pkt.h | 146 -
 tests/Makefile |   4 +-
 12 files changed, 1143 insertions(+), 1135 deletions(-)
 create mode 100644 hw/net/net_rx_pkt.c
 create mode 100644 hw/net/net_rx_pkt.h
 create mode 100644 hw/net/net_tx_pkt.c
 create mode 100644 hw/net/net_tx_pkt.h
 delete mode 100644 hw/net/vmxnet_rx_pkt.c
 delete mode 100644 hw/net/vmxnet_rx_pkt.h
 delete mode 100644 hw/net/vmxnet_tx_pkt.c
 delete mode 100644 hw/net/vmxnet_tx_pkt.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e7fac..dc5e536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -953,6 +953,14 @@ S: Maintained
 F: hw/*/xilinx_*
 F: include/hw/xilinx.h
 
+Network packet abstractions
+M: Dmitry Fleytman 
+S: Maintained
+F: include/net/eth.h
+F: net/eth.c
+F: hw/net/net_rx_pkt*
+F: hw/net/net_tx_pkt*
+
 Vmware
 M: Dmitry Fleytman 
 S: Maintained
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 64d0449..527d264 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -8,7 +8,7 @@ common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
 common-obj-$(CONFIG_E1000_PCI) += e1000.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
-common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
+common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
 
 common-obj-$(CONFIG_SMC91C111) += smc91c111.o
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
new file mode 100644
index 000..8a4f29f
--- /dev/null
+++ b/hw/net/net_rx_pkt.c
@@ -0,0 +1,187 @@
+/*
+ * QEMU RX packets abstractions
+ *
+ * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Tamir Shomer 
+ * Yan Vugenfirer 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "net_rx_pkt.h"
+#include "net/eth.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+#include "net/tap.h"
+
+/*
+ * RX packet may contain up to 2 fragments - rebuilt eth header
+ * in case of VLAN tag stripping
+ * and payload received from QEMU - in any case
+ */
+#define NET_MAX_RX_PACKET_FRAGMENTS (2)
+
+struct NetRxPkt {
+struct virtio_net_hdr virt_hdr;
+uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
+struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint16_t vec_len;
+uint32_t tot_len;
+uint16_t tci;
+bool vlan_stripped;
+bool has_virt_hdr;
+eth_pkt_types_e packet_type;
+
+/* Analysis results */
+bool isip4;
+bool isip6;
+bool isudp;
+bool istcp;
+};
+
+void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
+{
+struct NetRxPkt *p = g_malloc0(sizeof *p);
+p->has_virt_hdr = has_virt_hdr;
+*pkt = p;
+}
+
+void net_rx_pkt_uninit(struct NetRxPkt *pkt)
+{
+g_free(pkt);
+}
+
+struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt *pkt)
+{
+assert(pkt);
+return &pkt->virt_hdr;
+}
+
+void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
+   size_t len, bool strip_vlan)
+{
+uint16_t tci = 0;
+uint16_t ploff;
+assert(pkt);
+pkt->vlan_stripped = false;
+
+if (strip_vlan) {
+pkt->vlan_stripped = eth_strip_vlan(data, pkt->ehdr_buf, &plo

[Qemu-devel] [PATCH v5 07/16] net: Add macros for MAC address tracing

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

These macros will be used by future commits introducing
e1000e device emulation and by vmxnet3 tracing code.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 include/net/net.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..129d46b 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -9,6 +9,11 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 
+#define MAC_FMT "%02X:%02X:%02X:%02X:%02X:%02X"
+#define MAC_ARG(x) ((uint8_t *)(x))[0], ((uint8_t *)(x))[1], \
+   ((uint8_t *)(x))[2], ((uint8_t *)(x))[3], \
+   ((uint8_t *)(x))[4], ((uint8_t *)(x))[5]
+
 #define MAX_QUEUE_NUM 1024
 
 /* Maximum GSO packet size (64k) plus plenty of room for
-- 
2.5.5

[Qemu-devel] [PATCH v5 11/16] net_pkt: Extend packet abstraction as required by e1000e functionality

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/net_rx_pkt.c| 473 +
 hw/net/net_rx_pkt.h| 193 +++-
 hw/net/net_tx_pkt.c| 204 +
 hw/net/net_tx_pkt.h|  60 ++-
 include/net/checksum.h |   4 +-
 include/net/eth.h  | 153 +++-
 net/checksum.c |   7 +-
 net/eth.c  | 410 +-
 trace-events   |  40 +
 9 files changed, 1336 insertions(+), 208 deletions(-)

diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 8a4f29f..1019b50 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -16,24 +16,16 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "net_rx_pkt.h"
-#include "net/eth.h"
-#include "qemu-common.h"
-#include "qemu/iov.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 
-/*
- * RX packet may contain up to 2 fragments - rebuilt eth header
- * in case of VLAN tag stripping
- * and payload received from QEMU - in any case
- */
-#define NET_MAX_RX_PACKET_FRAGMENTS (2)
-
 struct NetRxPkt {
 struct virtio_net_hdr virt_hdr;
-uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
-struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint8_t ehdr_buf[sizeof(struct eth_header)];
+struct iovec *vec;
+uint16_t vec_len_total;
 uint16_t vec_len;
 uint32_t tot_len;
 uint16_t tci;
@@ -46,17 +38,31 @@ struct NetRxPkt {
 bool isip6;
 bool isudp;
 bool istcp;
+
+size_t l3hdr_off;
+size_t l4hdr_off;
+size_t l5hdr_off;
+
+eth_ip6_hdr_info ip6hdr_info;
+eth_ip4_hdr_info ip4hdr_info;
+eth_l4_hdr_info  l4hdr_info;
 };
 
 void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
 {
 struct NetRxPkt *p = g_malloc0(sizeof *p);
 p->has_virt_hdr = has_virt_hdr;
+p->vec = NULL;
+p->vec_len_total = 0;
 *pkt = p;
 }
 
 void net_rx_pkt_uninit(struct NetRxPkt *pkt)
 {
+if (pkt->vec_len_total != 0) {
+g_free(pkt->vec);
+}
+
 g_free(pkt);
 }
 
@@ -66,33 +72,88 @@ struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt 
*pkt)
 return &pkt->virt_hdr;
 }
 
-void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
-   size_t len, bool strip_vlan)
+static inline void
+net_rx_pkt_iovec_realloc(struct NetRxPkt *pkt,
+int new_iov_len)
+{
+if (pkt->vec_len_total < new_iov_len) {
+g_free(pkt->vec);
+pkt->vec = g_malloc(sizeof(*pkt->vec) * new_iov_len);
+pkt->vec_len_total = new_iov_len;
+}
+}
+
+static void
+net_rx_pkt_pull_data(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t ploff)
+{
+if (pkt->vlan_stripped) {
+net_rx_pkt_iovec_realloc(pkt, iovcnt + 1);
+
+pkt->vec[0].iov_base = pkt->ehdr_buf;
+pkt->vec[0].iov_len = sizeof(pkt->ehdr_buf);
+
+pkt->tot_len =
+iov_size(iov, iovcnt) - ploff + sizeof(struct eth_header);
+
+pkt->vec_len = iov_copy(pkt->vec + 1, pkt->vec_len_total - 1,
+iov, iovcnt, ploff, pkt->tot_len);
+} else {
+net_rx_pkt_iovec_realloc(pkt, iovcnt);
+
+pkt->tot_len = iov_size(iov, iovcnt) - ploff;
+pkt->vec_len = iov_copy(pkt->vec, pkt->vec_len_total,
+iov, iovcnt, ploff, pkt->tot_len);
+}
+
+eth_get_protocols(pkt->vec, pkt->vec_len, &pkt->isip4, &pkt->isip6,
+  &pkt->isudp, &pkt->istcp,
+  &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off,
+  &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info);
+
+trace_net_rx_pkt_parsed(pkt->isip4, pkt->isip6, pkt->isudp, pkt->istcp,
+pkt->l3hdr_off, pkt->l4hdr_off, pkt->l5hdr_off);
+}
+
+void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t iovoff, bool strip_vlan)
 {
 uint16_t tci = 0;
-uint16_t ploff;
+uint16_t ploff = iovoff;
 assert(pkt);
 pkt->vlan_stripped = false;
 
 if (strip_vlan) {
-pkt->vlan_stripped = eth_strip_vlan(data, pkt->ehdr_buf, &ploff, &tci);
+pkt->vlan_stripped = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf,
+&ploff, &tci);
 }
 
-if (pkt->vlan_stripped) {
-pkt->vec[0].iov_base = pkt->ehdr_buf;
-pkt->vec[0].iov_len = ploff - sizeof(struct vlan_header);
-pkt->vec[1].iov_base = (u

[Qemu-devel] [PATCH v5 16/16] e1000e: Introduce qtest for e1000e device

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 tests/Makefile  |   3 +
 tests/e1000e-test.c | 480 
 2 files changed, 483 insertions(+)
 create mode 100644 tests/e1000e-test.c

diff --git a/tests/Makefile b/tests/Makefile
index 5283e31..829349d 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -142,6 +142,8 @@ gcov-files-virtio-y += $(gcov-files-virtioserial-y)
 
 check-qtest-pci-y += tests/e1000-test$(EXESUF)
 gcov-files-pci-y += hw/net/e1000.c
+check-qtest-pci-y += tests/e1000e-test$(EXESUF)
+gcov-files-pci-y += hw/net/e1000e.c hw/net/e1000e_core.c
 check-qtest-pci-y += tests/rtl8139-test$(EXESUF)
 gcov-files-pci-y += hw/net/rtl8139.c
 check-qtest-pci-y += tests/pcnet-test$(EXESUF)
@@ -550,6 +552,7 @@ tests/i440fx-test$(EXESUF): tests/i440fx-test.o 
$(libqos-pc-obj-y)
 tests/q35-test$(EXESUF): tests/q35-test.o $(libqos-pc-obj-y)
 tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y)
 tests/e1000-test$(EXESUF): tests/e1000-test.o
+tests/e1000e-test$(EXESUF): tests/e1000e-test.o $(libqos-pc-obj-y)
 tests/rtl8139-test$(EXESUF): tests/rtl8139-test.o $(libqos-pc-obj-y)
 tests/pcnet-test$(EXESUF): tests/pcnet-test.o
 tests/eepro100-test$(EXESUF): tests/eepro100-test.o
diff --git a/tests/e1000e-test.c b/tests/e1000e-test.c
new file mode 100644
index 000..d6e6311
--- /dev/null
+++ b/tests/e1000e-test.c
@@ -0,0 +1,480 @@
+ /*
+ * QTest testcase for e1000e NIC
+ *
+ * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Leonid Bloch 
+ * Yan Vugenfirer 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+
+#include "qemu/osdep.h"
+#include 
+#include "libqtest.h"
+#include "qemu-common.h"
+#include "libqos/pci-pc.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/bitops.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
+
+#define E1000E_IMS  (0x00d0)
+
+#define E1000E_STATUS   (0x0008)
+#define E1000E_STATUS_LU BIT(1)
+#define E1000E_STATUS_ASDV1000 BIT(9)
+
+#define E1000E_CTRL (0x)
+#define E1000E_CTRL_RESET BIT(26)
+
+#define E1000E_RCTL (0x0100)
+#define E1000E_RCTL_EN  BIT(1)
+#define E1000E_RCTL_UPE BIT(3)
+#define E1000E_RCTL_MPE BIT(4)
+
+#define E1000E_RFCTL (0x5008)
+#define E1000E_RFCTL_EXTEN  BIT(15)
+
+#define E1000E_TCTL (0x0400)
+#define E1000E_TCTL_EN  BIT(1)
+
+#define E1000E_CTRL_EXT (0x0018)
+#define E1000E_CTRL_EXT_DRV_LOADBIT(28)
+#define E1000E_CTRL_EXT_TXLSFLOWBIT(22)
+
+#define E1000E_RX0_MSG_ID   (0)
+#define E1000E_TX0_MSG_ID   (1)
+#define E1000E_OTHER_MSG_ID (2)
+
+#define E1000E_IVAR (0x00E4)
+#define E1000E_IVAR_TEST_CFG((E1000E_RX0_MSG_ID << 0)| BIT(3)  | \
+ (E1000E_TX0_MSG_ID << 8)| BIT(11) | \
+ (E1000E_OTHER_MSG_ID << 16) | BIT(19) | \
+ BIT(31))
+
+#define E1000E_RING_LEN (0x1000)
+#define E1000E_TXD_LEN  (16)
+#define E1000E_RXD_LEN  (16)
+
+#define E1000E_TDBAL(0x3800)
+#define E1000E_TDBAH(0x3804)
+#define E1000E_TDLEN(0x3808)
+#define E1000E_TDH  (0x3810)
+#define E1000E_TDT  (0x3818)
+
+#define E1000E_RDBAL(0x2800)
+#define E1000E_RDBAH(0x2804)
+#define E1000E_RDLEN(0x2808)
+#define E1000E_RDH  (0x2810)
+#define E1000E_RDT  (0x2818)
+
+typedef struct {
+QPCIDevice *pci_dev;
+void *mac_regs;
+
+uint64_t tx_ring;
+uint64_t rx_ring;
+} e1000e_device;
+
+static int test_sockets[2];
+static QGuestAllocator *test_alloc;
+static QPCIBus *test_bus;
+
+static void e1000e_pci_foreach_callback(QPCIDevice *dev, int devfn, void *data)
+{
+*(QPCIDevice **) data = dev;
+}
+
+static QPCIDevice *e1000e_device_find(QPCIBus *bus)
+{
+static const int e1000e_vendor_id = 0x8086;
+static const int e1000e_dev_id = 0x10D3;
+
+QPCIDevice *e1000e_dev = NULL;
+
+qpci_device_foreach(bus, e1000e_vendor_id, e1000e_dev_id,
+e1000e_pci_foreach_callback, &e1000e_dev);
+
+g_assert_nonnull(e1000e_dev);
+
+return e1000e_dev;
+}
+
+static void e1000e_macreg_wr

[Qemu-devel] [PATCH v5 13/16] e1000_regs: Add definitions for Intel 82574-specific bits

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 hw/net/e1000_regs.h | 345 +++-
 1 file changed, 342 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000_regs.h b/hw/net/e1000_regs.h
index 1c40244..d62b3fa 100644
--- a/hw/net/e1000_regs.h
+++ b/hw/net/e1000_regs.h
@@ -85,6 +85,7 @@
 #define E1000_DEV_ID_82573E  0x108B
 #define E1000_DEV_ID_82573E_IAMT 0x108C
 #define E1000_DEV_ID_82573L  0x109A
+#define E1000_DEV_ID_82574L  0x10D3
 #define E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3 0x10B5
 #define E1000_DEV_ID_80003ES2LAN_COPPER_DPT 0x1096
 #define E1000_DEV_ID_80003ES2LAN_SERDES_DPT 0x1098
@@ -104,6 +105,7 @@
 #define E1000_PHY_ID2_82544x 0xC30
 #define E1000_PHY_ID2_8254xx_DEFAULT 0xC20 /* 82540x, 82545x, and 82546x */
 #define E1000_PHY_ID2_82573x 0xCC0
+#define E1000_PHY_ID2_82574x 0xCB1
 
 /* Register Set. (82543, 82544)
  *
@@ -135,8 +137,11 @@
 #define E1000_ITR  0x000C4  /* Interrupt Throttling Rate - RW */
 #define E1000_ICS  0x000C8  /* Interrupt Cause Set - WO */
 #define E1000_IMS  0x000D0  /* Interrupt Mask Set - RW */
+#define E1000_EIAC 0x000DC  /* Ext. Interrupt Auto Clear - RW */
 #define E1000_IMC  0x000D8  /* Interrupt Mask Clear - WO */
 #define E1000_IAM  0x000E0  /* Interrupt Acknowledge Auto Mask */
+#define E1000_IVAR 0x000E4  /* Interrupt Vector Allocation Register - RW */
+#define E1000_EITR 0x000E8  /* Extended Interrupt Throttling Rate - RW */
 #define E1000_RCTL 0x00100  /* RX Control - RW */
 #define E1000_RDTR10x02820  /* RX Delay Timer (1) - RW */
 #define E1000_RDBAL1   0x02900  /* RX Descriptor Base Address Low (1) - RW */
@@ -145,6 +150,7 @@
 #define E1000_RDH1 0x02910  /* RX Descriptor Head (1) - RW */
 #define E1000_RDT1 0x02918  /* RX Descriptor Tail (1) - RW */
 #define E1000_FCTTV0x00170  /* Flow Control Transmit Timer Value - RW */
+#define E1000_FCRTV0x05F40  /* Flow Control Refresh Timer Value - RW */
 #define E1000_TXCW 0x00178  /* TX Configuration Word - RW */
 #define E1000_RXCW 0x00180  /* RX Configuration Word - RO */
 #define E1000_TCTL 0x00400  /* TX Control - RW */
@@ -161,6 +167,10 @@
 #define E1000_PBM  0x1  /* Packet Buffer Memory - RW */
 #define E1000_PBS  0x01008  /* Packet Buffer Size - RW */
 #define E1000_EEMNGCTL 0x01010  /* MNG EEprom Control */
+#define E1000_EEMNGDATA0x01014 /* MNG EEPROM Read/Write data */
+#define E1000_FLMNGCTL 0x01018 /* MNG Flash Control */
+#define E1000_FLMNGDATA0x0101C /* MNG FLASH Read data */
+#define E1000_FLMNGCNT 0x01020 /* MNG FLASH Read Counter */
 #define E1000_FLASH_UPDATES 1000
 #define E1000_EEARBC   0x01024  /* EEPROM Auto Read Bus Control */
 #define E1000_FLASHT   0x01028  /* FLASH Timer Register */
@@ -169,9 +179,12 @@
 #define E1000_FLSWDATA 0x01034  /* FLASH data register */
 #define E1000_FLSWCNT  0x01038  /* FLASH Access Counter */
 #define E1000_FLOP 0x0103C  /* FLASH Opcode Register */
+#define E1000_FLOL 0x01050  /* FEEP Auto Load */
 #define E1000_ERT  0x02008  /* Early Rx Threshold - RW */
 #define E1000_FCRTL0x02160  /* Flow Control Receive Threshold Low - RW */
+#define E1000_FCRTL_A  0x00168  /* Alias to FCRTL */
 #define E1000_FCRTH0x02168  /* Flow Control Receive Threshold High - RW */
+#define E1000_FCRTH_A  0x00160  /* Alias to FCRTH */
 #define E1000_PSRCTL   0x02170  /* Packet Split Receive Control - RW */
 #define E1000_RDBAL0x02800  /* RX Descriptor Base Address Low - RW */
 #define E1000_RDBAH0x02804  /* RX Descriptor Base Address High - RW */
@@ -179,11 +192,17 @@
 #define E1000_RDH  0x02810  /* RX Descriptor Head - RW */
 #define E1000_RDT  0x02818  /* RX Descriptor Tail - RW */
 #define E1000_RDTR 0x02820  /* RX Delay Timer - RW */
+#define E1000_RDTR_A   0x00108  /* Alias to RDTR */
 #define E1000_RDBAL0   E1000_RDBAL /* RX Desc Base Address Low (0) - RW */
+#define E1000_RDBAL0_A 0x00110 /* Alias to RDBAL0 */
 #define E1000_RDBAH0   E1000_RDBAH /* RX Desc Base Address High (0) - RW */
+#define E1000_RDBAH0_A 0x00114 /* Alias to RDBAH0 */
 #define E1000_RDLEN0   E1000_RDLEN /* RX Desc Length (0) - RW */
+#define E1000_RDLEN0_A 0x00118 /* Alias to RDLEN0 */
 #define E1000_RDH0 E1000_RDH   /* RX Desc Head (0) - RW */
+#define E1000_RDH0_A   0x00120 /* Alias to RDH0 */
 #define E1000_RDT0 E1000_RDT   /* RX Desc Tail (0) - RW */
+#define E1000_RDT0_A   0x00128 /* Alias to RDT0 */
 #define E1000_RDTR0E1000_RDTR  /* RX Delay Timer (0) - RW */
 #define E1000_RXDCTL   0x02828  /* RX Descriptor Control queue 0 - RW */
 #define E1000_RXDCTL1  0x02928  /* RX Descriptor Control queue 1 - RW */
@@ -192,22 +211,33 @@
 #define E1000_RAID 0x02C08  /* Receive Ack Interrupt Delay - RW */
 #define E1000_TXDMAC   0x03000  /* TX DMA Control - RW */
 #define E1000_KABGTXD  0x03004  /* AFE Band G

[Qemu-devel] [PATCH v5 14/16] e1000: Move out code that will be reused in e1000e

2016-05-15 Thread Leonid Bloch

From: Dmitry Fleytman 

Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
---
 MAINTAINERS|   5 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/e1000.c | 411 +++--
 hw/net/e1000x_common.c | 267 
 hw/net/e1000x_common.h | 213 +
 trace-events   |  13 ++
 6 files changed, 591 insertions(+), 320 deletions(-)
 create mode 100644 hw/net/e1000x_common.c
 create mode 100644 hw/net/e1000x_common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dc5e536..e379f38 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -980,6 +980,11 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+e1000x
+M: Dmitry Fleytman 
+S: Maintained
+F: hw/net/e1000x*
+
 Subsystems
 --
 Audio
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 527d264..bc69948 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -6,7 +6,7 @@ common-obj-$(CONFIG_NE2000_PCI) += ne2000.o
 common-obj-$(CONFIG_EEPRO100_PCI) += eepro100.o
 common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
-common-obj-$(CONFIG_E1000_PCI) += e1000.o
+common-obj-$(CONFIG_E1000_PCI) += e1000.o e1000x_common.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
 common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8e79b55..36e3dbe 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,7 +36,7 @@
 #include "qemu/iov.h"
 #include "qemu/range.h"
 
-#include "e1000_regs.h"
+#include "e1000x_common.h"
 
 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
@@ -64,11 +64,6 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 #define PNPMMIO_SIZE  0x2
 #define MIN_BUF_SIZE  60 /* Min. octets in an ethernet frame sans FCS */
 
-/* this is the size past which hardware will drop packets when setting LPE=0 */
-#define MAXIMUM_ETHERNET_VLAN_SIZE 1522
-/* this is the size past which hardware will drop packets when setting LPE=1 */
-#define MAXIMUM_ETHERNET_LPE_SIZE 16384
-
 #define MAXIMUM_ETHERNET_HDR_LEN (14+4)
 
 /*
@@ -102,22 +97,9 @@ typedef struct E1000State_st {
 unsigned char vlan[4];
 unsigned char data[0x1];
 uint16_t size;
-unsigned char sum_needed;
 unsigned char vlan_needed;
-uint8_t ipcss;
-uint8_t ipcso;
-uint16_t ipcse;
-uint8_t tucss;
-uint8_t tucso;
-uint16_t tucse;
-uint8_t hdr_len;
-uint16_t mss;
-uint32_t paylen;
+e1000x_txd_props props;
 uint16_t tso_frames;
-char tse;
-int8_t ip;
-int8_t tcp;
-char cptse; // current packet tse bit
 } tx;
 
 struct {
@@ -162,52 +144,19 @@ typedef struct E1000BaseClass {
 #define E1000_DEVICE_GET_CLASS(obj) \
 OBJECT_GET_CLASS(E1000BaseClass, (obj), TYPE_E1000_BASE)
 
-#define defreg(x)x = (E1000_##x>>2)
-enum {
-defreg(CTRL),defreg(EECD),defreg(EERD),defreg(GPRC),
-defreg(GPTC),defreg(ICR), defreg(ICS), defreg(IMC),
-defreg(IMS), defreg(LEDCTL),  defreg(MANC),defreg(MDIC),
-defreg(MPC), defreg(PBA), defreg(RCTL),defreg(RDBAH),
-defreg(RDBAL),   defreg(RDH), defreg(RDLEN),   defreg(RDT),
-defreg(STATUS),  defreg(SWSM),defreg(TCTL),defreg(TDBAH),
-defreg(TDBAL),   defreg(TDH), defreg(TDLEN),   defreg(TDT),
-defreg(TORH),defreg(TORL),defreg(TOTH),defreg(TOTL),
-defreg(TPR), defreg(TPT), defreg(TXDCTL),  defreg(WUFC),
-defreg(RA),  defreg(MTA), defreg(CRCERRS), defreg(VFTA),
-defreg(VET), defreg(RDTR),defreg(RADV),defreg(TADV),
-defreg(ITR), defreg(FCRUC),   defreg(TDFH),defreg(TDFT),
-defreg(TDFHS),   defreg(TDFTS),   defreg(TDFPC),   defreg(RDFH),
-defreg(RDFT),defreg(RDFHS),   defreg(RDFTS),   defreg(RDFPC),
-defreg(IPAV),defreg(WUC), defreg(WUS), defreg(AIT),
-defreg(IP6AT),   defreg(IP4AT),   defreg(FFLT),defreg(FFMT),
-defreg(FFVT),defreg(WUPM),defreg(PBM), defreg(SCC),
-defreg(ECOL),defreg(MCC), defreg(LATECOL), defreg(COLC),
-defreg(DC),  defreg(TNCRS),   defreg(SEC), defreg(CEXTERR),
-defreg(RLEC),defreg(XONRXC),  defreg(XONTXC),  defreg(XOFFRXC),
-defreg(XOFFTXC), defreg(RFC), defreg(RJC), defreg(RNBC),
-defreg(TSCTFC),  defreg(MGTPRC),  defreg(MGTPDC),  defreg(MGTPTC),
-defreg(RUC), defreg(ROC), defreg(GORCL),   defreg(GORCH),
-defreg(GOTCL),   defreg(GOTCH),   defreg(BPRC),defreg(MPRC),
-defreg(TSCTC),   defreg(PRC64),   defreg(PRC127),  defreg(PRC255),
-defreg(PRC511),  defreg(PRC1023), defreg(PRC1522), defreg(PTC64),
-defreg(PTC127),  defreg(PTC255),  defreg(PTC511),

[Qemu-devel] [PATCH 2/2] hw/net/opencores_eth: Allocating Large sized arrays to heap

2016-05-15 Thread Max Filippov

From: Zhou Jie 

open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Reduce size of a buffer allocated on stack to 0x600 bytes, which is the
maximal frame length when HUGEN bit is not set in MODER, only allocate
buffer on heap when that is too small. Thus heap is not used in typical
use case.

Signed-off-by: Zhou Jie 
Signed-off-by: Max Filippov 
---
 hw/net/opencores_eth.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index c269992..484d113 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -482,7 +482,8 @@ static NetClientInfo net_open_eth_info = {
 
 static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 {
-uint8_t buf[65536];
+uint8_t *buf = NULL;
+uint8_t buffer[0x600];
 unsigned len = GET_FIELD(tx->len_flags, TXD_LEN);
 unsigned tx_len = len;
 
@@ -497,6 +498,11 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 
 trace_open_eth_start_xmit(tx->buf_ptr, len, tx_len);
 
+if (tx_len > sizeof(buffer)) {
+buf = g_new(uint8_t, tx_len);
+} else {
+buf = buffer;
+}
 if (len > tx_len) {
 len = tx_len;
 }
@@ -505,6 +511,9 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
 memset(buf + len, 0, tx_len - len);
 }
 qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
+if (tx_len > sizeof(buffer)) {
+g_free(buf);
+}
 
 if (tx->len_flags & TXD_WR) {
 s->tx_desc = 0;
-- 
2.1.4

[Qemu-devel] [PATCH 0/2] hw/net/opencores_eth cleanups

2016-05-15 Thread Max Filippov

Hello,

this series cleans up MII registers/bits usage in opencores_eth and reduces
stack size used by open_eth_start_xmit from 64K to less than 2K.

Max Filippov (1):
  hw/net/opencores_eth: use mii.h

Zhou Jie (1):
  hw/net/opencores_eth: Allocating Large sized arrays to heap

 hw/net/opencores_eth.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

-- 
2.1.4

[Qemu-devel] [PATCH 1/2] hw/net/opencores_eth: use mii.h

2016-05-15 Thread Max Filippov

Drop local definitions of MII registers and use constants from mii.h for
registers and register bits. No functional changes.

Signed-off-by: Max Filippov 
---
 hw/net/opencores_eth.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index c6094fb..c269992 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -33,6 +33,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
+#include "hw/net/mii.h"
 #include "hw/sysbus.h"
 #include "net/net.h"
 #include "sysemu/sysemu.h"
@@ -55,12 +56,6 @@
 
 /* PHY MII registers */
 enum {
-MII_BMCR,
-MII_BMSR,
-MII_PHYIDR1,
-MII_PHYIDR2,
-MII_ANAR,
-MII_ANLPAR,
 MII_REG_MAX = 16,
 };
 
@@ -72,10 +67,11 @@ typedef struct Mii {
 static void mii_set_link(Mii *s, bool link_ok)
 {
 if (link_ok) {
-s->regs[MII_BMSR] |= 0x4;
-s->regs[MII_ANLPAR] |= 0x01e1;
+s->regs[MII_BMSR] |= MII_BMSR_LINK_ST;
+s->regs[MII_ANLPAR] |= MII_ANLPAR_TXFD | MII_ANLPAR_TX |
+MII_ANLPAR_10FD | MII_ANLPAR_10 | MII_ANLPAR_CSMACD;
 } else {
-s->regs[MII_BMSR] &= ~0x4;
+s->regs[MII_BMSR] &= ~MII_BMSR_LINK_ST;
 s->regs[MII_ANLPAR] &= 0x01ff;
 }
 s->link_ok = link_ok;
@@ -84,11 +80,14 @@ static void mii_set_link(Mii *s, bool link_ok)
 static void mii_reset(Mii *s)
 {
 memset(s->regs, 0, sizeof(s->regs));
-s->regs[MII_BMCR] = 0x1000;
-s->regs[MII_BMSR] = 0x7868; /* no ext regs */
-s->regs[MII_PHYIDR1] = 0x2000;
-s->regs[MII_PHYIDR2] = 0x5c90;
-s->regs[MII_ANAR] = 0x01e1;
+s->regs[MII_BMCR] = MII_BMCR_AUTOEN;
+s->regs[MII_BMSR] = MII_BMSR_100TX_FD | MII_BMSR_100TX_HD |
+MII_BMSR_10T_FD | MII_BMSR_10T_HD | MII_BMSR_MFPS |
+MII_BMSR_AN_COMP | MII_BMSR_AUTONEG;
+s->regs[MII_PHYID1] = 0x2000;
+s->regs[MII_PHYID2] = 0x5c90;
+s->regs[MII_ANAR] = MII_ANAR_TXFD | MII_ANAR_TX |
+MII_ANAR_10FD | MII_ANAR_10 | MII_ANAR_CSMACD;
 mii_set_link(s, s->link_ok);
 }
 
@@ -98,7 +97,7 @@ static void mii_ro(Mii *s, uint16_t v)
 
 static void mii_write_bmcr(Mii *s, uint16_t v)
 {
-if (v & 0x8000) {
+if (v & MII_BMCR_RESET) {
 mii_reset(s);
 } else {
 s->regs[MII_BMCR] = v;
@@ -110,8 +109,8 @@ static void mii_write_host(Mii *s, unsigned idx, uint16_t v)
 static void (*reg_write[MII_REG_MAX])(Mii *s, uint16_t v) = {
 [MII_BMCR] = mii_write_bmcr,
 [MII_BMSR] = mii_ro,
-[MII_PHYIDR1] = mii_ro,
-[MII_PHYIDR2] = mii_ro,
+[MII_PHYID1] = mii_ro,
+[MII_PHYID2] = mii_ro,
 };
 
 if (idx < MII_REG_MAX) {
-- 
2.1.4

[Qemu-devel] Regression in QEMU 2.6.0 - Pentium III SSE fails

2016-05-15 Thread Stefan Weil

Hi Richard,

I got this error report from a Windows user of QEMU:

QEMU 2.6.0 RC4 can no longer boot Slackware or Zenwalk.
Boot fails with "Panic: Attempted to kill init."

RC3 didn't work too (screenshot attached)
My command line is:
qemu-system-i386w.exe -m 512 -cpu pentium3 -vga cirrus -soundhw sb16
-net nic,model=ne2k_pci -net user -hda c.img -boot c
I tried Slackware 9.0 and Zenwalk 2.6.

The last working version of QEMU was 2.5.0

The error is not Windows related, but can be reproduced on Linux
when running latest QEMU with TCG. In my tests, I used a simplified
command line:

qemu-system-i386 -m 512 -net nic,model=ne2k_pci -net none -hda ~/c.img
-snapshot

Slackware uses a 2.4 kernel. This kernel fails while measuring
checksumming speed with pIII_sse:

md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs :   384.400 MB/sec
   32regs:   259.200 MB/sec
invalid operand: 
CPU:0
EIP:0010:[]Not tainted
EFLAGS: 0246
eax: c15d8000   ebx:    ecx:    edx: c15d5000
esi: 8005003b   edi: 0004   ebp:    esp: c15bdf50
ds: 0018   es: 0018   ss: 0018 
Process swapper (pid: 1, stackpage=c15bd000)
Stack:       

         

    0206 c0241c6c 1000 c15d4000 c15d7000 c15d4000
c15d4000
Call Trace:[] [] [] []
[]
  [] []

Code: 0f ae f8 0f 10 04 24 0f 10 4c 24 10 0f 10 54 24 20 0f 10 5c
 <0>Kernel panic: Attempted to kill init!

It should have been like this:

md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs :   372.800 MB/sec
   32regs:   241.200 MB/sec
   pIII_sse  :   540.400 MB/sec
   pII_mmx   :   176.400 MB/sec
   p5_mmx:   205.600 MB/sec
raid5: using function: pIII_sse (540.400 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...

I think that the problem can be reproduced with other guests
running a 2.4 kernel, too.

Bisecting was difficult, but I could narrow down the range of
commits which caused the regression:

64dbaff09bb768dbbb13142862554f18ab642866 is good.
f4f1110e4b34797ddfa87bb28f9518b9256778be is bad.

Regards
Stefan




signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [Bug 1581976] [NEW] man qemu contains a bug in description of "-virtfs" command line argument

2016-05-15 Thread VsyachePuz

Public bug reported:

The description of command line argument looks like this:

 -virtfs
   
fsdriver[,path=path],mount_tag=mount_tag[,security_model=security_model][,writeout=writeout][,readonly][,socket=socket|sock_fd=sock_fd]


note, that there is no "id" attribute in the list of parameters.

later on the man there the "id" attribute is documented, as it were
present:

   id=id
   Specifies identifier for this device

i think that it was copied from above section (about "-fsdev") without
reviewing.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581976

Title:
  man qemu contains a bug in description of "-virtfs" command line
  argument

Status in QEMU:
  New

Bug description:
  The description of command line argument looks like this:

   -virtfs
 
fsdriver[,path=path],mount_tag=mount_tag[,security_model=security_model][,writeout=writeout][,readonly][,socket=socket|sock_fd=sock_fd]

  
  note, that there is no "id" attribute in the list of parameters.

  later on the man there the "id" attribute is documented, as it were
  present:

 id=id
 Specifies identifier for this device

  i think that it was copied from above section (about "-fsdev") without
  reviewing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1581976/+subscriptions

[Qemu-devel] [Bug 1581976] Re: man qemu contains a bug in description of "-virtfs" command line argument

2016-05-15 Thread VsyachePuz

** Description changed:

- The description of command line argument looks like this:
+ The description of command line argument 
+ 
https://github.com/qemu/qemu/blob/63d3145aadbecaa7e8be1e74b5d6b5cbbeb4e153/qemu-options.hx#L796-L799
+ looks like this:
  
-  -virtfs
-
fsdriver[,path=path],mount_tag=mount_tag[,security_model=security_model][,writeout=writeout][,readonly][,socket=socket|sock_fd=sock_fd]
- 
+  -virtfs
+    
fsdriver[,path=path],mount_tag=mount_tag[,security_model=security_model][,writeout=writeout][,readonly][,socket=socket|sock_fd=sock_fd]
  
  note, that there is no "id" attribute in the list of parameters.
  
  later on the man there the "id" attribute is documented, as it were
  present:
  
-id=id
-Specifies identifier for this device
+    id=id
+    Specifies identifier for this device
  
  i think that it was copied from above section (about "-fsdev") without
  reviewing.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1581976

Title:
  man qemu contains a bug in description of "-virtfs" command line
  argument

Status in QEMU:
  New

Bug description:
  The description of command line argument 
  
https://github.com/qemu/qemu/blob/63d3145aadbecaa7e8be1e74b5d6b5cbbeb4e153/qemu-options.hx#L796-L799
  looks like this:

   -virtfs
     
fsdriver[,path=path],mount_tag=mount_tag[,security_model=security_model][,writeout=writeout][,readonly][,socket=socket|sock_fd=sock_fd]

  note, that there is no "id" attribute in the list of parameters.

  later on the man there the "id" attribute is documented, as it were
  present:

     id=id
     Specifies identifier for this device

  i think that it was copied from above section (about "-fsdev") without
  reviewing.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1581976/+subscriptions

[Qemu-devel] "tcg: Clean up direct block chaining safety checks" breaks target-xtensa mmu test

2016-05-15 Thread Max Filippov

Hi Sergey,

I've noticed that the commit 5b053a4a28278 (tcg: Clean up direct block
chaining safety checks) has broken tearget-xtensa test cross_page_tb
from the tests/tcg/xtensa/test_mmu.S. The test runs a TB that spans two
adjacent pages, then unmaps the second page and runs it again. It
expects an instruction fetch exception on the second run, but with the
said commit doesn't get it. Reverting that commit fixes the test.
Any suggestions?

-- 
Thanks.
-- Max

[Qemu-devel] [PATCH V2 2/4] pci: reserve 64 bit MMIO range for PCI hotplug

2016-05-15 Thread Marcel Apfelbaum

Using the firmware assigned MMIO ranges for 64-bit PCI window
leads to zero space for hot-plugging PCI devices over 4G.

PC machines can use the whole CPU addressable range after
the space reserved for memory-hotplug.

Signed-off-by: Marcel Apfelbaum 
---
 hw/pci/pci.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index bb605ef..44dd949 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -41,6 +41,7 @@
 #include "hw/hotplug.h"
 #include "hw/boards.h"
 #include "qemu/cutils.h"
+#include "hw/i386/pc.h"
 
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
@@ -2467,8 +2468,19 @@ static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, 
void *opaque)
 
 void pci_bus_get_w64_range(PCIBus *bus, Range *range)
 {
-range->begin = range->end = 0;
-pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
+Object *machine = qdev_get_machine();
+if (object_dynamic_cast(machine, TYPE_PC_MACHINE)) {
+PCMachineState *pcms = PC_MACHINE(machine);
+range->begin = pc_machine_get_reserved_memory_end(pcms);
+if (!range->begin) {
+range->begin = ROUND_UP(0x1ULL + pcms->above_4g_mem_size,
+1ULL << 30);
+}
+range->end = 1ULL << 40; /* 40 bits physical */
+} else {
+range->begin = range->end = 0;
+pci_for_each_device_under_bus(bus, pci_dev_get_w64, range);
+}
 }
 
 static bool pcie_has_upstream_port(PCIDevice *dev)
-- 
2.4.3

[Qemu-devel] [PATCH V2 0/4] pci: better support for 64-bit MMIO allocation

2016-05-15 Thread Marcel Apfelbaum

Hi,

First two patches allocate (max_reserved_ram - max_addr_cpu_addressable) range 
for PCI hotplug
(for PC Machines) instead of the previous 64-bit PCI window that included only
the ranges allocated by the firmware.

The next two patches fix 64-bit CRS computations.

v1 -> v2:
 - resolved some styling issues (Laszlo)
 - rebase on latest master (Laszlo)

Thank you,
Marcel

Marcel Apfelbaum (4):
  hw/pc: extract reserved memory end computation to a standalone
function
  pci: reserve 64 bit MMIO range for PCI hotplug
  acpi: refactor pxb crs computation
  hw/apci: handle 64-bit MMIO regions correctly

 hw/i386/acpi-build.c | 127 ---
 hw/i386/pc.c |  29 
 hw/pci/pci.c |  16 ++-
 include/hw/i386/pc.h |   1 +
 4 files changed, 127 insertions(+), 46 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH V2 4/4] hw/apci: handle 64-bit MMIO regions correctly

2016-05-15 Thread Marcel Apfelbaum

In build_crs(), the calculation and merging of the ranges already happens
in 64-bit, but the entry boundaries are silently truncated to 32-bit in the
call to aml_dword_memory(). Fix it by handling the 64-bit MMIO ranges 
separately.
This fixes 64-bit BARs behind PXBs.

Signed-off-by: Marcel Apfelbaum 
---
 hw/i386/acpi-build.c | 53 +++-
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 78f25ef..aaf4a34 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -739,18 +739,22 @@ static void crs_range_free(gpointer data)
 typedef struct CrsRangeSet {
 GPtrArray *io_ranges;
 GPtrArray *mem_ranges;
+GPtrArray *mem_64bit_ranges;
  } CrsRangeSet;
 
 static void crs_range_set_init(CrsRangeSet *range_set)
 {
 range_set->io_ranges = g_ptr_array_new_with_free_func(crs_range_free);
 range_set->mem_ranges = g_ptr_array_new_with_free_func(crs_range_free);
+range_set->mem_64bit_ranges =
+g_ptr_array_new_with_free_func(crs_range_free);
 }
 
 static void crs_range_set_free(CrsRangeSet *range_set)
 {
 g_ptr_array_free(range_set->io_ranges, true);
 g_ptr_array_free(range_set->mem_ranges, true);
+g_ptr_array_free(range_set->mem_64bit_ranges, true);
 }
 
 static gint crs_range_compare(gconstpointer a, gconstpointer b)
@@ -908,8 +912,14 @@ static Aml *build_crs(PCIHostState *host, CrsRangeSet 
*range_set)
  * that do not support multiple root buses
  */
 if (range_base && range_base <= range_limit) {
-crs_range_insert(temp_range_set.mem_ranges,
- range_base, range_limit);
+uint64_t length = range_limit - range_base + 1;
+if (range_limit <= UINT32_MAX && length <= UINT32_MAX) {
+crs_range_insert(temp_range_set.mem_ranges,
+ range_base, range_limit);
+} else {
+crs_range_insert(temp_range_set.mem_64bit_ranges,
+ range_base, range_limit);
+}
 }
 
 range_base =
@@ -922,8 +932,14 @@ static Aml *build_crs(PCIHostState *host, CrsRangeSet 
*range_set)
  * that do not support multiple root buses
  */
 if (range_base && range_base <= range_limit) {
-crs_range_insert(temp_range_set.mem_ranges,
- range_base, range_limit);
+uint64_t length = range_limit - range_base + 1;
+if (range_limit <= UINT32_MAX && length <= UINT32_MAX) {
+crs_range_insert(temp_range_set.mem_ranges,
+ range_base, range_limit);
+} else {
+crs_range_insert(temp_range_set.mem_64bit_ranges,
+ range_base, range_limit);
+}
 }
 }
 }
@@ -951,6 +967,19 @@ static Aml *build_crs(PCIHostState *host, CrsRangeSet 
*range_set)
 crs_range_insert(range_set->mem_ranges, entry->base, entry->limit);
 }
 
+crs_range_merge(temp_range_set.mem_64bit_ranges);
+for (i = 0; i < temp_range_set.mem_64bit_ranges->len; i++) {
+entry = g_ptr_array_index(temp_range_set.mem_64bit_ranges, i);
+aml_append(crs,
+   aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+AML_MAX_FIXED, AML_NON_CACHEABLE,
+AML_READ_WRITE,
+0, entry->base, entry->limit, 0,
+entry->limit - entry->base + 1));
+crs_range_insert(range_set->mem_64bit_ranges,
+ entry->base, entry->limit);
+}
+
 crs_range_set_free(&temp_range_set);
 
 aml_append(crs,
@@ -2182,11 +2211,17 @@ build_dsdt(GArray *table_data, GArray *linker,
 }
 
 if (pci->w64.begin) {
-aml_append(crs,
-aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
- AML_CACHEABLE, AML_READ_WRITE,
- 0, pci->w64.begin, pci->w64.end - 1, 0,
- pci->w64.end - pci->w64.begin));
+crs_replace_with_free_ranges(crs_range_set.mem_64bit_ranges,
+ pci->w64.begin, pci->w64.end - 1);
+for (i = 0; i < crs_range_set.mem_64bit_ranges->len; i++) {
+entry = g_ptr_array_index(crs_range_set.mem_64bit_ranges, i);
+aml_append(crs,
+   aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+AML_MAX_FIXED,
+AML_CACHEABLE, AML_READ_WRITE,
+0, entry->base, entry->limit,
+0, entry->limit - ent

[Qemu-devel] [PATCH V2 1/4] hw/pc: extract reserved memory end computation to a standalone function

2016-05-15 Thread Marcel Apfelbaum

This code will be reused when calculating 64-bit MMIO hotplug ranges.

Signed-off-by: Marcel Apfelbaum 
---
 hw/i386/pc.c | 29 +
 include/hw/i386/pc.h |  1 +
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 99437e0..a7791e3 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1280,6 +1280,7 @@ void pc_memory_init(PCMachineState *pcms,
 FWCfgState *fw_cfg;
 MachineState *machine = MACHINE(pcms);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+uint64_t res_mem_end;
 
 assert(machine->ram_size == pcms->below_4g_mem_size +
 pcms->above_4g_mem_size);
@@ -1375,15 +1376,10 @@ void pc_memory_init(PCMachineState *pcms,
 
 rom_set_fw(fw_cfg);
 
-if (pcmc->has_reserved_memory && pcms->hotplug_memory.base) {
+res_mem_end = pc_machine_get_reserved_memory_end(pcms);
+if (res_mem_end) {
 uint64_t *val = g_malloc(sizeof(*val));
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
-uint64_t res_mem_end = pcms->hotplug_memory.base;
-
-if (!pcmc->broken_reserved_end) {
-res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
-}
-*val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
+*val = cpu_to_le64(res_mem_end);
 fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
 }
 
@@ -1853,6 +1849,23 @@ bool pc_machine_is_smm_enabled(PCMachineState *pcms)
 return false;
 }
 
+uint64_t pc_machine_get_reserved_memory_end(PCMachineState *pcms)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+uint64_t res_mem_end = 0;
+
+if (pcmc->has_reserved_memory && pcms->hotplug_memory.base) {
+res_mem_end = pcms->hotplug_memory.base;
+
+if (!pcmc->broken_reserved_end) {
+res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
+}
+res_mem_end = ROUND_UP(res_mem_end, 0x1ULL << 30);
+}
+
+return res_mem_end;
+}
+
 static void pc_machine_get_smm(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
 {
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 96f0b66..7c25814 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -223,6 +223,7 @@ void i8042_setup_a20_line(ISADevice *dev, qemu_irq 
*a20_out);
 extern int fd_bootchk;
 
 bool pc_machine_is_smm_enabled(PCMachineState *pcms);
+uint64_t pc_machine_get_reserved_memory_end(PCMachineState *pcms);
 void pc_register_ferr_irq(qemu_irq irq);
 void pc_acpi_smi_interrupt(void *opaque, int irq, int level);
 
-- 
2.4.3

[Qemu-devel] [PATCH V2 3/4] acpi: refactor pxb crs computation

2016-05-15 Thread Marcel Apfelbaum

Instead of always passing both IO and MEM ranges when
computing CRS ranges, define a new CrsRangeSet structure
that include them both.

This is done before introducing a third type of range,
64-bit MEM, so it will be easier to pass them all around.

Signed-off-by: Marcel Apfelbaum 
---
 hw/i386/acpi-build.c | 82 
 1 file changed, 51 insertions(+), 31 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 279f0d7..78f25ef 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -736,6 +736,23 @@ static void crs_range_free(gpointer data)
 g_free(entry);
 }
 
+typedef struct CrsRangeSet {
+GPtrArray *io_ranges;
+GPtrArray *mem_ranges;
+ } CrsRangeSet;
+
+static void crs_range_set_init(CrsRangeSet *range_set)
+{
+range_set->io_ranges = g_ptr_array_new_with_free_func(crs_range_free);
+range_set->mem_ranges = g_ptr_array_new_with_free_func(crs_range_free);
+}
+
+static void crs_range_set_free(CrsRangeSet *range_set)
+{
+g_ptr_array_free(range_set->io_ranges, true);
+g_ptr_array_free(range_set->mem_ranges, true);
+}
+
 static gint crs_range_compare(gconstpointer a, gconstpointer b)
 {
  CrsRangeEntry *entry_a = *(CrsRangeEntry **)a;
@@ -820,18 +837,17 @@ static void crs_range_merge(GPtrArray *range)
 g_ptr_array_free(tmp, true);
 }
 
-static Aml *build_crs(PCIHostState *host,
-  GPtrArray *io_ranges, GPtrArray *mem_ranges)
+static Aml *build_crs(PCIHostState *host, CrsRangeSet *range_set)
 {
 Aml *crs = aml_resource_template();
-GPtrArray *host_io_ranges = g_ptr_array_new_with_free_func(crs_range_free);
-GPtrArray *host_mem_ranges = 
g_ptr_array_new_with_free_func(crs_range_free);
+CrsRangeSet temp_range_set;
 CrsRangeEntry *entry;
 uint8_t max_bus = pci_bus_num(host->bus);
 uint8_t type;
 int devfn;
 int i;
 
+crs_range_set_init(&temp_range_set);
 for (devfn = 0; devfn < ARRAY_SIZE(host->bus->devices); devfn++) {
 uint64_t range_base, range_limit;
 PCIDevice *dev = host->bus->devices[devfn];
@@ -855,9 +871,11 @@ static Aml *build_crs(PCIHostState *host,
 }
 
 if (r->type & PCI_BASE_ADDRESS_SPACE_IO) {
-crs_range_insert(host_io_ranges, range_base, range_limit);
+crs_range_insert(temp_range_set.io_ranges,
+ range_base, range_limit);
 } else { /* "memory" */
-crs_range_insert(host_mem_ranges, range_base, range_limit);
+crs_range_insert(temp_range_set.mem_ranges,
+ range_base, range_limit);
 }
 }
 
@@ -876,7 +894,8 @@ static Aml *build_crs(PCIHostState *host,
  * that do not support multiple root buses
  */
 if (range_base && range_base <= range_limit) {
-crs_range_insert(host_io_ranges, range_base, range_limit);
+crs_range_insert(temp_range_set.io_ranges,
+ range_base, range_limit);
 }
 
 range_base =
@@ -889,7 +908,8 @@ static Aml *build_crs(PCIHostState *host,
  * that do not support multiple root buses
  */
 if (range_base && range_base <= range_limit) {
-crs_range_insert(host_mem_ranges, range_base, range_limit);
+crs_range_insert(temp_range_set.mem_ranges,
+ range_base, range_limit);
 }
 
 range_base =
@@ -902,35 +922,36 @@ static Aml *build_crs(PCIHostState *host,
  * that do not support multiple root buses
  */
 if (range_base && range_base <= range_limit) {
-crs_range_insert(host_mem_ranges, range_base, range_limit);
+crs_range_insert(temp_range_set.mem_ranges,
+ range_base, range_limit);
 }
 }
 }
 
-crs_range_merge(host_io_ranges);
-for (i = 0; i < host_io_ranges->len; i++) {
-entry = g_ptr_array_index(host_io_ranges, i);
+crs_range_merge(temp_range_set.io_ranges);
+for (i = 0; i < temp_range_set.io_ranges->len; i++) {
+entry = g_ptr_array_index(temp_range_set.io_ranges, i);
 aml_append(crs,
aml_word_io(AML_MIN_FIXED, AML_MAX_FIXED,
AML_POS_DECODE, AML_ENTIRE_RANGE,
0, entry->base, entry->limit, 0,
entry->limit - entry->base + 1));
-crs_range_insert(io_ranges, entry->base, entry->limit);
+crs_range_insert(range_set->io_ranges, entry->base, entry->limit);
 }
-g_ptr_array_free(host_io_ranges, true);
 
-crs_range_merge(host_mem_ranges);
-for (i = 0; i < host_mem_ranges->len; i++) {
-entry = g_ptr_array_index(host_mem_ranges, i);
+crs_range_merge(temp_range_set.mem_ranges);
+

Re: [Qemu-devel] "tcg: Clean up direct block chaining safety checks" breaks target-xtensa mmu test

2016-05-15 Thread Sergey Fedorov

On 15/05/16 21:58, Max Filippov wrote:
> Hi Sergey,
>
> I've noticed that the commit 5b053a4a28278 (tcg: Clean up direct block
> chaining safety checks) has broken tearget-xtensa test cross_page_tb
> from the tests/tcg/xtensa/test_mmu.S. The test runs a TB that spans two
> adjacent pages, then unmaps the second page and runs it again. It
> expects an instruction fetch exception on the second run, but with the
> said commit doesn't get it. Reverting that commit fixes the test.
> Any suggestions?
>

Hi Max,

That's too strange. How do I run the test?

Kind regards,
Sergey

Re: [Qemu-devel] "tcg: Clean up direct block chaining safety checks" breaks target-xtensa mmu test

2016-05-15 Thread Max Filippov

On Sun, May 15, 2016 at 10:38:46PM +0300, Sergey Fedorov wrote:
> On 15/05/16 21:58, Max Filippov wrote:
> > I've noticed that the commit 5b053a4a28278 (tcg: Clean up direct block
> > chaining safety checks) has broken tearget-xtensa test cross_page_tb
> > from the tests/tcg/xtensa/test_mmu.S. The test runs a TB that spans two
> > adjacent pages, then unmaps the second page and runs it again. It
> > expects an instruction fetch exception on the second run, but with the
> > said commit doesn't get it. Reverting that commit fixes the test.
> > Any suggestions?
> 
> That's too strange. How do I run the test?

I've minimized the test case, the source and the binary are available
here:
  http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201605152245/

You can run it as
  qemu-system-xtensa -M sim -cpu dc232b -nographic -semihosting -kernel 
./test_mmu.tst

-- 
Thanks.
-- Max

Re: [Qemu-devel] "tcg: Clean up direct block chaining safety checks" breaks target-xtensa mmu test

2016-05-15 Thread Sergey Fedorov

On 15/05/16 22:53, Max Filippov wrote:
> On Sun, May 15, 2016 at 10:38:46PM +0300, Sergey Fedorov wrote:
>> On 15/05/16 21:58, Max Filippov wrote:
>>> I've noticed that the commit 5b053a4a28278 (tcg: Clean up direct block
>>> chaining safety checks) has broken tearget-xtensa test cross_page_tb
>>> from the tests/tcg/xtensa/test_mmu.S. The test runs a TB that spans two
>>> adjacent pages, then unmaps the second page and runs it again. It
>>> expects an instruction fetch exception on the second run, but with the
>>> said commit doesn't get it. Reverting that commit fixes the test.
>>> Any suggestions?
>> That's too strange. How do I run the test?
> I've minimized the test case, the source and the binary are available
> here:
>   http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201605152245/
>
> You can run it as
>   qemu-system-xtensa -M sim -cpu dc232b -nographic -semihosting -kernel 
> ./test_mmu.tst
>

Thank you for this. I'll try it tomorrow and figure out what's going wrong.

Kind regards,
Sergey

Re: [Qemu-devel] "tcg: Clean up direct block chaining safety checks" breaks target-xtensa mmu test

2016-05-15 Thread Sergey Fedorov

On 15/05/16 22:56, Sergey Fedorov wrote:
> On 15/05/16 22:53, Max Filippov wrote:
>> On Sun, May 15, 2016 at 10:38:46PM +0300, Sergey Fedorov wrote:
>>> On 15/05/16 21:58, Max Filippov wrote:
 I've noticed that the commit 5b053a4a28278 (tcg: Clean up direct block
 chaining safety checks) has broken tearget-xtensa test cross_page_tb
 from the tests/tcg/xtensa/test_mmu.S. The test runs a TB that spans two
 adjacent pages, then unmaps the second page and runs it again. It
 expects an instruction fetch exception on the second run, but with the
 said commit doesn't get it. Reverting that commit fixes the test.
 Any suggestions?
>>> That's too strange. How do I run the test?
>> I've minimized the test case, the source and the binary are available
>> here:
>>   http://jcmvbkbc.spb.ru/~jcmvbkbc/tmp/201605152245/
>>
>> You can run it as
>>   qemu-system-xtensa -M sim -cpu dc232b -nographic -semihosting -kernel 
>> ./test_mmu.tst
>>
> Thank you for this. I'll try it tomorrow and figure out what's going wrong.

I couldn't sleep without first trying the test :) Now I really
understand why things went wrong. I mixed up 'next_tb' and 'tb' in this
piece of code:

/* see if we can patch the calling TB. When the TB   
   spans two pages, we cannot safely do a direct 
   jump. */  
if (next_tb != 0 && tb->page_addr[1] == -1   
&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
next_tb & TB_EXIT_MASK, tb); 
}

So I removed 'tb->page_addr[1] == -1' check thinking it's for the last
executed TB. But actually, it checks the next TB despite there's also
the variable called 'next_tb'. Indeed, we cannot safely direct jump *to*
the TB spanning pages in system emulation because we don't take care of
direct jumps when address mapping changes. However we can do this in
user emulation because there's only static address translation and TBs
get always invalidated properly.

I'll prepare a patch and fix this tomorrow then.

Nice test and nice catch!

Thanks,
Sergey

Re: [Qemu-devel] [PATCH qemu v16 14/19] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)

2016-05-15 Thread Alexey Kardashevskiy


On 05/14/2016 08:25 AM, Alex Williamson wrote:

On Wed,  4 May 2016 16:52:26 +1000
Alexey Kardashevskiy  wrote:


This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

As there is no per-IOMMU-type release() callback anymore, this stores
the IOMMU type in the container so vfio_listener_release() can determine
if it needs to unregister @prereg_listener.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v16:
* switched to 64bit math everywhere as there is no chance to see
region_add on RAM blocks even remotely close to 1<<64bytes.

v15:
* banned unaligned sections
* added an vfio_prereg_gpa_to_ua() helper

v14:
* s/free_container_exit/listener_release_exit/g
* added "if memory_region_is_iommu()" to vfio_prereg_listener_skipped_section
---
 hw/vfio/Makefile.objs |   1 +
 hw/vfio/common.c  |  38 +---
 hw/vfio/prereg.c  | 137 ++
 include/hw/vfio/vfio-common.h |   4 ++
 trace-events  |   2 +
 5 files changed, 172 insertions(+), 10 deletions(-)
 create mode 100644 hw/vfio/prereg.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index ceddbb8..5800e0e 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -4,4 +4,5 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o
 obj-$(CONFIG_SOFTMMU) += platform.o
 obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
 obj-$(CONFIG_SOFTMMU) += amd-xgbe.o
+obj-$(CONFIG_SOFTMMU) += prereg.o
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 2050040..496eb82 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -501,6 +501,9 @@ static const MemoryListener vfio_memory_listener = {
 static void vfio_listener_release(VFIOContainer *container)
 {
 memory_listener_unregister(&container->listener);
+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+memory_listener_unregister(&container->prereg_listener);
+}
 }

 int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
@@ -808,8 +811,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 goto free_container_exit;
 }

-ret = ioctl(fd, VFIO_SET_IOMMU,
-v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU);
+container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -834,8 +837,10 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
 container->iova_pgsizes = info.iova_pgsizes;
 }
-} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
+   ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
 struct vfio_iommu_spapr_tce_info info;
+bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU);

 ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
 if (ret) {
@@ -843,7 +848,9 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 ret = -errno;
 goto free_container_exit;
 }
-ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+container->iommu_type =
+v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -855,11 +862,22 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  * when container fd is closed so we do not call it explicitly
  * in this file.
  */
-ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-if (ret) {
-error_report("vfio: failed to enable container: %m");
-ret = -errno;
-goto free_container_exit;
+if (!v2)

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-15 Thread Jike Song

On 05/13/2016 11:48 PM, Neo Jia wrote:
> On Fri, May 13, 2016 at 05:46:17PM +0800, Jike Song wrote:
>> On 05/13/2016 04:12 AM, Neo Jia wrote:
>>> On Thu, May 12, 2016 at 01:05:52PM -0600, Alex Williamson wrote:

 If you're trying to equate the scale of what we need to track vs what
 type1 currently tracks, they're significantly different.  Possible
 things we need to track include the pfn, the iova, and possibly a
 reference count or some sort of pinned page map.  In the pin-all model
 we can assume that every page is pinned on map and unpinned on unmap,
 so a reference count or map is unnecessary.  We can also assume that we
 can always regenerate the pfn with get_user_pages() from the vaddr, so
 we don't need to track that.  
>>>
>>> Hi Alex,
>>>
>>> Thanks for pointing this out, we will not track those in our next rev and
>>> get_user_pages will be used from the vaddr as you suggested to handle the
>>> single VM with both passthru + mediated device case.
>>>
>>
>> Just a gut feeling:
>>
>> Calling GUP every time for a particular vaddr, means locking mm->mmap_sem
>> every time for a particular process. If the VM has dozens of VCPU, which
>> is not rare, the semaphore is likely to be the bottleneck.
> 
> Hi Jike,
> 
> We do need to hold the lock of mm->mmap_sem for the VMM/QEMU process, but I
> don't quite follow the reasoning with "dozens of vcpus", one situation that I
> can think of is that we have other thread competing with the mmap_sem for the
> VMM/QEMU process within KVM kernel such as hva_to_pfn, after a quick search it
> seems only mostly gets used by iotcl "KVM_ASSIGN_PCI_DEVICE".
>

I meant, on guest's writing a gfn to GPU MMU, which could happen on any vcpu,
so vmexit happens and mmap_sem required.  But I'm now realized that it's
also the situation even we store the pfn in rbtree ..

> We will definitely conduct performance analysis with large configuration on
> servers with E5-2697 v4. :-)

My homage :)

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH qemu v16 18/19] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)

2016-05-15 Thread Alexey Kardashevskiy


On 05/14/2016 08:26 AM, Alex Williamson wrote:

On Wed,  4 May 2016 16:52:30 +1000
Alexey Kardashevskiy  wrote:


New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add
and adds just created IOMMU into the host IOMMU list; the opposite
action is taken in vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v16:
* used memory_region_iommu_get_page_sizes() in vfio_listener_region_add()
* enforced no intersections between windows

v14:
* new to the series
---
 hw/vfio/common.c | 133 ++-
 trace-events |   2 +
 2 files changed, 125 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 03daf88..bd2dee8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -240,6 +240,18 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
 return -errno;
 }

+static bool range_contains(hwaddr start, hwaddr end, hwaddr addr)
+{
+return start <= addr && addr <= end;
+}


a) If you want a "range_foo" function then put it in range.h
b) I suspect there are already range.h functions that can do this.


+
+static bool vfio_host_win_intersects(VFIOHostDMAWindow *hostwin,
+ hwaddr min_iova, hwaddr max_iova)
+{
+return range_contains(hostwin->min_iova, hostwin->min_iova, min_iova) ||
+range_contains(min_iova, max_iova, hostwin->min_iova);
+}


How is this different than ranges_overlap()?

+
 static VFIOHostDMAWindow *vfio_host_win_lookup(VFIOContainer *container,
hwaddr min_iova, hwaddr 
max_iova)
 {
@@ -279,6 +291,14 @@ static int vfio_host_win_add(VFIOContainer *container,
 return 0;
 }

+static void vfio_host_win_del(VFIOContainer *container, hwaddr min_iova)
+{
+VFIOHostDMAWindow *hostwin = vfio_host_win_lookup(container, min_iova, 1);
+
+g_assert(hostwin);


Handle the error please.


Will this be enough?

if (!hostwin) {
error_report("%s: Cannot delete missing window at %"HWADDR_PRIx,
 __func__, min_iova);
return;
}





+QLIST_REMOVE(hostwin, hostwin_next);
+}
+
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
 return (!memory_region_is_ram(section->mr) &&
@@ -392,6 +412,69 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 end = int128_get64(int128_sub(llend, int128_one()));

+if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+VFIOHostDMAWindow *hostwin;
+unsigned pagesizes = memory_region_iommu_get_page_sizes(section->mr);
+unsigned pagesize = (hwaddr)1 << ctz64(pagesizes);
+unsigned entries, pages;
+struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) 
};
+
+trace_vfio_listener_region_add_iommu(iova, end);
+/*
+ * FIXME: For VFIO iommu types which have KVM acceleration to
+ * avoid bouncing all map/unmaps through qemu this way, this
+ * would be the right place to wire that up (tell the KVM
+ * device emulation the VFIO iommu handles to use).
+ */
+create.window_size = int128_get64(section->size);
+create.page_shift = ctz64(pagesize);
+/*
+ * SPAPR host supports multilevel TCE tables, there is some
+ * heuristic to decide how many levels we want for our table:
+ * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4
+ */
+entries = create.window_size >> create.page_shift;
+pages = MAX((entries * sizeof(uint64_t)) / getpagesize(), 1);
+pages = MAX(pow2ceil(pages) - 1, 1); /* Round up */
+create.levels = ctz64(pages) / 6 + 1;
+
+/* For now intersections are not allowed, we may relax this later */
+QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+if (vfio_host_win_intersects(hostwin,
+section->offset_within_address_space,
+section->offset_within_address_space +
+create.window_size - 1)) {
+goto fail;
+}
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
+if (ret) {
+error_report("Failed to create a window, ret = %d (%m)", ret);
+goto fail;
+}
+
+if (create.start_addr != section->offset_within_address_space) {
+struct vfio_iommu_spapr_tce_remove remove = {
+.argsz = sizeof(remove),
+.start_addr = create.start_addr
+};
+erro

Re: [Qemu-devel] [PATCH 1/2] e1000: Fixing interrupts pace.

2016-05-15 Thread Sameeh Jubran

As mentioned in the patch:
"According to the SPEC - intel PCI/PCI-X Family of Gigabit
Ethernet Controllers Software Developer's Manual, section
13.4.18 - the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. If there is
no upper bound this could lead to an interrupt storm by e1000
(when mit_delay < 500) causing interrupts to fire at a very high
pace."
This means that on a real hardware when mit_delay==0 ( don't use the timer
) the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. Unfortunately that isn't
the case in the emulated device and the interrupt
rate bypass the rate of the real hardware which could lead to an interrupt
storm. Setting mit_delay to 500 guarantees a maximum
interrupt rate of 7813 interrupts/sec.

Regards,
Sameeh

On Wed, May 4, 2016 at 2:34 PM, Shmulik Ladkani <
shmulik.ladk...@ravellosystems.com> wrote:

> Hi Sameeh,
>
> On Thu, 17 Mar 2016 09:37:57 +0200, sam...@daynix.com wrote:
> > @@ -357,6 +357,14 @@ set_interrupt_cause(E1000State *s, int index,
> uint32_t val)
> >  }
> >  mit_update_delay(&mit_delay, s->mac_reg[ITR]);
> >
> > +/*
> > + * According to e1000 SPEC, the Ethernet controller
> guarantees
> > + * a maximum observable interrupt rate of 7813
> interrupts/sec.
> > + * Thus if mit_delay < 500 then the delay should be set to
> the
> > + * minimum delay possible which is 500.
> > + */
> > +mit_delay = (mit_delay < 500) ? 500 : mit_delay;
> > +
> >  if (mit_delay) {
> >  s->mit_timer_on = 1;
> >  timer_mod(s->mit_timer,
> qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>
> Sorry for late response.
>
> Formerly, 'mit_delay' could possibly be 0 (as being not updated by
> any of the mit_update_delay calls), thus 'mit_timer' wouldn't be
> armed.
>
> The new logic forces mit_delay to be set to 500, even if it was 0
> ("unset").
>
> Which approach is correct:
> - Either the 'if (mit_delay)' is now superflous,
> - Or, do we need to keep the "unset" sematics (i.e. mit_delay==0 means
>   don't use the timer)
>
> Regards,
> Shmulik
>

-- 
Respectfully,
*Sameeh Jubran*
*Mobile: +972 054-2509642*

*Linkedin Junior
Software Engineer @ Daynix .*

Re: [Qemu-devel] [V10 1/4] hw/i386: Introduce AMD IOMMU

2016-05-15 Thread David Kiarie

On Sun, May 15, 2016 at 10:29 PM, Jan Kiszka  wrote:
> On 2016-05-09 14:15, David Kiarie wrote:
>> +

Thanks for review and testing!

>> +/* go to the next lower level */
>> +pte_addr = pte & AMDVI_DEV_PT_ROOT_MASK;
>> +/* add offset and load pte */
>> +pte_addr += ((addr >> (3 + 9 * level)) & 0x1FF) << 3;
>> +pte = ldq_phys(&address_space_memory, pte_addr);
>> +level = get_pte_translation_mode(pte);
>> +}
>> +/* get access permissions from pte */
>
> That comment is only addressing the last assignment of the followings.
>
>> +ret->iova = addr & AMDVI_PAGE_MASK_4K;
>> +ret->translated_addr = (pte & AMDVI_DEV_PT_ROOT_MASK) &
>> +AMDVI_PAGE_MASK_4K;
>> +ret->addr_mask = ~AMDVI_PAGE_MASK_4K;
>
> This does not take huge pages (2M, 1G, ...) into account. Jailhouse
> creates them, and its Linux guest goes mad. You need to use the correct
> page size here, analogously to intel_iommu.c.

Yes, this was meant to work with normal pages only. Until recently
intel iommu supported 4k pages only so I figured I could as well work
with 4k pages. Anyway, will fix this.

>
>> +ret->perm = amdvi_get_perms(pte);
>> +return;
>> +}
>> +
>> +no_remap:
>> +ret->iova = addr & AMDVI_PAGE_MASK_4K;
>> +ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
>> +ret->addr_mask = ~AMDVI_PAGE_MASK_4K;
>> +ret->perm = amdvi_get_perms(pte);
>> +
>> +}
>> +
>> +/* TODO : Mark addresses as Accessed and Dirty */
>> +static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr,
>> +   bool is_write, IOMMUTLBEntry *ret)
>> +{
>> +AMDVIState *s = as->iommu_state;
>> +uint16_t devid = PCI_DEVID(as->bus_num, as->devfn);
>> +AMDVIIOTLBEntry *iotlb_entry = amdvi_iotlb_lookup(s, addr, as->devfn);
>> +uint64_t entry[4];
>> +
>> +if (iotlb_entry) {
>> +AMDVI_DPRINTF(CACHE, "hit  iotlb devid: %02x:%02x.%x gpa 0x%"PRIx64
>> +  " hpa 0x%"PRIx64, PCI_BUS_NUM(devid), PCI_SLOT(devid),
>> +  PCI_FUNC(devid), addr, iotlb_entry->translated_addr);
>> +ret->iova = addr & AMDVI_PAGE_MASK_4K;
>> +ret->translated_addr = iotlb_entry->translated_addr;
>> +ret->addr_mask = ~AMDVI_PAGE_MASK_4K;
>> +ret->perm = iotlb_entry->perms;
>> +return;
>> +}
>> +
>> +/* devices with V = 0 are not translated */
>> +if (!amdvi_get_dte(s, devid, entry)) {
>> +goto out;
>> +}
>> +
>> +amdvi_page_walk(as, entry, ret,
>> +is_write ? AMDVI_PERM_WRITE : AMDVI_PERM_READ, addr);
>> +
>> +amdvi_update_iotlb(s, as->devfn, addr, ret->translated_addr,
>> +   ret->perm, entry[1] & AMDVI_DEV_DOMID_ID_MASK);
>> +return;
>> +
>> +out:
>> +ret->iova = addr & AMDVI_PAGE_MASK_4K;
>> +ret->translated_addr = addr & AMDVI_PAGE_MASK_4K;
>> +ret->addr_mask = ~AMDVI_PAGE_MASK_4K;
>> +ret->perm = IOMMU_RW;
>> +}
>> +
>> +static inline bool amdvi_is_interrupt_addr(hwaddr addr)
>> +{
>> +return addr >= AMDVI_INT_ADDR_FIRST && addr <= AMDVI_INT_ADDR_LAST;
>> +}
>> +
>> +static IOMMUTLBEntry amdvi_translate(MemoryRegion *iommu, hwaddr addr,
>> + bool is_write)
>> +{
>> +AMDVI_DPRINTF(GENERAL, "");
>
> Not a very helpful instrumentation, I would say.

It was helpful in the initial stages of development, not very helpful
now. I could get rid of such.

>
>> +
>> +AMDVIAddressSpace *as = container_of(iommu, AMDVIAddressSpace, iommu);
>> +AMDVIState *s = as->iommu_state;
>> +IOMMUTLBEntry ret = {
>> +.target_as = &address_space_memory,
>> +.iova = addr,
>> +.translated_addr = 0,
>> +.addr_mask = ~(hwaddr)0,
>> +.perm = IOMMU_NONE
>> +};
>> +
>> +if (!s->enabled || amdvi_is_interrupt_addr(addr)) {
>> +/* AMDVI disabled - corresponds to iommu=off not
>> + * failure to provide any parameter
>> + */
>> +ret.iova = addr & AMDVI_PAGE_MASK_4K;
>> +ret.translated_addr = addr & AMDVI_PAGE_MASK_4K;
>> +ret.addr_mask = ~AMDVI_PAGE_MASK_4K;
>> +ret.perm = IOMMU_RW;
>> +return ret;
>> +}
>> +
>> +amdvi_do_translate(as, addr, is_write, &ret);
>> +AMDVI_DPRINTF(MMU, "devid: %02x:%02x.%x gpa 0x%"PRIx64 " hpa 0x%"PRIx64,
>> +  as->bus_num, PCI_SLOT(as->devfn), PCI_FUNC(as->devfn), 
>> addr,
>> +  ret.translated_addr);
>
> Tracing permission here in addition would be good.
>
> Jan
>

Re: [Qemu-devel] [PATCH qemu v16 19/19] spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

2016-05-15 Thread Alexey Kardashevskiy


On 05/13/2016 06:41 PM, Bharata B Rao wrote:

On Wed, May 4, 2016 at 12:22 PM, Alexey Kardashevskiy  wrote:

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.5 machine (TODO: update version) and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800... as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v16:
* s/dma_liobn/dma_liobn[SPAPR_PCI_DMA_MAX_WINDOWS]/
* s/SPAPR_PCI_LIOBN()/dma_liobn[]/

v15:
* moved page mask filtering to PHB realize(), use "-mempath" to know
if there are huge pages
* fixed error reporting in RTAS handlers
* max window size accounts now hotpluggable memory boundaries
---
 hw/ppc/Makefile.objs|   1 +
 hw/ppc/spapr.c  |   5 +
 hw/ppc/spapr_pci.c  |  75 +---
 hw/ppc/spapr_rtas_ddw.c | 292 
 include/hw/pci-host/spapr.h |   8 +-
 include/hw/ppc/spapr.h  |  16 ++-
 trace-events|   4 +
 7 files changed, 381 insertions(+), 20 deletions(-)
 create mode 100644 hw/ppc/spapr_rtas_ddw.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c1ffc77..986b36f 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -7,6 +7,7 @@ obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o 
spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
+obj-$(CONFIG_PSERIES) += spapr_rtas_ddw.o
 # PowerPC 4xx boards
 obj-y += ppc405_boards.o ppc4xx_devs.o ppc405_uc.o ppc440_bamboo.o
 obj-y += ppc4xx_pci.o
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b69995e..0206609 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2365,6 +2365,11 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
 .driver   = "spapr-vlan", \
 .property = "use-rx-buffer-pools", \
 .value= "off", \
+}, \
+{\
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "ddw",\
+.value= stringify(off),\
 },

 static void spapr_machine_2_5_instance_options(MachineState *machine)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 51e7d56..aa414f2 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -35,6 +35,7 @@
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "exec/address-spaces.h"
+#include "exec/ram_addr.h"
 #include 
 #include "trace.h"
 #include "qemu/error-report.h"
@@ -44,6 +45,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/ppc/spapr_drc.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/hostmem.h"

 #include "hw/vfio/vfio.h"

@@ -1305,11 +1307,14 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 PCIBus *bus;
 uint64_t msi_window_size = 4096;
 sPAPRTCETable *tcet;
+const unsigned windows_supported =
+sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;

 if (sphb->index != (uint32_t)-1) {
 hwaddr windows_base;

-if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn != (uint32_t)-1)
+if ((sphb->buid != (uint64_t)-1) || (sphb->dma_liobn[0] != 
(uint32_t)-1)
+|| ((sphb->dma_liobn[1] != (uint32_t)-1) && (windows_supported > 
1))
 || (sphb->mem_win_addr != (hwaddr)-1)
 || (sphb->io_win_addr != (hwaddr)-1)) {
 error_setg(errp, "Either \"index\" or other parameters must"
@@ -1324,7 +1329,9 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 }

 sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
-sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
+for (i = 0; i < windows_supported; ++i) {
+sphb->d

Re: [Qemu-devel] qcow2 resize with snapshots

2016-05-15 Thread zhangzhiming

hi,  i read some source code by your tips, and i have some conclusions:

1. Old version of QCOW2 image does not store the total size of snapshot, so, we 
can’t 
add the function to the old version of QEMU, and the function of QCOW2 
resize with snapshots 
will be limited in V3 image or it will make confusion. so that is right.
2. I read the source code of bdrv_truncate from master, and the function 
“bdrv_dirty_bitmap_truncate" have done this, so 
it means that i need not to consider the resize of bitmap while resize 
QCOW2 ?
3. there is a function named “qmp_marshal_block_resize”, 
this function will block IO requests when resizing a image, and it seems 
that it called callback function to notify the 
guest, and we don’t need to set BLOCK_OP_TYPE_RESIZE blockers. is it 
correct ?

zhangzhiming
zhangzhimin...@meituan.com

> On May 7, 2016, at 11:13 AM, zhangzhiming  wrote:
> 
> sorry, i forgot to cc qemu-bl...@nongnu.org .
> 
> zhangzhiming
> zhangzhimin...@meituan.com 
> 
> 
> 
>> On May 7, 2016, at 10:47 AM, zhangzhiming > > wrote:
>> 
>> thank you for your reply, and i am glad to join to the development of qemu.
>> i will try my best to finish this new function.
>> 
>> have a good day!
>> 
>> zhangzhiming
>> zhangzhimin...@meituan.com 
>> 
>> 
>> 
>>> On May 3, 2016, at 4:44 PM, Kevin Wolf  wrote:
>>> 
>>> [ Cc: qemu-block ]
>>> 
>>> Am 29.04.2016 um 10:59 hat zhangzm geschrieben:
 hi, i want to implement the function of qcow2 resize which has
 snapshots.

 each snapshot of qcow2 will have a separate total size, and when apply
 a snapshot, the image can be shrunk, and the total size of image will
 change after apply to a snapshot with different size.

 now, there is a disk_size value in struct QcowSnapshot, i only need to
 change the size of current active image layer when apply a snapshot
 with different size, and the io request will be limit in the range of
 active layer.
>>> 
>>> Yes, I think today the qcow2 format provides everything that is needed
>>> to implement this. You need to make sure that we have a v3 image so that
>>> the virtual disk size is actually stored in the snapshot (this field did
>>> not exist in v2 images yet).
>>> 
>>> What you need to consider is that loading a snapshot becomes similar to
>>> resizing an image then and you need to do the same things for it. For
>>> example, we need to figure out what to do with associated dirty bitmaps
>>> (adapt them to the new size like in bdrv_truncate()?), call resize
>>> callbacks so that the guest devices actually see the changes size and
>>> possibly also consider the BLOCK_OP_TYPE_RESIZE blockers to prevent a
>>> size change while the image is in use.
>>> 
 and i want my code merged into the master of qemu.
>>> 
>>> The wiki has a few tips on how to submit patches for qemu:
>>> http://wiki.qemu.org/Contribute/SubmitAPatch
>>> 
>>> For a patch (or patch series) like this, you will want to CC at least
>>> qemu-block and qemu-devel, plus possibly a few individual people.
>>> 
>>> Kevin
>>> 
>> 
>> 
>

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-15 Thread Jike Song

On 05/13/2016 11:50 PM, Neo Jia wrote:
> On Fri, May 13, 2016 at 05:23:44PM +0800, Jike Song wrote:
>> On 05/13/2016 04:31 PM, Neo Jia wrote:
>>> On Fri, May 13, 2016 at 07:45:14AM +, Tian, Kevin wrote:

 We use page tracking framework, which is newly added to KVM recently,
 to mark RAM pages as read-only so write accesses are intercepted to 
 device model.
>>>
>>> Yes, I am aware of that patchset from Guangrong. So far the interface are 
>>> all
>>> requiring struct *kvm, copied from https://lkml.org/lkml/2015/11/30/644
>>>
>>> - kvm_page_track_add_page(): add the page to the tracking pool after
>>>   that later specified access on that page will be tracked
>>>
>>> - kvm_page_track_remove_page(): remove the page from the tracking pool,
>>>   the specified access on the page is not tracked after the last user is
>>>   gone
>>>
>>> void kvm_page_track_add_page(struct kvm *kvm, gfn_t gfn,
>>> enum kvm_page_track_mode mode);
>>> void kvm_page_track_remove_page(struct kvm *kvm, gfn_t gfn,
>>>enum kvm_page_track_mode mode);
>>>
>>> Really curious how you are going to have access to the struct kvm *kvm, or 
>>> you
>>> are relying on the userfaultfd to track the write faults only as part of the
>>> QEMU userfault thread?
>>>
>>
>> Hi Neo,
>>
>> For the vGPU used as a device for KVM guest, there will be interfaces
>> wrapped or implemented in KVM layer, as a rival thing diverted from
>> the interfaces for Xen. That is where the KVM related code supposed to be.
> 
> Hi Jike,
> 
> Is this discussed anywhere on the mailing list already? Sorry if I have missed
> such conversation.
>

Hi Neo,

Not exactly, but we can discuss it if necessary :)

Intel vGPU device-model, which is a part of i915 driver, has to be able to
emulate vGPU for *both* XenGT and KVMGT guests. That means there must be
a ridge somewhere, directing to Xen-specific and KVM-specific logic accordingly.


--
Thanks,
Jike

53 matches

Mail list logo