date:20150714

Re: [Qemu-devel] [PATCH v3 2/5] Extract some reusable vGIC code

2015-07-14 Thread Pavel Fedin

 Hi!

> /local/augere/LINARO/QEMU/qemu/hw/intc/arm_gic_kvm.c:565:5: error:
> passing argument 2 of 'qdev_init_gpio_in' from incompatible pointer type
> [-Werror]
>  qdev_init_gpio_in(dev, kvm_arm_gic_set_irq, i);
> 
> also
> 
> /local/augere/LINARO/QEMU/qemu/hw/intc/arm_gic_kvm.c:90:13: error:
> 'kvm_arm_gicv2_set_irq' defined but not used [-Werror=unused-function]
>  static void kvm_arm_gicv2_set_irq(void *opaque, int irq, int level)

 Ooops, i have occasionally missed this during rebase and conflict fixing. 
Actually it's a big bug,
and the code would not even work. :( Thank you for spotting this. In my master 
this issue is not
present. Fixing and will respin today.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

Re: [Qemu-devel] [PATCH qemu 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask

2015-07-14 Thread Alexey Kardashevskiy


On 07/14/2015 06:32 AM, Peter Crosthwaite wrote:

On Sun, Jul 12, 2015 at 11:15 PM, David Gibson
 wrote:

On Fri, Jul 10, 2015 at 08:43:44PM +1000, Alexey Kardashevskiy wrote:

These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
a real host page size:
4e51361d7 "cpu-all: complete "real" host page size API" and
f7ceed190 "vfio: cpu: Use "real" page size API"

This finished the transition by:
- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
- removing bitfield length for offsets in VFIOQuirk::data as
qemu_real_host_page_mask is not a macro


This does not make much sense to me.  f7ceed190 moved to
REAL_HOST_PAGE_SIZE because it's back end stuff that really depends
only on the host page size.

Here you're applying a blanket change to vfio code, in particular to
the DMA handling code, and for DMA both the host and target page size
can be relevant, depending on the details of the IOMMU implementation.



So the multi-arch work (for which f7ceed190 preps) does have a problem
that needs something like this. TARGET_PAGE_MASK and TARGET_PAGE_ALIGN
do need to go away from common code, or this has to be promoted to
cpu-specific code. Consequently, the page size is fixed to 4K for
multi-arch and this is not a good long-term limitation. Is the IOMMU
page size really tied to the CPU implementation? In practice this is
going to be the case, but IOMMU and CPU should be decoupleable.

If vfio needs to respect a particular (or all?) CPUs/IOMMUs page
alignment then can we virtualise this as data rather than a macroified
constant?

uint64_t  page_align = 0;

CPU_FOREACH(cpu, ...) {
 CPUClass *cc = CPU_GET_CLASS(cpu);

 page_align = MAX(page_align, cc->page_size);
}

/* This is a little more made up ... */
IOMMU_FOREACH(iommu, ...) {
 page_align = MAX(page_align, iommu->page_size);
}


This assumes that IOMMU has a constant page size, is this always true?
Cannot it be a (contigous?) set of different size chunks, for example, one 
per memory dimm?



--
Alexey

Re: [Qemu-devel] vm internal snapshot deletes only delete first disk's snapshots

2015-07-14 Thread Marcus

Ok, I seem to have fixed it. I can submit a patch, but some input would be
appreciated. In savevm.c, we have two functions:

 void hmp_delvm(Monitor *mon, const QDict *qdict), which my libvirt is using

static int del_existing_snapshots(Monitor *mon, const char *name)


These look to have the same origins. My guess was that perhaps the hmp
version was first, and the second was a copy for QMP, or perhaps vice
versa. At any rate, I changed hmp_delvm to call del_existing_snapshots:

void hmp_delvm(Monitor *mon, const QDict *qdict)

{

const char *name = qdict_get_str(qdict, "name");

del_existing_snapshots(mon, name);

}

And the issue went away. My only concern with this is that hmp_delvm does a
slightly different thing, it first calls find_vmstate_bs() which seems to
check if the block devices support snapshots. del_existing_snapshots does
this slightly differently. If I can get feedback on that I'd appreciate it.
Also, I'll look around for patch submission guidelines, but if someone can
point me in the right direction I'd appreciate it as well.

Thanks



On Mon, Jul 13, 2015 at 11:29 PM, Marcus  wrote:

> Ok, this is weird. It looks like it dies on the first disk, which is the
> only one that actually gets its snapshot removed. Perhaps it is processing
> the first disk a second time?
>
> # virsh qemu-monitor-command i-2-38-VM --hmp 'delvm 1'
>
> Error while deleting snapshot on device 'virtio-disk0': Can't find the
> snapshot
>
> On Mon, Jul 13, 2015 at 10:28 PM, Marcus  wrote:
>
>> Hi all,
>>
>> I've recently been toying with VM snapshots, and have ran into an
>> issue. Given a VM with multiple disks, it seems a snapshot-create followed
>> by a snapshot-delete will only remove the qcow2 snapshot for the first disk
>> (or perhaps just the disk that contains the memory), not all of the disk
>> snapshots it created. Is this something people are aware of?
>>
>> In searching around, I found a bug report where snapshot-creates
>> would fail due to the qcow2 snapshot ids being inconsistent. That looks
>> like it is patched for 2.4 qemu (
>> http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg04963.html),
>> this bug would trigger that one by leaving IDs around that are inconsistent
>> between member disks, but is not the same.
>>
>>At first I was snooping around the libvirt lists looking for an
>> answer, but found in the code that they're simply calling the qemu monitor
>> to handle snapshot deletes on active VMs.
>>
>> # virsh snapshot-create 7
>>
>> Domain snapshot 1436792720 created
>>
>>
>> # virsh snapshot-list 7
>>
>>  Name Creation Time State
>>
>> 
>>
>>  1436792720   2015-07-13 06:05:20 -0700 running
>>
>>
>> # virsh domblklist 7
>>
>> Target Source
>>
>> 
>>
>> vda
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/e4d6e885-1382-40bc-890b-ad9c8b51a7a5
>>
>> vdb
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/7033e4c6-5f59-4325-b7e0-ae191e12e86c
>>
>>
>> # qemu-img snapshot -l
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/e4d6e885-1382-40bc-890b-ad9c8b51a7a5
>>
>> Snapshot list:
>>
>> IDTAG VM SIZEDATE   VM CLOCK
>>
>> 1 1436792720 173M 2015-07-13 06:05:20   00:01:10.938
>>
>>
>> # qemu-img snapshot -l
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/7033e4c6-5f59-4325-b7e0-ae191e12e86c
>>
>> Snapshot list:
>>
>> IDTAG VM SIZEDATE   VM CLOCK
>>
>> 1 14367927200 2015-07-13 06:05:20   00:01:10.938
>>
>>
>> # virsh snapshot-delete 7 1436792720
>>
>> Domain snapshot 1436792720 deleted
>>
>>
>> # qemu-img snapshot -l
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/e4d6e885-1382-40bc-890b-ad9c8b51a7a5
>>
>> # qemu-img snapshot -l
>> /mnt/2a270ef3-f389-37a4-942f-380bed9f70aa/7033e4c6-5f59-4325-b7e0-ae191e12e86c
>>
>> Snapshot list:
>>
>> IDTAG VM SIZEDATE   VM CLOCK
>>
>> 1 14367927200 2015-07-13 06:05:20   00:01:10.938
>>
>
>

[Qemu-devel] (no subject)

2015-07-14 Thread Pankaj Gupta

Subject: [PATCH 0/2 v2] virtio-rng: Avoid uncessary timer trigger to bump up 
quota value 

Timer was added in virtio-rng to rate limit the
entropy. It used to trigger at regular intervals to
bump up the quota value. The value of quota and timer
is used to ensure single guest should not use up all the 
entropy from the host.
This resulted in triggering of timer even when quota
is not exhausted at all and resulting in extra processing.

This series has two patches:

patch1 : Bump up quota value only when guest requests entropy.
patch2 : Serve pending request if any after timer bumps up quota.

Changes from v1:
Amit Shah : 
Serve pending request if any after timer bumps up quota.
Add testing details.

I tested this with '/dev/urandom' at host side. Below are the details:

* Quota+timer specified on command line, 
  Ran in Guest 'dd if=/dev/hwrng of=/dev/null'
  
  - Quota (4096 bytes), Time slice (1000 ms)

48152 bytes (49 KB) copied, 12.0005 s, 4.1 kB/s
48640 bytes (49 KB) copied, 12.0014 s, 4.1 kB/s

  - Quota (8192), Time slice (1000 ms)
65536 bytes  (66 KB) copied, 8.00088 s, 8.2 kB/s
146944 bytes (147 KB)copied, 18.0021 s, 8.2 kB/s

  - No quota/timer specified on command line, takes default values.
Quota (INT64_MAX), Time slice(65536 ms)
8050688 bytes (8.1 MB) copied, 93.1198 s, 86.5 kB/s
35568128 bytes (36 MB) copied, 408.823 s, 87.0 kB/s


 hw/virtio/virtio-rng.c |   51 +
 include/hw/virtio/virtio-rng.h |1 
 2 files changed, 33 insertions(+), 19 deletions(-)

[Qemu-devel] [PATCH 1/2 v2] virtio-rng: Bump up quota value only when guest requests entropy

2015-07-14 Thread Pankaj Gupta

This patch triggers timer only when guest requests for 
entropy. As soon as first request from guest for entropy 
comes we set the timer. Timer bumps up the quota value 
when it gets triggered.

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-rng.c | 15 ---
 include/hw/virtio/virtio-rng.h |  1 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 22b1d87..8774a0c 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -78,6 +78,12 @@ static void virtio_rng_process(VirtIORNG *vrng)
 return;
 }
 
+if (vrng->activate_timer) {
+timer_mod(vrng->rate_limit_timer,
+   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
vrng->conf.period_ms);
+vrng->activate_timer = false;
+}
+
 if (vrng->quota_remaining < 0) {
 quota = 0;
 } else {
@@ -139,8 +145,7 @@ static void check_rate_limit(void *opaque)
 
 vrng->quota_remaining = vrng->conf.max_bytes;
 virtio_rng_process(vrng);
-timer_mod(vrng->rate_limit_timer,
-   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
vrng->conf.period_ms);
+vrng->activate_timer = true;
 }
 
 static void virtio_rng_device_realize(DeviceState *dev, Error **errp)
@@ -196,13 +201,9 @@ static void virtio_rng_device_realize(DeviceState *dev, 
Error **errp)
 
 vrng->vq = virtio_add_queue(vdev, 8, handle_input);
 vrng->quota_remaining = vrng->conf.max_bytes;
-
 vrng->rate_limit_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
check_rate_limit, vrng);
-
-timer_mod(vrng->rate_limit_timer,
-   qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
vrng->conf.period_ms);
-
+vrng->activate_timer = true;
 register_savevm(dev, "virtio-rng", -1, 1, virtio_rng_save,
 virtio_rng_load, vrng);
 }
diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h
index 0316488..3f07de7 100644
--- a/include/hw/virtio/virtio-rng.h
+++ b/include/hw/virtio/virtio-rng.h
@@ -44,6 +44,7 @@ typedef struct VirtIORNG {
  */
 QEMUTimer *rate_limit_timer;
 int64_t quota_remaining;
+bool activate_timer;
 } VirtIORNG;
 
 #endif
-- 
1.9.3

[Qemu-devel] [PATCH 2/2 v2] virtio-rng: Serve pending request if any after timer bumps up quota.

2015-07-14 Thread Pankaj Gupta

We are arming timer when we get first request from guest.
Even if guest pulls all the data we will be serving guest 
only when timer bumps up new quota. When timer expires 
we check if we have a pending request from guest, we 
serve it and rearm the timer else we don't do any thing.

This patch also moves out 'request size' logic out to 
'check_request' function so that can be re-used.

Signed-off-by: Pankaj Gupta 
---
 hw/virtio/virtio-rng.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/virtio-rng.c b/hw/virtio/virtio-rng.c
index 8774a0c..dca5064 100644
--- a/hw/virtio/virtio-rng.c
+++ b/hw/virtio/virtio-rng.c
@@ -37,6 +37,24 @@ static size_t get_request_size(VirtQueue *vq, unsigned quota)
 return in;
 }
 
+static size_t check_request(VirtIORNG *vrng)
+{
+size_t size;
+unsigned quota;
+
+if (vrng->quota_remaining < 0) {
+quota = 0;
+} else {
+quota = MIN((uint64_t)vrng->quota_remaining, (uint64_t)UINT32_MAX);
+}
+size = get_request_size(vrng->vq, quota);
+
+trace_virtio_rng_request(vrng, size, quota);
+
+size = MIN(vrng->quota_remaining, size);
+return size;
+}
+
 static void virtio_rng_process(VirtIORNG *vrng);
 
 /* Send data from a char device over to the guest */
@@ -72,7 +90,6 @@ static void chr_read(void *opaque, const void *buf, size_t 
size)
 static void virtio_rng_process(VirtIORNG *vrng)
 {
 size_t size;
-unsigned quota;
 
 if (!is_guest_ready(vrng)) {
 return;
@@ -83,17 +100,8 @@ static void virtio_rng_process(VirtIORNG *vrng)
qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
vrng->conf.period_ms);
vrng->activate_timer = false;
 }
+size = check_request(vrng);
 
-if (vrng->quota_remaining < 0) {
-quota = 0;
-} else {
-quota = MIN((uint64_t)vrng->quota_remaining, (uint64_t)UINT32_MAX);
-}
-size = get_request_size(vrng->vq, quota);
-
-trace_virtio_rng_request(vrng, size, quota);
-
-size = MIN(vrng->quota_remaining, size);
 if (size) {
 rng_backend_request_entropy(vrng->rng, size, chr_read, vrng);
 }
@@ -142,9 +150,13 @@ static int virtio_rng_load(QEMUFile *f, void *opaque, int 
version_id)
 static void check_rate_limit(void *opaque)
 {
 VirtIORNG *vrng = opaque;
+size_t size;
 
 vrng->quota_remaining = vrng->conf.max_bytes;
-virtio_rng_process(vrng);
+size = check_request(vrng);
+if (size > 0) {
+virtio_rng_process(vrng);
+}
 vrng->activate_timer = true;
 }
 
-- 
1.9.3

[Qemu-devel] [PATCH 0/2 v2] virtio-rng: Avoid uncessary timer trigger to bump up quota value

2015-07-14 Thread Pankaj Gupta

Timer was added in virtio-rng to rate limit the
entropy. It used to trigger at regular intervals to
bump up the quota value. The value of quota and timer
is used to ensure single guest should not use up all the 
entropy from the host.
This resulted in triggering of timer even when quota
is not exhausted at all and resulting in extra processing.

This series has two patches:

patch1 : Bump up quota value only when guest requests entropy.
patch2 : Serve pending request if any after timer bumps up quota.

Changes from v1:
Amit Shah : 
Serve pending request if any after timer bumps up quota.
Add testing details.

I tested this with '/dev/urandom' at host side. Below are the details:

* Quota+timer specified on command line, 
  Ran in Guest 'dd if=/dev/hwrng of=/dev/null'
  
  - Quota (4096 bytes), Time slice (1000 ms)

48152 bytes (49 KB) copied, 12.0005 s, 4.1 kB/s
48640 bytes (49 KB) copied, 12.0014 s, 4.1 kB/s

  - Quota (8192), Time slice (1000 ms)
65536 bytes  (66 KB) copied, 8.00088 s, 8.2 kB/s
146944 bytes (147 KB)copied, 18.0021 s, 8.2 kB/s

  - No quota/timer specified on command line, takes default values.
Quota (INT64_MAX), Time slice(65536 ms)
8050688 bytes (8.1 MB) copied, 93.1198 s, 86.5 kB/s
35568128 bytes (36 MB) copied, 408.823 s, 87.0 kB/s


 hw/virtio/virtio-rng.c |   51 +
 include/hw/virtio/virtio-rng.h |1 
 2 files changed, 33 insertions(+), 19 deletions(-)

Re: [Qemu-devel] [PATCH v3 14/16] acpi: Add a way for devices to add ACPI tables

2015-07-14 Thread Igor Mammedov

On Mon, 13 Jul 2015 16:55:29 -0500
Corey Minyard  wrote:

> On 07/09/2015 03:25 AM, Igor Mammedov wrote:
> > On Wed, 8 Jul 2015 22:39:24 +0200
> > Paolo Bonzini  wrote:
> >
> >>
> >> On 08/07/2015 21:26, Igor Mammedov wrote:
>  This was suggested by Michael, so I think you should read the reviews
>  of earlier versions first.
> >>> That is basically the same as hooks in v1 only the other way around
> >>> with all drawbacks attached.
> >>>
> >>> Just dropping this universal way to scatter ACPI code all over QEMU
> >>> and calling ipmi_encode_one_acpi() "[15/16] ipmi: Add ACPI table entries"
> >>> directly from build_ssdt() would simplify series quite a bit.
> >> On the other hand, it would be much harder to make IPMI optional unless
> >> you want to add a bunch of ugly stubs.  Same for SMBIOS.  Especially
> > it's no the case if one uses QOM properties to fetch info.
> > it's a bit verbose but works well without need for ugly stubs.
> >
> >> when you can have the same ACPI and SMBIOS tables in both x86 and ARM
> >> virtual machines, it becomes a little more the device business to
> >> provide ACPI and SMBIOS information.
> > For a generic way for device to supply information to ACPI,
> > I was thinking about introducing interface that device might implement
> > for example if it should have _CRS in its ACPI description.
> > Then APCI core could query for devices that implement interface
> > and build AML code based on resource values (i.e. base address,
> > size, irq #, e.t.c) it gets from device.
> 
> I'm just getting back from vacation and catching up.
> 
> I would agree that implementing an interface is a more consistent
> way to do this than the way it's done now.  If it's ok with everyone,
> I'll work on implementing this.
Implementing interface shouldn't block this series.
I'm ok with this series ACPI wise if you drop 14/16 and do as suggested
above + address comment on 15/16.
It would require minimal changes and get us an interesting device
to play with without significant delays.
I would do the same with SMBIOS side as well.

Implementing interface could be made on top along with conversion
of current code that fetches resource info from devices manually.

> 
> I'm not sure that resource values are enough to do this, though.
> IPMI has some custom fields (_IFT, _SRV, _STR) and lots of optional
> fields (_ADR, _UID, _STA), how would you do these fields?
fields names are part of ACPI spec,
maybe interface should have methods for returning these fields.

_ADR - for example for PCI devices should address of device on bus
_SRV - return packed revision number
_IFT - return interface number
_STR, _UID, _STA - I'd leave them to board/firmware to decide how to
name/enumerate/detect presence (i.e. in acpi-build.c)

> 
> -corey
> 
> >
> >> So overall I'm happy with the way it was done here.  I and Michael
> >> disagreed on several details, but not enough to fight over it...
> > I won't fight over it either, but it's my opinion how it should be
> > done.
> >
> >> Paolo
> >>
> >>> I'd do the same for SMBIOS entries as well, i.e. drop similar
> >>> universal way for devices to supply SMBIOS info (it's not their
> >>> business) and just call ipmi_encode_one_smbios()  to add ipmi specific
> >>> entry if device is present. Again simplifying series even more.
> >>>
> >>> that roughly would save us 300 lines of not really necessary code
> >>> and keep things consistent with a way we are managing ssdt now.
>

Re: [Qemu-devel] [PATCH for-2.4 1/2] core: reset handler for bus-less devices

2015-07-14 Thread Cornelia Huck

On Mon, 13 Jul 2015 18:06:45 +0200
Andreas Färber  wrote:

> Found it. So the problem *is* different from what I understood! It's not
> directly attached to /machine by s390x code, but rather instantiated via
> -device by the user.
> 
> Peter C. suggested you to do it in realize, which affects all devices.
> 
> The solution would be to instead either do the reset registration in
> qdev-monitor.c, where it's specific to devices that do not have a bus
> and on /machine/peripheral or /machine/periph-anon are not managed by a
> parent, or to add a further check here in realized. Right now I can only
> think of a hot-plug flag...? Not sure about unrealizing in the
> qdev-monitor case, but I think we can ignore that in this case?

This sounds not like something we'll want to do during hardfreeze,
though.

I'll just fall back to the original patch that registered a reset so
this works for now, and we can think of a more generic solution later.

[Qemu-devel] [PATCH for-2.4 04/12] usbnet: Drop usbnet_can_receive

2015-07-14 Thread Fam Zheng

usbnet_receive already drops packet if rndis_state is not
RNDIS_DATA_INITIALIZED, and queues packet if in buffer is not available.
The only difference is s->dev.config but that is similar to rndis_state.

Drop usbnet_can_receive and move these checks to usbnet_receive, so that
we don't need to explicitly flush the queue when s->dev.config changes
value.

Signed-off-by: Fam Zheng 
---
 hw/usb/dev-network.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 5eeb4c6..7800cee 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1268,6 +1268,10 @@ static ssize_t usbnet_receive(NetClientState *nc, const 
uint8_t *buf, size_t siz
 uint8_t *in_buf = s->in_buf;
 size_t total_size = size;
 
+if (!s->dev.config) {
+return -1;
+}
+
 if (is_rndis(s)) {
 if (s->rndis_state != RNDIS_DATA_INITIALIZED) {
 return -1;
@@ -1309,21 +1313,6 @@ static ssize_t usbnet_receive(NetClientState *nc, const 
uint8_t *buf, size_t siz
 return size;
 }
 
-static int usbnet_can_receive(NetClientState *nc)
-{
-USBNetState *s = qemu_get_nic_opaque(nc);
-
-if (!s->dev.config) {
-return 0;
-}
-
-if (is_rndis(s) && s->rndis_state != RNDIS_DATA_INITIALIZED) {
-return 1;
-}
-
-return !s->in_len;
-}
-
 static void usbnet_cleanup(NetClientState *nc)
 {
 USBNetState *s = qemu_get_nic_opaque(nc);
@@ -1343,7 +1332,6 @@ static void usb_net_handle_destroy(USBDevice *dev)
 static NetClientInfo net_usbnet_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = usbnet_can_receive,
 .receive = usbnet_receive,
 .cleanup = usbnet_cleanup,
 };
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 00/12] hw/net: Fix .can_receive() for NICs

2015-07-14 Thread Fam Zheng

Since a90a742 "tap: Drop tap_can_send", all nics that returns false from
.can_receive() are required to explicitly flush the incoming queue when the
status of it is changing back to true, otherwise the backend will sop
processing more rx packets.

The purpose of this callback is to tell the peer backend (tap, socket, etc)
"hold on until guest consumes old data because my buffer is not ready". More
often than not NICs also do this when driver deactivated the card or disabled
rx, causing the packets being unnessarily queued, where they should actualy be
dropped.

This series adds such missing qemu_flush_queued_packets calls for all NICs, and
drops such unnecessary conditions in .can_receive(), so that NICs now:

  - returns false from .can_receive when guest buffers are busy.
  - calls qemu_flush_queued_packets when buffers are available again.
  - returns -1 from .receive when rx is not enabled.

e1000, ne2000, rocker and vmxnet3 are not included because they're fixed by
other patches on the list and applied to Stefan's tree.

virtio-net is covered by another thread:

https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07377.html

All other NICs are okay, as they already do the flush on the state transition
points.

Please review.

Fam Zheng (12):
  xgmac: Drop packets with eth_can_rx is false.
  pcnet: Drop pcnet_can_receive
  eepro100: Drop nic_can_receive
  usbnet: Drop usbnet_can_receive
  etsec: Move etsec_can_receive into etsec_receive
  etsec: Flush queue when rx buffer is consumed
  mcf_fec: Drop mcf_fec_can_receive
  milkymist-minimac2: Flush queued packets when link comes up
  mipsnet: Flush queued packets when receiving is enabled
  stellaris_enet: Flush queued packets when read done
  dp8393x: Flush packets when link comes up
  axienet: Flush queued packets when rx is done

 hw/net/dp8393x.c|  8 
 hw/net/eepro100.c   | 11 ---
 hw/net/fsl_etsec/etsec.c| 20 ++--
 hw/net/fsl_etsec/etsec.h|  4 +++-
 hw/net/fsl_etsec/rings.c| 15 +--
 hw/net/lance.c  |  1 -
 hw/net/mcf_fec.c|  9 +
 hw/net/milkymist-minimac2.c | 12 ++--
 hw/net/mipsnet.c|  9 +++--
 hw/net/pcnet-pci.c  |  1 -
 hw/net/pcnet.c  |  9 -
 hw/net/pcnet.h  |  1 -
 hw/net/stellaris_enet.c | 14 +-
 hw/net/xgmac.c  |  8 
 hw/net/xilinx_axienet.c | 22 ++
 hw/usb/dev-network.c| 20 
 16 files changed, 79 insertions(+), 85 deletions(-)

-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 05/12] etsec: Move etsec_can_receive into etsec_receive

2015-07-14 Thread Fam Zheng

When etsec_reset returns 0, peer would queue the packet as if
.can_receive returns false. Drop etsec_can_receive and let etsec_receive
carry the semantics.

Signed-off-by: Fam Zheng 
---
 hw/net/fsl_etsec/etsec.c | 11 +--
 hw/net/fsl_etsec/etsec.h |  2 +-
 hw/net/fsl_etsec/rings.c | 14 --
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
index c57365f..f5170ae 100644
--- a/hw/net/fsl_etsec/etsec.c
+++ b/hw/net/fsl_etsec/etsec.c
@@ -338,13 +338,6 @@ static void etsec_reset(DeviceState *d)
 MII_SR_100X_FD_CAPS | MII_SR_100T4_CAPS;
 }
 
-static int etsec_can_receive(NetClientState *nc)
-{
-eTSEC *etsec = qemu_get_nic_opaque(nc);
-
-return etsec->rx_buffer_len == 0;
-}
-
 static ssize_t etsec_receive(NetClientState *nc,
  const uint8_t  *buf,
  size_t  size)
@@ -355,8 +348,7 @@ static ssize_t etsec_receive(NetClientState *nc,
 fprintf(stderr, "%s receive size:%d\n", etsec->nic->nc.name, size);
 qemu_hexdump(buf, stderr, "", size);
 #endif
-etsec_rx_ring_write(etsec, buf, size);
-return size;
+return etsec_rx_ring_write(etsec, buf, size);
 }
 
 
@@ -370,7 +362,6 @@ static void etsec_set_link_status(NetClientState *nc)
 static NetClientInfo net_etsec_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = etsec_can_receive,
 .receive = etsec_receive,
 .link_status_changed = etsec_set_link_status,
 };
diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
index 78d2c57..fc41773 100644
--- a/hw/net/fsl_etsec/etsec.h
+++ b/hw/net/fsl_etsec/etsec.h
@@ -162,7 +162,7 @@ DeviceState *etsec_create(hwaddrbase,
 
 void etsec_walk_tx_ring(eTSEC *etsec, int ring_nbr);
 void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr);
-void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
+ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
 
 void etsec_write_miim(eTSEC  *etsec,
   eTSEC_Register *reg,
diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
index d4a494f..a11280b 100644
--- a/hw/net/fsl_etsec/rings.c
+++ b/hw/net/fsl_etsec/rings.c
@@ -481,40 +481,42 @@ static void rx_init_frame(eTSEC *etsec, const uint8_t 
*buf, size_t size)
etsec->rx_buffer_len, etsec->rx_padding);
 }
 
-void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
+ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
 {
 int ring_nbr = 0;   /* Always use ring0 (no filer) */
 
 if (etsec->rx_buffer_len != 0) {
 RING_DEBUG("%s: We can't receive now,"
" a buffer is already in the pipe\n", __func__);
-return;
+return 0;
 }
 
 if (etsec->regs[RSTAT].value & 1 << (23 - ring_nbr)) {
 RING_DEBUG("%s: The ring is halted\n", __func__);
-return;
+return -1;
 }
 
 if (etsec->regs[DMACTRL].value & DMACTRL_GRS) {
 RING_DEBUG("%s: Graceful receive stop\n", __func__);
-return;
+return -1;
 }
 
 if (!(etsec->regs[MACCFG1].value & MACCFG1_RX_EN)) {
 RING_DEBUG("%s: MAC Receive not enabled\n", __func__);
-return;
+return -1;
 }
 
 if ((etsec->regs[RCTRL].value & RCTRL_RSF) && (size < 60)) {
 /* CRC is not in the packet yet, so short frame is below 60 bytes */
 RING_DEBUG("%s: Drop short frame\n", __func__);
-return;
+return -1;
 }
 
 rx_init_frame(etsec, buf, size);
 
 etsec_walk_rx_ring(etsec, ring_nbr);
+
+return size;
 }
 
 void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr)
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 01/12] xgmac: Drop packets with eth_can_rx is false.

2015-07-14 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 hw/net/xgmac.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
index b068f3a..15fb681 100644
--- a/hw/net/xgmac.c
+++ b/hw/net/xgmac.c
@@ -312,10 +312,8 @@ static const MemoryRegionOps enet_mem_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static int eth_can_rx(NetClientState *nc)
+static int eth_can_rx(XgmacState *s)
 {
-XgmacState *s = qemu_get_nic_opaque(nc);
-
 /* RX enabled?  */
 return s->regs[DMA_CONTROL] & DMA_CONTROL_SR;
 }
@@ -329,6 +327,9 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t 
*buf, size_t size)
 struct desc bd;
 ssize_t ret;
 
+if (!eth_can_rx(s)) {
+return -1;
+}
 unicast = ~buf[0] & 0x1;
 broadcast = memcmp(buf, sa_bcast, 6) == 0;
 multicast = !unicast && !broadcast;
@@ -371,7 +372,6 @@ out:
 static NetClientInfo net_xgmac_enet_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = eth_can_rx,
 .receive = eth_rx,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH v4 0/5] vGICv3 support

2015-07-14 Thread Pavel Fedin

This series introduces support for GICv3 by KVM. Software emulation is
currently not supported.

Differences from v3:
- Fixed stupid build breakage in patch 0002
- Rebased on top of current master, patch 0003 adjusted according to
  kvm_irqchip_create() changes
- Added assertion against uninitialized kernel_irqchip_type
- Removed kernel_irqchip_type initialization from models which do not
  use KVM vGIC

Differences from v2:
- Removed some unrelated and unnecessary changes from virt machine,
  occasionally slipped in; some of them caused qemu to crash on ARM32.
- Fixed build for ARM32; vGICv3 code requires definitions which are
  present only in ARM64 kernel

Differences from v1:
- Base class included, taken from the series by Shlomo Pongratz:
  http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg01512.html
  The code is refactored as little as possible in order to simplify
  further addition of software emulation:
  - Minor fixes in code style and comments, according to old reviews
  - Removed REV_V3 definition because it's currently not used, and it does
not add any meaning to number 3.
  - Removed reserved regions for MBI and ITS (except for 'virt' machine
memory map). These should go to separate classes when implemented.
- Improved commit messages
- vGIC patches restructured
- Use 'gicversion' option instead of virt-v3 machine

Pavel Fedin (4):
  Extract some reusable vGIC code
  Introduce irqchip type specification for KVM
  Initial implementation of vGICv3
  Add gicversion option to virt machine

Shlomo Pongratz (1):
  Implement GIC-500 base class

 hw/arm/vexpress.c  |   1 +
 hw/arm/virt.c  | 138 
 hw/intc/Makefile.objs  |   4 +
 hw/intc/arm_gic_kvm.c  |  84 +++
 hw/intc/arm_gicv3_common.c | 216 +
 hw/intc/arm_gicv3_kvm.c| 192 +
 hw/intc/gicv3_internal.h   | 156 +++
 hw/intc/vgic_common.h  |  43 
 include/hw/arm/fdt.h   |   2 +-
 include/hw/arm/virt.h  |   6 +-
 include/hw/boards.h|   1 +
 include/hw/intc/arm_gicv3_common.h | 113 +++
 include/sysemu/kvm.h   |   3 +-
 kvm-all.c  |   2 +-
 stubs/kvm.c|   2 +-
 target-arm/kvm.c   |  13 ++-
 16 files changed, 903 insertions(+), 73 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_common.c
 create mode 100644 hw/intc/arm_gicv3_kvm.c
 create mode 100644 hw/intc/gicv3_internal.h
 create mode 100644 hw/intc/vgic_common.h
 create mode 100644 include/hw/intc/arm_gicv3_common.h

-- 
2.4.4

[Qemu-devel] [PATCH for-2.4 07/12] mcf_fec: Drop mcf_fec_can_receive

2015-07-14 Thread Fam Zheng

The semantics of .can_receive requires us to flush the queue explicitly
when s->rx_enabled becomes true after it returns 0, but the packet being
queued is not meaningful since the guest hasn't activated the card.
Let's just drop the packet in this case.

Signed-off-by: Fam Zheng 
---
 hw/net/mcf_fec.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index 0255612..0b17c1f 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -351,12 +351,6 @@ static void mcf_fec_write(void *opaque, hwaddr addr,
 mcf_fec_update(s);
 }
 
-static int mcf_fec_can_receive(NetClientState *nc)
-{
-mcf_fec_state *s = qemu_get_nic_opaque(nc);
-return s->rx_enabled;
-}
-
 static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf, size_t 
size)
 {
 mcf_fec_state *s = qemu_get_nic_opaque(nc);
@@ -370,7 +364,7 @@ static ssize_t mcf_fec_receive(NetClientState *nc, const 
uint8_t *buf, size_t si
 
 DPRINTF("do_rx len %d\n", size);
 if (!s->rx_enabled) {
-fprintf(stderr, "mcf_fec_receive: Unexpected packet\n");
+return -1;
 }
 /* 4 bytes for the CRC.  */
 size += 4;
@@ -442,7 +436,6 @@ static const MemoryRegionOps mcf_fec_ops = {
 static NetClientInfo net_mcf_fec_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = mcf_fec_can_receive,
 .receive = mcf_fec_receive,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 11/12] dp8393x: Flush packets when link comes up

2015-07-14 Thread Fam Zheng

.can_receive callback changes semantics that once return 0, backend will
try sending again until explicitly flushed, change the device to meet
that.

dp8393x_can_receive checks SONIC_CR_RXEN bit in SONIC_CR register and
SONIC_ISR_RBE bit in SONIC_ISR register, try flushing the queue when
either bit is being updated.

Signed-off-by: Fam Zheng 
---
 hw/net/dp8393x.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index cd889bc..451ff72 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -327,9 +327,14 @@ static void dp8393x_do_stop_timer(dp8393xState *s)
 dp8393x_update_wt_regs(s);
 }
 
+static int dp8393x_can_receive(NetClientState *nc);
+
 static void dp8393x_do_receiver_enable(dp8393xState *s)
 {
 s->regs[SONIC_CR] &= ~SONIC_CR_RXDIS;
+if (dp8393x_can_receive(s->nic->ncs)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 }
 
 static void dp8393x_do_receiver_disable(dp8393xState *s)
@@ -569,6 +574,9 @@ static void dp8393x_write(void *opaque, hwaddr addr, 
uint64_t data,
 dp8393x_do_read_rra(s);
 }
 dp8393x_update_irq(s);
+if (dp8393x_can_receive(s->nic->ncs)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 break;
 /* Ignore least significant bit */
 case SONIC_RSA:
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 02/12] pcnet: Drop pcnet_can_receive

2015-07-14 Thread Fam Zheng

pcnet_receive already checks the conditions and drop packets if false.
Due to the new semantics since 6e99c63 ("net/socket: Drop
net_socket_can_send"), having .can_receive returning 0 requires us to
explicitly flush the queued packets when the conditions are becoming
true, but queuing the packets when guest driver is not ready doesn't
make much sense.

Signed-off-by: Fam Zheng 
---
 hw/net/lance.c | 1 -
 hw/net/pcnet-pci.c | 1 -
 hw/net/pcnet.c | 9 -
 hw/net/pcnet.h | 1 -
 4 files changed, 12 deletions(-)

diff --git a/hw/net/lance.c b/hw/net/lance.c
index 4baa016..780b39d 100644
--- a/hw/net/lance.c
+++ b/hw/net/lance.c
@@ -94,7 +94,6 @@ static const MemoryRegionOps lance_mem_ops = {
 static NetClientInfo net_lance_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = pcnet_can_receive,
 .receive = pcnet_receive,
 .link_status_changed = pcnet_set_link_status,
 };
diff --git a/hw/net/pcnet-pci.c b/hw/net/pcnet-pci.c
index 8305d1b..b4d60b8 100644
--- a/hw/net/pcnet-pci.c
+++ b/hw/net/pcnet-pci.c
@@ -273,7 +273,6 @@ static void pci_pcnet_uninit(PCIDevice *dev)
 static NetClientInfo net_pci_pcnet_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = pcnet_can_receive,
 .receive = pcnet_receive,
 .link_status_changed = pcnet_set_link_status,
 };
diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 68b9981..3437376 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -995,15 +995,6 @@ static int pcnet_tdte_poll(PCNetState *s)
 return !!(CSR_CXST(s) & 0x8000);
 }
 
-int pcnet_can_receive(NetClientState *nc)
-{
-PCNetState *s = qemu_get_nic_opaque(nc);
-if (CSR_STOP(s) || CSR_SPND(s))
-return 0;
-
-return sizeof(s->buffer)-16;
-}
-
 #define MIN_BUF_SIZE 60
 
 ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
diff --git a/hw/net/pcnet.h b/hw/net/pcnet.h
index 79c4c84..dec8de8 100644
--- a/hw/net/pcnet.h
+++ b/hw/net/pcnet.h
@@ -60,7 +60,6 @@ uint32_t pcnet_ioport_readw(void *opaque, uint32_t addr);
 void pcnet_ioport_writel(void *opaque, uint32_t addr, uint32_t val);
 uint32_t pcnet_ioport_readl(void *opaque, uint32_t addr);
 uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap);
-int pcnet_can_receive(NetClientState *nc);
 ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
 void pcnet_set_link_status(NetClientState *nc);
 void pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info);
-- 
2.4.3

[Qemu-devel] [PATCH v4 1/5] Implement GIC-500 base class

2015-07-14 Thread Pavel Fedin

From: Shlomo Pongratz 

This class is to be used by both software and KVM implementations of GICv3

Signed-off-by: Shlomo Pongratz 
Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs  |   1 +
 hw/intc/arm_gicv3_common.c | 216 +
 hw/intc/gicv3_internal.h   | 156 +++
 include/hw/intc/arm_gicv3_common.h | 113 +++
 4 files changed, 486 insertions(+)
 create mode 100644 hw/intc/arm_gicv3_common.c
 create mode 100644 hw/intc/gicv3_internal.h
 create mode 100644 include/hw/intc/arm_gicv3_common.h

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 092d8a8..1317e5a 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -12,6 +12,7 @@ common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gicv2m.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
new file mode 100644
index 000..86af129
--- /dev/null
+++ b/hw/intc/arm_gicv3_common.c
@@ -0,0 +1,216 @@
+/*
+ * ARM GICv3 support - common bits of emulated and KVM kernel model
+ *
+ * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "gicv3_internal.h"
+
+static void gicv3_pre_save(void *opaque)
+{
+GICv3State *s = (GICv3State *)opaque;
+ARMGICv3CommonClass *c = ARM_GICV3_COMMON_GET_CLASS(s);
+
+if (c->pre_save) {
+c->pre_save(s);
+}
+}
+
+static int gicv3_post_load(void *opaque, int version_id)
+{
+GICv3State *s = (GICv3State *)opaque;
+ARMGICv3CommonClass *c = ARM_GICV3_COMMON_GET_CLASS(s);
+
+if (c->post_load) {
+c->post_load(s);
+}
+return 0;
+}
+
+static const VMStateDescription vmstate_gicv3_irq_state = {
+.name = "arm_gicv3_irq_state",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(pending, gicv3_irq_state),
+VMSTATE_UINT64(active, gicv3_irq_state),
+VMSTATE_UINT64(level, gicv3_irq_state),
+VMSTATE_UINT64(group, gicv3_irq_state),
+VMSTATE_BOOL(edge_trigger, gicv3_irq_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_gicv3_sgi_state = {
+.name = "arm_gicv3_sgi_state",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64_ARRAY(pending, gicv3_sgi_state, GICV3_NCPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_gicv3 = {
+.name = "arm_gicv3",
+.version_id = 7,
+.minimum_version_id = 7,
+.pre_save = gicv3_pre_save,
+.post_load = gicv3_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(ctlr, GICv3State),
+VMSTATE_UINT32_ARRAY(cpu_ctlr, GICv3State, GICV3_NCPU),
+VMSTATE_STRUCT_ARRAY(irq_state, GICv3State, GICV3_MAXIRQ, 1,
+ vmstate_gicv3_irq_state, gicv3_irq_state),
+VMSTATE_UINT64_ARRAY(irq_target, GICv3State, GICV3_MAXIRQ),
+VMSTATE_UINT8_2DARRAY(priority1, GICv3State, GICV3_INTERNAL,
+  GICV3_NCPU),
+VMSTATE_UINT8_ARRAY(priority2, GICv3State,
+GICV3_MAXIRQ - GICV3_INTERNAL),
+VMSTATE_UINT16_2DARRAY(last_active, GICv3State, GICV3_MAXIRQ,
+   GICV3_NCPU),
+VMSTATE_STRUCT_ARRAY(sgi_state, GICv3State, GICV3_NR_SGIS, 1,
+ vmstate_gicv3_sgi_state, gicv3_sgi_state),
+VMSTATE_UINT16_ARRAY(priority_mask, GICv3State, GICV3_NCPU),
+VMSTATE_UINT16_ARRAY(running_irq, GICv3State, GICV3_NCPU),
+VMSTATE_UINT16_ARRAY(running_priority, GICv3State, GICV3_NCPU),
+VMSTATE_UINT16_ARRAY(current_pending, GICv3State, GICV3_NCPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
+{
+GICv3State *s = ARM_GICV3_COMMON(dev);
+int num_irq = s->num_irq;
+
+if (s->num_cpu > GICV3_NCPU) {
+error_setg(errp,

[Qemu-devel] [PATCH for-2.4 09/12] mipsnet: Flush queued packets when receiving is enabled

2015-07-14 Thread Fam Zheng

Drop .can_receive and move the semantics to mipsnet_receive, by
returning 0.

After 0 is returned, we must flush the queue explicitly to restart it:
Call qemu_flush_queued_packets when s->busy or s->rx_count is being
updated.

Signed-off-by: Fam Zheng 
---
 hw/net/mipsnet.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
index c813e0c..f261011 100644
--- a/hw/net/mipsnet.c
+++ b/hw/net/mipsnet.c
@@ -80,7 +80,7 @@ static ssize_t mipsnet_receive(NetClientState *nc, const 
uint8_t *buf, size_t si
 
 trace_mipsnet_receive(size);
 if (!mipsnet_can_receive(nc))
-return -1;
+return 0;
 
 s->busy = 1;
 
@@ -134,6 +134,9 @@ static uint64_t mipsnet_ioport_read(void *opaque, hwaddr 
addr,
 if (s->rx_count) {
 s->rx_count--;
 ret = s->rx_buffer[s->rx_read++];
+if (mipsnet_can_receive(s->nic->ncs)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 }
 break;
 /* Reads as zero. */
@@ -170,6 +173,9 @@ static void mipsnet_ioport_write(void *opaque, hwaddr addr,
 }
 s->busy = !!s->intctl;
 mipsnet_update_irq(s);
+if (mipsnet_can_receive(s->nic->ncs)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 break;
 case MIPSNET_TX_DATA_BUFFER:
 s->tx_buffer[s->tx_written++] = val;
@@ -214,7 +220,6 @@ static const VMStateDescription vmstate_mipsnet = {
 static NetClientInfo net_mipsnet_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = mipsnet_can_receive,
 .receive = mipsnet_receive,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH v4 3/5] Introduce irqchip type specification for KVM

2015-07-14 Thread Pavel Fedin

This patch introduces kernel_irqchip_type member in Machine class, which
it passed to kvm_arch_irqchip_create. It allows machine models to specify
correct GIC type during KVM capability verification. The variable is
defined as int in order to be architecture-agnostic for potential future
uses by other architectures.

Signed-off-by: Pavel Fedin 
---
 hw/arm/vexpress.c|  1 +
 hw/arm/virt.c|  3 +++
 include/hw/boards.h  |  1 +
 include/sysemu/kvm.h |  3 ++-
 kvm-all.c|  2 +-
 stubs/kvm.c  |  2 +-
 target-arm/kvm.c | 13 +++--
 7 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index da21788..0675a00 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -556,6 +556,7 @@ static void vexpress_common_init(MachineState *machine)
 const hwaddr *map = daughterboard->motherboard_map;
 int i;
 
+machine->kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2;
 daughterboard->init(vms, machine->ram_size, machine->cpu_model, pic);
 
 /*
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4846892..2e7d858 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -945,6 +945,9 @@ static void virt_instance_init(Object *obj)
 "Set on/off to enable/disable the ARM "
 "Security Extensions (TrustZone)",
 NULL);
+
+/* Default GIC type is v2 */
+vms->parent.kernel_irqchip_type = KVM_DEV_TYPE_ARM_VGIC_V2;
 }
 
 static void virt_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2aec9cb..37eb767 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -127,6 +127,7 @@ struct MachineState {
 char *accel;
 bool kernel_irqchip_allowed;
 bool kernel_irqchip_required;
+int kernel_irqchip_type;
 int kvm_shadow_mem;
 char *dtb;
 char *dumpdtb;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 983e99e..8f4d485 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -434,6 +434,7 @@ void kvm_init_irq_routing(KVMState *s);
 /**
  * kvm_arch_irqchip_create:
  * @KVMState: The KVMState pointer
+ * @type: irqchip type, architecture-specific
  *
  * Allow architectures to create an in-kernel irq chip themselves.
  *
@@ -441,7 +442,7 @@ void kvm_init_irq_routing(KVMState *s);
  *0: irq chip was not created
  *  > 0: irq chip was created
  */
-int kvm_arch_irqchip_create(KVMState *s);
+int kvm_arch_irqchip_create(KVMState *s, int type);
 
 /**
  * kvm_set_one_reg - set a register value in KVM via KVM_SET_ONE_REG ioctl
diff --git a/kvm-all.c b/kvm-all.c
index 06e06f2..8df938d 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1395,7 +1395,7 @@ static void kvm_irqchip_create(MachineState *machine, 
KVMState *s)
 
 /* First probe and see if there's a arch-specific hook to create the
  * in-kernel irqchip for us */
-ret = kvm_arch_irqchip_create(s);
+ret = kvm_arch_irqchip_create(s, machine->kernel_irqchip_type);
 if (ret == 0) {
 ret = kvm_vm_ioctl(s, KVM_CREATE_IRQCHIP);
 }
diff --git a/stubs/kvm.c b/stubs/kvm.c
index e7c60b6..a8505ff 100644
--- a/stubs/kvm.c
+++ b/stubs/kvm.c
@@ -1,7 +1,7 @@
 #include "qemu-common.h"
 #include "sysemu/kvm.h"
 
-int kvm_arch_irqchip_create(KVMState *s)
+int kvm_arch_irqchip_create(KVMState *s, int type)
 {
 return 0;
 }
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 548bfd7..3992c1f 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -579,19 +579,28 @@ void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
 
-int kvm_arch_irqchip_create(KVMState *s)
+int kvm_arch_irqchip_create(KVMState *s, int type)
 {
 int ret;
 
+/* Failure here means forgotten initialization of
+ * machine->kernel_irqchip_type in model code
+ */
+assert(type != 0);
+
 /* If we can create the VGIC using the newer device control API, we
  * let the device do this when it initializes itself, otherwise we
  * fall back to the old API */
 
-ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V2, true);
+ret = kvm_create_device(s, type, true);
 if (ret == 0) {
 return 1;
 }
 
+/* Fallback will create VGIC v2 */
+if (type != KVM_DEV_TYPE_ARM_VGIC_V2) {
+return ret;
+}
 return 0;
 }
 
-- 
2.4.4

[Qemu-devel] [PATCH for-2.4 03/12] eepro100: Drop nic_can_receive

2015-07-14 Thread Fam Zheng

nic_receive already checks the conditions and drop packets if false.
Due to the new semantics since 6e99c63 ("net/socket: Drop
net_socket_can_send"), having .can_receive returning 0 requires us to
explicitly flush the queued packets when the conditions are becoming
true, but queuing the packets when guest driver is not ready doesn't
make much sense.

Signed-off-by: Fam Zheng 
---
 hw/net/eepro100.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index c374c1a..60333b7 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -1617,16 +1617,6 @@ static const MemoryRegionOps eepro100_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static int nic_can_receive(NetClientState *nc)
-{
-EEPRO100State *s = qemu_get_nic_opaque(nc);
-TRACE(RXTX, logout("%p\n", s));
-return get_ru_state(s) == ru_ready;
-#if 0
-return !eepro100_buffer_full(s);
-#endif
-}
-
 static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t 
size)
 {
 /* TODO:
@@ -1844,7 +1834,6 @@ static void pci_nic_uninit(PCIDevice *pci_dev)
 static NetClientInfo net_eepro100_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = nic_can_receive,
 .receive = nic_receive,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH v4 2/5] Extract some reusable vGIC code

2015-07-14 Thread Pavel Fedin

These functions are useful also for vGICv3 implementation. Make them accessible
from within other modules.

Actually kvm_dist_get() and kvm_dist_put() could also be made reusable, but
they would require two extra parameters (s->dev_fd and s->num_cpu) as well as
lots of typecasts of 's' to DeviceState * and back to GICState *. This makes
the code very ugly so i decided to stop at this point. I tried also an
approach with making a base class for all possible GICs, but it would contain
only three variables (dev_fd, cpu_num and irq_num), and accessing them through
the rest of the code would be again tedious (either ugly casts or qemu-style
separate object pointer). So i disliked it too.

Signed-off-by: Pavel Fedin 
---
 hw/intc/arm_gic_kvm.c | 84 ---
 hw/intc/vgic_common.h | 43 ++
 2 files changed, 82 insertions(+), 45 deletions(-)
 create mode 100644 hw/intc/vgic_common.h

diff --git a/hw/intc/arm_gic_kvm.c b/hw/intc/arm_gic_kvm.c
index f56bff1..9b6d702 100644
--- a/hw/intc/arm_gic_kvm.c
+++ b/hw/intc/arm_gic_kvm.c
@@ -23,6 +23,7 @@
 #include "sysemu/kvm.h"
 #include "kvm_arm.h"
 #include "gic_internal.h"
+#include "vgic_common.h"
 
 //#define DEBUG_GIC_KVM
 
@@ -52,7 +53,7 @@ typedef struct KVMARMGICClass {
 void (*parent_reset)(DeviceState *dev);
 } KVMARMGICClass;
 
-static void kvm_arm_gic_set_irq(void *opaque, int irq, int level)
+void kvm_arm_gic_set_irq(uint32_t num_irq, int irq, int level)
 {
 /* Meaning of the 'irq' parameter:
  *  [0..N-1] : external interrupts
@@ -63,10 +64,9 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int 
level)
  * has separate fields in the irq number for type,
  * CPU number and interrupt number.
  */
-GICState *s = (GICState *)opaque;
 int kvm_irq, irqtype, cpu;
 
-if (irq < (s->num_irq - GIC_INTERNAL)) {
+if (irq < (num_irq - GIC_INTERNAL)) {
 /* External interrupt. The kernel numbers these like the GIC
  * hardware, with external interrupt IDs starting after the
  * internal ones.
@@ -77,7 +77,7 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int 
level)
 } else {
 /* Internal interrupt: decode into (cpu, interrupt id) */
 irqtype = KVM_ARM_IRQ_TYPE_PPI;
-irq -= (s->num_irq - GIC_INTERNAL);
+irq -= (num_irq - GIC_INTERNAL);
 cpu = irq / GIC_INTERNAL;
 irq %= GIC_INTERNAL;
 }
@@ -87,12 +87,19 @@ static void kvm_arm_gic_set_irq(void *opaque, int irq, int 
level)
 kvm_set_irq(kvm_state, kvm_irq, !!level);
 }
 
+static void kvm_arm_gicv2_set_irq(void *opaque, int irq, int level)
+{
+GICState *s = (GICState *)opaque;
+
+kvm_arm_gic_set_irq(s->num_irq, irq, level);
+}
+
 static bool kvm_arm_gic_can_save_restore(GICState *s)
 {
 return s->dev_fd >= 0;
 }
 
-static bool kvm_gic_supports_attr(GICState *s, int group, int attrnum)
+bool kvm_gic_supports_attr(int dev_fd, int group, int attrnum)
 {
 struct kvm_device_attr attr = {
 .group = group,
@@ -100,14 +107,14 @@ static bool kvm_gic_supports_attr(GICState *s, int group, 
int attrnum)
 .flags = 0,
 };
 
-if (s->dev_fd == -1) {
+if (dev_fd == -1) {
 return false;
 }
 
-return kvm_device_ioctl(s->dev_fd, KVM_HAS_DEVICE_ATTR, &attr) == 0;
+return kvm_device_ioctl(dev_fd, KVM_HAS_DEVICE_ATTR, &attr) == 0;
 }
 
-static void kvm_gic_access(GICState *s, int group, int offset,
+void kvm_gic_access(int dev_fd, int group, int offset,
int cpu, uint32_t *val, bool write)
 {
 struct kvm_device_attr attr;
@@ -130,7 +137,7 @@ static void kvm_gic_access(GICState *s, int group, int 
offset,
 type = KVM_GET_DEVICE_ATTR;
 }
 
-err = kvm_device_ioctl(s->dev_fd, type, &attr);
+err = kvm_device_ioctl(dev_fd, type, &attr);
 if (err < 0) {
 fprintf(stderr, "KVM_{SET/GET}_DEVICE_ATTR failed: %s\n",
 strerror(-err));
@@ -138,20 +145,6 @@ static void kvm_gic_access(GICState *s, int group, int 
offset,
 }
 }
 
-static void kvm_gicd_access(GICState *s, int offset, int cpu,
-uint32_t *val, bool write)
-{
-kvm_gic_access(s, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
-   offset, cpu, val, write);
-}
-
-static void kvm_gicc_access(GICState *s, int offset, int cpu,
-uint32_t *val, bool write)
-{
-kvm_gic_access(s, KVM_DEV_ARM_VGIC_GRP_CPU_REGS,
-   offset, cpu, val, write);
-}
-
 #define for_each_irq_reg(_ctr, _max_irq, _field_width) \
 for (_ctr = 0; _ctr < ((_max_irq) / (32 / (_field_width))); _ctr++)
 
@@ -291,7 +284,7 @@ static void kvm_dist_get(GICState *s, uint32_t offset, int 
width,
 irq = i * regsz;
 cpu = 0;
 while ((cpu < s->num_cpu && irq < GIC_INTERNAL) || cpu == 0) {
-kvm_gicd_access(s, offset, cpu, ®, false);
+kvm_gicd_access(s->dev_fd, offset, cp

[Qemu-devel] [PATCH for-2.4 06/12] etsec: Flush queue when rx buffer is consumed

2015-07-14 Thread Fam Zheng

The BH will be scheduled when etsec->rx_buffer_len is becoming 0, which
is the condition of queuing.

Signed-off-by: Fam Zheng 
---
 hw/net/fsl_etsec/etsec.c | 9 +
 hw/net/fsl_etsec/etsec.h | 2 ++
 hw/net/fsl_etsec/rings.c | 1 +
 3 files changed, 12 insertions(+)

diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
index f5170ae..fcde9b4 100644
--- a/hw/net/fsl_etsec/etsec.c
+++ b/hw/net/fsl_etsec/etsec.c
@@ -366,6 +366,13 @@ static NetClientInfo net_etsec_info = {
 .link_status_changed = etsec_set_link_status,
 };
 
+static void etsec_flush_rx_queue(void *opaque)
+{
+eTSEC *etsec = opaque;
+
+qemu_flush_queued_packets(qemu_get_queue(etsec->nic));
+}
+
 static void etsec_realize(DeviceState *dev, Error **errp)
 {
 eTSEC*etsec = ETSEC_COMMON(dev);
@@ -378,6 +385,8 @@ static void etsec_realize(DeviceState *dev, Error **errp)
 etsec->bh = qemu_bh_new(etsec_timer_hit, etsec);
 etsec->ptimer = ptimer_init(etsec->bh);
 ptimer_set_freq(etsec->ptimer, 100);
+
+etsec->flush_bh = qemu_bh_new(etsec_flush_rx_queue, etsec);
 }
 
 static void etsec_instance_init(Object *obj)
diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
index fc41773..05bb7f8 100644
--- a/hw/net/fsl_etsec/etsec.h
+++ b/hw/net/fsl_etsec/etsec.h
@@ -144,6 +144,8 @@ typedef struct eTSEC {
 QEMUBH *bh;
 struct ptimer_state *ptimer;
 
+QEMUBH *flush_bh;
+
 } eTSEC;
 
 #define TYPE_ETSEC_COMMON "eTSEC"
diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
index a11280b..7aae93e 100644
--- a/hw/net/fsl_etsec/rings.c
+++ b/hw/net/fsl_etsec/rings.c
@@ -646,6 +646,7 @@ void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr)
 } else {
 etsec->rx_buffer_len = 0;
 etsec->rx_buffer = NULL;
+qemu_bh_schedule(etsec->flush_bh);
 }
 
 RING_DEBUG("eTSEC End of ring_write: remaining_data:%zu\n", 
remaining_data);
-- 
2.4.3

[Qemu-devel] [PATCH v4 4/5] Initial implementation of vGICv3

2015-07-14 Thread Pavel Fedin

Get/put routines are missing, live migration is not possible.

Signed-off-by: Pavel Fedin 
---
 hw/intc/Makefile.objs   |   3 +
 hw/intc/arm_gicv3_kvm.c | 192 
 2 files changed, 195 insertions(+)
 create mode 100644 hw/intc/arm_gicv3_kvm.c

diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 1317e5a..e2525a8 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -17,6 +17,9 @@ common-obj-$(CONFIG_OPENPIC) += openpic.o
 
 obj-$(CONFIG_APIC) += apic.o apic_common.o
 obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o
+ifeq ($(ARCH), aarch64) # Only 64-bit KVM can use these
+obj-$(CONFIG_ARM_GIC_KVM) += arm_gicv3_kvm.o
+endif
 obj-$(CONFIG_STELLARIS) += armv7m_nvic.o
 obj-$(CONFIG_EXYNOS4) += exynos4210_gic.o exynos4210_combiner.o
 obj-$(CONFIG_GRLIB) += grlib_irqmp.o
diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
new file mode 100644
index 000..4bc8f25
--- /dev/null
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -0,0 +1,192 @@
+/*
+ * ARM Generic Interrupt Controller using KVM in-kernel support
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Written by Pavel Fedin
+ * Based on vGICv2 code by Peter Maydell
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "hw/sysbus.h"
+#include "sysemu/kvm.h"
+#include "kvm_arm.h"
+#include "gicv3_internal.h"
+#include "vgic_common.h"
+
+#ifdef DEBUG_GICV3_KVM
+static const int debug_gicv3_kvm = 1;
+#else
+static const int debug_gicv3_kvm = 0;
+#endif
+
+#define DPRINTF(fmt, ...) do { \
+if (debug_gicv3_kvm) { \
+printf("kvm_gicv3: " fmt , ## __VA_ARGS__); \
+} \
+} while (0)
+
+#define TYPE_KVM_ARM_GICV3 "kvm-arm-gicv3"
+#define KVM_ARM_GICV3(obj) \
+ OBJECT_CHECK(GICv3State, (obj), TYPE_KVM_ARM_GICV3)
+#define KVM_ARM_GICV3_CLASS(klass) \
+ OBJECT_CLASS_CHECK(KVMARMGICv3Class, (klass), TYPE_KVM_ARM_GICV3)
+#define KVM_ARM_GICV3_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(KVMARMGICv3Class, (obj), TYPE_KVM_ARM_GICV3)
+
+typedef struct KVMARMGICv3Class {
+ARMGICv3CommonClass parent_class;
+DeviceRealize parent_realize;
+void (*parent_reset)(DeviceState *dev);
+} KVMARMGICv3Class;
+
+static void kvm_arm_gicv3_set_irq(void *opaque, int irq, int level)
+{
+GICv3State *s = (GICv3State *)opaque;
+
+kvm_arm_gic_set_irq(s->num_irq, irq, level);
+}
+
+static void kvm_arm_gicv3_put(GICv3State *s)
+{
+/* TODO */
+DPRINTF("Cannot put kernel gic state, no kernel interface\n");
+}
+
+static void kvm_arm_gicv3_get(GICv3State *s)
+{
+/* TODO */
+DPRINTF("Cannot get kernel gic state, no kernel interface\n");
+}
+
+static void kvm_arm_gicv3_reset(DeviceState *dev)
+{
+GICv3State *s = ARM_GICV3_COMMON(dev);
+KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
+
+DPRINTF("Reset\n");
+
+kgc->parent_reset(dev);
+kvm_arm_gicv3_put(s);
+}
+
+static void kvm_arm_gicv3_realize(DeviceState *dev, Error **errp)
+{
+int i;
+GICv3State *s = KVM_ARM_GICV3(dev);
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+KVMARMGICv3Class *kgc = KVM_ARM_GICV3_GET_CLASS(s);
+Error *local_err = NULL;
+int ret;
+
+DPRINTF("kvm_arm_gicv3_realize\n");
+
+kgc->parent_realize(dev, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+i = s->num_irq - GICV3_INTERNAL;
+/* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
+ * GPIO array layout is thus:
+ *  [0..N-1] SPIs
+ *  [N..N+31] PPIs for CPU 0
+ *  [N+32..N+63] PPIs for CPU 1
+ *   ...
+ */
+i += (GICV3_INTERNAL * s->num_cpu);
+qdev_init_gpio_in(dev, kvm_arm_gicv3_set_irq, i);
+/* We never use our outbound IRQ lines but provide them so that
+ * we maintain the same interface as the non-KVM GIC.
+ */
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, &s->parent_irq[i]);
+}
+for (i = 0; i < s->num_cpu; i++) {
+sysbus_init_irq(sbd, &s->parent_fiq[i]);
+}
+
+/* Try to create the device via the device control API */
+s->dev_fd = -1;
+ret = kvm_create_device(kvm_state, KVM_DEV_TYPE_ARM_VGIC_V3, false);
+if (ret >= 0) {
+s->dev_fd = ret;
+} else if (ret != -ENODEV && ret != -ENOTSUP) {
+error_setg_errno(errp, -ret, "error creating in-kernel VGIC");
+return;
+}
+

[Qemu-devel] [PATCH for-2.4 10/12] stellaris_enet: Flush queued packets when read done

2015-07-14 Thread Fam Zheng

If s->np reaches 31, the queue will be disabled by peer when it sees
stellaris_enet_can_receive() returns false, until we explicitly flushes
it which notifies the peer. Do this when guest is done reading all
existing data.

Move the semantics to stellaris_enet_receive, by returning 0 when the
buffer is full, so that new packets will be queued.  In
stellaris_enet_read, flush and restart the queue when guest has done
reading.

Signed-off-by: Fam Zheng 
---
 hw/net/stellaris_enet.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c
index 278a654..21a4773 100644
--- a/hw/net/stellaris_enet.c
+++ b/hw/net/stellaris_enet.c
@@ -228,8 +228,7 @@ static ssize_t stellaris_enet_receive(NetClientState *nc, 
const uint8_t *buf, si
 if ((s->rctl & SE_RCTL_RXEN) == 0)
 return -1;
 if (s->np >= 31) {
-DPRINTF("Packet dropped\n");
-return -1;
+return 0;
 }
 
 DPRINTF("Received packet len=%zu\n", size);
@@ -260,13 +259,8 @@ static ssize_t stellaris_enet_receive(NetClientState *nc, 
const uint8_t *buf, si
 return size;
 }
 
-static int stellaris_enet_can_receive(NetClientState *nc)
+static int stellaris_enet_can_receive(stellaris_enet_state *s)
 {
-stellaris_enet_state *s = qemu_get_nic_opaque(nc);
-
-if ((s->rctl & SE_RCTL_RXEN) == 0)
-return 1;
-
 return (s->np < 31);
 }
 
@@ -307,6 +301,9 @@ static uint64_t stellaris_enet_read(void *opaque, hwaddr 
offset,
 s->next_packet = 0;
 s->np--;
 DPRINTF("RX done np=%d\n", s->np);
+if (!s->np && stellaris_enet_can_receive(s)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 }
 return val;
 }
@@ -454,7 +451,6 @@ static void stellaris_enet_reset(stellaris_enet_state *s)
 static NetClientInfo net_stellaris_enet_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = stellaris_enet_can_receive,
 .receive = stellaris_enet_receive,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH for-2.4 12/12] axienet: Flush queued packets when rx is done

2015-07-14 Thread Fam Zheng

eth_can_rx checks s->rxsize and returns false if it is non-zero. Because
of the .can_receive semantics change, this will make the incoming queue
disabled by peer, until it is explicitly flushed. So we should flush it
when s->rxsize is becoming zero.

Do it by adding a BH that calls qemu_flush_queued_packets after
decrementing s->rxsize in axienet_eth_rx_notify. BH is necessary to
avoid too deep recursive call stack.

The other conditions, "!axienet_rx_resetting(s) &&
axienet_rx_enabled(s)" are OK because enet_write already calls
qemu_flush_queued_packets when the register bits are changed.

Signed-off-by: Fam Zheng 
---
 hw/net/xilinx_axienet.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 9205770..bbc0ea8 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -28,6 +28,7 @@
 #include "net/checksum.h"
 
 #include "hw/stream.h"
+#include "qemu/main-loop.h"
 
 #define DPHY(x)
 
@@ -401,6 +402,8 @@ struct XilinxAXIEnet {
 
 uint8_t rxapp[CONTROL_PAYLOAD_SIZE];
 uint32_t rxappsize;
+
+QEMUBH *flush_bh;
 };
 
 static void axienet_rx_reset(XilinxAXIEnet *s)
@@ -681,8 +684,15 @@ static int enet_match_addr(const uint8_t *buf, uint32_t 
f0, uint32_t f1)
 return match;
 }
 
+static void xilinx_flush_cb(void *opaque)
+{
+XilinxAXIEnet *s = opaque;
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
+
 static void axienet_eth_rx_notify(void *opaque)
 {
+bool flush = false;
 XilinxAXIEnet *s = XILINX_AXI_ENET(opaque);
 
 while (s->rxappsize && stream_can_push(s->tx_control_dev,
@@ -701,9 +711,13 @@ static void axienet_eth_rx_notify(void *opaque)
 s->rxpos += ret;
 if (!s->rxsize) {
 s->regs[R_IS] |= IS_RX_COMPLETE;
+flush = true;
 }
 }
 enet_update_irq(s);
+if (flush) {
+qemu_bh_schedule(s->flush_bh);
+}
 }
 
 static ssize_t eth_rx(NetClientState *nc, const uint8_t *buf, size_t size)
@@ -967,6 +981,7 @@ static void xilinx_enet_realize(DeviceState *dev, Error 
**errp)
 s->TEMAC.parent = s;
 
 s->rxmem = g_malloc(s->c_rxmem);
+s->flush_bh = qemu_bh_new(xilinx_flush_cb, s);
 return;
 
 xilinx_enet_realize_fail:
@@ -975,6 +990,12 @@ xilinx_enet_realize_fail:
 }
 }
 
+static void xilinx_enet_unrealize(DeviceState *dev, Error **errp)
+{
+XilinxAXIEnet *s = XILINX_AXI_ENET(dev);
+qemu_bh_delete(s->flush_bh);
+}
+
 static void xilinx_enet_init(Object *obj)
 {
 XilinxAXIEnet *s = XILINX_AXI_ENET(obj);
@@ -1020,6 +1041,7 @@ static void xilinx_enet_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 dc->realize = xilinx_enet_realize;
+dc->unrealize = xilinx_enet_unrealize;
 dc->props = xilinx_enet_properties;
 dc->reset = xilinx_axienet_reset;
 }
-- 
2.4.3

Re: [Qemu-devel] Qemu-devel Digest, Vol 148, Issue 574

2015-07-14 Thread Ouyang, Changchun



> -Original Message-
> From: qemu-devel-bounces+changchun.ouyang=intel@nongnu.org
> [mailto:qemu-devel-bounces+changchun.ouyang=intel@nongnu.org]
> On Behalf Of qemu-devel-requ...@nongnu.org
> Sent: Monday, July 13, 2015 11:28 PM
> To: qemu-devel@nongnu.org
> Subject: Qemu-devel Digest, Vol 148, Issue 574
> 
> Send Qemu-devel mailing list submissions to
>   qemu-devel@nongnu.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>   https://lists.nongnu.org/mailman/listinfo/qemu-devel
> or, via email, send a message with subject or body 'help' to
>   qemu-devel-requ...@nongnu.org
> 
> You can reach the person managing the list at
>   qemu-devel-ow...@nongnu.org
> 
> When replying, please edit your Subject line so it is more specific than "Re:
> Contents of Qemu-devel digest..."
> 
> 
> Today's Topics:
> 
>1. Re: [PATCH v4 1/5] cpu: Provide vcpu throttling interface
>   (Paolo Bonzini)
>2. Re: [PATCH] net: vhost-user reconnect (Michael S. Tsirkin)
>3. [RFC PATCH qemu v2 0/5] vfio: SPAPR IOMMU v2 (memory
>   preregistration support) (Alexey Kardashevskiy)
> 
> 
> --
> 
> Message: 1
> Date: Mon, 13 Jul 2015 17:14:32 +0200
> From: Paolo Bonzini 
> To: jjhe...@linux.vnet.ibm.com, afaer...@suse.de,
>   amit.s...@redhat.com,   dgilb...@redhat.com,
> borntrae...@de.ibm.com,
>   quint...@redhat.com,qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] [PATCH v4 1/5] cpu: Provide vcpu throttling
>   interface
> Message-ID: <55a3d5d8.7070...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> 
> 
> On 13/07/2015 16:43, Jason J. Herne wrote:
> >>>
> >>> +CPU_FOREACH(cpu) {
> >>> +async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
> >>> +}
> >>> +
> >>> +timer_mod(throttle_timer,
> >>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL_RT) +
> >>> +   CPU_THROTTLE_TIMESLICE); }
> >>
> >> This could cause callbacks to pile up I think.  David, do you have
> >> any idea how to fix it?
> >
> > I'm not sure how callbacks can pile up here. If the vcpus are running
> > then their thread's will execute the callbacks. If they are not
> > running then the use of QEMU_CLOCK_VIRTUAL_RT will prevent the
> > callbacks from stacking because the timer is not running, right?
> 
> Couldn't the iothread starve the VCPUs?  They need to take the iothread lock
> in order to process the callbacks.
> 
> Paolo
> 
> 
> 
> --
> 
> Message: 2
> Date: Mon, 13 Jul 2015 18:18:51 +0300
> From: "Michael S. Tsirkin" 
> To: "Matus Fabian -X (matfabia - Pantheon Technologies SRO at Cisco)"
>   
> Cc: "pbonz...@redhat.com" ,
>   "jasow...@redhat.com"
>   ,  "qemu-devel@nongnu.org"
>   ,"stefa...@redhat.com"
> ,
>   "Damjan Marion \(damarion\)" 
> Subject: Re: [Qemu-devel] [PATCH] net: vhost-user reconnect
> Message-ID: <20150713181721-mutt-send-email-...@redhat.com>
> Content-Type: text/plain; charset=us-ascii
> 
> On Mon, Jul 13, 2015 at 02:44:37PM +, Matus Fabian -X (matfabia -
> Pantheon Technologies SRO at Cisco) wrote:
> > If userspace switch restarts it will reconnect to unix socket but QEMU
> > will not send any vhost-user information and that basically means that
> > userspace switch restart requires restart of VM.
> > Fix detects that userspace switch is disconnected and notify VM that
> > link status is down. After user space switch is reconnected QEMU send
> > vhost-user information to the userspace switch and notify VM that link is
> up.
> >
> > Signed-off-by: Matus Fabian 
> 
> This only works most of the time.  Reconnect can't be implemented without
> guest changes.
> See http://mid.gmane.org/1434945048-27958-1-git-send-email-
> muk...@igel.co.jp
> 

What's the status for these 2 patch sets?
Will they be applied into 2.4? or 2.5?

Just need choose one from both to apply? or need apply both to provide a good 
enough functionality of reconnection?

Thanks
Changchun


> 
> > ---
> >  hw/net/vhost_net.c | 126
> +
> >  hw/net/virtio-net.c|  62 
> >  hw/virtio/vhost.c  | 137
> +
> >  hw/virtio/virtio.c |  24 
> >  include/hw/virtio/vhost.h  |   8 +++
> >  include/hw/virtio/virtio.h |   1 +
> >  include/net/vhost_net.h|   7 +++
> >  include/sysemu/char.h  |   3 +
> >  net/vhost-user.c   |  24 
> >  qemu-char.c| 101 +
> >  10 files changed, 470 insertions(+), 23 deletions(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > 9bd360b..16a31d7 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -167,6 +167,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions
> *options)
> >  if (r < 0) {
> >  goto fail;
> >  }
> > +net->dev.fea

[Qemu-devel] [PATCH for-2.4 08/12] milkymist-minimac2: Flush queued packets when link comes up

2015-07-14 Thread Fam Zheng

Drop .can_receive and move the semantics into minimac2_rx, by returning
0.

That is once minimac2_rx returns 0, incoming packets will be queued
until the queue is explicitly flushed. We do this when s->regs[R_STATE0]
or s->regs[R_STATE1] is changed in minimac2_write.

Signed-off-by: Fam Zheng 
---
 hw/net/milkymist-minimac2.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/net/milkymist-minimac2.c b/hw/net/milkymist-minimac2.c
index f06afaa..cd38a06 100644
--- a/hw/net/milkymist-minimac2.c
+++ b/hw/net/milkymist-minimac2.c
@@ -303,8 +303,7 @@ static ssize_t minimac2_rx(NetClientState *nc, const 
uint8_t *buf, size_t size)
 r_state = R_STATE1;
 rx_buf = s->rx1_buf;
 } else {
-trace_milkymist_minimac2_drop_rx_frame(buf);
-return size;
+return 0;
 }
 
 /* assemble frame */
@@ -354,6 +353,7 @@ minimac2_read(void *opaque, hwaddr addr, unsigned size)
 return r;
 }
 
+static int minimac2_can_rx(MilkymistMinimac2State *s);
 static void
 minimac2_write(void *opaque, hwaddr addr, uint64_t value,
unsigned size)
@@ -387,6 +387,9 @@ minimac2_write(void *opaque, hwaddr addr, uint64_t value,
 case R_STATE1:
 s->regs[addr] = value;
 update_rx_interrupt(s);
+if (minimac2_can_rx(s)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 break;
 case R_SETUP:
 case R_COUNT0:
@@ -411,10 +414,8 @@ static const MemoryRegionOps minimac2_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static int minimac2_can_rx(NetClientState *nc)
+static int minimac2_can_rx(MilkymistMinimac2State *s)
 {
-MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
-
 if (s->regs[R_STATE0] == STATE_LOADED) {
 return 1;
 }
@@ -445,7 +446,6 @@ static void milkymist_minimac2_reset(DeviceState *d)
 static NetClientInfo net_milkymist_minimac2_info = {
 .type = NET_CLIENT_OPTIONS_KIND_NIC,
 .size = sizeof(NICState),
-.can_receive = minimac2_can_rx,
 .receive = minimac2_rx,
 };
 
-- 
2.4.3

[Qemu-devel] [PATCH v4 5/5] Add gicversion option to virt machine

2015-07-14 Thread Pavel Fedin

Set kernel_irqchip_type according to value of the option and pass it
around where necessary. Instantiate devices and fdt nodes according
to the choice.

mac_cpus for virt machine increased to 64. GICv2 compatibility check
happens inside arm_gic_common_realize().

Signed-off-by: Pavel Fedin 
---
 hw/arm/virt.c | 135 ++
 include/hw/arm/fdt.h  |   2 +-
 include/hw/arm/virt.h |   6 ++-
 3 files changed, 120 insertions(+), 23 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 2e7d858..df69dc8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -48,6 +48,8 @@
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/platform-bus.h"
 #include "hw/arm/fdt.h"
+#include "linux/kvm.h" /* For KVM_DEV_TYPE_ARM_VGIC_V{2|3} */
+#include "qapi/visitor.h"
 
 /* Number of external interrupt lines to configure the GIC with */
 #define NUM_IRQS 256
@@ -67,6 +69,7 @@ typedef struct VirtBoardInfo {
 uint32_t clock_phandle;
 uint32_t gic_phandle;
 uint32_t v2m_phandle;
+const char *class_name;
 } VirtBoardInfo;
 
 typedef struct {
@@ -80,6 +83,7 @@ typedef struct {
 } VirtMachineState;
 
 #define TYPE_VIRT_MACHINE   "virt"
+#define TYPE_VIRTV3_MACHINE   "virt-v3"
 #define VIRT_MACHINE(obj) \
 OBJECT_CHECK(VirtMachineState, (obj), TYPE_VIRT_MACHINE)
 #define VIRT_MACHINE_GET_CLASS(obj) \
@@ -106,7 +110,12 @@ static const MemMapEntry a15memmap[] = {
 /* GIC distributor and CPU interfaces sit inside the CPU peripheral space 
*/
 [VIRT_GIC_DIST] =   { 0x0800, 0x0001 },
 [VIRT_GIC_CPU] ={ 0x0801, 0x0001 },
-[VIRT_GIC_V2M] ={ 0x0802, 0x1000 },
+[VIRT_GIC_V2M] ={ 0x0802, 0x0001 },
+/* On v3 VIRT_GIC_DIST_MBI and VIRT_ITS_CONTROL take place of
+ * VIRT_GIC_CPU and VIRT_GIC_V2M respectively
+ */
+[VIRT_ITS_TRANSLATION] ={ 0x0803, 0x0001 },
+[VIRT_LPI] ={ 0x0804, 0x0080 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
 [VIRT_FW_CFG] = { 0x0902, 0x000a },
@@ -256,10 +265,13 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
  * they are edge-triggered.
  */
 ARMCPU *armcpu;
+uint32_t max;
 uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
 
+/* Argument is 32 bit but 8 bits are reserved for flags */
+max = (vbi->smp_cpus >= 24) ? 24 : vbi->smp_cpus;
 irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
- GIC_FDT_IRQ_PPI_CPU_WIDTH, (1 << vbi->smp_cpus) - 1);
+ GIC_FDT_IRQ_PPI_CPU_WIDTH, (1 << max) - 1);
 
 qemu_fdt_add_subnode(vbi->fdt, "/timer");
 
@@ -283,6 +295,18 @@ static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
 {
 int cpu;
 
+/*
+ * From Documentation/devicetree/bindings/arm/cpus.txt
+ *  On ARM v8 64-bit systems value should be set to 2,
+ *  that corresponds to the MPIDR_EL1 register size.
+ *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
+ *  in the system, #address-cells can be set to 1, since
+ *  MPIDR_EL1[63:32] bits are not used for CPUs
+ *  identification.
+ *
+ *  Now GIC500 doesn't support affinities 2 & 3 so currently
+ *  #address-cells can stay 1 until future GIC
+ */
 qemu_fdt_add_subnode(vbi->fdt, "/cpus");
 qemu_fdt_setprop_cell(vbi->fdt, "/cpus", "#address-cells", 0x1);
 qemu_fdt_setprop_cell(vbi->fdt, "/cpus", "#size-cells", 0x0);
@@ -319,25 +343,36 @@ static void fdt_add_v2m_gic_node(VirtBoardInfo *vbi)
 qemu_fdt_setprop_cell(vbi->fdt, "/intc/v2m", "phandle", vbi->v2m_phandle);
 }
 
-static void fdt_add_gic_node(VirtBoardInfo *vbi)
+static void fdt_add_gic_node(VirtBoardInfo *vbi, int type)
 {
 vbi->gic_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
 qemu_fdt_setprop_cell(vbi->fdt, "/", "interrupt-parent", vbi->gic_phandle);
 
 qemu_fdt_add_subnode(vbi->fdt, "/intc");
-/* 'cortex-a15-gic' means 'GIC v2' */
-qemu_fdt_setprop_string(vbi->fdt, "/intc", "compatible",
-"arm,cortex-a15-gic");
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "#interrupt-cells", 3);
 qemu_fdt_setprop(vbi->fdt, "/intc", "interrupt-controller", NULL, 0);
-qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc", "reg",
- 2, vbi->memmap[VIRT_GIC_DIST].base,
- 2, vbi->memmap[VIRT_GIC_DIST].size,
- 2, vbi->memmap[VIRT_GIC_CPU].base,
- 2, vbi->memmap[VIRT_GIC_CPU].size);
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "#address-cells", 0x2);
 qemu_fdt_setprop_cell(vbi->fdt, "/intc", "#size-cells", 0x2);
 qemu_fdt_setprop(vbi->fdt, "/intc", "ranges", NULL, 0);
+if (type == KVM_DEV_TYPE_ARM_VGIC_V3) {
+qemu_fdt_setprop_string(vbi->fdt, "/intc", "com

[Qemu-devel] Patches

2015-07-14 Thread Peter Crosthwaite

Hi Stefan, Peter,

Is patches down? I don't see any patches for the last 3 days and there
is definitely new things on list:

$ ./patches fetch http://vmsplice.net/~patches/patches.json
Fetched info on 689 patch series
Fetching mboxes...
$ ./patches list | more
Message-id: 1436556159-3002-1-git-send-email...@weilnetz.de
From: Stefan Weil 
Date: 2015-07-10
Tags: for-2.4
   [1/1] tci: Fix regression with INDEX_op_qemu_st_i32, INDEX_op_qemu_st_i64

Message-id: 1436555332-19076-1-git-send-email-dgilb...@redhat.com
From: Dr. David Alan Gilbert (git) 
Date: 2015-07-10
   [1/1] RDMA: Fix error exits (for 2.4)

Message-id: 
8b772bcd39dc96607e9e477398f1303bf1b402ca.1436551380.git.alistair.fran...@xilinx.com
From: Alistair Francis 
Date: 2015-07-10
   [1/1] xlnx-zynqmp: Connect the four OCM banks

Regards,
Peter

[Qemu-devel] [PATCH v2] pci_add_capability: remove duplicate comments

2015-07-14 Thread Chen Hanxiao

Signed-off-by: Chen Hanxiao 
---
 hw/pci/pci.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 442f822..a017614 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2101,12 +2101,10 @@ static void pci_del_option_rom(PCIDevice *pdev)
 }
 
 /*
- * if !offset
- * Reserve space and add capability to the linked list in pci config space
- *
  * if offset = 0,
  * Find and reserve space and add capability to the linked list
- * in pci config space */
+ * in pci config space
+ */
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
 {
-- 
2.1.0

Re: [Qemu-devel] [PATCH v3 05/15] backup: Extract dirty bitmap handling as a separate function

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 11:46:42AM +0800, Fam Zheng wrote:
> This will be reused by the coming new transactional completion code.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/backup.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 965654d..6e24384 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -210,6 +210,21 @@ static void backup_iostatus_reset(BlockJob *job)
>  
>  bdrv_iostatus_reset(s->target);
>  }
> +static void backup_handle_dirty_bitmap(BackupBlockJob *job, int ret)

"Dirty bitmap" is a general term.  This block job uses dirty bitmaps
besides the sync_bitmap.  I suggest making the name clearer:

backup_cleanup_sync_bitmap()


pgpaZGWilXI_z.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] ide: coverity touchups

2015-07-14 Thread Paolo Bonzini

On 13/07/2015 21:41, John Snow wrote:
>>> >> s->ports should never exceed 32, but coverity doesn't know that.
>>> >> ncq_tfs->sector_count should also never exceed 64K.
>> > 
>> > Personally I tend to mark that kind of thing as a false
>> > positive in the coverity UI and move on...
>> > 
>> > -- PMM
>> > 
> Either way; Paolo pinged me about the NCQ one so I figured I'd just do it.

Yeah, neither is particularly optimal.  Every now and then (a couple
years, say) you do have to re-evaluate false positives, so it's better
to fix them if possible.  On the other hand the code is uglier.

Let's ignore these in Coverity---with a triaging comment there about why
they are false positives.

Paolo

[Qemu-devel] User space vs kernel space instructions distribution.

2015-07-14 Thread Shlomo Pongratz

Hi,

I'm running aarm64 QEMU and I'm counting the number of instructions which
"belong" to user space vs kernel space. My measurements shows that 99
percent of instructions are in kernel space.
I've used both the address of the instructions and the EL just to be sure.
I also added an option to disable block chaining just to make sure that all
the instructions in every TB  is counted.
When examining some kernel's instructions against the objdump of the kernel
I've noticed that most of them are in interrupts/timers area.

Does this make sense?
Did someone also encountered this phenomenon?

Best regards,

S.P.

Re: [Qemu-devel] [PATCH for-2.4 00/12] hw/net: Fix .can_receive() for NICs

2015-07-14 Thread Wen Congyang

On 07/14/2015 03:53 PM, Fam Zheng wrote:
> Since a90a742 "tap: Drop tap_can_send", all nics that returns false from
> .can_receive() are required to explicitly flush the incoming queue when the
> status of it is changing back to true, otherwise the backend will sop
> processing more rx packets.
> 
> The purpose of this callback is to tell the peer backend (tap, socket, etc)
> "hold on until guest consumes old data because my buffer is not ready". More
> often than not NICs also do this when driver deactivated the card or disabled
> rx, causing the packets being unnessarily queued, where they should actualy be
> dropped.
> 
> This series adds such missing qemu_flush_queued_packets calls for all NICs, 
> and
> drops such unnecessary conditions in .can_receive(), so that NICs now:
> 
>   - returns false from .can_receive when guest buffers are busy.
>   - calls qemu_flush_queued_packets when buffers are available again.
>   - returns -1 from .receive when rx is not enabled.
> 
> e1000, ne2000, rocker and vmxnet3 are not included because they're fixed by
> other patches on the list and applied to Stefan's tree.
> 
> virtio-net is covered by another thread:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07377.html

When will the virtio-net patch be accepted? Without this patch, virtio-net + tap
can not work.

Thanks
Wen Congyang

> 
> All other NICs are okay, as they already do the flush on the state transition
> points.
> 
> Please review.
> 
> Fam Zheng (12):
>   xgmac: Drop packets with eth_can_rx is false.
>   pcnet: Drop pcnet_can_receive
>   eepro100: Drop nic_can_receive
>   usbnet: Drop usbnet_can_receive
>   etsec: Move etsec_can_receive into etsec_receive
>   etsec: Flush queue when rx buffer is consumed
>   mcf_fec: Drop mcf_fec_can_receive
>   milkymist-minimac2: Flush queued packets when link comes up
>   mipsnet: Flush queued packets when receiving is enabled
>   stellaris_enet: Flush queued packets when read done
>   dp8393x: Flush packets when link comes up
>   axienet: Flush queued packets when rx is done
> 
>  hw/net/dp8393x.c|  8 
>  hw/net/eepro100.c   | 11 ---
>  hw/net/fsl_etsec/etsec.c| 20 ++--
>  hw/net/fsl_etsec/etsec.h|  4 +++-
>  hw/net/fsl_etsec/rings.c| 15 +--
>  hw/net/lance.c  |  1 -
>  hw/net/mcf_fec.c|  9 +
>  hw/net/milkymist-minimac2.c | 12 ++--
>  hw/net/mipsnet.c|  9 +++--
>  hw/net/pcnet-pci.c  |  1 -
>  hw/net/pcnet.c  |  9 -
>  hw/net/pcnet.h  |  1 -
>  hw/net/stellaris_enet.c | 14 +-
>  hw/net/xgmac.c  |  8 
>  hw/net/xilinx_axienet.c | 22 ++
>  hw/usb/dev-network.c| 20 
>  16 files changed, 79 insertions(+), 85 deletions(-)
>

Re: [Qemu-devel] [PATCH v3 06/15] blockjob: Add .txn_commit and .txn_abort transaction actions

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 11:46:43AM +0800, Fam Zheng wrote:
> They will be called if the job is part of a transaction, after all jobs in a
> transaction are completed or cancelled, before calling job->cb().
> 
> Signed-off-by: Fam Zheng 
> ---
>  include/block/blockjob.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index dd9d5e6..a7b7f66 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -50,6 +50,18 @@ typedef struct BlockJobDriver {
>   * manually.
>   */
>  void (*complete)(BlockJob *job, Error **errp);
> +
> +/**
> + * Optional callback for job types that can be in a transaction. Called
> + * when the transaction succeeds.
> + */
> +void (*txn_commit)(BlockJob *job);
> +
> +/**
> + * Optional callback for job types that can be in a transaction. Call 
> when
> + * the transaction fails.
> + */
> +void (*txn_abort)(BlockJob *job);
>  } BlockJobDriver;

The semantics of transactions aren't fully documented.  My understanding
is that you want:
1. either .txn_commit() or .txn_abort() to be called
2. after all jobs complete (due to success, error, or cancellation).

These two points are important for understanding these interfaces.


pgpbPG79d3b0w.pgp
Description: PGP signature

Re: [Qemu-devel] [v9][PATCH 08/10] xen, gfx passthrough: register a isa bridge

2015-07-14 Thread Chen, Tiejun


diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 4356af4..703148e 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -51,4 +51,5 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
  #  define HVM_MAX_VCPUS 32
  #endif

+extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t 
gpu_dev_id);
  #endif /* QEMU_HW_XEN_H */


You are right that I was confused on my previous comment, but it is true
that it shouldn't be part of this patch: it should be part of patch


Right.


7/10. Also the declaration should probably not be in the xen.h header.

Michael, where do you think should go?


Donnu - add a header?



What about the include/hw/i386/pc.h file?

Thanks
Tiejun

Re: [Qemu-devel] User space vs kernel space instructions distribution.

2015-07-14 Thread Peter Maydell

On 14 July 2015 at 09:32, Shlomo Pongratz  wrote:
> Hi,
>
> I'm running aarm64 QEMU and I'm counting the number of instructions which
> "belong" to user space vs kernel space. My measurements shows that 99
> percent of instructions are in kernel space.
> I've used both the address of the instructions and the EL just to be sure. I
> also added an option to disable block chaining just to make sure that all
> the instructions in every TB  is counted.
> When examining some kernel's instructions against the objdump of the kernel
> I've noticed that most of them are in interrupts/timers area.
>
> Does this make sense?
> Did someone also encountered this phenomenon?

Depends entirely on your workload, obviously. If the system only
boots then most instructions will be in kernel space. If the system
is only sitting idle then it'll just be executing the kernel space
idle loop. If you're measuring solely the section of time where
a userspace program is doing real work with the CPU and you're
still seeing a 99% figure then the obvious conclusion would be that
your measurement approach is wrong...

If your measurement instrumentation is intrusive and is significantly
slowing down QEMU then you'll naturally find that the guest spends
more time in timer interrupt handling, because the timer interrupts
come in in real time, and you've just effectively reduced the speed
of your CPU, so it can get less useful work done between timer
interrupts.

-- PMM

Re: [Qemu-devel] [v9][PATCH 08/10] xen, gfx passthrough: register a isa bridge

2015-07-14 Thread Michael S. Tsirkin

On Tue, Jul 14, 2015 at 04:42:44PM +0800, Chen, Tiejun wrote:
> >>>diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> >>>index 4356af4..703148e 100644
> >>>--- a/include/hw/xen/xen.h
> >>>+++ b/include/hw/xen/xen.h
> >>>@@ -51,4 +51,5 @@ void xen_register_framebuffer(struct MemoryRegion *mr);
> >>>  #  define HVM_MAX_VCPUS 32
> >>>  #endif
> >>>
> >>>+extern void igd_passthrough_isa_bridge_create(PCIBus *bus, uint16_t 
> >>>gpu_dev_id);
> >>>  #endif /* QEMU_HW_XEN_H */
> >>
> >>You are right that I was confused on my previous comment, but it is true
> >>that it shouldn't be part of this patch: it should be part of patch
> 
> Right.
> 
> >>7/10. Also the declaration should probably not be in the xen.h header.
> >>
> >>Michael, where do you think should go?
> >
> >Donnu - add a header?
> >
> 
> What about the include/hw/i386/pc.h file?
> 
> Thanks
> Tiejun

Somewhat ugly, but OK I guess.

[Qemu-devel] [PATCH 0/2] vmxnet3: Fix RX TCP/UDP checksum of partially summed packets

2015-07-14 Thread Shmulik Ladkani

In case csum offloaded packet, vmxnet3 implementation always passes an
RxCompDesc with the "Checksum calculated and found correct" notification
to the OS. This emulates the observed ESXi behavior.

Therefore, if packet has the NEEDS_CSUM bit set, we must calculate and
place a fully computed checksum into the tcp/udp header. Otherwise, the
OS driver will receive a checksum-correct indication but with the actual
tcp/udp checksum field having just the pseudo header csum value.

Fix, by converting partially summed packets to be fully checksummed.

Shmulik Ladkani:
  net/vmxnet3: Refactor 'vmxnet_rx_pkt_attach_data'

Dana Rubin:
  net/vmxnet3: Fix RX TCP/UDP checksum on partially summed packets

 hw/net/vmxnet3.c   | 59 ++
 hw/net/vmxnet_rx_pkt.c | 12 +++---
 hw/net/vmxnet_rx_pkt.h | 11 ++
 3 files changed, 79 insertions(+), 3 deletions(-)

-- 
1.9.1

[Qemu-devel] [PATCH 1/2] net/vmxnet3: Refactor 'vmxnet_rx_pkt_attach_data'

2015-07-14 Thread Shmulik Ladkani

Separate RX packet protocol parsing out of 'vmxnet_rx_pkt_attach_data'.

Signed-off-by: Shmulik Ladkani 
---
 hw/net/vmxnet3.c   |  1 +
 hw/net/vmxnet_rx_pkt.c | 12 +---
 hw/net/vmxnet_rx_pkt.h | 11 +++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 706e060..dd22a0a 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1897,6 +1897,7 @@ vmxnet3_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
 
 if (vmxnet3_rx_filter_may_indicate(s, buf, size)) {
+vmxnet_rx_pkt_set_protocols(s->rx_pkt, buf, size);
 vmxnet_rx_pkt_attach_data(s->rx_pkt, buf, size, s->rx_vlan_stripping);
 bytes_indicated = vmxnet3_indicate_packet(s) ? size : -1;
 if (bytes_indicated < size) {
diff --git a/hw/net/vmxnet_rx_pkt.c b/hw/net/vmxnet_rx_pkt.c
index acbca6a..aa54629 100644
--- a/hw/net/vmxnet_rx_pkt.c
+++ b/hw/net/vmxnet_rx_pkt.c
@@ -92,9 +92,6 @@ void vmxnet_rx_pkt_attach_data(struct VmxnetRxPkt *pkt, const 
void *data,
 }
 
 pkt->tci = tci;
-
-eth_get_protocols(data, len, &pkt->isip4, &pkt->isip6,
-&pkt->isudp, &pkt->istcp);
 }
 
 void vmxnet_rx_pkt_dump(struct VmxnetRxPkt *pkt)
@@ -131,6 +128,15 @@ size_t vmxnet_rx_pkt_get_total_len(struct VmxnetRxPkt *pkt)
 return pkt->tot_len;
 }
 
+void vmxnet_rx_pkt_set_protocols(struct VmxnetRxPkt *pkt, const void *data,
+ size_t len)
+{
+assert(pkt);
+
+eth_get_protocols(data, len, &pkt->isip4, &pkt->isip6,
+&pkt->isudp, &pkt->istcp);
+}
+
 void vmxnet_rx_pkt_get_protocols(struct VmxnetRxPkt *pkt,
  bool *isip4, bool *isip6,
  bool *isudp, bool *istcp)
diff --git a/hw/net/vmxnet_rx_pkt.h b/hw/net/vmxnet_rx_pkt.h
index 5f8352a..a425846 100644
--- a/hw/net/vmxnet_rx_pkt.h
+++ b/hw/net/vmxnet_rx_pkt.h
@@ -55,6 +55,17 @@ void vmxnet_rx_pkt_init(struct VmxnetRxPkt **pkt, bool 
has_virt_hdr);
 size_t vmxnet_rx_pkt_get_total_len(struct VmxnetRxPkt *pkt);
 
 /**
+ * parse and set packet analysis results
+ *
+ * @pkt:packet
+ * @data:   pointer to the data buffer to be parsed
+ * @len:data length
+ *
+ */
+void vmxnet_rx_pkt_set_protocols(struct VmxnetRxPkt *pkt, const void *data,
+ size_t len);
+
+/**
  * fetches packet analysis results
  *
  * @pkt:packet
-- 
1.9.1

[Qemu-devel] [PATCH 2/2] net/vmxnet3: Fix RX TCP/UDP checksum on partially summed packets

2015-07-14 Thread Shmulik Ladkani

From: Dana Rubin 

Convert partially summed packets to be fully checksummed.

In case csum offloaded packet, vmxnet3 implementation always passes an
RxCompDesc with the "Checksum calculated and found correct" notification
to the OS. This emulates the observed ESXi behavior.

Therefore, if packet has the NEEDS_CSUM bit set, we must calculate and
place a fully computed checksum into the tcp/udp header. Otherwise, the
OS driver will receive a checksum-correct indication but with the actual
tcp/udp checksum field having just the pseudo header csum value.

If host OS performs forwarding, it will forward an incorrectly
checksummed packet.

Signed-off-by: Dana Rubin 
Signed-off-by: Shmulik Ladkani 
---
 hw/net/vmxnet3.c | 58 
 1 file changed, 58 insertions(+)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index dd22a0a..59b06b8 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -885,6 +885,63 @@ vmxnet3_get_next_rx_descr(VMXNET3State *s, bool is_head,
 }
 }
 
+/* In case packet was csum offloaded (either NEEDS_CSUM or DATA_VALID),
+ * the implementation always passes an RxCompDesc with a "Checksum
+ * calculated and found correct" to the OS (cnc=0 and tuc=1, see
+ * vmxnet3_rx_update_descr). This emulates the observed ESXi behavior.
+ *
+ * Therefore, if packet has the NEEDS_CSUM set, we must calculate
+ * and place a fully computed checksum into the tcp/udp header.
+ * Otherwise, the OS driver will receive a checksum-correct indication
+ * (CHECKSUM_UNNECESSARY), but with the actual tcp/udp checksum field
+ * having just the pseudo header csum value.
+ *
+ * While this is not a problem if packet is destined for local delivery,
+ * in the case the host OS performs forwarding, it will forward an
+ * incorrectly checksummed packet.
+ */
+static void vmxnet3_rx_need_csum_calculate(struct VmxnetRxPkt *pkt,
+   const void *pkt_data,
+   size_t pkt_len)
+{
+struct virtio_net_hdr *vhdr;
+bool isip4, isip6, istcp, isudp;
+uint8_t *data;
+int len;
+
+if (!vmxnet_rx_pkt_has_virt_hdr(pkt)) {
+return;
+}
+
+vhdr = vmxnet_rx_pkt_get_vhdr(pkt);
+if (!VMXNET_FLAG_IS_SET(vhdr->flags, VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
+return;
+}
+
+vmxnet_rx_pkt_get_protocols(pkt, &isip4, &isip6, &isudp, &istcp);
+if (!(isip4 || isip6) || !(istcp || isudp)) {
+return;
+}
+
+vmxnet3_dump_virt_hdr(vhdr);
+
+/* Validate packet len: csum_start + scum_offset + length of csum field */
+if (pkt_len < (vhdr->csum_start + vhdr->csum_offset + 2)) {
+VMW_PKPRN("packet len:%d < csum_start(%d) + csum_offset(%d) + 2, "
+  "cannot calculate checksum",
+  len, vhdr->csum_start, vhdr->csum_offset);
+return;
+}
+
+data = (uint8_t *)pkt_data + vhdr->csum_start;
+len = pkt_len - vhdr->csum_start;
+/* Put the checksum obtained into the packet */
+stw_be_p(data + vhdr->csum_offset, net_raw_checksum(data, len));
+
+vhdr->flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
+vhdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
+}
+
 static void vmxnet3_rx_update_descr(struct VmxnetRxPkt *pkt,
 struct Vmxnet3_RxCompDesc *rxcd)
 {
@@ -1898,6 +1955,7 @@ vmxnet3_receive(NetClientState *nc, const uint8_t *buf, 
size_t size)
 
 if (vmxnet3_rx_filter_may_indicate(s, buf, size)) {
 vmxnet_rx_pkt_set_protocols(s->rx_pkt, buf, size);
+vmxnet3_rx_need_csum_calculate(s->rx_pkt, buf, size);
 vmxnet_rx_pkt_attach_data(s->rx_pkt, buf, size, s->rx_vlan_stripping);
 bytes_indicated = vmxnet3_indicate_packet(s) ? size : -1;
 if (bytes_indicated < size) {
-- 
1.9.1

[Qemu-devel] [Bug 1474263] [NEW] "Image format was not specified" warning should be suppressed for the vvfat (and probably nbd) driver

2015-07-14 Thread felix

Public bug reported:

Running

qemu -drive file.driver=vvfat,file.dir=.

displays

WARNING: Image format was not specified for 'json:{"dir": ".", "driver": 
"vvfat"}' and probing guessed raw.
 Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
 Specify the 'raw' format explicitly to remove the restrictions.

However, since "images" provided by the vvfat driver are always raw (and
the first sector isn't writeable in any case), this warning is
superfluous and should not be displayed.

A similar warning is displayed for NBD devices; I suspect it should be
also disabled for similar reasons, but I'm not sure if serving non-raw
images is actually a violation of the protocol. qemu-nbd translates them
to raw images, for what it's worth, even if it may be suppressed with -f
raw.

Noticed on 2.3.0; the code that causes this behaviour is still
apparently present in today's git master
(f3a1b5068cea303a55e2a21a97e66d057eaae638). Speaking of versions: you
may want to update the copyright notice that qemu -version displays.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1474263

Title:
  "Image format was not specified" warning should be suppressed for the
  vvfat (and probably nbd) driver

Status in QEMU:
  New

Bug description:
  Running

  qemu -drive file.driver=vvfat,file.dir=.

  displays

  WARNING: Image format was not specified for 'json:{"dir": ".", "driver": 
"vvfat"}' and probing guessed raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.

  However, since "images" provided by the vvfat driver are always raw
  (and the first sector isn't writeable in any case), this warning is
  superfluous and should not be displayed.

  A similar warning is displayed for NBD devices; I suspect it should be
  also disabled for similar reasons, but I'm not sure if serving non-raw
  images is actually a violation of the protocol. qemu-nbd translates
  them to raw images, for what it's worth, even if it may be suppressed
  with -f raw.

  Noticed on 2.3.0; the code that causes this behaviour is still
  apparently present in today's git master
  (f3a1b5068cea303a55e2a21a97e66d057eaae638). Speaking of versions: you
  may want to update the copyright notice that qemu -version displays.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1474263/+subscriptions

Re: [Qemu-devel] [PATCH v7 31/42] Page request: Process incoming page request

2015-07-14 Thread Juan Quintela

"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> On receiving MIG_RPCOMM_REQ_PAGES look up the address and
> queue the page.
>
> Signed-off-by: Dr. David Alan Gilbert 


>  migrate_fd_cleanup_src_rp(s);
>  
> +/* This queue generally should be empty - but in the case of a failed
> + * migration might have some droppings in.
> + */
> +struct MigrationSrcPageRequest *mspr, *next_mspr;
> +QSIMPLEQ_FOREACH_SAFE(mspr, &s->src_page_requests, next_req, next_mspr) {
> +QSIMPLEQ_REMOVE_HEAD(&s->src_page_requests, next_req);

How nice of QSIMPLEQ.  To remove elements you don't use mspr

> +g_free(mspr);
> +}
> +
>  if (s->file) {
>  trace_migrate_fd_cleanup();
>  qemu_mutex_unlock_iothread();
> @@ -713,6 +729,8 @@ MigrationState *migrate_init(const MigrationParams 
> *params)
>  s->state = MIGRATION_STATUS_SETUP;
>  trace_migrate_set_state(MIGRATION_STATUS_SETUP);
>  
> +QSIMPLEQ_INIT(&s->src_page_requests);
> +
>  s->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>  return s;
>  }
> @@ -976,7 +994,25 @@ static void source_return_path_bad(MigrationState *s)
>  static void migrate_handle_rp_req_pages(MigrationState *ms, const char* 
> rbname,
> ram_addr_t start, ram_addr_t len)
>  {
> +long our_host_ps = getpagesize();
> +
>  trace_migrate_handle_rp_req_pages(rbname, start, len);
> +
> +/*
> + * Since we currently insist on matching page sizes, just sanity check
> + * we're being asked for whole host pages.
> + */
> +if (start & (our_host_ps-1) ||
> +   (len & (our_host_ps-1))) {


I don't know if creating a macro is a good idea?
#define HOST_ALIGN_CHECK(addr)  (addr & (getpagesize()-1))

???

Don't me wave a macro for this in qemu?

> index f7d957e..da3e9ea 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -924,6 +924,69 @@ static int ram_save_compressed_page(QEMUFile *f, 
> RAMBlock *block,
>  }
>  
>  /**
> + * Queue the pages for transmission, e.g. a request from postcopy destination
> + *   ms: MigrationStatus in which the queue is held
> + *   rbname: The RAMBlock the request is for - may be NULL (to mean reuse 
> last)
> + *   start: Offset from the start of the RAMBlock
> + *   len: Length (in bytes) to send
> + *   Return: 0 on success
> + */
> +int ram_save_queue_pages(MigrationState *ms, const char *rbname,
> + ram_addr_t start, ram_addr_t len)
> +{
> +RAMBlock *ramblock;
> +
> +rcu_read_lock();
> +if (!rbname) {
> +/* Reuse last RAMBlock */
> +ramblock = ms->last_req_rb;
> +
> +if (!ramblock) {
> +/*
> + * Shouldn't happen, we can't reuse the last RAMBlock if
> + * it's the 1st request.
> + */
> +error_report("ram_save_queue_pages no previous block");
> +goto err;
> +}
> +} else {
> +ramblock = ram_find_block(rbname);
> +
> +if (!ramblock) {
> +/* We shouldn't be asked for a non-existent RAMBlock */
> +error_report("ram_save_queue_pages no block '%s'", rbname);
> +goto err;
> +}

   Here?

> +}
> +trace_ram_save_queue_pages(ramblock->idstr, start, len);
> +if (start+len > ramblock->used_length) {
> +error_report("%s request overrun start=%zx len=%zx blocklen=%zx",
> + __func__, start, len, ramblock->used_length);
> +goto err;
> +}
> +
> +struct MigrationSrcPageRequest *new_entry =
> +g_malloc0(sizeof(struct MigrationSrcPageRequest));
> +new_entry->rb = ramblock;
> +new_entry->offset = start;
> +new_entry->len = len;
> +ms->last_req_rb = ramblock;

Can we move this line to the else?

> +
> +qemu_mutex_lock(&ms->src_page_req_mutex);
> +memory_region_ref(ramblock->mr);

I haven't looked further in the patch series yet, but I can't see on
this patch a memory_region_unref   Don't we need it?

> +QSIMPLEQ_INSERT_TAIL(&ms->src_page_requests, new_entry, next_req);
> +qemu_mutex_unlock(&ms->src_page_req_mutex);
> +rcu_read_unlock();

Of everything that we have inside the rcu_read_lock()  Is there
anything else that the memory_region_ref() that needs rcu?

Would not be possible to do the memory reference before asking for the
mutex?

Once here, do we care about calling malloc with the rcu set?  or could
we just call malloc at the beggining of the function and free it in case
that it is not needed on err?

Thanks, Juan.

[Qemu-devel] [PATCH] hid: clarify hid_keyboard_process_keycode

2015-07-14 Thread Paolo Bonzini

Coverity thinks the fallthroughs are smelly.  They are correct, but
everything else in this function is like "wut?".

Refer explicitly to bits 8 and 9 of hs->kbd.modifiers instead of
shifting right first and using (1 << 7).  Document what the scancode
is when hid_code is 0xe0.  And add plenty of comments.

Signed-off-by: Paolo Bonzini 
---
 hw/input/hid.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/input/hid.c b/hw/input/hid.c
index 6841cb8..21ebd9e 100644
--- a/hw/input/hid.c
+++ b/hw/input/hid.c
@@ -239,7 +239,7 @@ static void hid_keyboard_event(DeviceState *dev, 
QemuConsole *src,
 
 static void hid_keyboard_process_keycode(HIDState *hs)
 {
-uint8_t hid_code, key;
+uint8_t hid_code, index, key;
 int i, keycode, slot;
 
 if (hs->n == 0) {
@@ -249,7 +249,8 @@ static void hid_keyboard_process_keycode(HIDState *hs)
 keycode = hs->kbd.keycodes[slot];
 
 key = keycode & 0x7f;
-hid_code = hid_usage_keys[key | ((hs->kbd.modifiers >> 1) & (1 << 7))];
+index = key | ((hs->kbd.modifiers & (1 << 8)) >> 1);
+hid_code = hid_usage_keys[index];
 hs->kbd.modifiers &= ~(1 << 8);
 
 switch (hid_code) {
@@ -257,18 +258,41 @@ static void hid_keyboard_process_keycode(HIDState *hs)
 return;
 
 case 0xe0:
+assert(key == 0x1d);
 if (hs->kbd.modifiers & (1 << 9)) {
-hs->kbd.modifiers ^= 3 << 8;
+/* The hid_codes for the 0xe1/0x1d scancode sequence are 0xe9/0xe0.
+ * Here we're processing the second hid_code.  By dropping bit 9
+ * and setting bit 8, the scancode after 0x1d will access the
+ * second half of the table.
+ */
+hs->kbd.modifiers ^= (1 << 8) | (1 << 9);
 return;
 }
+/* fall through to process Ctrl_L */
 case 0xe1 ... 0xe7:
+/* Ctrl_L/Ctrl_R, Shift_L/Shift_R, Alt_L/Alt_R, Win_L/Win_R.
+ * Handle releases here, or fall through to process presses.
+ */
 if (keycode & (1 << 7)) {
 hs->kbd.modifiers &= ~(1 << (hid_code & 0x0f));
 return;
 }
-case 0xe8 ... 0xef:
+/* fall through */
+case 0xe8 ... 0xe9:
+/* USB modifiers are just 1 byte long.  Bits 8 and 9 of
+ * hs->kbd.modifiers implement a state machine that detects the
+ * 0xe0 and 0xe1/0x1d sequences.  These bits do not follow the
+ * usual rules where bit 7 marks released keys; they are cleared
+ * elsewhere in the function as the state machine dictates.
+ */
 hs->kbd.modifiers |= 1 << (hid_code & 0x0f);
 return;
+
+case 0xea ... 0xef:
+abort();
+
+default:
+break;
 }
 
 if (keycode & (1 << 7)) {
-- 
2.4.3

Re: [Qemu-devel] [PATCH 2/2] net/vmxnet3: Fix RX TCP/UDP checksum on partially summed packets

2015-07-14 Thread Dmitry Fleytman


> On Jul 14, 2015, at 11:55, Shmulik Ladkani 
>  wrote:
> 
> From: Dana Rubin 
> 
> Convert partially summed packets to be fully checksummed.
> 
> In case csum offloaded packet, vmxnet3 implementation always passes an
> RxCompDesc with the "Checksum calculated and found correct" notification
> to the OS. This emulates the observed ESXi behavior.
> 
> Therefore, if packet has the NEEDS_CSUM bit set, we must calculate and
> place a fully computed checksum into the tcp/udp header. Otherwise, the
> OS driver will receive a checksum-correct indication but with the actual
> tcp/udp checksum field having just the pseudo header csum value.
> 
> If host OS performs forwarding, it will forward an incorrectly
> checksummed packet.

Ack.

> 
> Signed-off-by: Dana Rubin 
> Signed-off-by: Shmulik Ladkani 
> ---
> hw/net/vmxnet3.c | 58 
> 1 file changed, 58 insertions(+)
> 
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index dd22a0a..59b06b8 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -885,6 +885,63 @@ vmxnet3_get_next_rx_descr(VMXNET3State *s, bool is_head,
> }
> }
> 
> +/* In case packet was csum offloaded (either NEEDS_CSUM or DATA_VALID),
> + * the implementation always passes an RxCompDesc with a "Checksum
> + * calculated and found correct" to the OS (cnc=0 and tuc=1, see
> + * vmxnet3_rx_update_descr). This emulates the observed ESXi behavior.
> + *
> + * Therefore, if packet has the NEEDS_CSUM set, we must calculate
> + * and place a fully computed checksum into the tcp/udp header.
> + * Otherwise, the OS driver will receive a checksum-correct indication
> + * (CHECKSUM_UNNECESSARY), but with the actual tcp/udp checksum field
> + * having just the pseudo header csum value.
> + *
> + * While this is not a problem if packet is destined for local delivery,
> + * in the case the host OS performs forwarding, it will forward an
> + * incorrectly checksummed packet.
> + */
> +static void vmxnet3_rx_need_csum_calculate(struct VmxnetRxPkt *pkt,
> +   const void *pkt_data,
> +   size_t pkt_len)
> +{
> +struct virtio_net_hdr *vhdr;
> +bool isip4, isip6, istcp, isudp;
> +uint8_t *data;
> +int len;
> +
> +if (!vmxnet_rx_pkt_has_virt_hdr(pkt)) {
> +return;
> +}
> +
> +vhdr = vmxnet_rx_pkt_get_vhdr(pkt);
> +if (!VMXNET_FLAG_IS_SET(vhdr->flags, VIRTIO_NET_HDR_F_NEEDS_CSUM)) {
> +return;
> +}
> +
> +vmxnet_rx_pkt_get_protocols(pkt, &isip4, &isip6, &isudp, &istcp);
> +if (!(isip4 || isip6) || !(istcp || isudp)) {
> +return;
> +}
> +
> +vmxnet3_dump_virt_hdr(vhdr);
> +
> +/* Validate packet len: csum_start + scum_offset + length of csum field 
> */
> +if (pkt_len < (vhdr->csum_start + vhdr->csum_offset + 2)) {
> +VMW_PKPRN("packet len:%d < csum_start(%d) + csum_offset(%d) + 2, "
> +  "cannot calculate checksum",
> +  len, vhdr->csum_start, vhdr->csum_offset);
> +return;
> +}
> +
> +data = (uint8_t *)pkt_data + vhdr->csum_start;
> +len = pkt_len - vhdr->csum_start;
> +/* Put the checksum obtained into the packet */
> +stw_be_p(data + vhdr->csum_offset, net_raw_checksum(data, len));
> +
> +vhdr->flags &= ~VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +vhdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +}
> +
> static void vmxnet3_rx_update_descr(struct VmxnetRxPkt *pkt,
> struct Vmxnet3_RxCompDesc *rxcd)
> {
> @@ -1898,6 +1955,7 @@ vmxnet3_receive(NetClientState *nc, const uint8_t *buf, 
> size_t size)
> 
> if (vmxnet3_rx_filter_may_indicate(s, buf, size)) {
> vmxnet_rx_pkt_set_protocols(s->rx_pkt, buf, size);
> +vmxnet3_rx_need_csum_calculate(s->rx_pkt, buf, size);
> vmxnet_rx_pkt_attach_data(s->rx_pkt, buf, size, s->rx_vlan_stripping);
> bytes_indicated = vmxnet3_indicate_packet(s) ? size : -1;
> if (bytes_indicated < size) {
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH 1/2] net/vmxnet3: Refactor 'vmxnet_rx_pkt_attach_data'

2015-07-14 Thread Dmitry Fleytman



> On Jul 14, 2015, at 11:55, Shmulik Ladkani 
>  wrote:
> 
> Separate RX packet protocol parsing out of 'vmxnet_rx_pkt_attach_data'.
> 

Ack

> Signed-off-by: Shmulik Ladkani 
> ---
> hw/net/vmxnet3.c   |  1 +
> hw/net/vmxnet_rx_pkt.c | 12 +---
> hw/net/vmxnet_rx_pkt.h | 11 +++
> 3 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 706e060..dd22a0a 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -1897,6 +1897,7 @@ vmxnet3_receive(NetClientState *nc, const uint8_t *buf, 
> size_t size)
> get_eth_packet_type(PKT_GET_ETH_HDR(buf)));
> 
> if (vmxnet3_rx_filter_may_indicate(s, buf, size)) {
> +vmxnet_rx_pkt_set_protocols(s->rx_pkt, buf, size);
> vmxnet_rx_pkt_attach_data(s->rx_pkt, buf, size, s->rx_vlan_stripping);
> bytes_indicated = vmxnet3_indicate_packet(s) ? size : -1;
> if (bytes_indicated < size) {
> diff --git a/hw/net/vmxnet_rx_pkt.c b/hw/net/vmxnet_rx_pkt.c
> index acbca6a..aa54629 100644
> --- a/hw/net/vmxnet_rx_pkt.c
> +++ b/hw/net/vmxnet_rx_pkt.c
> @@ -92,9 +92,6 @@ void vmxnet_rx_pkt_attach_data(struct VmxnetRxPkt *pkt, 
> const void *data,
> }
> 
> pkt->tci = tci;
> -
> -eth_get_protocols(data, len, &pkt->isip4, &pkt->isip6,
> -&pkt->isudp, &pkt->istcp);
> }
> 
> void vmxnet_rx_pkt_dump(struct VmxnetRxPkt *pkt)
> @@ -131,6 +128,15 @@ size_t vmxnet_rx_pkt_get_total_len(struct VmxnetRxPkt 
> *pkt)
> return pkt->tot_len;
> }
> 
> +void vmxnet_rx_pkt_set_protocols(struct VmxnetRxPkt *pkt, const void *data,
> + size_t len)
> +{
> +assert(pkt);
> +
> +eth_get_protocols(data, len, &pkt->isip4, &pkt->isip6,
> +&pkt->isudp, &pkt->istcp);
> +}
> +
> void vmxnet_rx_pkt_get_protocols(struct VmxnetRxPkt *pkt,
>  bool *isip4, bool *isip6,
>  bool *isudp, bool *istcp)
> diff --git a/hw/net/vmxnet_rx_pkt.h b/hw/net/vmxnet_rx_pkt.h
> index 5f8352a..a425846 100644
> --- a/hw/net/vmxnet_rx_pkt.h
> +++ b/hw/net/vmxnet_rx_pkt.h
> @@ -55,6 +55,17 @@ void vmxnet_rx_pkt_init(struct VmxnetRxPkt **pkt, bool 
> has_virt_hdr);
> size_t vmxnet_rx_pkt_get_total_len(struct VmxnetRxPkt *pkt);
> 
> /**
> + * parse and set packet analysis results
> + *
> + * @pkt:packet
> + * @data:   pointer to the data buffer to be parsed
> + * @len:data length
> + *
> + */
> +void vmxnet_rx_pkt_set_protocols(struct VmxnetRxPkt *pkt, const void *data,
> + size_t len);
> +
> +/**
>  * fetches packet analysis results
>  *
>  * @pkt:packet
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH v3 06/15] blockjob: Add .txn_commit and .txn_abort transaction actions

2015-07-14 Thread Fam Zheng

On Tue, 07/14 09:35, Stefan Hajnoczi wrote:
> On Fri, Jul 10, 2015 at 11:46:43AM +0800, Fam Zheng wrote:
> > They will be called if the job is part of a transaction, after all jobs in a
> > transaction are completed or cancelled, before calling job->cb().
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  include/block/blockjob.h | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > index dd9d5e6..a7b7f66 100644
> > --- a/include/block/blockjob.h
> > +++ b/include/block/blockjob.h
> > @@ -50,6 +50,18 @@ typedef struct BlockJobDriver {
> >   * manually.
> >   */
> >  void (*complete)(BlockJob *job, Error **errp);
> > +
> > +/**
> > + * Optional callback for job types that can be in a transaction. Called
> > + * when the transaction succeeds.
> > + */
> > +void (*txn_commit)(BlockJob *job);
> > +
> > +/**
> > + * Optional callback for job types that can be in a transaction. Call 
> > when
> > + * the transaction fails.
> > + */
> > +void (*txn_abort)(BlockJob *job);
> >  } BlockJobDriver;
> 
> The semantics of transactions aren't fully documented.  My understanding
> is that you want:
> 1. either .txn_commit() or .txn_abort() to be called
> 2. after all jobs complete (due to success, error, or cancellation).
> 
> These two points are important for understanding these interfaces.

Yes, I'll add document these.

Fam

Re: [Qemu-devel] [PATCH for-2.4 05/12] etsec: Move etsec_can_receive into etsec_receive

2015-07-14 Thread Jason Wang



On 07/14/2015 03:53 PM, Fam Zheng wrote:
> When etsec_reset returns 0, peer would queue the packet as if
> .can_receive returns false. Drop etsec_can_receive and let etsec_receive
> carry the semantics.
>
> Signed-off-by: Fam Zheng 
> ---
>  hw/net/fsl_etsec/etsec.c | 11 +--
>  hw/net/fsl_etsec/etsec.h |  2 +-
>  hw/net/fsl_etsec/rings.c | 14 --
>  3 files changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
> index c57365f..f5170ae 100644
> --- a/hw/net/fsl_etsec/etsec.c
> +++ b/hw/net/fsl_etsec/etsec.c
> @@ -338,13 +338,6 @@ static void etsec_reset(DeviceState *d)
>  MII_SR_100X_FD_CAPS | MII_SR_100T4_CAPS;
>  }
>  
> -static int etsec_can_receive(NetClientState *nc)
> -{
> -eTSEC *etsec = qemu_get_nic_opaque(nc);
> -
> -return etsec->rx_buffer_len == 0;
> -}
> -
>  static ssize_t etsec_receive(NetClientState *nc,
>   const uint8_t  *buf,
>   size_t  size)
> @@ -355,8 +348,7 @@ static ssize_t etsec_receive(NetClientState *nc,
>  fprintf(stderr, "%s receive size:%d\n", etsec->nic->nc.name, size);
>  qemu_hexdump(buf, stderr, "", size);
>  #endif
> -etsec_rx_ring_write(etsec, buf, size);
> -return size;
> +return etsec_rx_ring_write(etsec, buf, size);
>  }
>  
>  
> @@ -370,7 +362,6 @@ static void etsec_set_link_status(NetClientState *nc)
>  static NetClientInfo net_etsec_info = {
>  .type = NET_CLIENT_OPTIONS_KIND_NIC,
>  .size = sizeof(NICState),
> -.can_receive = etsec_can_receive,
>  .receive = etsec_receive,
>  .link_status_changed = etsec_set_link_status,
>  };
> diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
> index 78d2c57..fc41773 100644
> --- a/hw/net/fsl_etsec/etsec.h
> +++ b/hw/net/fsl_etsec/etsec.h
> @@ -162,7 +162,7 @@ DeviceState *etsec_create(hwaddrbase,
>  
>  void etsec_walk_tx_ring(eTSEC *etsec, int ring_nbr);
>  void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr);
> -void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
> +ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
>  
>  void etsec_write_miim(eTSEC  *etsec,
>eTSEC_Register *reg,
> diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
> index d4a494f..a11280b 100644
> --- a/hw/net/fsl_etsec/rings.c
> +++ b/hw/net/fsl_etsec/rings.c
> @@ -481,40 +481,42 @@ static void rx_init_frame(eTSEC *etsec, const uint8_t 
> *buf, size_t size)
> etsec->rx_buffer_len, etsec->rx_padding);
>  }
>  
> -void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
> +ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
>  {
>  int ring_nbr = 0;   /* Always use ring0 (no filer) */
>  
>  if (etsec->rx_buffer_len != 0) {
>  RING_DEBUG("%s: We can't receive now,"
> " a buffer is already in the pipe\n", __func__);
> -return;
> +return 0;

Is queue be flushed when rx buffer is available again?

>  }
>  
>  if (etsec->regs[RSTAT].value & 1 << (23 - ring_nbr)) {
>  RING_DEBUG("%s: The ring is halted\n", __func__);
> -return;
> +return -1;
>  }
>  
>  if (etsec->regs[DMACTRL].value & DMACTRL_GRS) {
>  RING_DEBUG("%s: Graceful receive stop\n", __func__);
> -return;
> +return -1;
>  }
>  
>  if (!(etsec->regs[MACCFG1].value & MACCFG1_RX_EN)) {
>  RING_DEBUG("%s: MAC Receive not enabled\n", __func__);
> -return;
> +return -1;
>  }
>  
>  if ((etsec->regs[RCTRL].value & RCTRL_RSF) && (size < 60)) {
>  /* CRC is not in the packet yet, so short frame is below 60 bytes */
>  RING_DEBUG("%s: Drop short frame\n", __func__);
> -return;
> +return -1;
>  }
>  
>  rx_init_frame(etsec, buf, size);
>  
>  etsec_walk_rx_ring(etsec, ring_nbr);
> +
> +return size;
>  }
>  
>  void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr)

Re: [Qemu-devel] [PATCH for-2.4 06/12] etsec: Flush queue when rx buffer is consumed

2015-07-14 Thread Jason Wang



On 07/14/2015 03:53 PM, Fam Zheng wrote:
> The BH will be scheduled when etsec->rx_buffer_len is becoming 0, which
> is the condition of queuing.
>
> Signed-off-by: Fam Zheng 

I suggest to squash this into patch 05/12 to unbreak bisection.

And can we do this without a bh? Otherwise, we may need to stop and
restart the bh during vm stop and start?

> ---
>  hw/net/fsl_etsec/etsec.c | 9 +
>  hw/net/fsl_etsec/etsec.h | 2 ++
>  hw/net/fsl_etsec/rings.c | 1 +
>  3 files changed, 12 insertions(+)
>
> diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
> index f5170ae..fcde9b4 100644
> --- a/hw/net/fsl_etsec/etsec.c
> +++ b/hw/net/fsl_etsec/etsec.c
> @@ -366,6 +366,13 @@ static NetClientInfo net_etsec_info = {
>  .link_status_changed = etsec_set_link_status,
>  };
>  
> +static void etsec_flush_rx_queue(void *opaque)
> +{
> +eTSEC *etsec = opaque;
> +
> +qemu_flush_queued_packets(qemu_get_queue(etsec->nic));
> +}
> +
>  static void etsec_realize(DeviceState *dev, Error **errp)
>  {
>  eTSEC*etsec = ETSEC_COMMON(dev);
> @@ -378,6 +385,8 @@ static void etsec_realize(DeviceState *dev, Error **errp)
>  etsec->bh = qemu_bh_new(etsec_timer_hit, etsec);
>  etsec->ptimer = ptimer_init(etsec->bh);
>  ptimer_set_freq(etsec->ptimer, 100);
> +
> +etsec->flush_bh = qemu_bh_new(etsec_flush_rx_queue, etsec);
>  }
>  
>  static void etsec_instance_init(Object *obj)
> diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
> index fc41773..05bb7f8 100644
> --- a/hw/net/fsl_etsec/etsec.h
> +++ b/hw/net/fsl_etsec/etsec.h
> @@ -144,6 +144,8 @@ typedef struct eTSEC {
>  QEMUBH *bh;
>  struct ptimer_state *ptimer;
>  
> +QEMUBH *flush_bh;
> +
>  } eTSEC;
>  
>  #define TYPE_ETSEC_COMMON "eTSEC"
> diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
> index a11280b..7aae93e 100644
> --- a/hw/net/fsl_etsec/rings.c
> +++ b/hw/net/fsl_etsec/rings.c
> @@ -646,6 +646,7 @@ void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr)
>  } else {
>  etsec->rx_buffer_len = 0;
>  etsec->rx_buffer = NULL;
> +qemu_bh_schedule(etsec->flush_bh);
>  }
>  
>  RING_DEBUG("eTSEC End of ring_write: remaining_data:%zu\n", 
> remaining_data);

[Qemu-devel] [PATCH v4 2/2] QOM: object_property_add() performance improvement

2015-07-14 Thread Pavel Fedin

Avoid repetitive lookup of every property in array starting from 0 by adding
one more property which caches last used index. Every time an array is
expanded the index is picked up from this cache.

The property is a uint32_t and its name is name of the array plus '#'
('name#'). It has getter function in order to allow to inspect it from
within monitor.

Another optimization includes avoiding reallocation of 'full_name' every
iteration. Instead, name_no_array buffer is extended and reused.

Signed-off-by: Pavel Fedin 
Reviewed-by: Peter Crosthwaite 
---
 qom/object.c | 51 ---
 1 file changed, 40 insertions(+), 11 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index ba63777..5820df2 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -10,6 +10,8 @@
  * See the COPYING file in the top-level directory.
  */
 
+#include 
+
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
 #include "qemu-common.h"
@@ -862,6 +864,14 @@ object_property_add_single(Object *obj, const char *name, 
const char *type,
 return prop;
 }
 
+static void
+property_get_uint32_opaque(Object *obj, Visitor *v, void *opaque,
+   const char *name, Error **errp)
+{
+uint32_t value = (uintptr_t)opaque;
+visit_type_uint32(v, &value, name, errp);
+}
+
 ObjectProperty *
 object_property_add(Object *obj, const char *name, const char *type,
 ObjectPropertyAccessor *get,
@@ -871,27 +881,46 @@ object_property_add(Object *obj, const char *name, const 
char *type,
 {
 size_t name_len = strlen(name);
 char *name_no_array;
-ObjectProperty *ret;
-int i;
+ObjectProperty *ret, *count;
+uintptr_t i;
 
 if (name_len < 3 || memcmp(&name[name_len - 3], "[*]", 4)) {
 return object_property_add_single(obj, name, type,
   get, set, release, opaque, errp);
 }
 
-name_no_array = g_strdup(name);
-
-name_no_array[name_len - 3] = '\0';
-for (i = 0; ; ++i) {
-char *full_name = g_strdup_printf("%s[%d]", name_no_array, i);
-
-ret = object_property_add(obj, full_name, type, get, set,
-  release, opaque, NULL);
-g_free(full_name);
+/* 20 characters for maximum possible uintptr_t (64-bit) */
+name_no_array = g_malloc(name_len + 20);
+name_len -= 3;
+memcpy(name_no_array, name, name_len);
+
+name_no_array[name_len] = '#';
+name_no_array[name_len + 1] = '\0';
+count = object_property_find(obj, name_no_array, NULL);
+if (count == NULL) {
+/* This is very similar to object_property_add_uint32_ptr(), but:
+ * - Returns pointer
+ * - Will not recurse here, avoiding one condition check
+ * - Allows us to use 'opaque' pointer itself as a storage, because
+ *   we want to store only a single integer which should never be
+ *   modified from outside.
+ */
+count = object_property_add_single(obj, name_no_array, "uint32",
+   property_get_uint32_opaque, NULL,
+   NULL, NULL, &error_abort);
+}
+
+for (i = (uintptr_t)count->opaque; ; ++i) {
+g_sprintf(&name_no_array[name_len], "[%zu]", i);
+
+ret = object_property_add_single(obj, name_no_array, type, get, set,
+ release, opaque, NULL);
 if (ret) {
 break;
 }
 }
+
+count->opaque = (void *)i;
 g_free(name_no_array);
 return ret;
 }
-- 
1.9.5.msysgit.0

[Qemu-devel] [PATCH v4 1/2] QOM: Introduce object_property_add_single()

2015-07-14 Thread Pavel Fedin

Refactoring of object_property_add() before performance optimization of the
array expansion code

Signed-off-by: Pavel Fedin 
Reviewed-by: Peter Crosthwaite 
---
 qom/object.c | 67 
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index eea8edf..ba63777 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -830,35 +830,14 @@ void object_unref(Object *obj)
 }
 }
 
-ObjectProperty *
-object_property_add(Object *obj, const char *name, const char *type,
-ObjectPropertyAccessor *get,
-ObjectPropertyAccessor *set,
-ObjectPropertyRelease *release,
-void *opaque, Error **errp)
+static ObjectProperty *
+object_property_add_single(Object *obj, const char *name, const char *type,
+   ObjectPropertyAccessor *get,
+   ObjectPropertyAccessor *set,
+   ObjectPropertyRelease *release,
+   void *opaque, Error **errp)
 {
 ObjectProperty *prop;
-size_t name_len = strlen(name);
-
-if (name_len >= 3 && !memcmp(name + name_len - 3, "[*]", 4)) {
-int i;
-ObjectProperty *ret;
-char *name_no_array = g_strdup(name);
-
-name_no_array[name_len - 3] = '\0';
-for (i = 0; ; ++i) {
-char *full_name = g_strdup_printf("%s[%d]", name_no_array, i);
-
-ret = object_property_add(obj, full_name, type, get, set,
-  release, opaque, NULL);
-g_free(full_name);
-if (ret) {
-break;
-}
-}
-g_free(name_no_array);
-return ret;
-}
 
 QTAILQ_FOREACH(prop, &obj->properties, node) {
 if (strcmp(prop->name, name) == 0) {
@@ -883,6 +862,40 @@ object_property_add(Object *obj, const char *name, const 
char *type,
 return prop;
 }
 
+ObjectProperty *
+object_property_add(Object *obj, const char *name, const char *type,
+ObjectPropertyAccessor *get,
+ObjectPropertyAccessor *set,
+ObjectPropertyRelease *release,
+void *opaque, Error **errp)
+{
+size_t name_len = strlen(name);
+char *name_no_array;
+ObjectProperty *ret;
+int i;
+
+if (name_len < 3 || memcmp(&name[name_len - 3], "[*]", 4)) {
+return object_property_add_single(obj, name, type,
+  get, set, release, opaque, errp);
+}
+
+name_no_array = g_strdup(name);
+
+name_no_array[name_len - 3] = '\0';
+for (i = 0; ; ++i) {
+char *full_name = g_strdup_printf("%s[%d]", name_no_array, i);
+
+ret = object_property_add(obj, full_name, type, get, set,
+  release, opaque, NULL);
+g_free(full_name);
+if (ret) {
+break;
+}
+}
+g_free(name_no_array);
+return ret;
+}
+
 ObjectProperty *object_property_find(Object *obj, const char *name,
  Error **errp)
 {
-- 
1.9.5.msysgit.0

[Qemu-devel] [PATCH v4 0/2] QOM: object_property_add() performance improvement

2015-07-14 Thread Pavel Fedin

The function originally behaves very badly when adding properties with "[*]"
suffix. Normally these are used for numbering IRQ pins. In order to find the
correct starting number the function started from zero and checked for
duplicates. This takes incredibly long time with large number of CPUs because
number of IRQ pins on some architectures (like ARM GICv3) gets multiplied by
number of CPUs.

The solution is to add one more property which caches last used index so that
duplication check is not repeated thousands of times. Every time an array is
expanded the index is picked up from this cache.

The modification decreases qemu startup time with 32 CPUs by a factor of 2
(~10 sec vs ~20 sec).

Pavel Fedin (2):
  QOM: Introduce object_property_add_single()
  QOM: object_property_add() performance improvement

 qom/object.c | 96 +++-
 1 file changed, 69 insertions(+), 27 deletions(-)

-- 
1.9.5.msysgit.0

Re: [Qemu-devel] [PATCH v7 32/42] Page request: Consume pages off the post-copy queue

2015-07-14 Thread Juan Quintela

"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> When transmitting RAM pages, consume pages that have been queued by
> MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.
>
> Note:
>   a) After a queued page the linear walk carries on from after the
> unqueued page; there is a reasonable chance that the destination
> was about to ask for other closeby pages anyway.
>
>   b) We have to be careful of any assumptions that the page walking
> code makes, in particular it does some short cuts on its first linear
> walk that break as soon as we do a queued page.
>
>   c) We have to be careful to not break up host-page size chunks, since
> this makes it harder to place the pages on the destination.
>
> Signed-off-by: Dr. David Alan Gilbert 


> +static bool last_was_from_queue;

Are we using this variable later in the series?

>  static void migration_bitmap_sync_range(ram_addr_t start, ram_addr_t length)
>  {
>  migration_dirty_pages +=
> @@ -923,6 +933,41 @@ static int ram_save_compressed_page(QEMUFile *f, 
> RAMBlock *block,
>  return pages;
>  }
>  
> +/*
> + * Unqueue a page from the queue fed by postcopy page requests
> + *
> + * Returns:  The RAMBlock* to transmit from (or NULL if the queue is 
> empty)
> + *  ms:  MigrationState in
> + *  offset:  the byte offset within the RAMBlock for the start of the 
> page
> + * ram_addr_abs: global offset in the dirty/sent bitmaps
> + */
> +static RAMBlock *ram_save_unqueue_page(MigrationState *ms, ram_addr_t 
> *offset,
> +   ram_addr_t *ram_addr_abs)
> +{
> +RAMBlock *result = NULL;
> +qemu_mutex_lock(&ms->src_page_req_mutex);
> +if (!QSIMPLEQ_EMPTY(&ms->src_page_requests)) {
> +struct MigrationSrcPageRequest *entry =
> +QSIMPLEQ_FIRST(&ms->src_page_requests);
> +result = entry->rb;
> +*offset = entry->offset;
> +*ram_addr_abs = (entry->offset + entry->rb->offset) & 
> TARGET_PAGE_MASK;
> +
> +if (entry->len > TARGET_PAGE_SIZE) {
> +entry->len -= TARGET_PAGE_SIZE;
> +entry->offset += TARGET_PAGE_SIZE;
> +} else {
> +memory_region_unref(result->mr);

Here it is the unref, but I still don't understand why we don't need to
undo that on the error case on previous patch.

> +QSIMPLEQ_REMOVE_HEAD(&ms->src_page_requests, next_req);
> +g_free(entry);
> +}
> +}
> +qemu_mutex_unlock(&ms->src_page_req_mutex);
> +
> +return result;
> +}
> +
> +
>  /**
>   * Queue the pages for transmission, e.g. a request from postcopy destination
>   *   ms: MigrationStatus in which the queue is held
> @@ -987,6 +1032,58 @@ err:
>  

> @@ -997,65 +1094,102 @@ err:
>   * @f: QEMUFile where to send the data
>   * @last_stage: if we are at the completion stage
>   * @bytes_transferred: increase it with the number of transferred bytes
> + *
> + * On systems where host-page-size > target-page-size it will send all the
> + * pages in a host page that are dirty.
>   */
>  
>  static int ram_find_and_save_block(QEMUFile *f, bool last_stage,
> uint64_t *bytes_transferred)
>  {
> +MigrationState *ms = migrate_get_current();
>  RAMBlock *block = last_seen_block;
> +RAMBlock *tmpblock;
>  ram_addr_t offset = last_offset;
> +ram_addr_t tmpoffset;
>  bool complete_round = false;
>  int pages = 0;
> -MemoryRegion *mr;
>  ram_addr_t dirty_ram_abs; /* Address of the start of the dirty page in
>   ram_addr_t space */
>  
> -if (!block)
> +if (!block) {
>  block = QLIST_FIRST_RCU(&ram_list.blocks);
> +last_was_from_queue = false;
> +}
>  
> -while (true) {
> -mr = block->mr;
> -offset = migration_bitmap_find_and_reset_dirty(mr, offset,
> -   &dirty_ram_abs);
> -if (complete_round && block == last_seen_block &&
> -offset >= last_offset) {
> -break;
> -}
> -if (offset >= block->used_length) {
> -offset = 0;
> -block = QLIST_NEXT_RCU(block, next);
> -if (!block) {
> -block = QLIST_FIRST_RCU(&ram_list.blocks);
> -complete_round = true;
> -ram_bulk_stage = false;
> -if (migrate_use_xbzrle()) {
> -/* If xbzrle is on, stop using the data compression at 
> this
> - * point. In theory, xbzrle can do better than 
> compression.
> - */
> -flush_compressed_data(f);
> -compression_switch = false;
> -}
> +while (true) { /* Until we send a block or run out of stuff to send */
> +tmpblock = ram_save_unqueue_page(ms, &tmpoffset, &dirty_ram_abs);

This function was ugly.  You al

[Qemu-devel] [PATCH for-2.4] virtio-net: Flush incoming queues when DRIVER_OK is being set

2015-07-14 Thread Fam Zheng

This patch fixes network hang after "stop" then "cont", while network
packets keep arriving.

Tested both manually (tap, host pinging guest) and with Jason's qtest
series (plus his "[PATCH 2.4] socket: pass correct size in
net_socket_send()" fix).

As virtio_net_set_status is called when guest driver is setting status
byte and when vm state is changing, it is a good opportunity to flush
queued packets.

This is necessary because during vm stop the backend (e.g. tap) would
stop rx processing after .can_receive returns false, until the queue is
explicitly flushed or purged.

The other interesting condition in .can_receive, virtio_queue_ready(),
is handled by virtio_net_handle_rx() when guest kicks; the 3rd condition
is invalid queue index which doesn't need flushing.

Signed-off-by: Fam Zheng 
---
 hw/net/virtio-net.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index d728233..7c178c6 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -162,8 +162,13 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 virtio_net_vhost_status(n, status);
 
 for (i = 0; i < n->max_queues; i++) {
+NetClientState *ncs = qemu_get_subqueue(n->nic, i);
 q = &n->vqs[i];
 
+if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+qemu_flush_queued_packets(ncs);
+}
+
 if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
 queue_status = 0;
 } else {
-- 
2.4.3

Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file

2015-07-14 Thread Richard W.M. Jones

On Mon, Jul 13, 2015 at 04:09:37PM -0400, Gabriel L. Somlo wrote:
> 3. I'm currently only handling x86 and I/O ports. I could drop the
>fw_cfg_dmi_whitelist and just check the signature, using mmio where
>appropriate, but I don't have a handy-dandy set of VMs for those
>architectures on which I could test. Wondering if that's something
>we should have before I officially try to submit this to the kernel,
>or whether it could wait for a second iteration.

  $ virt-builder --arch armv7l fedora-22
or:
  $ virt-builder --arch aarch64 fedora-22
then:
  $ virt-builder --get-kernel fedora-22.img

and then boot is using the right qemu command, probably something
like:

  $ qemu-system-arm \
  -M virt,accel=tcg \
  -cpu cortex-a15 \
  -kernel vmlinuz-4.0.4-301.fc22.armv7hl+lpae \
  -initrd initramfs-4.0.4-301.fc22.armv7hl+lpae.img \
  -append "console=ttyAMA0 root=/dev/vda3 ro" \
  -drive file=fedora-22.img,if=none,id=hd \
  -device virtio-blk-device,drive=hd \
  -serial stdio

The root password is printed in virt-builder output.

> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static inline void fw_cfg_read_blob(uint16_t select,
>void *buf, loff_t pos, size_t count)
> {
>   mutex_lock(&fw_cfg_dev_lock);
>   outw(select, FW_CFG_PORT_CTL);
>   while (pos-- > 0)
>   inb(FW_CFG_PORT_DATA);
>   insb(FW_CFG_PORT_DATA, buf, count);
>   mutex_unlock(&fw_cfg_dev_lock);
> }

How slow is this?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Re: [Qemu-devel] [PATCH for-2.4 06/12] etsec: Flush queue when rx buffer is consumed

2015-07-14 Thread Fam Zheng

On Tue, 07/14 17:33, Jason Wang wrote:
> 
> 
> On 07/14/2015 03:53 PM, Fam Zheng wrote:
> > The BH will be scheduled when etsec->rx_buffer_len is becoming 0, which
> > is the condition of queuing.
> >
> > Signed-off-by: Fam Zheng 
> 
> I suggest to squash this into patch 05/12 to unbreak bisection.

05/12 didn't change the logic, it just moved the semantics from returning 0 in
.can_receive() to .receive(). So I think it's OK to split to make reviewing
easier.

> 
> And can we do this without a bh? Otherwise, we may need to stop and
> restart the bh during vm stop and start?

A bh doesn't hurt when vm stop and restart (we get superfluous flush),
otherwise the call stack could go very deap with a long queue, something like:

etsec_receive
  etsec_rx_ring_write
etsec_walk_rx_ring
  etsec_flush_rx_queue
qemu_flush_queued_packets
  ...
etsec_receive
  etsec_rx_ring_write
etsec_walk_rx_ring
  etsec_flush_rx_queue
qemu_flush_queued_packets
  /* loop */
...

> 
> > ---
> >  hw/net/fsl_etsec/etsec.c | 9 +
> >  hw/net/fsl_etsec/etsec.h | 2 ++
> >  hw/net/fsl_etsec/rings.c | 1 +
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
> > index f5170ae..fcde9b4 100644
> > --- a/hw/net/fsl_etsec/etsec.c
> > +++ b/hw/net/fsl_etsec/etsec.c
> > @@ -366,6 +366,13 @@ static NetClientInfo net_etsec_info = {
> >  .link_status_changed = etsec_set_link_status,
> >  };
> >  
> > +static void etsec_flush_rx_queue(void *opaque)
> > +{
> > +eTSEC *etsec = opaque;
> > +
> > +qemu_flush_queued_packets(qemu_get_queue(etsec->nic));
> > +}
> > +
> >  static void etsec_realize(DeviceState *dev, Error **errp)
> >  {
> >  eTSEC*etsec = ETSEC_COMMON(dev);
> > @@ -378,6 +385,8 @@ static void etsec_realize(DeviceState *dev, Error 
> > **errp)
> >  etsec->bh = qemu_bh_new(etsec_timer_hit, etsec);
> >  etsec->ptimer = ptimer_init(etsec->bh);
> >  ptimer_set_freq(etsec->ptimer, 100);
> > +
> > +etsec->flush_bh = qemu_bh_new(etsec_flush_rx_queue, etsec);
> >  }
> >  
> >  static void etsec_instance_init(Object *obj)
> > diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
> > index fc41773..05bb7f8 100644
> > --- a/hw/net/fsl_etsec/etsec.h
> > +++ b/hw/net/fsl_etsec/etsec.h
> > @@ -144,6 +144,8 @@ typedef struct eTSEC {
> >  QEMUBH *bh;
> >  struct ptimer_state *ptimer;
> >  
> > +QEMUBH *flush_bh;
> > +
> >  } eTSEC;
> >  
> >  #define TYPE_ETSEC_COMMON "eTSEC"
> > diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
> > index a11280b..7aae93e 100644
> > --- a/hw/net/fsl_etsec/rings.c
> > +++ b/hw/net/fsl_etsec/rings.c
> > @@ -646,6 +646,7 @@ void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr)
> >  } else {
> >  etsec->rx_buffer_len = 0;
> >  etsec->rx_buffer = NULL;
> > +qemu_bh_schedule(etsec->flush_bh);
> >  }
> >  
> >  RING_DEBUG("eTSEC End of ring_write: remaining_data:%zu\n", 
> > remaining_data);
>

Re: [Qemu-devel] [PATCH for-2.4 05/12] etsec: Move etsec_can_receive into etsec_receive

2015-07-14 Thread Fam Zheng

On Tue, 07/14 17:30, Jason Wang wrote:
> 
> 
> On 07/14/2015 03:53 PM, Fam Zheng wrote:
> > When etsec_reset returns 0, peer would queue the packet as if
> > .can_receive returns false. Drop etsec_can_receive and let etsec_receive
> > carry the semantics.
> >
> > Signed-off-by: Fam Zheng 
> > ---
> >  hw/net/fsl_etsec/etsec.c | 11 +--
> >  hw/net/fsl_etsec/etsec.h |  2 +-
> >  hw/net/fsl_etsec/rings.c | 14 --
> >  3 files changed, 10 insertions(+), 17 deletions(-)
> >
> > diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
> > index c57365f..f5170ae 100644
> > --- a/hw/net/fsl_etsec/etsec.c
> > +++ b/hw/net/fsl_etsec/etsec.c
> > @@ -338,13 +338,6 @@ static void etsec_reset(DeviceState *d)
> >  MII_SR_100X_FD_CAPS | MII_SR_100T4_CAPS;
> >  }
> >  
> > -static int etsec_can_receive(NetClientState *nc)
> > -{
> > -eTSEC *etsec = qemu_get_nic_opaque(nc);
> > -
> > -return etsec->rx_buffer_len == 0;
> > -}
> > -
> >  static ssize_t etsec_receive(NetClientState *nc,
> >   const uint8_t  *buf,
> >   size_t  size)
> > @@ -355,8 +348,7 @@ static ssize_t etsec_receive(NetClientState *nc,
> >  fprintf(stderr, "%s receive size:%d\n", etsec->nic->nc.name, size);
> >  qemu_hexdump(buf, stderr, "", size);
> >  #endif
> > -etsec_rx_ring_write(etsec, buf, size);
> > -return size;
> > +return etsec_rx_ring_write(etsec, buf, size);
> >  }
> >  
> >  
> > @@ -370,7 +362,6 @@ static void etsec_set_link_status(NetClientState *nc)
> >  static NetClientInfo net_etsec_info = {
> >  .type = NET_CLIENT_OPTIONS_KIND_NIC,
> >  .size = sizeof(NICState),
> > -.can_receive = etsec_can_receive,
> >  .receive = etsec_receive,
> >  .link_status_changed = etsec_set_link_status,
> >  };
> > diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
> > index 78d2c57..fc41773 100644
> > --- a/hw/net/fsl_etsec/etsec.h
> > +++ b/hw/net/fsl_etsec/etsec.h
> > @@ -162,7 +162,7 @@ DeviceState *etsec_create(hwaddrbase,
> >  
> >  void etsec_walk_tx_ring(eTSEC *etsec, int ring_nbr);
> >  void etsec_walk_rx_ring(eTSEC *etsec, int ring_nbr);
> > -void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
> > +ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size);
> >  
> >  void etsec_write_miim(eTSEC  *etsec,
> >eTSEC_Register *reg,
> > diff --git a/hw/net/fsl_etsec/rings.c b/hw/net/fsl_etsec/rings.c
> > index d4a494f..a11280b 100644
> > --- a/hw/net/fsl_etsec/rings.c
> > +++ b/hw/net/fsl_etsec/rings.c
> > @@ -481,40 +481,42 @@ static void rx_init_frame(eTSEC *etsec, const uint8_t 
> > *buf, size_t size)
> > etsec->rx_buffer_len, etsec->rx_padding);
> >  }
> >  
> > -void etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
> > +ssize_t etsec_rx_ring_write(eTSEC *etsec, const uint8_t *buf, size_t size)
> >  {
> >  int ring_nbr = 0;   /* Always use ring0 (no filer) */
> >  
> >  if (etsec->rx_buffer_len != 0) {
> >  RING_DEBUG("%s: We can't receive now,"
> > " a buffer is already in the pipe\n", __func__);
> > -return;
> > +return 0;
> 
> Is queue be flushed when rx buffer is available again?

Previously not, and it's fixed in patch 6.

Fam

Re: [Qemu-devel] Patches

2015-07-14 Thread Peter Maydell

On 14 July 2015 at 09:12, Peter Crosthwaite
 wrote:
> Hi Stefan, Peter,
>
> Is patches down? I don't see any patches for the last 3 days and there
> is definitely new things on list:
>
> $ ./patches fetch http://vmsplice.net/~patches/patches.json
> Fetched info on 689 patch series
> Fetching mboxes...
> $ ./patches list | more
> Message-id: 1436556159-3002-1-git-send-email...@weilnetz.de
> From: Stefan Weil 
> Date: 2015-07-10
> Tags: for-2.4
>[1/1] tci: Fix regression with INDEX_op_qemu_st_i32, INDEX_op_qemu_st_i64

The other patches instance also seems to have got wedged
(http://wiki.qemu.org/patches/patches.json): that
2015-07-10 patch is the last one it knows about too.

-- PMM

Re: [Qemu-devel] [PATCH v3 09/15] blockjob: Move BlockJobDeferToMainLoopData into BlockJob

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 11:46:46AM +0800, Fam Zheng wrote:
> diff --git a/blockjob.c b/blockjob.c
> index 11b48f5..e057dd5 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -92,6 +92,10 @@ void block_job_completed(BlockJob *job, int ret)
>  assert(!job->completed);
>  job->completed = true;
>  job->ret = ret;
> +if (job->defer_to_main_loop_data.bh) {
> +qemu_bh_delete(job->defer_to_main_loop_data.bh);
> +job->defer_to_main_loop_data.bh = NULL;
> +}
>  job->cb(job->opaque, ret);
>  block_job_release(bs);
>  }

This doesn't make sense to me:

1. This function is called from a defer_to_main_loop BH so
   job->defer_to_main_loop_data.bh should already be running here.

   In fact, we've already called qemu_bh_delete() in
   block_job_defer_to_main_loop_bh().  This call is pointless but you
   could change it to an assertion and assign bh = NULL in
   block_job_defer_to_main_loop_bh().

2. If there was a BH scheduled, why is it safe to silently delete it?
   Wouldn't the caller rely on the BH code executing to clean up, for
   example?

If you drop this if statement, is it necessary to move
BlockJobDeferToMainLoopData into BlockJob at all?  Maybe you can just
drop this patch.

> @@ -344,44 +348,36 @@ BlockErrorAction block_job_error_action(BlockJob *job, 
> BlockDriverState *bs,
>  return action;
>  }
>  
> -typedef struct {
> -BlockJob *job;
> -QEMUBH *bh;
> -AioContext *aio_context;
> -BlockJobDeferToMainLoopFn *fn;
> -void *opaque;
> -} BlockJobDeferToMainLoopData;
> -
>  static void block_job_defer_to_main_loop_bh(void *opaque)
>  {
> -BlockJobDeferToMainLoopData *data = opaque;
> +BlockJob *job = opaque;
> +/* Copy the struct in case job get released in data.fn() */
> +BlockJobDeferToMainLoopData data = job->defer_to_main_loop_data;

A comment is warranted since this access is only safe due to the
following:

The job may still be executing in another thread (the AioContext hasn't
been acquired by us yet), but job->defer_to_main_loop_data isn't
modified by the other thread after the qemu_bh_schedule() call.

Additionally, the atomic_xchg memory barriers in qemu_bh_schedule() and
aio_bh_poll() to ensure that this thread sees the latest
job->defer_to_main_loop_data data.

Access to other job fields would probably not be safe here!

pgpEBlOuPeyiB.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 33/42] postcopy_ram.c: place_page and helpers

2015-07-14 Thread Juan Quintela

"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> postcopy_place_page (etc) provide a way for postcopy to place a page
> into guests memory atomically (using the copy ioctl on the ufd).
>
> Signed-off-by: Dr. David Alan Gilbert 

> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -69,4 +69,20 @@ void postcopy_discard_send_range(MigrationState *ms, 
> PostcopyDiscardState *pds,
>  void postcopy_discard_send_finish(MigrationState *ms,
>PostcopyDiscardState *pds);
>  
> +/*
> + * Place a page (from) at (host) efficiently
> + *There are restrictions on how 'from' must be mapped, in general best
> + *to use other postcopy_ routines to allocate.
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +bool all_zero);
> +
> +/*
> + * Allocate a page of memory that can be mapped at a later point in time
> + * using postcopy_place_page
> + * Returns: Pointer to allocated page
> + */
> +void *postcopy_get_tmp_page(MigrationIncomingState *mis);
> +

I don't think that this makes sense, but wouldn't have been a good idea
to ask for the address that we want as a hint.  That could help with
fragmentation, no?

> +/*
> + * Place a host page (from) at (host) atomically
> + * all_zero: Hint that the page being placed is 0 throughout
> + * returns 0 on success
> + */
> +int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> +bool all_zero)

postcop_place_page() and postcop_place_zero_page()?  They just share a
trace point :p


> +{
> +if (!all_zero) {
> +struct uffdio_copy copy_struct;
> +
> +copy_struct.dst = (uint64_t)(uintptr_t)host;
> +copy_struct.src = (uint64_t)(uintptr_t)from;
> +copy_struct.len = getpagesize();
> +copy_struct.mode = 0;
> +
> +/* copy also acks to the kernel waking the stalled thread up
> + * TODO: We can inhibit that ack and only do it if it was requested
> + * which would be slightly cheaper, but we'd have to be careful
> + * of the order of updating our page state.
> + */
> +if (ioctl(mis->userfault_fd, UFFDIO_COPY, ©_struct)) {
> +int e = errno;
> +error_report("%s: %s copy host: %p from: %p",
> + __func__, strerror(e), host, from);
> +
> +return -e;
> +}
> +} else {
> +struct uffdio_zeropage zero_struct;
> +
> +zero_struct.range.start = (uint64_t)(uintptr_t)host;
> +zero_struct.range.len = getpagesize();
> +zero_struct.mode = 0;
> +
> +if (ioctl(mis->userfault_fd, UFFDIO_ZEROPAGE, &zero_struct)) {
> +int e = errno;
> +error_report("%s: %s zero host: %p from: %p",
> + __func__, strerror(e), host, from);
> +
> +return -e;
> +}
> +}
> +
> +trace_postcopy_place_page(host, all_zero);
> +return 0;
> +}

I really think that the userfault code should be in a linux specific
file, but that can be done late, so I will not insist O:-)

Later, Juan.

[Qemu-devel] [PATCH] target-mips: fix resource leak reported by Coverity

2015-07-14 Thread Leon Alrae

UHI assert and link operations call lock_user_string() twice to obtain two
strings pointed by gpr[4] and gpr[5]. If the second lock_user_string()
fails, then the first one won't get freed. Fix this by introducing another
macro responsible for obtaining two strings and handling allocation
failure.

Signed-off-by: Leon Alrae 
---
 target-mips/mips-semi.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/target-mips/mips-semi.c b/target-mips/mips-semi.c
index 1162c76..5050940 100644
--- a/target-mips/mips-semi.c
+++ b/target-mips/mips-semi.c
@@ -220,6 +220,23 @@ static int copy_argn_to_target(CPUMIPSState *env, int 
arg_num,
 }   \
 } while (0)
 
+#define GET_TARGET_STRINGS_2(p, addr, p2, addr2)\
+do {\
+p = lock_user_string(addr); \
+if (!p) {   \
+gpr[2] = -1;\
+gpr[3] = EFAULT;\
+goto uhi_done;  \
+}   \
+p2 = lock_user_string(addr2);   \
+if (!p2) {  \
+unlock_user(p, addr, 0);\
+gpr[2] = -1;\
+gpr[3] = EFAULT;\
+goto uhi_done;  \
+}   \
+} while (0)
+
 #define FREE_TARGET_STRING(p, gpr)  \
 do {\
 unlock_user(p, gpr, 0); \
@@ -322,8 +339,7 @@ void helper_do_semihosting(CPUMIPSState *env)
 FREE_TARGET_STRING(p, gpr[4]);
 break;
 case UHI_assert:
-GET_TARGET_STRING(p, gpr[4]);
-GET_TARGET_STRING(p2, gpr[5]);
+GET_TARGET_STRINGS_2(p, gpr[4], p2, gpr[5]);
 printf("assertion '");
 printf("\"%s\"", p);
 printf("': file \"%s\", line %d\n", p2, (int)gpr[6]);
@@ -341,8 +357,7 @@ void helper_do_semihosting(CPUMIPSState *env)
 break;
 #ifndef _WIN32
 case UHI_link:
-GET_TARGET_STRING(p, gpr[4]);
-GET_TARGET_STRING(p2, gpr[5]);
+GET_TARGET_STRINGS_2(p, gpr[4], p2, gpr[5]);
 gpr[2] = link(p, p2);
 gpr[3] = errno_mips(errno);
 FREE_TARGET_STRING(p2, gpr[5]);
-- 
2.1.0

[Qemu-devel] [PATCH] target-mips: fix logically dead code reported by Coverity

2015-07-14 Thread Leon Alrae

Make use of CMPOP in floating-point compare instructions.

Signed-off-by: Leon Alrae 
---
 target-mips/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 7302857..4a1ffdb 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -9552,6 +9552,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode 
op1,
 gen_cmp_s(ctx, func-48, ft, fs, cc);
 opn = condnames[func-48];
 }
+optype = CMPOP;
 break;
 case OPC_ADD_D:
 check_cp1_registers(ctx, fs | ft | fd);
@@ -10036,6 +10037,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode 
op1,
 gen_cmp_d(ctx, func-48, ft, fs, cc);
 opn = condnames[func-48];
 }
+optype = CMPOP;
 break;
 case OPC_CVT_S_D:
 check_cp1_registers(ctx, fs);
@@ -10461,6 +10463,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode 
op1,
 gen_cmp_ps(ctx, func-48, ft, fs, cc);
 opn = condnames[func-48];
 }
+optype = CMPOP;
 break;
 default:
 MIPS_INVAL(opn);
-- 
2.1.0

Re: [Qemu-devel] [PATCH for-2.4] virtio-net: Flush incoming queues when DRIVER_OK is being set

2015-07-14 Thread Michael S. Tsirkin

On Tue, Jul 14, 2015 at 05:41:04PM +0800, Fam Zheng wrote:
> This patch fixes network hang after "stop" then "cont", while network
> packets keep arriving.
> 
> Tested both manually (tap, host pinging guest) and with Jason's qtest
> series (plus his "[PATCH 2.4] socket: pass correct size in
> net_socket_send()" fix).
> 
> As virtio_net_set_status is called when guest driver is setting status
> byte and when vm state is changing, it is a good opportunity to flush
> queued packets.
> 
> This is necessary because during vm stop the backend (e.g. tap) would
> stop rx processing after .can_receive returns false, until the queue is
> explicitly flushed or purged.
> 
> The other interesting condition in .can_receive, virtio_queue_ready(),
> is handled by virtio_net_handle_rx() when guest kicks; the 3rd condition
> is invalid queue index which doesn't need flushing.
> 
> Signed-off-by: Fam Zheng 
> ---
>  hw/net/virtio-net.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index d728233..7c178c6 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -162,8 +162,13 @@ static void virtio_net_set_status(struct VirtIODevice 
> *vdev, uint8_t status)
>  virtio_net_vhost_status(n, status);
>  
>  for (i = 0; i < n->max_queues; i++) {
> +NetClientState *ncs = qemu_get_subqueue(n->nic, i);
>  q = &n->vqs[i];
>  
> +if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +qemu_flush_queued_packets(ncs);
> +}
> +
>  if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
>  queue_status = 0;
>  } else {

I think this should be limited to
virtio_net_started(n, queue_status) && !n->vhost_started

> -- 
> 2.4.3

[Qemu-devel] [PATCH] target-mips: correct DERET instruction

2015-07-14 Thread Leon Alrae

Fix Debug Mode flag clearing, and when DERET is placed between LL and SC
do not make SC fail.

Signed-off-by: Leon Alrae 
---
 target-mips/op_helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 2a9ddff..d461ad8 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -2154,10 +2154,9 @@ void helper_deret(CPUMIPSState *env)
 debug_pre_eret(env);
 set_pc(env, env->CP0_DEPC);
 
-env->hflags &= MIPS_HFLAG_DM;
+env->hflags &= ~MIPS_HFLAG_DM;
 compute_hflags(env);
 debug_post_eret(env);
-env->lladdr = 1;
 }
 #endif /* !CONFIG_USER_ONLY */
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH for-2.4] virtio-net: Flush incoming queues when DRIVER_OK is being set

2015-07-14 Thread Fam Zheng

On Tue, 07/14 13:09, Michael S. Tsirkin wrote:
> On Tue, Jul 14, 2015 at 05:41:04PM +0800, Fam Zheng wrote:
> > This patch fixes network hang after "stop" then "cont", while network
> > packets keep arriving.
> > 
> > Tested both manually (tap, host pinging guest) and with Jason's qtest
> > series (plus his "[PATCH 2.4] socket: pass correct size in
> > net_socket_send()" fix).
> > 
> > As virtio_net_set_status is called when guest driver is setting status
> > byte and when vm state is changing, it is a good opportunity to flush
> > queued packets.
> > 
> > This is necessary because during vm stop the backend (e.g. tap) would
> > stop rx processing after .can_receive returns false, until the queue is
> > explicitly flushed or purged.
> > 
> > The other interesting condition in .can_receive, virtio_queue_ready(),
> > is handled by virtio_net_handle_rx() when guest kicks; the 3rd condition
> > is invalid queue index which doesn't need flushing.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  hw/net/virtio-net.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index d728233..7c178c6 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -162,8 +162,13 @@ static void virtio_net_set_status(struct VirtIODevice 
> > *vdev, uint8_t status)
> >  virtio_net_vhost_status(n, status);
> >  
> >  for (i = 0; i < n->max_queues; i++) {
> > +NetClientState *ncs = qemu_get_subqueue(n->nic, i);
> >  q = &n->vqs[i];
> >  
> > +if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +qemu_flush_queued_packets(ncs);
> > +}
> > +
> >  if ((!n->multiqueue && i != 0) || i >= n->curr_queues) {
> >  queue_status = 0;
> >  } else {
> 
> I think this should be limited to
>   virtio_net_started(n, queue_status) && !n->vhost_started

Yes, that looks better.

Fam

Re: [Qemu-devel] [PATCH v3 10/15] block: add block job transactions

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 11:46:47AM +0800, Fam Zheng wrote:
> From: Stefan Hajnoczi 

Please take authorship, most of the code is yours.

> +static void block_job_completed_txn(BlockJobTxn *txn, BlockJob *job, int ret)
> +{
> +AioContext *ctx;
> +BlockJob *other_job, *next;
> +if (ret < 0 || block_job_is_cancelled(job)) {
> +if (!txn->aborting) {
> +/* We are the first failed job. Cancel other jobs. */
> +txn->aborting = true;
> +QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +if (other_job == job || other_job->completed) {

You didn't leave any clues as to why other_job->complete is safe to
access outside its AioContext.

Perhaps because it's only modified from the main loop and we are in the
main loop too?

Please either access it inside the AioContext or add a comment
explaining why it is safe.

> +continue;
> +}
> +ctx = bdrv_get_aio_context(other_job->bs);
> +aio_context_acquire(ctx);
> +block_job_cancel_sync(other_job);
> +assert(other_job->completed);
> +aio_context_release(ctx);
> +}
> +QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +if (other_job->driver->txn_abort) {
> +other_job->driver->txn_abort(other_job);
> +}
> +other_job->cb(other_job->opaque,
> +  other_job->ret ? : -ECANCELED);

Please don't add ECANCELED here since block_job_completed() doesn't do
that either.  We should be consistent: currently cb() needs to check
block_job_is_cancelled() to determine whether the job was cancelled.

Also, if a job finished but another job aborts, then we need to mark the
first job as cancelled (other_job->cancelled == true).

> +block_job_release(other_job->bs);
> +}

No AioContext acquire/release in this loop?  Remember the job's bs can
still be accessed from another thread while we're running.

> +} else {
> +/*
> + * We are cancelled by another job, who will handle everything.
> + */
> +return;
> +}
> +} else {
> +/*
> + * Successful completion, see if there is other running jobs in this

s/is/are/ (plural)

> + * txn. */
> +QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
> +if (!other_job->completed) {
> +return;
> +}
> +}
> +/* We are the last completed job, commit the transaction. */
> +QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +if (other_job->driver->txn_commit) {
> +other_job->driver->txn_commit(other_job);
> +}
> +assert(other_job->ret == 0);
> +other_job->cb(other_job->opaque, 0);
> +block_job_release(other_job->bs);
> +}

More other_job field accesses outside AioContext that need to be checked
and probably protected using acquire/release.

> +/**
> + * block_job_txn_new:
> + *
> + * Allocate and return a new block job transaction.  Jobs can be added to the
> + * transaction using block_job_txn_add_job().
> + *
> + * All jobs in the transaction either complete successfully or fail/cancel 
> as a
> + * group.  Jobs wait for each other before completing.  Cancelling one job
> + * cancels all jobs in the transaction.
> + */
> +BlockJobTxn *block_job_txn_new(void);

The doc comments says "Allocate and return a new block job transaction"
but does not explain how/when the object is freed.  Please add a
sentence along the lines of: The transaction is automatically freed when
the last job completes or is cancelled.

pgpBwgkk9HkpK.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] hw/arm/boot: Increase fdt alignment

2015-07-14 Thread Peter Maydell

On 13 July 2015 at 17:50, Alexander Graf  wrote:
> The Linux kernel on aarch64 creates a page table entry at early bootup
> that spans the 2MB range on memory spanning the fdt start address:
>
>   [ ALIGN_DOWN(fdt, 2MB) ... ALIGN_DOWN(fdt, 2MB) + 2MB ]
>
> This means that when our current 4k alignment happens to fall at the end
> of the aligned region, Linux tries to access memory that is not mapped.
>
> The easy fix is to instead increase the alignment to 2MB, making Linux's
> logic always succeed.
>
> We leave the existing 4k alignment for 32bit kernels to not cause any
> regressions due to space constraints.
>
> Reported-by: Andreas Schwab 
> Signed-off-by: Alexander Graf 



Applied to target-arm.next, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v3 12/15] block/backup: support block job transactions

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 11:46:49AM +0800, Fam Zheng wrote:
>  static BlockErrorAction backup_error_action(BackupBlockJob *job,
> @@ -444,7 +462,7 @@ static void coroutine_fn backup_run(void *opaque)
>  qemu_co_rwlock_wrlock(&job->flush_rwlock);
>  qemu_co_rwlock_unlock(&job->flush_rwlock);
>  
> -if (job->sync_bitmap) {
> +if (!job->common.txn && job->sync_bitmap) {
>  backup_handle_dirty_bitmap(job, ret);
>  }
>  hbitmap_free(job->bitmap);

It would be nice if the core blockjob code called commit/abort even when
there is no txn object.  That way we can avoid special case code

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7b2efb8..d5e33fd 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -736,6 +736,10 @@
>  #   default 'report' (no limitations, since this applies to
>  #   a different block device than @device).
>  #
> +# @transactional-cancel: #optional whether failure or cancellation of other
> +#block jobs with @transactional-cancel true causes 
> the
> +#whole group to cancel.
> +#
>  # Note that @on-source-error and @on-target-error only affect background I/O.
>  # If an error occurs during a guest write request, the device's rerror/werror
>  # actions will be used.
> @@ -747,7 +751,8 @@
>  'sync': 'MirrorSyncMode', '*mode': 'NewImageMode',
>  '*speed': 'int', '*bitmap': 'str',
>  '*on-source-error': 'BlockdevOnError',
> -'*on-target-error': 'BlockdevOnError' } }
> +'*on-target-error': 'BlockdevOnError',
> +'*transactional-cancel': 'bool' } }
>  
>  ##
>  # @BlockdevBackup
> @@ -771,6 +776,10 @@
>  #   default 'report' (no limitations, since this applies to
>  #   a different block device than @device).
>  #
> +# @transactional-cancel: #optional whether failure or cancellation of other
> +#block jobs with @transactional-cancel true causes 
> the
> +#whole group to cancel.
> +#
>  # Note that @on-source-error and @on-target-error only affect background I/O.
>  # If an error occurs during a guest write request, the device's rerror/werror
>  # actions will be used.
> @@ -782,7 +791,8 @@
>  'sync': 'MirrorSyncMode',
>  '*speed': 'int',
>  '*on-source-error': 'BlockdevOnError',
> -'*on-target-error': 'BlockdevOnError' } }
> +'*on-target-error': 'BlockdevOnError',
> +'*transactional-cancel': 'bool' } }

The doc comments are missing (Since: 2.5).  My fault, sorry!


pgp2YP33oHMoQ.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 09/15] blockjob: Move BlockJobDeferToMainLoopData into BlockJob

2015-07-14 Thread Fam Zheng

On Tue, 07/14 11:03, Stefan Hajnoczi wrote:
> On Fri, Jul 10, 2015 at 11:46:46AM +0800, Fam Zheng wrote:
> > diff --git a/blockjob.c b/blockjob.c
> > index 11b48f5..e057dd5 100644
> > --- a/blockjob.c
> > +++ b/blockjob.c
> > @@ -92,6 +92,10 @@ void block_job_completed(BlockJob *job, int ret)
> >  assert(!job->completed);
> >  job->completed = true;
> >  job->ret = ret;
> > +if (job->defer_to_main_loop_data.bh) {
> > +qemu_bh_delete(job->defer_to_main_loop_data.bh);
> > +job->defer_to_main_loop_data.bh = NULL;
> > +}
> >  job->cb(job->opaque, ret);
> >  block_job_release(bs);
> >  }
> 
> This doesn't make sense to me:
> 
> 1. This function is called from a defer_to_main_loop BH so
>job->defer_to_main_loop_data.bh should already be running here.
> 
>In fact, we've already called qemu_bh_delete() in
>block_job_defer_to_main_loop_bh().  This call is pointless but you
>could change it to an assertion and assign bh = NULL in
>block_job_defer_to_main_loop_bh().
> 
> 2. If there was a BH scheduled, why is it safe to silently delete it?
>Wouldn't the caller rely on the BH code executing to clean up, for
>example?
> 
> If you drop this if statement, is it necessary to move
> BlockJobDeferToMainLoopData into BlockJob at all?  Maybe you can just
> drop this patch.

You're right, I agree this patch is superfluous and can be dropped.

Fam

> 
> > @@ -344,44 +348,36 @@ BlockErrorAction block_job_error_action(BlockJob 
> > *job, BlockDriverState *bs,
> >  return action;
> >  }
> >  
> > -typedef struct {
> > -BlockJob *job;
> > -QEMUBH *bh;
> > -AioContext *aio_context;
> > -BlockJobDeferToMainLoopFn *fn;
> > -void *opaque;
> > -} BlockJobDeferToMainLoopData;
> > -
> >  static void block_job_defer_to_main_loop_bh(void *opaque)
> >  {
> > -BlockJobDeferToMainLoopData *data = opaque;
> > +BlockJob *job = opaque;
> > +/* Copy the struct in case job get released in data.fn() */
> > +BlockJobDeferToMainLoopData data = job->defer_to_main_loop_data;
> 
> A comment is warranted since this access is only safe due to the
> following:
> 
> The job may still be executing in another thread (the AioContext hasn't
> been acquired by us yet), but job->defer_to_main_loop_data isn't
> modified by the other thread after the qemu_bh_schedule() call.
> 
> Additionally, the atomic_xchg memory barriers in qemu_bh_schedule() and
> aio_bh_poll() to ensure that this thread sees the latest
> job->defer_to_main_loop_data data.
> 
> Access to other job fields would probably not be safe here!

Re: [Qemu-devel] [PULL v3 05/22] cpu: Convert cpu_index into a bitmap

2015-07-14 Thread Bharata B Rao

On Thu, Jul 09, 2015 at 03:23:55PM +0200, Andreas Färber wrote:
> From: Bharata B Rao 
> 
> Currently CPUState::cpu_index is monotonically increasing and a newly
> created CPU always gets the next higher index. The next available
> index is calculated by counting the existing number of CPUs. This is
> fine as long as we only add CPUs, but there are architectures which
> are starting to support CPU removal, too. For an architecture like PowerPC
> which derives its CPU identifier (device tree ID) from cpu_index, the
> existing logic of generating cpu_index values causes problems.
> 
> With the currently proposed method of handling vCPU removal by parking
> the vCPU fd in QEMU
> (Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
> generating cpu_index this way will not work for PowerPC.
> 
> This patch changes the way cpu_index is handed out by maintaining
> a bit map of the CPUs that tracks both addition and removal of CPUs.
> 
> The CPU bitmap allocation logic is part of cpu_exec_init(), which is
> called by instance_init routines of various CPU targets. Newly added
> cpu_exec_exit() API handles the deallocation part and this routine is
> called from generic CPU instance_finalize.
> 
> Note: This new CPU enumeration is for !CONFIG_USER_ONLY only.
> CONFIG_USER_ONLY continues to have the old enumeration logic.
> 
> Signed-off-by: Bharata B Rao 
> Reviewed-by: Eduardo Habkost 
> Reviewed-by: Igor Mammedov 
> Reviewed-by: David Gibson 
> Reviewed-by: Peter Crosthwaite 
> Acked-by: Paolo Bonzini 
> Signed-off-by: Peter Crosthwaite 
> [AF: max_cpus -> MAX_CPUMASK_BITS]
> Signed-off-by: Andreas Färber 
> ---
>  exec.c| 58 
> ++-
>  include/qom/cpu.h |  1 +
>  qom/cpu.c |  7 +++
>  3 files changed, 61 insertions(+), 5 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index ce5fadd..d817e5f 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -526,12 +526,57 @@ void tcg_cpu_address_space_init(CPUState *cpu, 
> AddressSpace *as)
>  }
>  #endif
> 
> +#ifndef CONFIG_USER_ONLY
> +static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> +
> +static int cpu_get_free_index(Error **errp)
> +{
> +int cpu = find_first_zero_bit(cpu_index_map, MAX_CPUMASK_BITS);
> +
> +if (cpu >= MAX_CPUMASK_BITS) {
> +error_setg(errp, "Trying to use more CPUs than max of %d",
> +   MAX_CPUMASK_BITS);
> +return -1;
> +}

If this routine and hence cpu_exec_init() (which is called from realize
routine) don't error out when max_cpus is reached, archs supporting CPU
hotplug using device_add will find it difficult to fail the realization of
CPU when hotplugging of more than max_cpus is attempted.

An alternative is to explicitly check for the returned cpu_index
in realize call within each arch and fail if the cpu_index obtained
is greater than max_cpus. So for ppc, I could put such a check in
target-ppc/translate_init:ppc_cpu_realizefn(), but ppc_cpu_realizefn()
is a common routine for all targets under ppc and some targets like
ppc64abi32-linux-user don't have visibility to max_cpus which is
in vl.c.

Any thoughts on the above problem ?

Also, is it possible to revisit the problem that use of max_cpus instead
of MAX_CPUMASK_BITS caused to xlnx-ep108 ?

Regards,
Bharata.

Re: [Qemu-devel] [PATCH for-2.4 08/12] milkymist-minimac2: Flush queued packets when link comes up

2015-07-14 Thread Michael Walle


Am 2015-07-14 09:53, schrieb Fam Zheng:

Drop .can_receive and move the semantics into minimac2_rx, by returning
0.

That is once minimac2_rx returns 0, incoming packets will be queued
until the queue is explicitly flushed. We do this when 
s->regs[R_STATE0]

or s->regs[R_STATE1] is changed in minimac2_write.

Signed-off-by: Fam Zheng 
---
 hw/net/milkymist-minimac2.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/net/milkymist-minimac2.c b/hw/net/milkymist-minimac2.c
index f06afaa..cd38a06 100644
--- a/hw/net/milkymist-minimac2.c
+++ b/hw/net/milkymist-minimac2.c
@@ -303,8 +303,7 @@ static ssize_t minimac2_rx(NetClientState *nc,
const uint8_t *buf, size_t size)
 r_state = R_STATE1;
 rx_buf = s->rx1_buf;
 } else {
-trace_milkymist_minimac2_drop_rx_frame(buf);
-return size;
+return 0;


trace removed?


 }

 /* assemble frame */
@@ -354,6 +353,7 @@ minimac2_read(void *opaque, hwaddr addr, unsigned 
size)

 return r;
 }

+static int minimac2_can_rx(MilkymistMinimac2State *s);
 static void
 minimac2_write(void *opaque, hwaddr addr, uint64_t value,
unsigned size)
@@ -387,6 +387,9 @@ minimac2_write(void *opaque, hwaddr addr, uint64_t 
value,

 case R_STATE1:
 s->regs[addr] = value;
 update_rx_interrupt(s);
+if (minimac2_can_rx(s)) {
+qemu_flush_queued_packets(qemu_get_queue(s->nic));
+}
 break;
 case R_SETUP:
 case R_COUNT0:
@@ -411,10 +414,8 @@ static const MemoryRegionOps minimac2_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };

-static int minimac2_can_rx(NetClientState *nc)
+static int minimac2_can_rx(MilkymistMinimac2State *s)
 {
-MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
-
 if (s->regs[R_STATE0] == STATE_LOADED) {
 return 1;
 }


please move this above the minimac2_write call, so the forward 
declaration isn't necessary anymore.




-michael

Re: [Qemu-devel] [RFC 0/6] vSMMU initialization

2015-07-14 Thread Will Deacon

On Tue, Jul 14, 2015 at 03:21:03AM +0100, Varun Sethi wrote:
> Hi Will,

Hi Varun,

> > On Fri, Jun 12, 2015 at 03:20:04PM +0100, Baptiste Reynal wrote:
> > > The ARM SMMU has support for 2-stages address translations, allowing a
> > > virtual address to be translated at two levels:
> > > - Stage 1 translates a virtual address (VA) into an intermediate
> > > physical address (IPA)
> > > - Stage 2 translates an IPA into a physical address (PA)
> > >
> > > Will Deacon introduced a virtual SMMU interface for KVM, which gives a
> > > virtual machine the possibility to use an IOMMU with native drivers.
> > > While the VM will program the first stage of translation (stage 1),
> > > the interface will program the second (stage 2) on the physical SMMU.
> > 
> > Please note that I have no plans to merge the kernel-side of this at the
> > moment. It was merely an exploratory tool to see what a non-PV vSMMU
> > implementation might look like and certainly not intended to be used in
> > anger.
> How do you see the context fault reporting work for the PV interface?

We could have an interrupt, for the PV IOMMU and have the hypervisor
inject that, no?

> Currently the vSMMU interface does seem quiet restrictive and it may
> simplify things by having PV iommu interface. But, do you see this even
> true in case of SMMUv3?

I think SMMUv3 is *far* more amenable to the vSMMU approach, largely
because it moves many of the data structures into memory, but also because
it has support for things like ATS and PRI, so sharing page tables with
the CPU becomes a real possibility (and is something that doesn't work
well with a PV model).

> Just wondering if we can give more control with respect memory transaction
> attributes to the guest. Also, would it make sense to give guest control
> of the fault handling attributes i.e. fault/terminate model.

I'd like to see the basics prototyped before we start trying to design
for these more advanced use-cases. I'm confident there are plenty of things
we haven't even considered at the moment.

Will

Re: [Qemu-devel] [PATCH for-2.4 08/12] milkymist-minimac2: Flush queued packets when link comes up

2015-07-14 Thread Fam Zheng

On Tue, 07/14 13:02, Michael Walle wrote:
> Am 2015-07-14 09:53, schrieb Fam Zheng:
> >Drop .can_receive and move the semantics into minimac2_rx, by returning
> >0.
> >
> >That is once minimac2_rx returns 0, incoming packets will be queued
> >until the queue is explicitly flushed. We do this when s->regs[R_STATE0]
> >or s->regs[R_STATE1] is changed in minimac2_write.
> >
> >Signed-off-by: Fam Zheng 
> >---
> > hw/net/milkymist-minimac2.c | 12 ++--
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> >diff --git a/hw/net/milkymist-minimac2.c b/hw/net/milkymist-minimac2.c
> >index f06afaa..cd38a06 100644
> >--- a/hw/net/milkymist-minimac2.c
> >+++ b/hw/net/milkymist-minimac2.c
> >@@ -303,8 +303,7 @@ static ssize_t minimac2_rx(NetClientState *nc,
> >const uint8_t *buf, size_t size)
> > r_state = R_STATE1;
> > rx_buf = s->rx1_buf;
> > } else {
> >-trace_milkymist_minimac2_drop_rx_frame(buf);
> >-return size;
> >+return 0;
> 
> trace removed?

Because the frame is no longer dropped as we return 0 - it is queued and will
be sent again when we do qemu_flush_queued_packets later. Should I rename the
trace point?

Fam

> 
> > }
> >
> > /* assemble frame */
> >@@ -354,6 +353,7 @@ minimac2_read(void *opaque, hwaddr addr, unsigned
> >size)
> > return r;
> > }
> >
> >+static int minimac2_can_rx(MilkymistMinimac2State *s);
> > static void
> > minimac2_write(void *opaque, hwaddr addr, uint64_t value,
> >unsigned size)
> >@@ -387,6 +387,9 @@ minimac2_write(void *opaque, hwaddr addr, uint64_t
> >value,
> > case R_STATE1:
> > s->regs[addr] = value;
> > update_rx_interrupt(s);
> >+if (minimac2_can_rx(s)) {
> >+qemu_flush_queued_packets(qemu_get_queue(s->nic));
> >+}
> > break;
> > case R_SETUP:
> > case R_COUNT0:
> >@@ -411,10 +414,8 @@ static const MemoryRegionOps minimac2_ops = {
> > .endianness = DEVICE_NATIVE_ENDIAN,
> > };
> >
> >-static int minimac2_can_rx(NetClientState *nc)
> >+static int minimac2_can_rx(MilkymistMinimac2State *s)
> > {
> >-MilkymistMinimac2State *s = qemu_get_nic_opaque(nc);
> >-
> > if (s->regs[R_STATE0] == STATE_LOADED) {
> > return 1;
> > }
> 
> please move this above the minimac2_write call, so the forward declaration
> isn't necessary anymore.
> 
> 
> 
> -michael
>

[Qemu-devel] [PATCH] virtio-input: fix segfault in virtio_input_hid_properties

2015-07-14 Thread Lin Ma

commit 5cce173 introduced virtio-input segfault, This patch fixes it.

Signed-off-by: Lin Ma 
---
 hw/input/virtio-input-hid.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/input/virtio-input-hid.c b/hw/input/virtio-input-hid.c
index 616a815..4d85dad 100644
--- a/hw/input/virtio-input-hid.c
+++ b/hw/input/virtio-input-hid.c
@@ -308,6 +308,7 @@ static void virtio_input_hid_handle_status(VirtIOInput 
*vinput,
 static Property virtio_input_hid_properties[] = {
 DEFINE_PROP_STRING("display", VirtIOInputHID, display),
 DEFINE_PROP_UINT32("head", VirtIOInputHID, head, 0),
+DEFINE_PROP_END_OF_LIST(),
 };
 
 static void virtio_input_hid_class_init(ObjectClass *klass, void *data)
-- 
2.1.4

Re: [Qemu-devel] RFC: guest-side retrieval of fw_cfg file

2015-07-14 Thread Laszlo Ersek

On 07/13/15 22:09, Gabriel L. Somlo wrote:
> Hi,
> 
> A while ago I was pondering on the options available for retrieving
> a fw_cfg blob from the guest-side (now that we can insert fw_cfg
> files on the host-side command line, see commit 81b2b8106).
> 
> So over the last couple of weekends I cooked up the sysfs kernel module
> below, which lists all fw_cfg files under /sys/firmware/fw_cfg/.
> 
> I'm building it against the current Fedora (22) running kernel (after
> installing the kernel-devel rpm) using the following Makefile:
> 
> obj-m := fw_cfg.o
> KDIR := /lib/modules/$(shell uname -r)/build
> PWD := $(shell pwd)
> default:
>   $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
> 
> I'm looking for early feedback before trying to push this into the
> kernel:
> 
> 1. Right now, only fw_cfg *files* are listed (not sure it's worth
>trying for the pre-FW_CFG_FILE_FIRST blobs, since AFAICT there's
>no good way of figuring out their size, not to mention even
>knowing which ones are present in the first place.
> 
> 2. File names are listed in /sys/fs/fw_cfg/... with slashes replaced
>exclamation marks, e.g.:
> 
> # ls /sys/firmware/fw_cfg/
> bootorder   etc!e820  etc!table-loader
> etc!acpi!rsdp   etc!smbios!smbios-anchor  genroms!kvmvapic.bin
> etc!acpi!tables etc!smbios!smbios-tables
> etc!boot-fail-wait  etc!system-states
> 
>That's done automatically by kobject_init_and_add(), and I'm hoping
>it's acceptable, since the alternative involves parsing file names
>and building some sort of hierarchy of ksets representing subfolders
>like "etc", "etc/smbios/", "etc/acpi/", "opt/whatever/", etc.

In general I hope Matt will respond to you; I have just one note
(although I have no horse in this race :)): I think it would be really
useful if you could somehow translate the fw_cfg "pathname components"
to a real directory structure. Utilities like "find", "dirname",
"basename" etc are generally great, and it would be nice to preserve
that flexibility.

Anyway, this is just an idea (because I like "find" so much :)); feel
free to ignore it.

Thanks!
Laszlo

> 3. I'm currently only handling x86 and I/O ports. I could drop the
>fw_cfg_dmi_whitelist and just check the signature, using mmio where
>appropriate, but I don't have a handy-dandy set of VMs for those
>architectures on which I could test. Wondering if that's something
>we should have before I officially try to submit this to the kernel,
>or whether it could wait for a second iteration.
> 
>Speaking of the kernel: My default plan is to subscribe to
>kernelnewb...@kernelnewbies.org and submit it there (this is after
>all baby's first kernel module :) but if there's a better route for
>pushing it upstream, please feel free to suggest it to me.
> 
> Thanks much,
> --Gabriel
> 
> PS. This still leaves me with the open question of how to do something
> similar on Windows, but I'm less bothered by the concept of compiling
> an in-house, ad-hoc, binary-from-hell program to simply read/write from
> the fw_cfg I/O ports on Windows :) Although in principle it'd be
> nice to have a standard, cross-platform way of doing things, likely
> involving the guest agent...
> 
> 
> /* fw_cfg.c
>  *
>  * Expose entries from QEMU's firmware configuration (fw_cfg) device in
>  * sysfs (read-only, under "/sys/firmware/fw_cfg/").
>  *
>  * NOTE: '/' chars in fw_cfg file names automatically converted to '!' by
>  *   the kobject_init_and_add() call.
>  */
> 
> #include 
> #include 
> #include 
> #include 
> 
> 
> /* fw_cfg i/o ports */
> #define FW_CFG_PORT_CTL  0x510
> #define FW_CFG_PORT_DATA 0x511
> 
> /* selector values for "well-known" fw_cfg entries */
> #define FW_CFG_SIGNATURE  0x00
> #define FW_CFG_FILE_DIR   0x19
> 
> /* size in bytes of fw_cfg signature */
> #define FW_CFG_SIG_SIZE 4
> 
> /* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> #define FW_CFG_MAX_FILE_PATH 56
> 
> /* fw_cfg file directory entry type */
> struct fw_cfg_file {
>   uint32_t size;
>   uint16_t select;
>   uint16_t reserved;
>   char name[FW_CFG_MAX_FILE_PATH];
> };
> 
> 
> /* provide atomic read access to hardware fw_cfg device
>  * (critical section involves potentially lengthy i/o, using mutex) */
> static DEFINE_MUTEX(fw_cfg_dev_lock);
> 
> /* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> static inline void fw_cfg_read_blob(uint16_t select,
>void *buf, loff_t pos, size_t count)
> {
>   mutex_lock(&fw_cfg_dev_lock);
>   outw(select, FW_CFG_PORT_CTL);
>   while (pos-- > 0)
>   inb(FW_CFG_PORT_DATA);
>   insb(FW_CFG_PORT_DATA, buf, count);
>   mutex_unlock(&fw_cfg_dev_lock);
> }
> 
> 
> /* fw_cfg_sysfs_entry type */
> struct fw_cfg_sysfs_entry {
>   struct kobject kobj;
>   struct fw_cfg_file f;
>   struct list_head list;
> };
> 
> /*

Re: [Qemu-devel] [Xen-devel] [PATCH RFC 1 6/8] xen/pt: Make xen_pt_unregister_device idempotent

2015-07-14 Thread Stefano Stabellini

On Thu, 2 Jul 2015, Konrad Rzeszutek Wilk wrote:
> > > @@ -858,15 +863,20 @@ static void xen_pt_unregister_device(PCIDevice *d)
> > > machine_irq, errno);
> > >  }
> > >  }
> > > +s->machine_irq = 0;
> > >  }
> > >  
> > >  /* delete all emulated config registers */
> > >  xen_pt_config_delete(s);
> > >  
> > > -memory_listener_unregister(&s->memory_listener);
> > > -memory_listener_unregister(&s->io_listener);
> > > -
> > > -xen_host_pci_device_put(&s->real_device);
> > > +if (s->listener_set) {
> > > +memory_listener_unregister(&s->memory_listener);
> > > +memory_listener_unregister(&s->io_listener);
> > > +s->listener_set = false;
> > 
> > If you call QTAILQ_INIT on memory_listener and io_listener, then you
> > simply check on QTAILQ_EMPTY and remove listener_set.
> 
> No dice. For that to work they need to be QTAIL_HEAD while
> they are of QTAILQ_ENTRY.
> 
> That is, the include/exec/memory.h:
> struct MemoryListener {
>  ...
>   AddressSpace *address_space_filter;
>   QTAILQ_ENTRY(MemoryListener) link;
> }
> 
> And the QTAILQ_EMPTY checks for '>tqh_first' while the
> QTAILQ_ENTRY only defines two pointers: tqe_next and tqe_prev.

Ah, that is true.  In that case maybe it is OK to introduce
listener_set, rather than messing up with QTAILQ definitions or having
checks like memory_listener->link->tqe_next ==
memory_listener->link->tqe_prev.

Re: [Qemu-devel] [PULL v3 05/22] cpu: Convert cpu_index into a bitmap

2015-07-14 Thread Igor Mammedov

On Tue, 14 Jul 2015 16:08:54 +0530
Bharata B Rao  wrote:

> On Thu, Jul 09, 2015 at 03:23:55PM +0200, Andreas Färber wrote:
> > From: Bharata B Rao 
> > 
> > Currently CPUState::cpu_index is monotonically increasing and a newly
> > created CPU always gets the next higher index. The next available
> > index is calculated by counting the existing number of CPUs. This is
> > fine as long as we only add CPUs, but there are architectures which
> > are starting to support CPU removal, too. For an architecture like PowerPC
> > which derives its CPU identifier (device tree ID) from cpu_index, the
> > existing logic of generating cpu_index values causes problems.
> > 
> > With the currently proposed method of handling vCPU removal by parking
> > the vCPU fd in QEMU
> > (Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
> > generating cpu_index this way will not work for PowerPC.
> > 
> > This patch changes the way cpu_index is handed out by maintaining
> > a bit map of the CPUs that tracks both addition and removal of CPUs.
> > 
> > The CPU bitmap allocation logic is part of cpu_exec_init(), which is
> > called by instance_init routines of various CPU targets. Newly added
> > cpu_exec_exit() API handles the deallocation part and this routine is
> > called from generic CPU instance_finalize.
> > 
> > Note: This new CPU enumeration is for !CONFIG_USER_ONLY only.
> > CONFIG_USER_ONLY continues to have the old enumeration logic.
> > 
> > Signed-off-by: Bharata B Rao 
> > Reviewed-by: Eduardo Habkost 
> > Reviewed-by: Igor Mammedov 
> > Reviewed-by: David Gibson 
> > Reviewed-by: Peter Crosthwaite 
> > Acked-by: Paolo Bonzini 
> > Signed-off-by: Peter Crosthwaite 
> > [AF: max_cpus -> MAX_CPUMASK_BITS]
> > Signed-off-by: Andreas Färber 
> > ---
> >  exec.c| 58 
> > ++-
> >  include/qom/cpu.h |  1 +
> >  qom/cpu.c |  7 +++
> >  3 files changed, 61 insertions(+), 5 deletions(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index ce5fadd..d817e5f 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -526,12 +526,57 @@ void tcg_cpu_address_space_init(CPUState *cpu, 
> > AddressSpace *as)
> >  }
> >  #endif
> > 
> > +#ifndef CONFIG_USER_ONLY
> > +static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> > +
> > +static int cpu_get_free_index(Error **errp)
> > +{
> > +int cpu = find_first_zero_bit(cpu_index_map, MAX_CPUMASK_BITS);
> > +
> > +if (cpu >= MAX_CPUMASK_BITS) {
> > +error_setg(errp, "Trying to use more CPUs than max of %d",
> > +   MAX_CPUMASK_BITS);
> > +return -1;
> > +}
> 
> If this routine and hence cpu_exec_init() (which is called from realize
> routine) don't error out when max_cpus is reached, archs supporting CPU
> hotplug using device_add will find it difficult to fail the realization of
> CPU when hotplugging of more than max_cpus is attempted.
> 
> An alternative is to explicitly check for the returned cpu_index
> in realize call within each arch and fail if the cpu_index obtained
> is greater than max_cpus. So for ppc, I could put such a check in
> target-ppc/translate_init:ppc_cpu_realizefn(), but ppc_cpu_realizefn()
> is a common routine for all targets under ppc and some targets like
> ppc64abi32-linux-user don't have visibility to max_cpus which is
> in vl.c.
> 
> Any thoughts on the above problem ?
we already have MachineClass.max_cpus which is max
supported limit of machine type.
Perhaps make max_cpus a property of MashineState

> 
> Also, is it possible to revisit the problem that use of max_cpus instead
> of MAX_CPUMASK_BITS caused to xlnx-ep108 ?
> 
> Regards,
> Bharata.
> 
>

[Qemu-devel] unable to auth to vnc with qemu 2.4.0-rc0

2015-07-14 Thread Vasiliy Tolstov

vncviewer can't auto to qemu vnc with plain auth:
Client sock=23 ws=0 auth=2 subauth=0
New client on socket 23
vnc_set_share_mode/23: undefined -> connecting
Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 12. Wait SSF 0
Wrote wire 0x7fc8eba5dbe0 12 -> 12
Read plain (nil) size 0 offset 0
Read wire 0x7fc8eaec3890 4096 -> 12
Client request protocol version 3.8
Telling client we support auth 2
Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 2. Wait SSF 0
Wrote wire 0x7fc8eba5dbe0 2 -> 2
Read plain 0x7fc8eaec3890 size 5120 offset 0
Read wire 0x7fc8eaec3890 4096 -> 1
Client requested auth 2
Start VNC auth
Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 16. Wait SSF 0
Wrote wire 0x7fc8eba5dbe0 16 -> 16
Read plain 0x7fc8eaec3890 size 5120 offset 0
Read wire 0x7fc8eaec3890 4096 -> 16
Client challenge response did not match
Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 30. Wait SSF 0
Wrote wire 0x7fc8eba5dbe0 30 -> 30
Closing down client sock: protocol error
vnc_set_share_mode/23: connecting -> disconnected


-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

[Qemu-devel] [PATCH] memory: fix refcount leak in memory_region_present

2015-07-14 Thread Paolo Bonzini

memory_region_present() leaks a reference to a MemoryRegion in the
case "mr == container".  While fixing it, avoid reference counting
altogether for memory_region_present(), by using RCU only.

Reported-by: Peter Maydell 
Signed-off-by: Paolo Bonzini 
---
 memory.c | 44 
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/memory.c b/memory.c
index 5a0cc66..0acebb1 100644
--- a/memory.c
+++ b/memory.c
@@ -1887,23 +1887,16 @@ static FlatRange *flatview_lookup(FlatView *view, 
AddrRange addr)
sizeof(FlatRange), cmp_flatrange_addr);
 }
 
-bool memory_region_present(MemoryRegion *container, hwaddr addr)
-{
-MemoryRegion *mr = memory_region_find(container, addr, 1).mr;
-if (!mr || (mr == container)) {
-return false;
-}
-memory_region_unref(mr);
-return true;
-}
-
 bool memory_region_is_mapped(MemoryRegion *mr)
 {
 return mr->container ? true : false;
 }
 
-MemoryRegionSection memory_region_find(MemoryRegion *mr,
-   hwaddr addr, uint64_t size)
+/* Same as memory_region_find, but it does not add a reference to the
+ * returned region.  It must be called from an RCU critical section.
+ */
+static MemoryRegionSection memory_region_find_rcu(MemoryRegion *mr,
+  hwaddr addr, uint64_t size)
 {
 MemoryRegionSection ret = { .mr = NULL };
 MemoryRegion *root;
@@ -1924,11 +1917,10 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 }
 range = addrrange_make(int128_make64(addr), int128_make64(size));
 
-rcu_read_lock();
 view = atomic_rcu_read(&as->current_map);
 fr = flatview_lookup(view, range);
 if (!fr) {
-goto out;
+return ret;
 }
 
 while (fr > view->ranges && addrrange_intersects(fr[-1].addr, range)) {
@@ -1944,12 +1936,32 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
 ret.size = range.size;
 ret.offset_within_address_space = int128_get64(range.start);
 ret.readonly = fr->readonly;
-memory_region_ref(ret.mr);
-out:
+return ret;
+}
+
+MemoryRegionSection memory_region_find(MemoryRegion *mr,
+   hwaddr addr, uint64_t size)
+{
+MemoryRegionSection ret;
+rcu_read_lock();
+ret = memory_region_find_rcu(mr, addr, size);
+if (ret.mr) {
+memory_region_ref(ret.mr);
+}
 rcu_read_unlock();
 return ret;
 }
 
+bool memory_region_present(MemoryRegion *container, hwaddr addr)
+{
+MemoryRegion *mr;
+
+rcu_read_lock();
+mr = memory_region_find_rcu(container, addr, 1).mr;
+rcu_read_unlock();
+return mr && mr != container;
+}
+
 void address_space_sync_dirty_bitmap(AddressSpace *as)
 {
 FlatView *view;
-- 
2.4.3

Re: [Qemu-devel] unable to auth to vnc with qemu 2.4.0-rc0

2015-07-14 Thread Paolo Bonzini

On 14/07/2015 13:58, Vasiliy Tolstov wrote:
> vncviewer can't auto to qemu vnc with plain auth:
> Client sock=23 ws=0 auth=2 subauth=0
> New client on socket 23
> vnc_set_share_mode/23: undefined -> connecting
> Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 12. Wait SSF 0
> Wrote wire 0x7fc8eba5dbe0 12 -> 12
> Read plain (nil) size 0 offset 0
> Read wire 0x7fc8eaec3890 4096 -> 12
> Client request protocol version 3.8
> Telling client we support auth 2
> Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 2. Wait SSF 0
> Wrote wire 0x7fc8eba5dbe0 2 -> 2
> Read plain 0x7fc8eaec3890 size 5120 offset 0
> Read wire 0x7fc8eaec3890 4096 -> 1
> Client requested auth 2
> Start VNC auth
> Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 16. Wait SSF 0
> Wrote wire 0x7fc8eba5dbe0 16 -> 16
> Read plain 0x7fc8eaec3890 size 5120 offset 0
> Read wire 0x7fc8eaec3890 4096 -> 16
> Client challenge response did not match
> Write Plain: Pending output 0x7fc8eba5dbe0 size 1036 offset 30. Wait SSF 0
> Wrote wire 0x7fc8eba5dbe0 30 -> 30
> Closing down client sock: protocol error
> vnc_set_share_mode/23: connecting -> disconnected

Can you send the config.log and config-host.mak file from the build
directory?

Paolo

Re: [Qemu-devel] unable to auth to vnc with qemu 2.4.0-rc0

2015-07-14 Thread Vasiliy Tolstov

2015-07-14 15:01 GMT+03:00 Paolo Bonzini :
> Can you send the config.log and config-host.mak file from the build
> directory?


i can send only jenkins build log =(, may be it helps:
http://cdn.selfip.ru/public/qemu.log

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

Re: [Qemu-devel] [PATCH] more check for replaced node

2015-07-14 Thread Stefan Hajnoczi

On Mon, Jul 13, 2015 at 10:41:53AM +0800, Wen Congyang wrote:
> On 07/02/2015 10:48 PM, Stefan Hajnoczi wrote:
> > On Wed, Jul 01, 2015 at 04:49:40PM +0800, Wen Congyang wrote:
> >> On 07/01/2015 04:39 PM, Stefan Hajnoczi wrote:
> >>> On Thu, Jun 25, 2015 at 02:55:10PM +0800, Wen Congyang wrote:
>  Signed-off-by: Wen Congyang 
>  ---
>   block.c   | 5 +++--
>   block/mirror.c| 3 ++-
>   blockdev.c| 2 +-
>   include/block/block.h | 3 ++-
>   4 files changed, 8 insertions(+), 5 deletions(-)
> >>>
> >>> This patch is missing a commit description.  What is the justification
> >>> for this change?
> >>
> >> Sorry, I forgot to add commit messages..
> >>
> >> Without this patch, the replace node can be any node, and it can be
> >> top BDS with BB, or another quorum's child. With this patch, the replace 
> >> node
> >> must be this quorum's child.
> > 
> > I think the point of the replace operation was to swap a quorum child
> > with a new drive.  It sounds like this patch will break that use case?
> > 
> > The idea was that a failed child needs to be replaced.  The user adds a
> > new -drive and then uses the mirror+replace to include it into the
> > quorum.  I think the new child is not be part of the quorum BDS graph
> > until replacement occurs.
> 
> bs/s->common.bs is quorum, and to_replace is the broken child.
> The new child is target_bs. With this patch, we just check if
> the broken child is part of the quorum BDS.
> 
> Do I misunderstand something?

Thanks for explaining.  That makes sense.

Please resend with a commit description.

Stefan


pgpcnquBoFmHz.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH qemu v2 1/5] vfio: Switch from TARGET_PAGE_MASK to qemu_real_host_page_mask

2015-07-14 Thread Alexey Kardashevskiy


On 07/14/2015 04:58 PM, Alexey Kardashevskiy wrote:

On 07/14/2015 05:13 AM, Alex Williamson wrote:

On Tue, 2015-07-14 at 00:56 +1000, Alexey Kardashevskiy wrote:

These started switching from TARGET_PAGE_MASK (hardcoded as 4K) to
a real host page size:
4e51361d7 "cpu-all: complete "real" host page size API" and
f7ceed190 "vfio: cpu: Use "real" page size API"

This finished the transition by:
- %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/
- %s/TARGET_PAGE_ALIGN/REAL_HOST_PAGE_ALIGN/
- removing bitfield length for offsets in VFIOQuirk::data as
qemu_real_host_page_mask is not a macro


Can we assume that none of the changes to quirks have actually been
tested?


No, why? :) I tried it on one of NVIDIAs I got here -
VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K2200] (rev a2)
The driver was from NVIDIA (not nouveau) and the test was "acos" (some
basic CUDA test).


Ufff. My bad. CUDA only works if I remove quirks at all.

Using host page size helps to boot the guest with this Quadro (otherwise 
abort in kvm_set_phys_mem()) but simple CUDA test produces EEH:


aik@aiktest-u14:~$ ./acos
[   47.488858] EEH: Frozen PHB#0-PE#1 detected
[   47.488861] EEH: PE location: vfio_vfio-pci::00:01.0, PHB location: 
vfio_vfio-pci::00:01.0


[   51.559681] Unable to handle kernel paging request for data at address 
0x

[   51.559873] Faulting instruction address: 0xd550d168
[   51.560038] Oops: Kernel access of bad area, sig: 11 [#1]


Does this mean "nack" to %s/TARGET_PAGE_MASK/qemu_real_host_page_mask/ in 
quirks?



--
Alexey

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 05:03:22PM +0800, Fam Zheng wrote:
> On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> > On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and 
> > > > > friends,
> > > > > net queues need to be explicitly flushed after qemu_can_send_packet()
> > > > > returns false, because the netdev side will disable the polling of fd.
> > > > > 
> > > > > This fixes the case of "cont" after "stop" (or migration).
> > > > > 
> > > > > Signed-off-by: Fam Zheng 
> > > > 
> > > > Note virtio has its own handler which must be used to
> > > > flush packets - this one might run too early or too late.
> > > 
> > > Which handler do you mean? I don't think virtio-net handles resume now. 
> > > (If it
> > > does, we probably should drop it together with this change, since it's 
> > > needed
> > > by as all NICs.)
> > > 
> > > Fam
> > 
> > virtio_vmstate_change
> > 
> > It's all far from trivial. I suspect these whack-a-mole approach
> > spreading purge here and there will only create more bugs.
> > 
> > Why would we ever need to process network packets when
> > VM is not running? I don't see any point to it.
> > How about we simply stop the job processing network on
> > vm stop and restart on vm start?
> 
> I suppose it is too much for 2.4. I think this approach, adding
> qemu_flush_queued_packets(), is consistent with its existing usage (when a
> device is becoming active from inactive), like in e1000_write_config.
> 
> How about applying this and let's work on "stopping tap when VM not running"
> for 2.5?

Jason has gone happy on this and the virtio-net .can_receive() patch.

Michael: Any further comments?  Are you okay with this patch too?


pgpXsKakj_a0p.pgp
Description: PGP signature

[Qemu-devel] [PATCH RFC for-2.4] hw: set DIRTY_MEMORY_VGA on RAM that can be used for the framebuffer

2015-07-14 Thread Paolo Bonzini

The pxa2xx_lcd, omap_lcdc, pl110 and milkymist-vgafb devices use
framebuffer.c to render an image from a shared memory framebuffer.
With KVM, DIRTY_MEMORY_VGA always had to be enabled explicitly
on RAM memory regions that can be used for the framebuffer, and
the 2.4 changes to dirty bitmap handling made that mandatory for
TCG as well.

If the board does not set DIRTY_MEMORY_VGA, framebuffer.c detects this
(commit d55d420, framebuffer: check memory_region_is_logging,
2015-03-23) and always invalidates the screen; this is inefficient,
to the point that blocking X11 calls on slow network connections
can prevent the guest from making progress.  Fix this by setting
the flag on the boards that use the aforementioned devices.

Reported-by: Peter Maydell 
Signed-off-by: Paolo Bonzini 
---
This is a layering violation, but it's the simplest patch
that works.

I also have worked on Peter's suggestion that framebuffer.c could
manage DIRTY_MEMORY_VGA.  The (prototype) patch however is a
bit large, so perhaps it's better to have this in 2.4 instead.
Posting this one as RFC for now.

 hw/arm/integratorcp.c| 1 +
 hw/arm/omap1.c   | 2 ++
 hw/arm/pxa2xx.c  | 4 
 hw/arm/versatilepb.c | 1 +
 hw/display/framebuffer.c | 5 -
 hw/lm32/milkymist.c  | 1 +
 6 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index 0fbbf99..30abb8c 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -569,6 +569,7 @@ static void integratorcp_init(MachineState *machine)
 
 memory_region_allocate_system_memory(ram, NULL, "integrator.ram",
  ram_size);
+memory_region_set_log(ram, true, DIRTY_MEMORY_VGA);
 /* ??? On a real system the first 1Mb is mapped as SSRAM or boot flash.  */
 /* ??? RAM should repeat to fill physical memory space.  */
 /* SDRAM at address zero*/
diff --git a/hw/arm/omap1.c b/hw/arm/omap1.c
index de2b289..164b9bf 100644
--- a/hw/arm/omap1.c
+++ b/hw/arm/omap1.c
@@ -3880,9 +3880,11 @@ struct omap_mpu_state_s *omap310_mpu_init(MemoryRegion 
*system_memory,
 /* Memory-mapped stuff */
 memory_region_allocate_system_memory(&s->emiff_ram, NULL, "omap1.dram",
  s->sdram_size);
+memory_region_set_log(&s->emiff_ram, true, DIRTY_MEMORY_VGA);
 memory_region_add_subregion(system_memory, OMAP_EMIFF_BASE, &s->emiff_ram);
 memory_region_init_ram(&s->imif_ram, NULL, "omap1.sram", s->sram_size,
&error_abort);
+memory_region_set_log(&s->imif_ram, true, DIRTY_MEMORY_VGA);
 vmstate_register_ram_global(&s->imif_ram);
 memory_region_add_subregion(system_memory, OMAP_IMIF_BASE, &s->imif_ram);
 
diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index ec353f7..91401d7 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -2080,10 +2080,12 @@ PXA2xxState *pxa270_init(MemoryRegion *address_space,
 /* SDRAM & Internal Memory Storage */
 memory_region_init_ram(&s->sdram, NULL, "pxa270.sdram", sdram_size,
&error_abort);
+memory_region_set_log(&s->sdram, true, DIRTY_MEMORY_VGA);
 vmstate_register_ram_global(&s->sdram);
 memory_region_add_subregion(address_space, PXA2XX_SDRAM_BASE, &s->sdram);
 memory_region_init_ram(&s->internal, NULL, "pxa270.internal", 0x4,
&error_abort);
+memory_region_set_log(&s->sdram, true, DIRTY_MEMORY_VGA);
 vmstate_register_ram_global(&s->internal);
 memory_region_add_subregion(address_space, PXA2XX_INTERNAL_BASE,
 &s->internal);
@@ -2214,10 +2216,12 @@ PXA2xxState *pxa255_init(MemoryRegion *address_space, 
unsigned int sdram_size)
 /* SDRAM & Internal Memory Storage */
 memory_region_init_ram(&s->sdram, NULL, "pxa255.sdram", sdram_size,
&error_abort);
+memory_region_set_log(&s->sdram, true, DIRTY_MEMORY_VGA);
 vmstate_register_ram_global(&s->sdram);
 memory_region_add_subregion(address_space, PXA2XX_SDRAM_BASE, &s->sdram);
 memory_region_init_ram(&s->internal, NULL, "pxa255.internal",
PXA2XX_INTERNAL_SIZE, &error_abort);
+memory_region_set_log(&s->sdram, true, DIRTY_MEMORY_VGA);
 vmstate_register_ram_global(&s->internal);
 memory_region_add_subregion(address_space, PXA2XX_INTERNAL_BASE,
 &s->internal);
diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index 6c69f4e..c7fbe4d 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -228,6 +228,7 @@ static void versatile_init(MachineState *machine, int 
board_id)
 
 memory_region_allocate_system_memory(ram, NULL, "versatile.ram",
  machine->ram_size);
+memory_region_set_log(ram, true, DIRTY_MEMORY_VGA);
 /* ??? RAM should repeat to fill physical memory space.  */
 /*

[Qemu-devel] [RFC PATCH qemu v3 4/4] vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

2015-07-14 Thread Alexey Kardashevskiy

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a guest RAM memory listener which notifies a VFIO container
about memory which needs to be pinned/unpinned. VFIO MMIO regions
(i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This does not change the guest visible interface.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* new RAM listener skips BARs (i.e. "skip dump" regions)

v2:
* added another listener for RAM
---
 hw/vfio/common.c  | 99 +++
 include/hw/vfio/vfio-common.h |  1 +
 trace-events  |  2 +
 3 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6982b8f..d78a83f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -406,6 +406,19 @@ static void vfio_listener_region_add(VFIOContainer 
*container,
 goto error_exit;
 }
 break;
+
+case VFIO_SPAPR_TCE_v2_IOMMU: {
+struct vfio_iommu_spapr_register_memory reg = {
+.argsz = sizeof(reg),
+.flags = 0,
+.vaddr = (uint64_t) vaddr,
+.size = end - iova
+};
+
+ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_REGISTER_MEMORY, ®);
+trace_vfio_ram_register(reg.vaddr, reg.size, ret ? -errno : 0);
+break;
+}
 }
 
 return;
@@ -486,6 +499,26 @@ static void vfio_listener_region_del(VFIOContainer 
*container,
  "0x%"HWADDR_PRIx") = %d (%m)",
  container, iova, end - iova, ret);
 }
+
+switch (container->iommu_data.type) {
+case VFIO_SPAPR_TCE_v2_IOMMU:
+if (!memory_region_is_iommu(section->mr)) {
+void *vaddr = memory_region_get_ram_ptr(section->mr) +
+section->offset_within_region +
+(iova - section->offset_within_address_space);
+struct vfio_iommu_spapr_register_memory reg = {
+.argsz = sizeof(reg),
+.flags = 0,
+.vaddr = (uint64_t) vaddr,
+.size = end - iova
+};
+
+ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY,
+®);
+trace_vfio_ram_unregister(reg.vaddr, reg.size, ret ? -errno : 0);
+}
+break;
+}
 }
 
 static void vfio_type1_iommu_listener_region_add(MemoryListener *listener,
@@ -550,8 +583,42 @@ static const MemoryListener vfio_spapr_iommu_listener = {
 .region_del = vfio_spapr_iommu_listener_region_del,
 };
 
+static void vfio_spapr_ram_listener_region_add(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.spapr.ram_listener);
+
+if (memory_region_is_skip_dump(section->mr)) {
+return;
+}
+vfio_listener_region_add(container, qemu_real_host_page_mask, listener,
+ section);
+}
+
+static void vfio_spapr_ram_listener_region_del(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.spapr.ram_listener);
+
+if (memory_region_is_skip_dump(section->mr)) {
+return;
+}
+vfio_listener_region_del(container, qemu_real_host_page_mask, listener,
+ section);
+}
+
+static const MemoryListener vfio_spapr_ram_listener = {
+.region_add = vfio_spapr_ram_listener_region_add,
+.region_del = vfio_spapr_ram_listener_region_del,
+};
+
 static void vfio_listener_release(VFIOContainer *container)
 {
+if (container->iommu_data.type == VFIO_SPAPR_TCE_v2_IOMMU) {
+memory_listener_unregister(&container->iommu_data.spapr.ram_listener);
+}
 memory_listener_unregister(&container->iommu_data.type1.listener);
 }
 
@@ -765,14 +832,18 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 
 container->iommu_data.type1.initialized = true;
 
-} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
+} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
+   ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) {
+bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU);
+

[Qemu-devel] [RFC PATCH qemu v3 2/4] vfio: Use different page size for different IOMMU types

2015-07-14 Thread Alexey Kardashevskiy

The existing memory listener is called on RAM or PCI address space
which implies potentially different page size. Instead of guessing
what page size should be used, this replaces a single IOMMU memory
listener by two, one per supported IOMMU type; listener callbacks
call the existing helpers with a known page size.

For now, Type1 uses qemu_real_host_page_mask, sPAPR uses the page size
from IOMMU.

As memory_region_iommu_get_page_sizes() asserts on non-IOMMU regions,
this adds memory_region_is_iommu() to the SPAPR IOMMU listener to skip
non IOMMU regions (which is an MSIX window) which duplicates
vfio_listener_skipped_section() a bit.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/vfio/common.c | 95 
 1 file changed, 76 insertions(+), 19 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85ee9b0..aad41e1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -312,11 +312,11 @@ out:
 rcu_read_unlock();
 }
 
-static void vfio_listener_region_add(MemoryListener *listener,
+static void vfio_listener_region_add(VFIOContainer *container,
+ hwaddr page_mask,
+ MemoryListener *listener,
  MemoryRegionSection *section)
 {
-VFIOContainer *container = container_of(listener, VFIOContainer,
-iommu_data.type1.listener);
 hwaddr iova, end;
 Int128 llend;
 void *vaddr;
@@ -330,16 +330,16 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 return;
 }
 
-if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
- (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+if (unlikely((section->offset_within_address_space & ~page_mask) !=
+ (section->offset_within_region & ~page_mask))) {
 error_report("%s received unaligned region", __func__);
 return;
 }
 
-iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1);
 llend = int128_make64(section->offset_within_address_space);
 llend = int128_add(llend, section->size);
-llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
+llend = int128_and(llend, int128_exts64(page_mask));
 
 if (int128_ge(int128_make64(iova), llend)) {
 return;
@@ -416,11 +416,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 }
 }
 
-static void vfio_listener_region_del(MemoryListener *listener,
+static void vfio_listener_region_del(VFIOContainer *container,
+ hwaddr page_mask,
+ MemoryListener *listener,
  MemoryRegionSection *section)
 {
-VFIOContainer *container = container_of(listener, VFIOContainer,
-iommu_data.type1.listener);
 hwaddr iova, end;
 int ret;
 
@@ -432,8 +432,8 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 return;
 }
 
-if (unlikely((section->offset_within_address_space & ~TARGET_PAGE_MASK) !=
- (section->offset_within_region & ~TARGET_PAGE_MASK))) {
+if (unlikely((section->offset_within_address_space & ~page_mask) !=
+ (section->offset_within_region & ~page_mask))) {
 error_report("%s received unaligned region", __func__);
 return;
 }
@@ -459,9 +459,9 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
  */
 }
 
-iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+iova = ROUND_UP(section->offset_within_address_space, ~page_mask + 1);
 end = (section->offset_within_address_space + int128_get64(section->size)) 
&
-  TARGET_PAGE_MASK;
+  page_mask;
 
 if (iova >= end) {
 return;
@@ -478,9 +478,66 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 }
 
-static const MemoryListener vfio_memory_listener = {
-.region_add = vfio_listener_region_add,
-.region_del = vfio_listener_region_del,
+static void vfio_type1_iommu_listener_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.type1.listener);
+
+vfio_listener_region_add(container, qemu_real_host_page_mask, listener,
+ section);
+}
+
+
+static void vfio_type1_iommu_listener_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer,
+iommu_data.type1.listener);
+
+vfio_listener_region_del(container

[Qemu-devel] [RFC PATCH qemu v3 3/4] vfio: Store IOMMU type in container

2015-07-14 Thread Alexey Kardashevskiy

So far we were managing not to have an IOMMU type stored anywhere but
since we are going to implement different behavior for different IOMMU
types in the same memory listener, we need to know IOMMU type after
initialization.

This adds an IOMMU type into VFIOContainer and initializes it. This
adds SPAPR IOMMU data into the iommu_data union; for now it only includes
the existing Type1 data struct. Since zero is not used for any type,
no additional initialization is necessary for VFIOContainer::type.

This reworks vfio_listener_region_add() in order to prepare it to
handle RAM regions on IOMMUs other than Type1/Type1v2.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* folded vfio_listener_region_add() change into this

v2:
* added VFIOContainer::iommu_data::spapr
---
 hw/vfio/common.c  | 55 ++-
 include/hw/vfio/vfio-common.h |  6 +
 2 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index aad41e1..6982b8f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -393,26 +393,36 @@ static void vfio_listener_region_add(VFIOContainer 
*container,
 section->offset_within_region +
 (iova - section->offset_within_address_space);
 
-trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
+switch (container->iommu_data.type) {
+case VFIO_TYPE1_IOMMU:
+case VFIO_TYPE1v2_IOMMU:
+trace_vfio_listener_region_add_ram(iova, end - 1, vaddr);
 
-ret = vfio_dma_map(container, iova, end - iova, vaddr, section->readonly);
-if (ret) {
-error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx", %p) = %d (%m)",
- container, iova, end - iova, vaddr, ret);
+ret = vfio_dma_map(container, iova, end - iova, vaddr, 
section->readonly);
+if (ret) {
+error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+ "0x%"HWADDR_PRIx", %p) = %d (%m)",
+ container, iova, end - iova, vaddr, ret);
+goto error_exit;
+}
+break;
+}
+
+return;
+
+error_exit:
 
-/*
- * On the initfn path, store the first error in the container so we
- * can gracefully fail.  Runtime, there's not much we can do other
- * than throw a hardware error.
- */
-if (!container->iommu_data.type1.initialized) {
-if (!container->iommu_data.type1.error) {
-container->iommu_data.type1.error = ret;
-}
-} else {
-hw_error("vfio: DMA mapping failed, unable to continue");
+/*
+ * On the initfn path, store the first error in the container so we
+ * can gracefully fail.  Runtime, there's not much we can do other
+ * than throw a hardware error.
+ */
+if (!container->iommu_data.type1.initialized) {
+if (!container->iommu_data.type1.error) {
+container->iommu_data.type1.error = ret;
 }
+} else {
+hw_error("vfio: DMA mapping failed, unable to continue");
 }
 }
 
@@ -733,8 +743,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 goto free_container_exit;
 }
 
-ret = ioctl(fd, VFIO_SET_IOMMU,
-v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU);
+container->iommu_data.type = v2 ? VFIO_TYPE1v2_IOMMU : 
VFIO_TYPE1_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_data.type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -762,7 +772,8 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 ret = -errno;
 goto free_container_exit;
 }
-ret = ioctl(fd, VFIO_SET_IOMMU, VFIO_SPAPR_TCE_IOMMU);
+container->iommu_data.type = VFIO_SPAPR_TCE_IOMMU;
+ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_data.type);
 if (ret) {
 error_report("vfio: failed to set iommu for container: %m");
 ret = -errno;
@@ -781,10 +792,10 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
 goto free_container_exit;
 }
 
-container->iommu_data.type1.listener = vfio_spapr_iommu_listener;
+container->iommu_data.spapr.common.listener = 
vfio_spapr_iommu_listener;
 container->iommu_data.release = vfio_listener_release;
 
-memory_listener_register(&container->iommu_data.type1.listener,
+memory_listener_register(&container->iommu_data.spapr.common.listener,
  container->space->as);
 
 } else {
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 59a321d..135ea64 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -70,13 +70,19 @@ typedef struct VFIOType1

[Qemu-devel] [RFC PATCH qemu v3 1/4] memory: Add reporting of supported page sizes

2015-07-14 Thread Alexey Kardashevskiy

Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.

This adds a get_page_sizes callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the actual
page size(s) used by an IOMMU.

TARGET_PAGE_BITS shift is used as fallback.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr_iommu.c  |  8 
 include/exec/memory.h | 11 +++
 memory.c  |  9 +
 3 files changed, 28 insertions(+)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index f61504e..a2572c4 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -104,6 +104,13 @@ static IOMMUTLBEntry 
spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
 return ret;
 }
 
+static uint64_t spapr_tce_get_page_sizes(MemoryRegion *iommu)
+{
+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+
+return 1ULL << tcet->page_shift;
+}
+
 static int spapr_tce_table_post_load(void *opaque, int version_id)
 {
 sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
@@ -135,6 +142,7 @@ static const VMStateDescription vmstate_spapr_tce_table = {
 
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
 .translate = spapr_tce_translate_iommu,
+.get_page_sizes = spapr_tce_get_page_sizes,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1394715..9ca74e3 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -152,6 +152,8 @@ typedef struct MemoryRegionIOMMUOps MemoryRegionIOMMUOps;
 struct MemoryRegionIOMMUOps {
 /* Return a TLB entry that contains a given address. */
 IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool 
is_write);
+/* Returns supported page sizes */
+uint64_t (*get_page_sizes)(MemoryRegion *iommu);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -552,6 +554,15 @@ static inline bool memory_region_is_romd(MemoryRegion *mr)
 bool memory_region_is_iommu(MemoryRegion *mr);
 
 /**
+ * memory_region_iommu_get_page_sizes: get supported page sizes in an iommu
+ *
+ * Returns %bitmap of supported page sizes for an iommu.
+ *
+ * @mr: the memory region being queried
+ */
+uint64_t memory_region_iommu_get_page_sizes(MemoryRegion *mr);
+
+/**
  * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
  *
  * @mr: the memory region that was changed
diff --git a/memory.c b/memory.c
index 5a0cc66..0732763 100644
--- a/memory.c
+++ b/memory.c
@@ -1413,6 +1413,15 @@ bool memory_region_is_iommu(MemoryRegion *mr)
 return mr->iommu_ops;
 }
 
+uint64_t memory_region_iommu_get_page_sizes(MemoryRegion *mr)
+{
+assert(memory_region_is_iommu(mr));
+if (mr->iommu_ops && mr->iommu_ops->get_page_sizes) {
+return mr->iommu_ops->get_page_sizes(mr);
+}
+return 1ULL << TARGET_PAGE_BITS;
+}
+
 void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n)
 {
 notifier_list_add(&mr->iommu_notify, n);
-- 
2.4.0.rc3.8.gfb3e7d5

[Qemu-devel] [RFC PATCH qemu v3 0/4] vfio: SPAPR IOMMU v2 (memory preregistration support)

2015-07-14 Thread Alexey Kardashevskiy

Yet another try, reworked the whole patchset.

Here are few patches to prepare an existing listener for handling memory
preregistration for SPAPR guests running on POWER8.

This used to be a part of DDW patchset but now is separated as requested.


Please comment. Thanks!

Changes:
v3:
* removed incorrect "vfio: Skip PCI BARs in memory listener"
* removed page size changes from quirks as they did not completely fix
the crashes happening on POWER8 (only total removal helps there)
* added "memory: Add reporting of supported page sizes"




Alexey Kardashevskiy (4):
  memory: Add reporting of supported page sizes
  vfio: Use different page size for different IOMMU types
  vfio: Store IOMMU type in container
  vfio: spapr: Add SPAPR IOMMU v2 support (DMA memory preregistering)

 hw/ppc/spapr_iommu.c  |   8 ++
 hw/vfio/common.c  | 245 ++
 include/exec/memory.h |  11 ++
 include/hw/vfio/vfio-common.h |   7 ++
 memory.c  |   9 ++
 trace-events  |   2 +
 6 files changed, 235 insertions(+), 47 deletions(-)

-- 
2.4.0.rc3.8.gfb3e7d5

Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: Introduce block_job_relax_cpu

2015-07-14 Thread Stefan Hajnoczi

On Fri, Jul 10, 2015 at 05:42:48AM +0200, Alexandre DERUMIER wrote:
> >>By the way, why did you choose 10 milliseconds?  That is quite long.
> >>
> >>If this function is called once per 10 ms disk I/O operations then we
> >>lose 50% utilization.  1 ms or less would be reasonable.
> 
> From my tests, 1ms is not enough, It still hanging in guest or qmp queries.
> 10ms give me optimal balance between bitmap scan speed and guest 
> responsiveness.

Then I don't fully understand the bug.

Fam: can you explain why 1ms isn't enough?


pgpgzUQltkT5x.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 34/42] Postcopy: Use helpers to map pages during migration

2015-07-14 Thread Juan Quintela

"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> In postcopy, the destination guest is running at the same time
> as it's receiving pages; as we receive new pages we must put
> them into the guests address space atomically to avoid a running
> CPU accessing a partially written page.
>
> Use the helpers in postcopy-ram.c to map these pages.
>
> qemu_get_buffer_less_copy is used to avoid a copy out of qemu_file
> in the case that postcopy is going to do a copy anyway.
>
> Signed-off-by: Dr. David Alan Gilbert 
> @@ -1742,7 +1752,6 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>  error_report("Ack, bad migration stream!");
>  return NULL;
>  }
> -

Dont' belong here O:-)

>  return memory_region_get_ram_ptr(block->mr) + offset;
>  }
>  
> @@ -1881,6 +1890,16 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  int flags = 0, ret = 0;
>  static uint64_t seq_iter;
>  int len = 0;
> +/*
> + * System is running in postcopy mode, page inserts to host memory must 
> be
> + * atomic
> + */
> +MigrationIncomingState *mis = migration_incoming_get_current();
> +bool postcopy_running = postcopy_state_get(mis) >=
> +POSTCOPY_INCOMING_LISTENING;
> +void *postcopy_host_page = NULL;
> +bool postcopy_place_needed = false;
> +bool matching_page_sizes = qemu_host_page_size == TARGET_PAGE_SIZE;
>  
>  seq_iter++;
>  
> @@ -1896,13 +1915,57 @@ static int ram_load(QEMUFile *f, void *opaque, int 
> version_id)
>  rcu_read_lock();
>  while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) {
>  ram_addr_t addr, total_ram_bytes;
> -void *host;
> +void *host = 0;
> +void *page_buffer = 0;
> +void *postcopy_place_source = 0;

NULL, NULL, NULL?

BTW, do we really need postcopy_place_source?  I think that just doing
s/postcopy_place_source/postcopy_host_page/ would do?

>  uint8_t ch;
> +bool all_zero = false;
>  
>  addr = qemu_get_be64(f);
>  flags = addr & ~TARGET_PAGE_MASK;
>  addr &= TARGET_PAGE_MASK;
>  
> +if (flags & (RAM_SAVE_FLAG_COMPRESS | RAM_SAVE_FLAG_PAGE |
> + RAM_SAVE_FLAG_XBZRLE)) {
> +host = host_from_stream_offset(f, mis, addr, flags);
> +if (!host) {
> +error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
> +ret = -EINVAL;
> +break;
> +}
> +if (!postcopy_running) {
> +page_buffer = host;
> +} else {
> +/*
> + * Postcopy requires that we place whole host pages 
> atomically.
> + * To make it atomic, the data is read into a temporary page
> + * that's moved into place later.
> + * The migration protocol uses,  possibly smaller, 
> target-pages
> + * however the source ensures it always sends all the 
> components
> + * of a host page in order.
> + */
> +if (!postcopy_host_page) {
> +postcopy_host_page = postcopy_get_tmp_page(mis);
> +}
> +page_buffer = postcopy_host_page +
> +  ((uintptr_t)host & ~qemu_host_page_mask);
> +/* If all TP are zero then we can optimise the place */
> +if (!((uintptr_t)host & ~qemu_host_page_mask)) {

I don't understand the test, the comment or both :-(

How you arrive from that test that this is a page full of zeros is a
mistery to me :p

Head hurts, would try to convince myself that the rest of changes are ok.

Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy

2015-07-14 Thread Juan Quintela

"Dr. David Alan Gilbert (git)"  wrote:
> From: "Dr. David Alan Gilbert" 
>
> Once we're in postcopy the source processors are stopped and memory
> shouldn't change any more, so there's no need to look at the dirty
> map.
>
> There are two notes to this:
>   1) If we do resync and a page had changed then the page would get
>  sent again, which the destination wouldn't allow (since it might
>  have also modified the page)
>   2) Before disabling this I'd seen very rare cases where a page had been
>  marked dirtied although the memory contents are apparently identical
>
> Signed-off-by: Dr. David Alan Gilbert 
> Reviewed-by: David Gibson 

Reviewed-by: Juan Quintela 

But, in what patch do we sync the migratioon bitmap after changing to postcopy?

> ---
>  migration/ram.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 01a0ab4..5cff4d6 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
>  {
>  rcu_read_lock();
>  
> -migration_bitmap_sync();
> +if (!migration_postcopy_phase(migrate_get_current())) {
> +migration_bitmap_sync();
> +}
>  
>  ram_control_before_iterate(f, RAM_CONTROL_FINISH);
>  
> @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
> uint64_t max_size,
>  
>  remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
>  
> -if (remaining_size < max_size) {
> +if (!migration_postcopy_phase(migrate_get_current()) &&
> +remaining_size < max_size) {
>  qemu_mutex_lock_iothread();
>  rcu_read_lock();
>  migration_bitmap_sync();

Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan

2015-07-14 Thread Stefan Hajnoczi

On Mon, Jul 13, 2015 at 01:08:37PM +0800, Fam Zheng wrote:
> On Fri, 07/10 14:16, Alexandre DERUMIER wrote:
> > So, I think patch 3/3 is enough. (could be great to have it for qemu 2.4)
> 
> Stefan, are you happy with this series?

I'll take Patch 3 for QEMU 2.4.

This series should go via Jeff Cody (blockjobs maintainer) but he is/was
on vacation.

Patches 1 & 2 need more justification.  It seems like they might be
working around or masking a deeper problem.

pgpJZvnKAb3ON.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] net: Flush queued packets when guest resumes

2015-07-14 Thread Michael S. Tsirkin

On Tue, Jul 14, 2015 at 01:20:03PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 10, 2015 at 05:03:22PM +0800, Fam Zheng wrote:
> > On Tue, 07/07 12:19, Michael S. Tsirkin wrote:
> > > On Tue, Jul 07, 2015 at 05:09:09PM +0800, Fam Zheng wrote:
> > > > On Tue, 07/07 11:13, Michael S. Tsirkin wrote:
> > > > > On Tue, Jul 07, 2015 at 09:21:07AM +0800, Fam Zheng wrote:
> > > > > > Since commit 6e99c63 "net/socket: Drop net_socket_can_send" and 
> > > > > > friends,
> > > > > > net queues need to be explicitly flushed after 
> > > > > > qemu_can_send_packet()
> > > > > > returns false, because the netdev side will disable the polling of 
> > > > > > fd.
> > > > > > 
> > > > > > This fixes the case of "cont" after "stop" (or migration).
> > > > > > 
> > > > > > Signed-off-by: Fam Zheng 
> > > > > 
> > > > > Note virtio has its own handler which must be used to
> > > > > flush packets - this one might run too early or too late.
> > > > 
> > > > Which handler do you mean? I don't think virtio-net handles resume now. 
> > > > (If it
> > > > does, we probably should drop it together with this change, since it's 
> > > > needed
> > > > by as all NICs.)
> > > > 
> > > > Fam
> > > 
> > > virtio_vmstate_change
> > > 
> > > It's all far from trivial. I suspect these whack-a-mole approach
> > > spreading purge here and there will only create more bugs.
> > > 
> > > Why would we ever need to process network packets when
> > > VM is not running? I don't see any point to it.
> > > How about we simply stop the job processing network on
> > > vm stop and restart on vm start?
> > 
> > I suppose it is too much for 2.4. I think this approach, adding
> > qemu_flush_queued_packets(), is consistent with its existing usage (when a
> > device is becoming active from inactive), like in e1000_write_config.
> > 
> > How about applying this and let's work on "stopping tap when VM not running"
> > for 2.5?
> 
> Jason has gone happy on this and the virtio-net .can_receive() patch.
> 
> Michael: Any further comments?  Are you okay with this patch too?

I think it doesn't help virtio - am I wrong?

-- 
MST

[Qemu-devel] [PATCH] vnc: fix vnc client authentication

2015-07-14 Thread Wolfgang Bumiller

Commit 800567a61 updated the code to the generic crypto API
and mixed up encrypt and decrypt functions in
procotol_client_auth_vnc.
(Used to be: deskey(key, EN0) which encrypts, and was
changed to qcrypto_cipher_decrypt in 800567a61.)
Changed it to qcrypto_cipher_encrypt now.

Signed-off-by: Wolfgang Bumiller 
---
 ui/vnc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 94e4f19..1483958 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2551,7 +2551,7 @@ static int protocol_client_auth_vnc(VncState *vs, uint8_t 
*data, size_t len)
 goto reject;
 }
 
-if (qcrypto_cipher_decrypt(cipher,
+if (qcrypto_cipher_encrypt(cipher,
vs->challenge,
response,
VNC_AUTH_CHALLENGE_SIZE,
-- 
2.1.4

Re: [Qemu-devel] [PATCH v4 4/7] pc: fix QEMU crashing when more than ~50 memory hotplugged

2015-07-14 Thread Igor Mammedov

On Mon, 13 Jul 2015 23:14:37 +0300
"Michael S. Tsirkin"  wrote:

> On Mon, Jul 13, 2015 at 08:55:13PM +0200, Igor Mammedov wrote:
> > On Mon, 13 Jul 2015 09:55:18 +0300
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Fri, Jul 10, 2015 at 12:12:36PM +0200, Igor Mammedov wrote:
> > > > On Thu, 9 Jul 2015 16:46:43 +0300
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Jul 09, 2015 at 03:43:01PM +0200, Paolo Bonzini wrote:
> > > > > > 
> > > > > > 
> > > > > > On 09/07/2015 15:06, Michael S. Tsirkin wrote:
> > > > > > > > QEMU asserts in vhost due to hitting vhost backend limit
> > > > > > > > on number of supported memory regions.
> > > > > > > > 
> > > > > > > > Describe all hotplugged memory as one continuos range
> > > > > > > > to vhost with linear 1:1 HVA->GPA mapping in backend.
> > > > > > > > 
> > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > >
> > > > > > > Hmm - a bunch of work here to recombine MRs that memory
> > > > > > > listener interface breaks up.  In particular KVM could
> > > > > > > benefit from this too (on workloads that change the table a
> > > > > > > lot).  Can't we teach memory core to pass hva range as a
> > > > > > > single continuous range to memory listeners?
> > > > > > 
> > > > > > Memory listeners are based on memory regions, not HVA ranges.
> > > > > > 
> > > > > > Paolo
> > > > > 
> > > > > Many listeners care about HVA ranges. I know KVM and vhost do.
> > > > I'm not sure about KVM, it works just fine with fragmented memory
> > > > regions, the same will apply to vhost once module parameter to
> > > > increase limit is merged.
> > > > 
> > > > but changing generic memory listener interface to replace HVA mapped
> > > > regions with HVA container would lead to a case when listeners
> > > > won't see exact layout that they might need.
> > > 
> > > I don't think they care, really.
> > > 
> > > > In addition vhost itself will suffer from working with big HVA
> > > > since it allocates log depending on size of memory => bigger log.
> > > 
> > > Not really - it allocates the log depending on the PA range.
> > > Leaving unused holes doesn't reduce it's size.
> > if it would use HVA container instead then it will always allocate
> > log for max possible GPA, meaning that -m 1024,maxmem=1T will waste
> > a lot of memory and more so for bigger maxmem.
> > It's still possible to induce worst case by plugging pc-dimm at the end
> > of hotplug-memory area by specifying address for it explicitly.
> > That problem exists since memory hot-add was introduced, I've just
> > haven't noticed it back then.
> 
> There you are then. Depending on maxmem seems cleaner as it's more
> predictable.
> 
> > It's perfectly fine to allocate log by last GPA as far as
> > memory is nearly continuous but memory hot-add makes it possible to
> > have sparse layout with a huge gaps between guest mapped RAM
> > which makes current log handling inefficient.
> > 
> > I wonder how hard it would be to make log_size depend on present RAM
> > size rather than max present GPA so it wouldn't allocate excess 
> > memory for log.
> 
> We can simply map the unused parts of the log RESERVED.
meaning that vhost listener should get RAM regions so it would know
which parts of log it has to mmap(NORESERVE|DONTNEED)

it would also require custom allocator for log, that could manage
punching/unpunching holes in log depending on RAM layout.

btw is it possible for guest to force vhost module access
NORESERVE area and what would happen it that case?


> 
> That can be a natural continuation of these series, but
> I don't think it needs to block it.
> 
> > 
> > > 
> > > 
> > > > That's one of the reasons that in this patch HVA ranges in
> > > > memory map are compacted only for backend consumption,
> > > > QEMU's side of vhost uses exact map for internal purposes.
> > > > And the other reason is I don't know vhost enough to rewrite it
> > > > to use big HVA for everything.
> > > > 
> > > > > I guess we could create dummy MRs to fill in the holes left by
> > > > > memory hotplug?
> > > > it looks like nice thing from vhost pov but complicates other side,
> > > 
> > > What other side do you have in mind?
> > > 
> > > > hence I dislike an idea inventing dummy MRs for vhost's convenience.
> > memory core, but lets see what Paolo thinks about it.
> > 
> > > > 
> > > > 
> > > > > vhost already has logic to recombine
> > > > > consequitive chunks created by memory core.
> > > > which looks a bit complicated and I was thinking about simplifying
> > > > it some time in the future.
> > > 
>

Re: [Qemu-devel] [PATCH v7 35/42] Don't sync dirty bitmaps in postcopy

2015-07-14 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert (git)"  wrote:
> > From: "Dr. David Alan Gilbert" 
> >
> > Once we're in postcopy the source processors are stopped and memory
> > shouldn't change any more, so there's no need to look at the dirty
> > map.
> >
> > There are two notes to this:
> >   1) If we do resync and a page had changed then the page would get
> >  sent again, which the destination wouldn't allow (since it might
> >  have also modified the page)
> >   2) Before disabling this I'd seen very rare cases where a page had been
> >  marked dirtied although the memory contents are apparently identical
> >
> > Signed-off-by: Dr. David Alan Gilbert 
> > Reviewed-by: David Gibson 
> 
> Reviewed-by: Juan Quintela 
> 
> But, in what patch do we sync the migratioon bitmap after changing to 
> postcopy?

It's called one last time in ram_postcopy_send_discard_bitmap; see:
v7-0025-Postcopy-Maintain-sentmap-and-calculate-discard.patch

and that happens at the start of postcopy mode, when the CPU is stopped and 
won't
be running on the source again.

Dave

> 
> > ---
> >  migration/ram.c | 7 +--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 01a0ab4..5cff4d6 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1643,7 +1643,9 @@ static int ram_save_complete(QEMUFile *f, void 
> > *opaque)
> >  {
> >  rcu_read_lock();
> >  
> > -migration_bitmap_sync();
> > +if (!migration_postcopy_phase(migrate_get_current())) {
> > +migration_bitmap_sync();
> > +}
> >  
> >  ram_control_before_iterate(f, RAM_CONTROL_FINISH);
> >  
> > @@ -1678,7 +1680,8 @@ static void ram_save_pending(QEMUFile *f, void 
> > *opaque, uint64_t max_size,
> >  
> >  remaining_size = ram_save_remaining() * TARGET_PAGE_SIZE;
> >  
> > -if (remaining_size < max_size) {
> > +if (!migration_postcopy_phase(migrate_get_current()) &&
> > +remaining_size < max_size) {
> >  qemu_mutex_lock_iothread();
> >  rcu_read_lock();
> >  migration_bitmap_sync();
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v4 4/7] pc: fix QEMU crashing when more than ~50 memory hotplugged

2015-07-14 Thread Michael S. Tsirkin

On Tue, Jul 14, 2015 at 03:02:44PM +0200, Igor Mammedov wrote:
> On Mon, 13 Jul 2015 23:14:37 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jul 13, 2015 at 08:55:13PM +0200, Igor Mammedov wrote:
> > > On Mon, 13 Jul 2015 09:55:18 +0300
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Fri, Jul 10, 2015 at 12:12:36PM +0200, Igor Mammedov wrote:
> > > > > On Thu, 9 Jul 2015 16:46:43 +0300
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Thu, Jul 09, 2015 at 03:43:01PM +0200, Paolo Bonzini wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 09/07/2015 15:06, Michael S. Tsirkin wrote:
> > > > > > > > > QEMU asserts in vhost due to hitting vhost backend limit
> > > > > > > > > on number of supported memory regions.
> > > > > > > > > 
> > > > > > > > > Describe all hotplugged memory as one continuos range
> > > > > > > > > to vhost with linear 1:1 HVA->GPA mapping in backend.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > >
> > > > > > > > Hmm - a bunch of work here to recombine MRs that memory
> > > > > > > > listener interface breaks up.  In particular KVM could
> > > > > > > > benefit from this too (on workloads that change the table a
> > > > > > > > lot).  Can't we teach memory core to pass hva range as a
> > > > > > > > single continuous range to memory listeners?
> > > > > > > 
> > > > > > > Memory listeners are based on memory regions, not HVA ranges.
> > > > > > > 
> > > > > > > Paolo
> > > > > > 
> > > > > > Many listeners care about HVA ranges. I know KVM and vhost do.
> > > > > I'm not sure about KVM, it works just fine with fragmented memory
> > > > > regions, the same will apply to vhost once module parameter to
> > > > > increase limit is merged.
> > > > > 
> > > > > but changing generic memory listener interface to replace HVA mapped
> > > > > regions with HVA container would lead to a case when listeners
> > > > > won't see exact layout that they might need.
> > > > 
> > > > I don't think they care, really.
> > > > 
> > > > > In addition vhost itself will suffer from working with big HVA
> > > > > since it allocates log depending on size of memory => bigger log.
> > > > 
> > > > Not really - it allocates the log depending on the PA range.
> > > > Leaving unused holes doesn't reduce it's size.
> > > if it would use HVA container instead then it will always allocate
> > > log for max possible GPA, meaning that -m 1024,maxmem=1T will waste
> > > a lot of memory and more so for bigger maxmem.
> > > It's still possible to induce worst case by plugging pc-dimm at the end
> > > of hotplug-memory area by specifying address for it explicitly.
> > > That problem exists since memory hot-add was introduced, I've just
> > > haven't noticed it back then.
> > 
> > There you are then. Depending on maxmem seems cleaner as it's more
> > predictable.
> > 
> > > It's perfectly fine to allocate log by last GPA as far as
> > > memory is nearly continuous but memory hot-add makes it possible to
> > > have sparse layout with a huge gaps between guest mapped RAM
> > > which makes current log handling inefficient.
> > > 
> > > I wonder how hard it would be to make log_size depend on present RAM
> > > size rather than max present GPA so it wouldn't allocate excess 
> > > memory for log.
> > 
> > We can simply map the unused parts of the log RESERVED.
> meaning that vhost listener should get RAM regions so it would know
> which parts of log it has to mmap(NORESERVE|DONTNEED)
> 
> it would also require custom allocator for log, that could manage
> punching/unpunching holes in log depending on RAM layout.

Yea. Anyway, this isn't urgent I think.

> btw is it possible for guest to force vhost module access
> NORESERVE area and what would happen it that case?

Sure.  I think you'll get EFAULT, vhost will stop processing the ring then.

> 
> > 
> > That can be a natural continuation of these series, but
> > I don't think it needs to block it.
> > 
> > > 
> > > > 
> > > > 
> > > > > That's one of the reasons that in this patch HVA ranges in
> > > > > memory map are compacted only for backend consumption,
> > > > > QEMU's side of vhost uses exact map for internal purposes.
> > > > > And the other reason is I don't know vhost enough to rewrite it
> > > > > to use big HVA for everything.
> > > > > 
> > > > > > I guess we could create dummy MRs to fill in the holes left by
> > > > > > memory hotplug?
> > > > > it looks like nice thing from vhost pov but complicates other side,
> > > > 
> > > > What other side do you have in mind?
> > > > 
> > > > > hence I dislike an idea inventing dummy MRs for vhost's convenience.
> > > memory core, but lets see what Paolo thinks about it.
> > > 
> > > > > 
> > > > > 
> > > > > > vhost already has logic to recombine
> > > > > > consequitive chunks created by memory core.
> > > > > which looks a bit complicated and I was thinking about simplifying
> > > > > it some time in the future.
> > > > 
> >

1 2 3 >

1 - 100 of 239 matches

Mail list logo