Re: [Qemu-devel] [PATCH v2] file-posix: Incoporate max_segments in block limit

2017-03-08 Thread Paolo Bonzini


On 08/03/2017 08:01, Fam Zheng wrote:
> Linux exposes a separate limit, /sys/block/.../queue/max_segments, which
> in the worst case can be more restrictive than BLKSECTGET (as they are
> two different things). Similar to the BLKSECTGET story, guests don't see
> this limit and send big requests will get -EINVAL error on SG_IO.
> 
> Lean on the safer side to clamp max_transfer according to max_segments
> and page size, because in the end what host HBA gets is the mapped host
> pages rather than a guest buffer.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Use /sys/dev/block/MAJOR:MINOR/queue/max_segments. [Paolo]
> ---
>  block/file-posix.c | 47 +++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 4de1abd..c4c0663 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -668,6 +668,48 @@ static int hdev_get_max_transfer_length(BlockDriverState 
> *bs, int fd)
>  #endif
>  }
>  
> +static int hdev_get_max_segments(const struct stat *st)
> +{
> +#ifdef CONFIG_LINUX
> +char buf[32];
> +const char *end;
> +char *sysfspath;
> +int ret;
> +int fd = -1;
> +long max_segments;
> +
> +sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> +major(st->st_rdev), minor(st->st_rdev));
> +fd = open(sysfspath, O_RDONLY);
> +if (fd == -1) {
> +ret = -errno;
> +goto out;
> +}
> +do {
> +ret = read(fd, buf, sizeof(buf));
> +} while (ret == -1 && errno == EINTR);
> +if (ret < 0) {
> +ret = -errno;
> +goto out;
> +} else if (ret == 0) {
> +ret = -EIO;
> +goto out;
> +}
> +buf[ret] = 0;
> +/* The file is ended with '\n', pass 'end' to accept that. */
> +ret = qemu_strtol(buf, &end, 10, &max_segments);
> +if (ret == 0 && end && *end == '\n') {
> +ret = max_segments;
> +}
> +
> +out:
> +g_free(sysfspath);
> +return ret;
> +#else
> +return -ENOTSUP;
> +#endif
> +}
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
> @@ -679,6 +721,11 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>  bs->bl.max_transfer = pow2floor(ret);
>  }
> +ret = hdev_get_max_segments(&st);
> +if (ret > 0) {
> +bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> +  ret * getpagesize());
> +}
>  }
>  }
>  
> 

Reviewed-by: Paolo Bonzini 

Thanks!



Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Marcel Apfelbaum

On 03/08/2017 05:15 AM, Jason Wang wrote:



On 2017年03月08日 10:43, Peter Xu wrote:

On Tue, Mar 07, 2017 at 02:47:30PM +0200, Marcel Apfelbaum wrote:

On 03/07/2017 11:09 AM, Jason Wang wrote:

After commit 96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU
after caching ring translations"), IOMMU was required to be created in
advance. This is because we can only get the correct dma_as after pci
IOMMU (e.g intel_iommu) was initialized. This is suboptimal and
inconvenient for user. This patch releases this by:

- introduce a bus_master_ready method for PCIDeviceClass and trigger
  this during pci_init_bus_master
- implement virtio-pci method and 1) reset the dma_as 2) re-register
  the memory listener to the new dma_as


Hi Jason,


Cc: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
Changes from V2:
- delay pci_init_bus_master() after pci device is realized to make
  bus_master_ready a more generic method
---
hw/pci/pci.c   | 11 ---
hw/virtio/virtio-pci.c |  9 +
hw/virtio/virtio.c | 19 +++
include/hw/pci/pci.h   |  1 +
include/hw/virtio/virtio.h |  1 +
5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 273f1e4..22e6ad9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -83,6 +83,7 @@ static const VMStateDescription vmstate_pcibus = {
static void pci_init_bus_master(PCIDevice *pci_dev)
{
 AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
+PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);

 memory_region_init_alias(&pci_dev->bus_master_enable_region,
  OBJECT(pci_dev), "bus master",
@@ -90,6 +91,9 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
 memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
 address_space_init(&pci_dev->bus_master_as,
&pci_dev->bus_master_enable_region, pci_dev->name);
+if (pc->bus_master_ready) {
+pc->bus_master_ready(pci_dev);
+}
}

static void pcibus_machine_done(Notifier *notifier, void *data)
@@ -995,9 +999,6 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pci_dev->devfn = devfn;
 pci_dev->requester_id_cache = pci_req_id_cache_get(pci_dev);

-if (qdev_hotplug) {
-pci_init_bus_master(pci_dev);
-}
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 pci_dev->irq_state = 0;
 pci_config_alloc(pci_dev);
@@ -1995,6 +1996,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 }

+if (qdev_hotplug) {
+pci_init_bus_master(pci_dev);
+}
+

How does it help if we move qdev_hotplug check outside "do_pci_register_device"?
I suppose you want the code to run after the "realize" function?


Yes.


If so, what happens if a "realize" function of another device needs the 
bus_master_as
(at hotplug time)?


I'm not sure this is really needed. What kind of device need to check hotplug 
during their realize? (Looks like we don't have such kind of device now).


I am sorry I was not clear enough:
If a device is added after the system is up (hotplug), we cannot depend on the 
"machine_done"
event to enable "bus master".
This is why we have
   if (qdev_hotplug)
pci_init_bus_master(pci_dev);

The code you proposed changes the order, so this call is done *after* realize.

My question was: What if any other device may require the bus_master_as
at realize time (and can be hot-plugged) ?
For example: hcd-ehci/hcd-ohci devices call pci_get_address_space()
and caches the bus_master_as.

It is possible the mentioned devices can't be hot-plugged or
are not relevant for PCIe.

I am only saying we should double check when making such a change.




My unmature understanding is that, we can just call
pci_device_iommu_address_space() if device realization really needs
the address space, rather than using bus_master_as.


A little issue is pci_device_iommu_address_space() can be wrong if it was 
called before OMMU was created.



At hotplug time this is irrelevant because the iommu has already been created.

Thanks,
Marcel


Thanks



An example is vfio-pci device. It is using
pci_device_iommu_address_space() in vfio_realize(). Though I _guess_
there may be other reasons behind, e.g., VFIOAddressSpace should be
1:1 mapped to bus master address space, so we may want to make sure
this address space will be the same when different devices are using
the same address space (while bus_master_as is one per device, even if
two devices have the same backend address space, there will be two
distinct bus_master_as).

IIUC, bus_master_as is only used to emulate the master bit in PCI
command register. In that case, that should only be there for the
guest operations, while imho device realization is "emulation code",
which can directly use pci_device_iommu_address_space(). Actually, if
it plays with bus_master_as even if it can, I guess it'll just fail
since that region has not yet be

Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Marcel Apfelbaum

On 03/08/2017 04:43 AM, Peter Xu wrote:

On Tue, Mar 07, 2017 at 02:47:30PM +0200, Marcel Apfelbaum wrote:

On 03/07/2017 11:09 AM, Jason Wang wrote:

After commit 96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU
after caching ring translations"), IOMMU was required to be created in
advance. This is because we can only get the correct dma_as after pci
IOMMU (e.g intel_iommu) was initialized. This is suboptimal and
inconvenient for user. This patch releases this by:

- introduce a bus_master_ready method for PCIDeviceClass and trigger
 this during pci_init_bus_master
- implement virtio-pci method and 1) reset the dma_as 2) re-register
 the memory listener to the new dma_as



Hi Jason,


Cc: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
Changes from V2:
- delay pci_init_bus_master() after pci device is realized to make
 bus_master_ready a more generic method
---
hw/pci/pci.c   | 11 ---
hw/virtio/virtio-pci.c |  9 +
hw/virtio/virtio.c | 19 +++
include/hw/pci/pci.h   |  1 +
include/hw/virtio/virtio.h |  1 +
5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 273f1e4..22e6ad9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -83,6 +83,7 @@ static const VMStateDescription vmstate_pcibus = {
static void pci_init_bus_master(PCIDevice *pci_dev)
{
AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
+PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);

memory_region_init_alias(&pci_dev->bus_master_enable_region,
 OBJECT(pci_dev), "bus master",
@@ -90,6 +91,9 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
address_space_init(&pci_dev->bus_master_as,
   &pci_dev->bus_master_enable_region, pci_dev->name);
+if (pc->bus_master_ready) {
+pc->bus_master_ready(pci_dev);
+}
}

static void pcibus_machine_done(Notifier *notifier, void *data)
@@ -995,9 +999,6 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
pci_dev->devfn = devfn;
pci_dev->requester_id_cache = pci_req_id_cache_get(pci_dev);

-if (qdev_hotplug) {
-pci_init_bus_master(pci_dev);
-}
pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
pci_dev->irq_state = 0;
pci_config_alloc(pci_dev);
@@ -1995,6 +1996,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
}
}

+if (qdev_hotplug) {
+pci_init_bus_master(pci_dev);
+}
+


How does it help if we move qdev_hotplug check outside "do_pci_register_device"?
I suppose you want the code to run after the "realize" function?
If so, what happens if a "realize" function of another device needs the 
bus_master_as
(at hotplug time)?


My unmature understanding is that, we can just call
pci_device_iommu_address_space() if device realization really needs
the address space, rather than using bus_master_as.



Yes, each device that support IOMMU needs to call it instead of bus_master_as.
As if it really needs this information at realize time is debatable.
The vfio-pci is a special case and Alex Williamson explain why it is needed
at the "realize" time.



An example is vfio-pci device. It is using
pci_device_iommu_address_space() in vfio_realize(). Though I _guess_
there may be other reasons behind, e.g., VFIOAddressSpace should be
1:1 mapped to bus master address space, so we may want to make sure
this address space will be the same when different devices are using
the same address space (while bus_master_as is one per device, even if
two devices have the same backend address space, there will be two
distinct bus_master_as).



Understood


IIUC, bus_master_as is only used to emulate the master bit in PCI
command register. In that case, that should only be there for the
guest operations, while imho device realization is "emulation code",
which can directly use pci_device_iommu_address_space(). Actually, if
it plays with bus_master_as even if it can, I guess it'll just fail
since that region has not yet been enabled.

Please kindly correct me if I made a mistake...



You are right the device does not need the bus master address space before the
VM is up, this is we the init process of this region is delayed until machine 
done.


My question was why do you need to move pci_init_bus_master after device 
realization
at hotplug time and how would influence the other devices.


Thanks,
Marcel


Thanks,

-- peterx






[Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Thomas Huth

 Hi everybody,

what will be the next version of QEMU after 2.9? Will we go for a 2.10
(as I've seen it mentioned a couple of times on the mailing list
already), or do we dare to switch to 3.0 instead?

I personally dislike two-digit minor version numbers like 2.10 since the
non-experienced users sometimes mix it up with 2.1 ... and there have
been a couple of new cool features in the past releases that would
justify a 3.0 now, too, I think.

But anyway, the more important thing that keeps me concerned is: Someone
 once told me that we should get rid of old parameters and interfaces
(like HMP commands) primarily only when we're changing to a new major
version number. As you all know, QEMU has a lot of legacy options, which
are likely rather confusing than helpful for the new users nowadays,
e.g. things like the "-net channel" option (which is fortunately even
hardly documented), but maybe also even the whole vlan/hub concept in
the net code, or legacy parameters like "-usbdevice". If we switch to
version 3.0, could we agree to remove at least some of them?

 Thomas



Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Julian Kirsch
Hi Eric,

thank you for your comments. My answers are inlined.

On 08.03.2017 03:36, Eric Blake wrote:

> 
> I'm just focusing on the QMP interface portion of this.
> 
> Is any of this information...
> 
>> This patch moves the logic of the rdmsr and wrmsr functions to helper.c and
>> replaces their current versions in misc_helper.c with stubs calling the new
>> functions. The ordering of MSRs now loosely follows the ordering used in the 
>> KVM
>> module. As qemu operates on cached values in the CPUX86State struct, the 
>> msr-set
>> command is implemented in a hackish way: In order to force qemu to flush the 
>> new
>> values to KVM a call to cpu_synchronize_post_init is triggered, eventually
>> ending up in calling kvm_put_msrs at KVM_PUT_FULL_STATE level. As MSR writes
>> could *still* be caught by the code logic in this function, the msr-set 
>> function
>> reads back the values written to determine whether the write was successful.
>> This way, we don't have to duplicate the logic used in kvm_put_msrs 
>> (has_msr_XXX)
>> to x86_cpu_wrmsr.
>> There are several things I would like to pooint out about this patch:
>>   - The "msr-list" command currently displays MSR values for all virtual 
>> cpus.
>> This is somewhat inconsistent with "info registers" just displaying the
>> value of the current default cpu. One might think about just displaying 
>> the
>> current value and offer access to other CPU's MSRs by means of switching
>> between CPUs using the "cpu" monitor command.
>>   - The new version of x86_cpu_rdmsr exposes MSRs that are arguably of
>> questionable help for any human / tool using the monitor. However I 
>> merely
>> felt a deep urge to reflect the full MSR list from kvm.c when writing the
>> code.
>>   - While the need for msr-list is evident (see commit msg), msr-set could be
>> used in more obscure cases. For instance, one might offer a way to access
>> and configure performance counter MSRs of the guest via the hmp. If this
>> feels too much like an inacceptable hack, I'll happily drop the msr-set
>> part.
> 
> ...useful above the --- as part of the commit message?
> 

I tried to explain the issue of qemu not flushing back the msrs to kvm in cpu.c
. The remainder is merely design questions that are imho not important for the
commit message.

> 
>> +++ b/qapi-schema.json
>> @@ -2365,6 +2365,55 @@
>>'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
>>  
>>  ##
>> +# @MsrInfo:
>> +#
>> +# Information about a MSR
>> +#
>> +# @cpu_idx: CPU index
>> +#
>> +# @msr_idx: MSR index
>> +#
>> +# @value: MSR value
>> +#
>> +# Since: 2.8.1
> 
> You've missed 2.8 by a long shot; you've even missed soft freeze for
> 2.9.  This should be 2.10.
> 

Ops. thanks for pointing this out, will update it. I sticked to the latest
version returned by "git tag".

>> +##
>> +{ 'struct': 'MsrInfo',
>> +  'data': {'cpu_idx': 'int', 'msr_idx': 'uint32', 'value': 'uint64'} }
> 
> Please spell new members with '-' rather than '_', as in 'cpu-idx' (or
> even spell it out as 'cpu-index') and 'msr-idx'.
> 

Again, thank you, will take care of it.

>> +
>> +##
>> +# @msr-set:
>> +#
>> +# Set model specific registers (MSRs) on x86
>> +#
>> +# @cpu_idx: CPU holding the MSR that should be written
>> +#
>> +# @msr_idx: MSR index to write
>> +#
>> +# @value: Value to write
>> +#
>> +# Returns: Nothing on success
> 
> Useless Returns: line.
> 

Removed.

Best,
Julian



[Qemu-devel] [PATCH v2] Add inst_dirty_pages_rate in 'info migrate'

2017-03-08 Thread Chao Fan
Auto-converge aims to accelerate migration by slowing down the
generation of dirty pages. But user doesn't know how to determine the
throttle value, so, a new item "inst-dirty-pages-rate" in "info migrate"
would be helpful for user's determination.

Signed-off-by: Chao Fan 

---
v2:
  Update the way to caculate the time.
  Add tag '(since 2.9)' in documentation of qapi-schema.json
---
 hmp.c |  4 
 include/migration/migration.h |  1 +
 include/qemu/bitmap.h | 17 
 migration/migration.c |  2 ++
 migration/ram.c   | 45 +++
 qapi-schema.json  |  4 
 6 files changed, 73 insertions(+)

diff --git a/hmp.c b/hmp.c
index 2bc4f06..c7892ea 100644
--- a/hmp.c
+++ b/hmp.c
@@ -219,6 +219,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "dirty pages rate: %" PRIu64 " pages\n",
info->ram->dirty_pages_rate);
 }
+if (info->ram->inst_dirty_pages_rate) {
+monitor_printf(mon, "inst dirty pages rate: %" PRIu64 " bytes/s\n",
+   info->ram->inst_dirty_pages_rate);
+}
 if (info->ram->postcopy_requests) {
 monitor_printf(mon, "postcopy request count: %" PRIu64 "\n",
info->ram->postcopy_requests);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 1735d66..95f0453 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -164,6 +164,7 @@ struct MigrationState
 int64_t downtime;
 int64_t expected_downtime;
 int64_t dirty_pages_rate;
+int64_t inst_dirty_pages_rate;
 int64_t dirty_bytes_rate;
 bool enabled_capabilities[MIGRATION_CAPABILITY__MAX];
 int64_t xbzrle_cache_size;
diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 63ea2d0..dc99f9b 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -235,4 +235,21 @@ static inline unsigned long *bitmap_zero_extend(unsigned 
long *old,
 return new;
 }
 
+static inline unsigned long bitmap_weight(const unsigned long *src, long nbits)
+{
+unsigned long i, count = 0, nlong = nbits / BITS_PER_LONG;
+
+if (small_nbits(nbits)) {
+return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
+}
+for (i = 0; i < nlong; i++) {
+count += hweight_long(src[i]);
+}
+if (nbits % BITS_PER_LONG) {
+count += hweight_long(src[i] & BITMAP_LAST_WORD_MASK(nbits));
+}
+
+return count;
+}
+
 #endif /* BITMAP_H */
diff --git a/migration/migration.c b/migration/migration.c
index c6ae69d..18fc2ec 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -644,6 +644,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 info->ram->remaining = ram_bytes_remaining();
 info->ram->dirty_pages_rate = s->dirty_pages_rate;
+info->ram->inst_dirty_pages_rate = s->inst_dirty_pages_rate;
 }
 }
 
@@ -1099,6 +1100,7 @@ MigrationState *migrate_init(const MigrationParams 
*params)
 s->downtime = 0;
 s->expected_downtime = 0;
 s->dirty_pages_rate = 0;
+s->inst_dirty_pages_rate = 0;
 s->dirty_bytes_rate = 0;
 s->setup_time = 0;
 s->dirty_sync_count = 0;
diff --git a/migration/ram.c b/migration/ram.c
index f289fcd..7b440fd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -44,6 +44,7 @@
 #include "exec/ram_addr.h"
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
+#include "hw/boards.h"
 
 static int dirty_rate_high_cnt;
 
@@ -590,6 +591,7 @@ static int64_t bytes_xfer_prev;
 static int64_t num_dirty_pages_period;
 static uint64_t xbzrle_cache_miss_prev;
 static uint64_t iterations_prev;
+static int64_t dirty_pages_time_prev;
 
 static void migration_bitmap_sync_init(void)
 {
@@ -598,6 +600,47 @@ static void migration_bitmap_sync_init(void)
 num_dirty_pages_period = 0;
 xbzrle_cache_miss_prev = 0;
 iterations_prev = 0;
+dirty_pages_time_prev = 0;
+}
+
+static void migration_inst_rate(void)
+{
+int64_t dirty_pages_time_now;
+if (!dirty_pages_time_prev) {
+dirty_pages_time_prev = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+}
+dirty_pages_time_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (dirty_pages_time_now > dirty_pages_time_prev + 1000) {
+RAMBlock *block;
+MigrationState *s = migrate_get_current();
+int64_t inst_dirty_pages = 0;
+int64_t i;
+unsigned long *num;
+unsigned long len = 0;
+
+rcu_read_lock();
+DirtyMemoryBlocks *blocks = atomic_rcu_read(
+ &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+if (len == 0) {
+len = block->offset;
+}
+len += block->used_length;
+}
+  

Re: [Qemu-devel] [PATCH v3] intel_iommu: check misordered init when realize

2017-03-08 Thread Paolo Bonzini


- Original Message -
> From: "Marcel Apfelbaum" 
> To: "Michael S. Tsirkin" , "Peter Xu" 
> Cc: qemu-devel@nongnu.org, "Paolo Bonzini" , "yi l liu" 
> , "Jintack Lim"
> , "Jason Wang" , "Alex 
> Williamson" 
> Sent: Thursday, March 2, 2017 7:24:44 AM
> Subject: Re: [PATCH v3] intel_iommu: check misordered init when realize
> 
> On 03/02/2017 07:13 AM, Michael S. Tsirkin wrote:
> > On Thu, Mar 02, 2017 at 11:32:18AM +0800, Peter Xu wrote:
> >> Intel vIOMMU devices are created with "-device" parameter, while here
> >> actually we need to make sure the dmar device be created before other
> >> PCI devices (like vfio-pci, virtio-pci ones) so that we know iommu_fn
> >> will be setup correctly before realizations of those PCI devices (it is
> >> legal that PCI device fetch these info during its realization). Now this
> >> ordering yet cannot be achieved elsewhere, and devices will be created
> >> in the order that user specified. We need to avoid that.
> >>
> >> This patch tries to detect this kind of misordering issue during init of
> >> VT-d device, then report to guest if misordering happened. In the
> >> future, we can provide something better to solve it, e.g., to support
> >> device init ordering, then we can live without this patch.
> >>
> >> Signed-off-by: Peter Xu 
> >
> > Unfortunately with virtio it's a regression, as it used to
> > work with iommu. So I'm afraid we need to look into supporting
> > arbitrary order right now :(
> >
> 
> Hi,
> 
> A fast way to do it is to initialize iommu with a new keyword, like
>  -iommu intel-iommu
> Or maybe use the existing "-object" somehow ?
>  From vl.c comments:
>"Initial object creation happens before all other
> QEMU data types are created "
> 
> Another idea is to add a third run on QEMU cmd line arguments,
> but this would affect the whole system.
> 
> However having an ordering system beats all other ideas.

The ordering should be explicit in the command line.  Since we do not have
cycles, it should be possible.

The best solution would have been to add something like

   -device intel-iommu,id=root-complex-iommu -global 
mch.iommu=root-complex-iommu

but it's too late for that.

Paolo



Re: [Qemu-devel] [PATCH v6 00/10] add blkdebug tests

2017-03-08 Thread Kevin Wolf
Am 08.03.2017 um 03:54 hat Eric Blake geschrieben:
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v6
> 
> v5 was:
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03042.html
> 
> Since then:
> - Rebase to master
> - Pull in Dan's patch that splits test 97 for qcow (patch 1)
> - Address comments from Kevin (includes creating new patch 2, 6)
> - Update patches to reflect that this is now 2.10 material, since it
> missed soft freeze and is not fixing a bug

I think if patches 1 and 2 pass review, we can take at least those into
2.9. Improving test cases is okay during the freeze (and obviously
fixing them is allowed anyway).

Kevin



Re: [Qemu-devel] [PATCH v3] intel_iommu: check misordered init when realize

2017-03-08 Thread Peter Xu
On Wed, Mar 08, 2017 at 03:35:50AM -0500, Paolo Bonzini wrote:

[...]

> The ordering should be explicit in the command line.  Since we do not have
> cycles, it should be possible.

Current QEMU should have no restriction on parameter ordering, right?
Or do we have any existing case that:

  "$QEMU -parameter1 -parameter2"

will work, while...

  "$QEMU -parameter2 -parameter1"

won't work?

Thanks,

> 
> The best solution would have been to add something like
> 
>-device intel-iommu,id=root-complex-iommu -global 
> mch.iommu=root-complex-iommu
> 
> but it's too late for that.

-- peterx



Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Peter Xu
On Wed, Mar 08, 2017 at 10:17:09AM +0200, Marcel Apfelbaum wrote:

[...]

> 
> I am sorry I was not clear enough:
> If a device is added after the system is up (hotplug), we cannot depend on 
> the "machine_done"
> event to enable "bus master".
> This is why we have
>if (qdev_hotplug)
> pci_init_bus_master(pci_dev);
> 
> The code you proposed changes the order, so this call is done *after* realize.
> 
> My question was: What if any other device may require the bus_master_as
> at realize time (and can be hot-plugged) ?
> For example: hcd-ehci/hcd-ohci devices call pci_get_address_space()
> and caches the bus_master_as.

Oh, I didn't notice that there are other devices that used
bus_master_as during realization. If so... Would this really work even
without hot plug? Considering that bus_master_as won't be inited until
machine done phase?

-- peterx



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Paolo Bonzini


- Original Message -
> From: "Jason Wang" 
> To: "Paolo Bonzini" , m...@redhat.com, 
> qemu-devel@nongnu.org
> Cc: pet...@redhat.com
> Sent: Wednesday, March 8, 2017 7:22:06 AM
> Subject: Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset
> 
> 
> 
> On 2017年03月08日 11:21, Jason Wang wrote:
> >
> > On 2017年03月07日 18:55, Paolo Bonzini wrote:
> >>
> >> On 07/03/2017 09:47, Jason Wang wrote:
> >>> We don't destroy region cache during reset which can make the maps
> >>> of previous driver leaked to a buggy or malicious driver that don't
> >>> set vring address before starting to use the device.
> >> I'm still not sure as to how this can happen.  Reset does clear
> >> desc/used/avail, which should then be checked before accessing the
> >> caches.
> >
> > But the code does not check them in fact? (E.g the attached qtest
> > patch can still pass check-qtest).
> >
> > Thanks
> 
> Ok, the reproducer seems wrong. And I think what you mean is something
> like the check done in virtio_queue_ready(). But looks like not all
> virtqueue check for this. One example is virtio_net_handle_ctrl(), and
> there may be even more. So you want to fix them all?

Why would virtio_net_handle_ctrl be called when desc == 0?  The checks
are all in common virtio code.

static void virtio_queue_notify_vq(VirtQueue *vq)
{
if (vq->vring.desc && vq->handle_output) {
VirtIODevice *vdev = vq->vdev;

if (unlikely(vdev->broken)) {
return;
}

trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
vq->handle_output(vdev, vq);
}
}


   1440,29   55%
Paolo



Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-03-08 Thread Daniel P. Berrange
On Tue, Mar 07, 2017 at 05:27:55PM -0800, ashish mittal wrote:
> Thanks! There is one more input I need some help with!
> 
> VxHS network library opens a fixed number of connection channels to a
> given host, and all the vdisks (that connect to the same host) share
> these connection channels.
> 
> Therefore, we need to open secure channels to a specific target host
> only once for the first vdisk that connects to that host. All the
> other vdisks that connect to the same target host will share the same
> set of secure communication channels.
> 
> I hope the above scheme is acceptable?

I don't think I'm in favour of such an approach, as it forces a single
QEMU process to use the same privileges for all disks it uses.

Consider an example where a QEMU process has two disks, one shared
readonly disk and one exclusive writable disk, both on the same host.

It is reasonable as an administrator to want to use different credentials
for each of these. ie, they might have a set of well known credentials to
authenticate to get access to the read-only disk, but have a different set
of strictly limited access credentials to get access to the writable disk

Trying to re-use the same connection for multiple cause prevents QEMU from
authenticating with different credentials per disk, so I don't think that
is a good approach - each disk should have totally independant state.

 > 
> If yes, then we have a couple of options to implement this:
> 
> (1) Accept the TLS credentials per vdisk using the previously
> discussed --object tls-creds-x509 mechanism. In this case, if more
> than one vdisk have the same host info, then we will use only the
> first one's creds to set up the secure connection, and ignore the
> others. vdisks that connect to different target hosts will use their
> individual tls-creds-x509 to set up the secure channels. This is, of
> course, not relevant for qemu-img/qemu-io as they can open only one
> vdisk at a time.
> 
> (2) Instead of having a per-vdisk --object tls-creds-x509, have a
> single such argument on the command line for vxhs block device code to
> consume - if that is possible! One way to achieve this could be the
> user/password authentication we discussed earlier, which we could use
> to pass the directory where cert/keys are kept.
> 
> (3) Use the instance UUID, when available, to lookup the cert files
> per instance (i.e. for qemu-kvm), and use the --object tls-creds-x509
> mechanism, when instance UUID is NULL (i.e. qemu-io, qemu-img etc).
> The cert/key files are anyway protected by file permissions in either
> case, so I guess there is no additional security provided by either
> method.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH RFC 1/1] block: Handle NULL options correctly in raw_open

2017-03-08 Thread Kevin Wolf
Am 08.03.2017 um 03:15 hat Dong Jia Shi geschrieben:
> A normal call for raw_open should always pass in a non-NULL @options,
> but for some certain cases (e.g. trying to applying snapshot on a RBD
> image), they call raw_open with a NULL @options right after the calling
> for raw_close.
> 
> Let's take the NULL @options as a sign of trying to do raw_open again,
> and just simply return a success code.
> 
> Signed-off-by: Dong Jia Shi 

I think we rather need to fix bdrv_snapshot_goto() so that it doesn't
pass NULL, but the actual options that were given for the node (i.e.
bs->options).

Kevin



Re: [Qemu-devel] [PATCH v4] backup: allow target without .bdrv_get_info

2017-03-08 Thread Kevin Wolf
Am 28.02.2017 um 20:33 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Currently backup to nbd target is broken, as nbd doesn't have
> .bdrv_get_info realization.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Peter Xu
On Wed, Mar 08, 2017 at 05:09:45PM +0800, Peter Xu wrote:
> On Wed, Mar 08, 2017 at 10:17:09AM +0200, Marcel Apfelbaum wrote:
> 
> [...]
> 
> > 
> > I am sorry I was not clear enough:
> > If a device is added after the system is up (hotplug), we cannot depend on 
> > the "machine_done"
> > event to enable "bus master".
> > This is why we have
> >if (qdev_hotplug)
> > pci_init_bus_master(pci_dev);
> > 
> > The code you proposed changes the order, so this call is done *after* 
> > realize.
> > 
> > My question was: What if any other device may require the bus_master_as
> > at realize time (and can be hot-plugged) ?
> > For example: hcd-ehci/hcd-ohci devices call pci_get_address_space()
> > and caches the bus_master_as.
> 
> Oh, I didn't notice that there are other devices that used
> bus_master_as during realization. If so... Would this really work even
> without hot plug? Considering that bus_master_as won't be inited until
> machine done phase?

Please ignore my question... I think the answer is that these devices
are only caching the pointer of bus_master_as. So it won't really use
the address space before machine_done.

If so, IMHO moving pci_init_bus_master() after device specific
realize() is okay as well then, right?

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Cornelia Huck
On Wed, 8 Mar 2017 11:18:27 +0800
Jason Wang  wrote:

> On 2017年03月07日 18:16, Cornelia Huck wrote:
> > On Tue,  7 Mar 2017 16:47:58 +0800
> > Jason Wang  wrote:
> >
> >> We don't destroy region cache during reset which can make the maps
> >> of previous driver leaked to a buggy or malicious driver that don't
> >> set vring address before starting to use the device. Fix this by
> >> destroy the region cache during reset and validate it before trying to
> >> use them. While at it, also validate address_space_cache_init() during
> >> virtio_init_region_cache() to make sure we have a correct region
> >> cache.
> >>
> >> Signed-off-by: Jason Wang 
> >> ---
> >>   hw/virtio/virtio.c | 88 
> >> ++
> >>   1 file changed, 76 insertions(+), 12 deletions(-)

> >> @@ -190,6 +211,10 @@ static inline uint16_t vring_avail_flags(VirtQueue 
> >> *vq)
> >>   {
> >>   VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
> >>   hwaddr pa = offsetof(VRingAvail, flags);
> >> +if (!caches) {
> >> +virtio_error(vq->vdev, "Cannot map avail flags");
> > I'm not sure that virtio_error is the right thing here; ending up in
> > this function with !caches indicates an error in our logic.
> 
> Probably not, this can be triggered by buggy guest.

I would think that even a buggy guest should not be able to trigger
accesses to vring structures that have not yet been set up. What am I
missing?

> 
> > An assert
> > might be better (and I hope we can sort out all of those errors exposed
> > by the introduction of region caches for 2.9...)
> 
> I thought we should avoid assert as much as possible in this case. But 
> if you and maintainer want an assert, it's also fine.

My personal rule-of-thumb:
- If it is something that can be triggered by the guest, or it is
something that is easily recovered, set the device to broken.
- If it is something that indicates that we messed up our internal
logic, use an assert.

I think arriving here with !caches indicates the second case; but if
there is a way for a guest to trigger this, setting the device to
broken would certainly be better.

> 
> >
> >> +return 0;
> >> +}
> >>   return virtio_lduw_phys_cached(vq->vdev, &caches->avail, pa);
> >>   }
> >>

> >> @@ -1117,6 +1174,15 @@ static enum virtio_device_endian 
> >> virtio_current_cpu_endian(void)
> >>   }
> >>   }
> >>
> >> +static void virtio_virtqueue_reset_region_cache(struct VirtQueue *vq)
> >> +{
> >> +VRingMemoryRegionCaches *caches;
> >> +
> >> +caches = atomic_read(&vq->vring.caches);
> >> +atomic_set(&vq->vring.caches, NULL);
> >> +virtio_free_region_cache(caches);
> > Shouldn't this use rcu to free it? Unconditionally setting caches to
> > NULL feels wrong...
> 
> Right, will switch to use rcu.
> 
> >> +}
> >> +
> >>   void virtio_reset(void *opaque)
> >>   {
> >>   VirtIODevice *vdev = opaque;
> >> @@ -1157,6 +1223,7 @@ void virtio_reset(void *opaque)
> >>   vdev->vq[i].notification = true;
> >>   vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
> >>   vdev->vq[i].inuse = 0;
> >> +virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> > ...especially as you call it in a reset context here.
> >
> >>   }
> >>   }
> >>
> >> @@ -2451,13 +2518,10 @@ static void 
> >> virtio_device_free_virtqueues(VirtIODevice *vdev)
> >>   }
> >>
> >>   for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >> -VRingMemoryRegionCaches *caches;
> >>   if (vdev->vq[i].vring.num == 0) {
> >>   break;
> >>   }
> >> -caches = atomic_read(&vdev->vq[i].vring.caches);
> >> -atomic_set(&vdev->vq[i].vring.caches, NULL);
> >> -virtio_free_region_cache(caches);
> >> +virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> > OTOH, immediate destruction may still be called for during device
> > finalization.
> >
> 
> Right but to avoid code duplication, use rcu unconditionally should be 
> no harm here.

Yes, this should be fine.




Re: [Qemu-devel] [PATCH v3] intel_iommu: check misordered init when realize

2017-03-08 Thread Gerd Hoffmann
> Current QEMU should have no restriction on parameter ordering, right?
> Or do we have any existing case that:
> 
>   "$QEMU -parameter1 -parameter2"
> 
> will work, while...
> 
>   "$QEMU -parameter2 -parameter1"
> 
> won't work?

Well, we have.  If you, for example, add a scsi disk, you have to add a
scsi host adapter first.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Cornelia Huck
On Wed, 8 Mar 2017 14:22:06 +0800
Jason Wang  wrote:

> On 2017年03月08日 11:21, Jason Wang wrote:
> >
> > On 2017年03月07日 18:55, Paolo Bonzini wrote:
> >>
> >> On 07/03/2017 09:47, Jason Wang wrote:
> >>> We don't destroy region cache during reset which can make the maps
> >>> of previous driver leaked to a buggy or malicious driver that don't
> >>> set vring address before starting to use the device.
> >> I'm still not sure as to how this can happen.  Reset does clear
> >> desc/used/avail, which should then be checked before accessing the 
> >> caches.
> >
> > But the code does not check them in fact? (E.g the attached qtest 
> > patch can still pass check-qtest).
> >
> > Thanks 
> 
> Ok, the reproducer seems wrong. And I think what you mean is something 
> like the check done in virtio_queue_ready(). But looks like not all 
> virtqueue check for this. One example is virtio_net_handle_ctrl(), and 

Shouldn't the check for desc in virtio_queue_notify_vq() already take
care of that?

> there may be even more. So you want to fix them all?

Obviously not speaking for Paolo, but I think the virtio core should
have be audited for missing guards.




Re: [Qemu-devel] [PATCH RFC 1/1] block: Handle NULL options correctly in raw_open

2017-03-08 Thread Dong Jia Shi
* Kevin Wolf  [2017-03-08 10:13:46 +0100]:

> Am 08.03.2017 um 03:15 hat Dong Jia Shi geschrieben:
> > A normal call for raw_open should always pass in a non-NULL @options,
> > but for some certain cases (e.g. trying to applying snapshot on a RBD
> > image), they call raw_open with a NULL @options right after the calling
> > for raw_close.
> > 
> > Let's take the NULL @options as a sign of trying to do raw_open again,
> > and just simply return a success code.
> > 
> > Signed-off-by: Dong Jia Shi 
> 
> I think we rather need to fix bdrv_snapshot_goto() so that it doesn't
> pass NULL, but the actual options that were given for the node (i.e.
> bs->options).
I've tried that before the current try. bs->options does not have the
"file" key-value pair, so that leads to a fail too. Should we put "file"
in to the options manually? I noticed that it was removed from
bs->options during the calling of bdrv_open_inherit.

> 
> Kevin
> 

-- 
Dong Jia




Re: [Qemu-devel] [PATCH v4 5/5] xen: use libxendevicemodel when available

2017-03-08 Thread Paul Durrant


> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 07 March 2017 19:14
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org; Stefano
> Stabellini 
> Subject: Re: [PATCH v4 5/5] xen: use libxendevicemodel when available
> 
> On Tue, 7 Mar 2017, Paul Durrant wrote:
> > This patch modifies the wrapper functions in xen_common.h to use the
> > new xendevicemodel interface if it is available along with compatibility
> > code to use the old libxenctrl interface if it is not.
> >
> > Signed-off-by: Paul Durrant 
> > Reviewed-by: Anthony Perard 
> 
> Reviewed-by: Stefano Stabellini 
> 

Great. Thanks, and thanks for finding the compat issues.

> However, QEMU is in soft-freeze, so I'll wait until the merge window
> open again before sending the pull request.
> 

Ok.

Cheers,

  Paul

> 
> > Cc: Stefano Stabellini 
> >
> > v4:
> > - Fix typo causing build failures on < 490
> > - Fix building on < 450
> > - Build-tested against 4.2.5 and 4.8.0
> >
> > v3:
> > - Switch from macros to static inlines.
> >
> > v2:
> > - Add a compat define for xenforeignmemory_close() since this is now
> >   used.
> > ---
> >  include/hw/xen/xen_common.h | 203
> +---
> >  xen-common.c|   8 ++
> >  2 files changed, 178 insertions(+), 33 deletions(-)
> >
> > diff --git a/include/hw/xen/xen_common.h
> b/include/hw/xen/xen_common.h
> > index 31cf25f..274accc 100644
> > --- a/include/hw/xen/xen_common.h
> > +++ b/include/hw/xen/xen_common.h
> > @@ -9,6 +9,7 @@
> >  #undef XC_WANT_COMPAT_EVTCHN_API
> >  #undef XC_WANT_COMPAT_GNTTAB_API
> >  #undef XC_WANT_COMPAT_MAP_FOREIGN_API
> > +#undef XC_WANT_COMPAT_DEVICEMODEL_API
> >
> >  #include 
> >  #include 
> > @@ -26,48 +27,183 @@ extern xc_interface *xen_xc;
> >   * We don't support Xen prior to 4.2.0.
> >   */
> >
> > +#if CONFIG_XEN_CTRL_INTERFACE_VERSION < 490
> > +
> > +typedef xc_interface xendevicemodel_handle;
> > +
> > +static inline xendevicemodel_handle *xendevicemodel_open(
> > +struct xentoollog_logger *logger, unsigned int open_flags)
> > +{
> > +return xen_xc;
> > +}
> > +
> > +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 450
> > +
> > +static inline int xendevicemodel_create_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, int handle_bufioreq,
> > +ioservid_t *id)
> > +{
> > +return xc_hvm_create_ioreq_server(dmod, domid, handle_bufioreq,
> > +  id);
> > +}
> > +
> > +static inline int xendevicemodel_get_ioreq_server_info(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
> > +xen_pfn_t *ioreq_pfn, xen_pfn_t *bufioreq_pfn,
> > +evtchn_port_t *bufioreq_port)
> > +{
> > +return xc_hvm_get_ioreq_server_info(dmod, domid, id, ioreq_pfn,
> > +bufioreq_pfn, bufioreq_port);
> > +}
> > +
> > +static inline int xendevicemodel_map_io_range_to_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, int
> is_mmio,
> > +uint64_t start, uint64_t end)
> > +{
> > +return xc_hvm_map_io_range_to_ioreq_server(dmod, domid, id,
> is_mmio,
> > +   start, end);
> > +}
> > +
> > +static inline int xendevicemodel_unmap_io_range_from_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, int
> is_mmio,
> > +uint64_t start, uint64_t end)
> > +{
> > +return xc_hvm_unmap_io_range_from_ioreq_server(dmod, domid, id,
> is_mmio,
> > +   start, end);
> > +}
> > +
> > +static inline int xendevicemodel_map_pcidev_to_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
> > +uint16_t segment, uint8_t bus, uint8_t device, uint8_t function)
> > +{
> > +return xc_hvm_map_pcidev_to_ioreq_server(dmod, domid, id,
> segment,
> > + bus, device, function);
> > +}
> > +
> > +static inline int xendevicemodel_unmap_pcidev_from_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id,
> > +uint16_t segment, uint8_t bus, uint8_t device, uint8_t function)
> > +{
> > +return xc_hvm_unmap_pcidev_from_ioreq_server(dmod, domid, id,
> segment,
> > + bus, device, function);
> > +}
> > +
> > +static inline int xendevicemodel_destroy_ioreq_server(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id)
> > +{
> > +return xc_hvm_destroy_ioreq_server(dmod, domid, id);
> > +}
> > +
> > +static inline int xendevicemodel_set_ioreq_server_state(
> > +xendevicemodel_handle *dmod, domid_t domid, ioservid_t id, int
> enabled)
> > +{
> > +return xc_hvm_set_ioreq_server_state(dmod, domid, id, enabled);
> > +}
> > +
> > +#endif /* CONFIG_XEN_CTRL_INTERFACE_VERSION >= 450 */
> > +
> > +static inline int xendevicemodel_set_pci_intx_level(
> > +xendevi

Re: [Qemu-devel] [PULL v4 00/24] block: Command line option -blockdev

2017-03-08 Thread Peter Maydell
On 7 March 2017 at 16:20, Markus Armbruster  wrote:
> Actually, the command line option is the least part of this series.
> Its bulk is about building infrastructure and getting errors out of
> the JSON parser.
>
> The design of the command line interface was discussed here:
> Subject: Non-flat command line option argument syntax
> Message-ID: <87bmukmlau@dusky.pond.sub.org>
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00555.html
>
> v4: PATCH 03: One-liner to placate clang sanitizer
> v3: A few commit messages touched up, code unchanged
>
> The following changes since commit ff79d5e939c38677a575e3493eb9b4d36eb21865:
>
>   Merge remote-tracking branch 'remotes/xtensa/tags/20170306-xtensa' into 
> staging (2017-03-07 09:57:14 +)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-block-2017-02-28-v4
>
> for you to fetch changes up to 0b2c1beea4358e40d1049b8ee019408ce96b37ce:
>
>   keyval: Support lists (2017-03-07 16:07:48 +0100)
>


Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Jason Wang



On 2017年03月08日 17:10, Paolo Bonzini wrote:


- Original Message -

From: "Jason Wang" 
To: "Paolo Bonzini" , m...@redhat.com, 
qemu-devel@nongnu.org
Cc: pet...@redhat.com
Sent: Wednesday, March 8, 2017 7:22:06 AM
Subject: Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset



On 2017年03月08日 11:21, Jason Wang wrote:

On 2017年03月07日 18:55, Paolo Bonzini wrote:

On 07/03/2017 09:47, Jason Wang wrote:

We don't destroy region cache during reset which can make the maps
of previous driver leaked to a buggy or malicious driver that don't
set vring address before starting to use the device.

I'm still not sure as to how this can happen.  Reset does clear
desc/used/avail, which should then be checked before accessing the
caches.

But the code does not check them in fact? (E.g the attached qtest
patch can still pass check-qtest).

Thanks

Ok, the reproducer seems wrong. And I think what you mean is something
like the check done in virtio_queue_ready(). But looks like not all
virtqueue check for this. One example is virtio_net_handle_ctrl(), and
there may be even more. So you want to fix them all?

Why would virtio_net_handle_ctrl be called when desc == 0?  The checks
are all in common virtio code.

static void virtio_queue_notify_vq(VirtQueue *vq)
{
 if (vq->vring.desc && vq->handle_output) {
 VirtIODevice *vdev = vq->vdev;

 if (unlikely(vdev->broken)) {
 return;
 }

 trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
 vq->handle_output(vdev, vq);
 }
}


1440,29   55%
Paolo


Right, I miss this.

But I find two possible leaks by auditing the callers of virtqueue_pop():

virtio_input_send() and virtio_balloon_set_status() which will call 
virtio_balloon_receive_stats().


Thanks



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Jason Wang



On 2017年03月08日 17:19, Cornelia Huck wrote:

On Wed, 8 Mar 2017 11:18:27 +0800
Jason Wang  wrote:


On 2017年03月07日 18:16, Cornelia Huck wrote:

On Tue,  7 Mar 2017 16:47:58 +0800
Jason Wang  wrote:


We don't destroy region cache during reset which can make the maps
of previous driver leaked to a buggy or malicious driver that don't
set vring address before starting to use the device. Fix this by
destroy the region cache during reset and validate it before trying to
use them. While at it, also validate address_space_cache_init() during
virtio_init_region_cache() to make sure we have a correct region
cache.

Signed-off-by: Jason Wang 
---
   hw/virtio/virtio.c | 88 
++
   1 file changed, 76 insertions(+), 12 deletions(-)
@@ -190,6 +211,10 @@ static inline uint16_t vring_avail_flags(VirtQueue *vq)
   {
   VRingMemoryRegionCaches *caches = atomic_rcu_read(&vq->vring.caches);
   hwaddr pa = offsetof(VRingAvail, flags);
+if (!caches) {
+virtio_error(vq->vdev, "Cannot map avail flags");

I'm not sure that virtio_error is the right thing here; ending up in
this function with !caches indicates an error in our logic.

Probably not, this can be triggered by buggy guest.

I would think that even a buggy guest should not be able to trigger
accesses to vring structures that have not yet been set up. What am I
missing?


I think this may happen if a driver start the device without setting the 
vring address. In this case, region caches cache the mapping of previous 
driver. But maybe I was wrong.





An assert
might be better (and I hope we can sort out all of those errors exposed
by the introduction of region caches for 2.9...)

I thought we should avoid assert as much as possible in this case. But
if you and maintainer want an assert, it's also fine.

My personal rule-of-thumb:
- If it is something that can be triggered by the guest, or it is
something that is easily recovered, set the device to broken.
- If it is something that indicates that we messed up our internal
logic, use an assert.

I think arriving here with !caches indicates the second case; but if
there is a way for a guest to trigger this, setting the device to
broken would certainly be better.


Yes, broken seems better.

Thanks



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Jason Wang



On 2017年03月08日 17:30, Cornelia Huck wrote:

On Wed, 8 Mar 2017 14:22:06 +0800
Jason Wang  wrote:


On 2017年03月08日 11:21, Jason Wang wrote:

On 2017年03月07日 18:55, Paolo Bonzini wrote:

On 07/03/2017 09:47, Jason Wang wrote:

We don't destroy region cache during reset which can make the maps
of previous driver leaked to a buggy or malicious driver that don't
set vring address before starting to use the device.

I'm still not sure as to how this can happen.  Reset does clear
desc/used/avail, which should then be checked before accessing the
caches.

But the code does not check them in fact? (E.g the attached qtest
patch can still pass check-qtest).

Thanks

Ok, the reproducer seems wrong. And I think what you mean is something
like the check done in virtio_queue_ready(). But looks like not all
virtqueue check for this. One example is virtio_net_handle_ctrl(), and

Shouldn't the check for desc in virtio_queue_notify_vq() already take
care of that?


Yes, I miss this.




there may be even more. So you want to fix them all?

Obviously not speaking for Paolo, but I think the virtio core should
have be audited for missing guards.



Probably, otherwise we need a careful auditing of all callers of caches.

Thanks



Re: [Qemu-devel] RAMBlock's named ""

2017-03-08 Thread Dr. David Alan Gilbert
* Peter Maydell (peter.mayd...@linaro.org) wrote:
> On 7 March 2017 at 20:46, Dr. David Alan Gilbert  wrote:
> > The real fun is that there doesn't seem to be anything that stops
> > two blocks having the same name!
> 
> The memory region API says that the name is for debugging only,
> so the problem is in code which relies on them being unique :-)

I'll fix that comment!
However, the problem here is slightly different; I think the Memory regions
do have names - the problem in this case is that the RAMBlock's aren't
attached to any memory regions.

Dave

> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Jason Wang



On 2017年03月08日 16:17, Marcel Apfelbaum wrote:

On 03/08/2017 05:15 AM, Jason Wang wrote:



On 2017年03月08日 10:43, Peter Xu wrote:

On Tue, Mar 07, 2017 at 02:47:30PM +0200, Marcel Apfelbaum wrote:

On 03/07/2017 11:09 AM, Jason Wang wrote:

After commit 96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU
after caching ring translations"), IOMMU was required to be 
created in

advance. This is because we can only get the correct dma_as after pci
IOMMU (e.g intel_iommu) was initialized. This is suboptimal and
inconvenient for user. This patch releases this by:

- introduce a bus_master_ready method for PCIDeviceClass and trigger
  this during pci_init_bus_master
- implement virtio-pci method and 1) reset the dma_as 2) re-register
  the memory listener to the new dma_as


Hi Jason,


Cc: Paolo Bonzini 
Signed-off-by: Jason Wang 
---
Changes from V2:
- delay pci_init_bus_master() after pci device is realized to make
  bus_master_ready a more generic method
---
hw/pci/pci.c   | 11 ---
hw/virtio/virtio-pci.c |  9 +
hw/virtio/virtio.c | 19 +++
include/hw/pci/pci.h   |  1 +
include/hw/virtio/virtio.h |  1 +
5 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 273f1e4..22e6ad9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -83,6 +83,7 @@ static const VMStateDescription vmstate_pcibus = {
static void pci_init_bus_master(PCIDevice *pci_dev)
{
 AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
+PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);

memory_region_init_alias(&pci_dev->bus_master_enable_region,
  OBJECT(pci_dev), "bus master",
@@ -90,6 +91,9 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
 address_space_init(&pci_dev->bus_master_as,
&pci_dev->bus_master_enable_region, pci_dev->name);
+if (pc->bus_master_ready) {
+pc->bus_master_ready(pci_dev);
+}
}

static void pcibus_machine_done(Notifier *notifier, void *data)
@@ -995,9 +999,6 @@ static PCIDevice 
*do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,

 pci_dev->devfn = devfn;
 pci_dev->requester_id_cache = pci_req_id_cache_get(pci_dev);

-if (qdev_hotplug) {
-pci_init_bus_master(pci_dev);
-}
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 pci_dev->irq_state = 0;
 pci_config_alloc(pci_dev);
@@ -1995,6 +1996,10 @@ static void pci_qdev_realize(DeviceState 
*qdev, Error **errp)

 }
 }

+if (qdev_hotplug) {
+pci_init_bus_master(pci_dev);
+}
+
How does it help if we move qdev_hotplug check outside 
"do_pci_register_device"?

I suppose you want the code to run after the "realize" function?


Yes.

If so, what happens if a "realize" function of another device needs 
the bus_master_as

(at hotplug time)?


I'm not sure this is really needed. What kind of device need to check 
hotplug during their realize? (Looks like we don't have such kind of 
device now).


I am sorry I was not clear enough:
If a device is added after the system is up (hotplug), we cannot 
depend on the "machine_done"

event to enable "bus master".
This is why we have
   if (qdev_hotplug)
pci_init_bus_master(pci_dev);

The code you proposed changes the order, so this call is done *after* 
realize.


My question was: What if any other device may require the bus_master_as
at realize time (and can be hot-plugged) ?
For example: hcd-ehci/hcd-ohci devices call pci_get_address_space()
and caches the bus_master_as.

It is possible the mentioned devices can't be hot-plugged or
are not relevant for PCIe.

I am only saying we should double check when making such a change.


Can't agree more.

For hcd-ehci/hcd-ohci, simply caching the as should be fine. But if they 
want to more things like virito, they should delay it to bus_master_ready.







My unmature understanding is that, we can just call
pci_device_iommu_address_space() if device realization really needs
the address space, rather than using bus_master_as.


A little issue is pci_device_iommu_address_space() can be wrong if it 
was called before OMMU was created.




At hotplug time this is irrelevant because the iommu has already been 
created.


Thanks,
Marcel


Yes.

Thanks



Re: [Qemu-devel] [Qemu-trivial] [PATCH for-2.9?] tests: Ignore more test executables

2017-03-08 Thread Laurent Vivier
and tests/test-arm-mptimer?

Laurent

On 07/03/2017 19:45, Eric Blake wrote:
> Ignore test executables when building in-tree:
> test-crypto-hmac introduced in commit 4fd460b
> test-aio-multithread introduced in commit 0c330a7
> 
> Signed-off-by: Eric Blake 
> ---
> 
> Doesn't affect the built binaries, but does make it harder to
> accidentally commit an unintended binary when doing 'git add .'.
> Therefore, it's up to the maintainer if this is 2.9 or 2.10
> material.
> 
>  tests/.gitignore | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/.gitignore b/tests/.gitignore
> index dc37519..71521ac 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,7 @@ check-qom-proplist
>  qht-bench
>  rcutorture
>  test-aio
> +test-aio-multithread
>  test-base64
>  test-bitops
>  test-bitcnt
> @@ -24,6 +25,7 @@ test-crypto-afsplit
>  test-crypto-block
>  test-crypto-cipher
>  test-crypto-hash
> +test-crypto-hmac
>  test-crypto-ivgen
>  test-crypto-pbkdf
>  test-crypto-secret
> 




Re: [Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Peter Maydell
On 8 March 2017 at 09:26, Thomas Huth  wrote:
> But anyway, the more important thing that keeps me concerned is: Someone
>  once told me that we should get rid of old parameters and interfaces
> (like HMP commands) primarily only when we're changing to a new major
> version number. As you all know, QEMU has a lot of legacy options, which
> are likely rather confusing than helpful for the new users nowadays,
> e.g. things like the "-net channel" option (which is fortunately even
> hardly documented), but maybe also even the whole vlan/hub concept in
> the net code, or legacy parameters like "-usbdevice". If we switch to
> version 3.0, could we agree to remove at least some of them?

I think if we are going to deprecate and remove options we need
a clear transition plan for doing so, which means at least one
release where options are "still works, but warn that they
are going away with pointer to documentation or similar info
about their replacement syntax", before actually dropping them.

(Similar applies if we want to deprecate and remove old and
unmaintained machine models.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3] intel_iommu: check misordered init when realize

2017-03-08 Thread Jason Wang



On 2017年03月08日 16:35, Paolo Bonzini wrote:


- Original Message -

From: "Marcel Apfelbaum" 
To: "Michael S. Tsirkin" , "Peter Xu" 
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" , "yi l liu" 
, "Jintack Lim"
, "Jason Wang" , "Alex Williamson" 

Sent: Thursday, March 2, 2017 7:24:44 AM
Subject: Re: [PATCH v3] intel_iommu: check misordered init when realize

On 03/02/2017 07:13 AM, Michael S. Tsirkin wrote:

On Thu, Mar 02, 2017 at 11:32:18AM +0800, Peter Xu wrote:

Intel vIOMMU devices are created with "-device" parameter, while here
actually we need to make sure the dmar device be created before other
PCI devices (like vfio-pci, virtio-pci ones) so that we know iommu_fn
will be setup correctly before realizations of those PCI devices (it is
legal that PCI device fetch these info during its realization). Now this
ordering yet cannot be achieved elsewhere, and devices will be created
in the order that user specified. We need to avoid that.

This patch tries to detect this kind of misordering issue during init of
VT-d device, then report to guest if misordering happened. In the
future, we can provide something better to solve it, e.g., to support
device init ordering, then we can live without this patch.

Signed-off-by: Peter Xu 

Unfortunately with virtio it's a regression, as it used to
work with iommu. So I'm afraid we need to look into supporting
arbitrary order right now :(


Hi,

A fast way to do it is to initialize iommu with a new keyword, like
  -iommu intel-iommu
Or maybe use the existing "-object" somehow ?
  From vl.c comments:
"Initial object creation happens before all other
 QEMU data types are created "

Another idea is to add a third run on QEMU cmd line arguments,
but this would affect the whole system.

However having an ordering system beats all other ideas.

The ordering should be explicit in the command line.  Since we do not have
cycles, it should be possible.

The best solution would have been to add something like

-device intel-iommu,id=root-complex-iommu -global 
mch.iommu=root-complex-iommu

but it's too late for that.

Paolo



Yes, it's probably too late other than having workarounds for 2.9. And 
to eliminate similar issues, we may initializing the pci devices by 
traversing the PCI tree from root complex in level order.


Thanks



Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Cornelia Huck
On Wed, 8 Mar 2017 17:51:22 +0800
Jason Wang  wrote:

> On 2017年03月08日 17:19, Cornelia Huck wrote:
> > On Wed, 8 Mar 2017 11:18:27 +0800
> > Jason Wang  wrote:
> >
> >> On 2017年03月07日 18:16, Cornelia Huck wrote:
> >>> On Tue,  7 Mar 2017 16:47:58 +0800
> >>> Jason Wang  wrote:
> >>>
>  We don't destroy region cache during reset which can make the maps
>  of previous driver leaked to a buggy or malicious driver that don't
>  set vring address before starting to use the device. Fix this by
>  destroy the region cache during reset and validate it before trying to
>  use them. While at it, also validate address_space_cache_init() during
>  virtio_init_region_cache() to make sure we have a correct region
>  cache.
> 
>  Signed-off-by: Jason Wang 
>  ---
> hw/virtio/virtio.c | 88 
>  ++
> 1 file changed, 76 insertions(+), 12 deletions(-)
>  @@ -190,6 +211,10 @@ static inline uint16_t vring_avail_flags(VirtQueue 
>  *vq)
> {
> VRingMemoryRegionCaches *caches = 
>  atomic_rcu_read(&vq->vring.caches);
> hwaddr pa = offsetof(VRingAvail, flags);
>  +if (!caches) {
>  +virtio_error(vq->vdev, "Cannot map avail flags");
> >>> I'm not sure that virtio_error is the right thing here; ending up in
> >>> this function with !caches indicates an error in our logic.
> >> Probably not, this can be triggered by buggy guest.
> > I would think that even a buggy guest should not be able to trigger
> > accesses to vring structures that have not yet been set up. What am I
> > missing?
> 
> I think this may happen if a driver start the device without setting the 
> vring address. In this case, region caches cache the mapping of previous 
> driver. But maybe I was wrong.

So the sequence would be:

- Driver #1 sets up the device, then abandons it without cleaning up
via reset
- Driver #2 uses the device without doing a reset or proper setup

?

I can't quite see why caches would be NULL in that case...

Having reset clean up the caches as well (IOW, the other part of your
patch) should make sure that we always have a consistent view of
descriptors and their caches, I think. The checks for desc != NULL and
friends should catch other driver buggyness.




Re: [Qemu-devel] [PATCH] virtio: destroy region cache during reset

2017-03-08 Thread Cornelia Huck
On Wed, 8 Mar 2017 17:53:23 +0800
Jason Wang  wrote:

> On 2017年03月08日 17:30, Cornelia Huck wrote:
> > On Wed, 8 Mar 2017 14:22:06 +0800
> > Jason Wang  wrote:

> >> there may be even more. So you want to fix them all?
> > Obviously not speaking for Paolo, but I think the virtio core should
> > have be audited for missing guards.
> >
> 
> Probably, otherwise we need a careful auditing of all callers of caches.

Yes, especially as all direct calls to the caches are localized in the
core anyway.




Re: [Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Daniel P. Berrange
On Wed, Mar 08, 2017 at 09:26:00AM +0100, Thomas Huth wrote:
> 
>  Hi everybody,
> 
> what will be the next version of QEMU after 2.9? Will we go for a 2.10
> (as I've seen it mentioned a couple of times on the mailing list
> already), or do we dare to switch to 3.0 instead?
> 
> I personally dislike two-digit minor version numbers like 2.10 since the
> non-experienced users sometimes mix it up with 2.1 ... and there have
> been a couple of new cool features in the past releases that would
> justify a 3.0 now, too, I think.

If we need to debate whether to switch major version numbers, it is a sign
we should set out a clearer policy on version numbering :-)

AFAIK, QEMU's never documented semantics for changes in the major number
part of the version. I don't think the 1.0 or 2.0 releases were particularly
significant in what they introduced/changed, as opposed to many of the .x
releases. It was a fairly arbitrary decision to bump the major version.

libvirt suffered similar lack of clarity around when to bump major version
number as opposed to minor version. To address this we recently adopted the
rule[1] that major version number changes have no relation to features. The
major number is simply incremented at the start of each calendar year.

> But anyway, the more important thing that keeps me concerned is: Someone
>  once told me that we should get rid of old parameters and interfaces
> (like HMP commands) primarily only when we're changing to a new major
> version number. As you all know, QEMU has a lot of legacy options, which
> are likely rather confusing than helpful for the new users nowadays,
> e.g. things like the "-net channel" option (which is fortunately even
> hardly documented), but maybe also even the whole vlan/hub concept in
> the net code, or legacy parameters like "-usbdevice". If we switch to
> version 3.0, could we agree to remove at least some of them?

This would, by inference, say that increments of major part of the version
number indicate backwards incompatible changes in QEMU API.

That is certainly one option for a rule around major version number changes
but is not one QEMU seems to have traditionally followed. The release notes
show plenty of incompatible changes in arbitrary .x releases.

>From the POV of libvirt, I don't think saying that .0 releases have
incompatible changes is particularly useful in itself. What *is* useful
is to have a clear rule around a deprecation cycle. ie, a rule that says
a feature will be marked deprecated for 3 releases or 12 months, before it
is permitted to be removed/changed. If that were followed, there is no
need to batch up such changes in a .0 release IMHO.

Regards,
Daniel

[1] http://libvirt.org/downloads.html#numbering
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] RAMBlock's named ""

2017-03-08 Thread Igor Mammedov
On Tue, 7 Mar 2017 19:46:56 +
"Dr. David Alan Gilbert"  wrote:

> We seem to have a couple of weird cases where we end up with
> RAMBlocks with no name;  I think they'll badly confuse
> the migration code, but I don't quite understand how they're
> happening.
> 
> 1) device_del e1000e
> 2) -object memory-backend-file  without wiring it up
> 
> I added some debug into migration/ram.c ram_save_setup to
> dump the names it was seeing in it's FOREACH.
> 
> 1)
>   (from https://bugzilla.redhat.com/show_bug.cgi?id=1425273)
>   The simplest reproducer of this is:
> 
> ./qemu-system-x86_64 -nographic  -device e1000e,id=foo -m 1G -M pc,accel=kvm 
> my.img
> 
>   with a Linux image and after it's booted do a device_del foo
> 
>   migration at that point sees an empty RAMBlock that was the ROM
>   associated with the device.  This doesn't happen on an e1000 device,
>   so I've not figured out what the difference is.
> 
>   This gives a : Unknown ramblock "", cannot accept migration
>   on the destination (correctly).
> 
>   (This happens on 2.7.0 as well, so it's nothing new)
> 
> 2)
>   ./qemu-system-x86_64 -nographic -object 
> memory-backend-file,id=mem,size=512M,mem-path=/tmp
> 
>   Note I've not wired that memory to a NUMA node or a DIMM or anything, so
>   it's just hanging around.
>   Again that RAMBlock does exist and shows up in the migration code,
>   I think it'll try and migrate it.
it has to be registered with vmstate_register_ram() which
doesn't happen for non used hostmem object.
See:
  pc_dimm_memory_plug()
and
  memory_region_allocate_system_memory()

> The real fun is that there doesn't seem to be anything that stops
> two blocks having the same name!
code doesn't permit duplicate ids for -object created objects
but memory region api doesn't care about it as long as
memory_region name is unique child name within its parent object
children namespace.

you can do a check for empty / duplicate names at ram_block_add()
time and fail gracefully, but that probably will break things as
it hasn't been expected behavior before.


> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[Qemu-devel] [PATCH] memory: reduce heap Rss size around 3M

2017-03-08 Thread Yang Zhong
Since cpu-memory and memory have same address space,one malloced
memory is enough. This patch will skip memory malloc for memory
address space,which will reduce around 3M physical memory in heap.

Signed-off-by: Yang Zhong 
---
 memory.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index 284894b..799ca4c 100644
--- a/memory.c
+++ b/memory.c
@@ -2422,8 +2422,10 @@ AddressSpace *address_space_init_shareable(MemoryRegion 
*root, const char *name)
 AddressSpace *as;
 
 QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-if (root == as->root && as->malloced) {
-as->ref_count++;
+if (root == as->root) {
+if (as->malloced) {
+as->ref_count++;
+}
 return as;
 }
 }
-- 
1.9.1




[Qemu-devel] Host kernel panic when shutdown the pass-through QLogic HBA Card VM

2017-03-08 Thread Gaofeng (GaoFeng, Euler)
The pass-through VM make the host kernel panic when it shutdown.
I read the kernel code and find the reason of panic is :
BUG_ON(domain_type_is_vm_or_si(domain));
in domain_get_iommu function.

The normal logic does not come to this function, it should be return when 
__intel_map_single call function iommu_no_mapping.
I tried the qemu-2.8.0 upstream version, without this problem.
Is there have some patch fix this bug after qemu-2.6.0?

Thanks

Qemu version:2.6.0
Libvirt: 1.3.4
VM OS: rhel-server-7.1-x86_64
Host Kernel version: 3.10.0-327
PCI Device:
0002:82:00.0 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI 
Express HBA (rev 02)
0002:82:00.1 Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI 
Express HBA (rev 02)

0002:82:00.0 and 0002:82:00.1 in the same iommu_group.

VM config XML:


  redhat_7.1_64
  4194304
  4194304
  18
  
/machine
  
  
hvm

  
  



  
  

  
  
  destroy
  restart
  destroy
  
/usr/bin/qemu-kvm

  
  
  


  
  
  
  


  

  
  


  

  
  





  


  


  


Dump info:
[40473.205379] qla2xxx 0002:82:00.0: enabling device (0400 -> 0402)
[40473.205867] qla2xxx [0002:82:00.0]-001d: : Found an ISP2532 irq 73 iobase 
0xc9016c222000.
[40473.216686] [ cut here ]
[40473.223384] kernel BUG at drivers/iommu/intel-iommu.c:598!
[40473.230916] invalid opcode:  [#1] SMP
[40473.239620] collected_len = 625223, LOG_BUF_LEN_LOCAL = 1048576
[40473.316996] kbox: no notify die func register. no need to notify
[40473.325313] do nothing after die!
[40473.330993] Modules linked in: qla2xxx scsi_transport_fc scsi_tgt ext4 jbd2 
dev_connlimit(O) hotpatch(OE) bum(O) ip_set nfnetlink prio(O) nat(O) 
vport_vxlan(O) openvswitch(O) nf_defrag_ipv6 gre signo_catch(O) kboxdriver(O) 
pmcint(O) kbox(O) ipmi_devintf ipmi_si ipmi_msghandler iTCO_wdt 
iTCO_vendor_support kvm_intel(O) kvm(O) coretemp intel_rapl crc32_pclmul 
crc32c_intel ghash_clmulni_intel ixgbe(O) aesni_intel lrw gf128mul glue_helper 
ablk_helper cryptd pcspkr vxlan ip6_udp_tunnel udp_tunnel vfat fat igb ptp 
pps_core i2c_algo_bit dca sb_edac edac_core i2c_i801 i2c_core ses enclosure sg 
shpchp lpc_ich mfd_core mei_me mei wmi nf_conntrack_ipv4 nf_defrag_ipv4 
vhost_net(O) tun(O) vhost(O) macvtap macvlan vfio_pci irqbypass 
vfio_iommu_type1 vfio xt_sctp nf_conntrack_proto_sctp nf_nat_proto_sctp nf_nat
[40473.431127]  nf_conntrack sctp libcrc32c ip_tables ext3 mbcache jbd sd_mod 
crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common ahci libahci 
libata usb_storage megaraid_sas dm_mod [last unloaded: scsi_tgt]
[40473.463456] CPU: 180 PID: 3910 Comm: kworker/180:1 Tainted: G   OE  
 ---   3.10.0-327.44.58.19_7.x86_64 #1
[40473.484945] Hardware name: To be filled by O.E.M. 9032/IT91SMUB, BIOS 
BLXSV105 07/05/2016
[40473.498740] Workqueue: events work_for_cpu_fn
[40473.508815] task: 921f51fff300 ti: 921f50ed task.ti: 
921f50ed
[40473.522274] RIP: 0010:[]  [] 
domain_get_iommu+0x44/0x50
[40473.536839] RSP: 0018:921f50ed3bb0  EFLAGS: 00010202
[40473.548450] RAX:  RBX: 9a1f51b5f098 RCX: 
[40473.561971] RDX:  RSI: 921f50065dc0 RDI: 8a1f45257200
[40473.575460] RBP: 921f50ed3bf8 R08: 00019620 R09: c9000147d620
[40473.589074] R10: 8150e805 R11: ea287d401940 R12: 
[40473.602811] R13: 0a1f4adc6000 R14: 8a1f45257200 R15: 2000
[40473.616629] FS:  () GS:c90001464000() 
knlGS:
[40473.631594] CS:  0010 DS:  ES:  CR0: 80050033
[40473.644306] CR2: 7fb8b2fff7b0 CR3: 0295a000 CR4: 001407e0
[40473.658592] DR0:  DR1:  DR2: 
[40473.672952] DR3:  DR6: fffe0ff0 DR7: 0400
[40473.687339] Stack:
[40473.696624]  81512b18   
2000
[40473.711487]  921f4adc6000 2000 9a1f51b5f098 
921f4fdc01f8
[40473.726250]  0001 921f50ed3c38 81512d42 
302d5d302e30303a
[40473.741050]  921f4fdc 921f50ed3d30 921f50ed3d28 
0200
[40473.755827]  0800 921f50ed3cc0 a0770025 
0010
[40473.770564]  c79f7266 c79f7266 0001 

[40473.785250]  921f4fdc0368 921f4fdc0360 0282 
c79f7266
[40473.799919]  c79f7266 9a1f51b5f000 0800 
0200
[40473.814468]  9a1f51b5f000 921f4fdc 921f50ed3dc8 
a0775285
[40473.828917]  c9016c222000 921f7fff 7ffe 
04060008
[40473.843229]  921f0020 0100 921f0800 
c9000147a8c0

[Qemu-devel] can not enable replaying and miss object type

2017-03-08 Thread myas...@126.com
To record block operations the drive,I use parameters as shown in Figure 1, it 
succeed, but the whole process is very slow.

To replay block operations the drive,I use parameters as shown in Figure 2, 
after entering the startup options, the system stay black screen.

I can't solve this problem, I hope to get some advice.

Both block-record.img and block-replay.img are the same mirror increments.

When I add parameter  "-netdev user,id=net1 -device rtl8139,netdev=net1 -object 
filter-replay,id=replay,netdev=net1", I encountered an error as shown in Figure 
6.

I can't understand why the object type  is missing, I hope I can get help.

 Figure:
1.qemu-system-x86_64  -m 3G -drive file=block-record.img,if=none,id=img-direct 
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay -device 
ide-hd,drive=img-blkreplay   -icount shift=0,rr=record,rrfile=replay-block.bin 
-localtime  -monitor stdio  -vnc :1
2.qemu-system-x86_64  -m 3G -drive file=block-replay.img,if=none,id=img-direct 
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay -device 
ide-hd,drive=img-blkreplay   -icount shift=0,rr=replay,rrfile=replay-block.bin 
-localtime  -monitor stdio  -vnc :1
3.qemu-img create -f qcow2 -b ubuntu11.04.img  block-record.img
4.qemu-img create -f qcow2 -b ubuntu11.04.img block-replay.img
5.-netdev user,id=net1 -device rtl8139,netdev=net1 -object 
filter-replay,id=replay,netdev=net1
6.qemu-system-x86_64: -object filter-replay,id=replay,netdev=net1: invalid 
object type: filter-replay








Re: [Qemu-devel] RAMBlock's named ""

2017-03-08 Thread Dr. David Alan Gilbert
* Igor Mammedov (imamm...@redhat.com) wrote:
> On Tue, 7 Mar 2017 19:46:56 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > We seem to have a couple of weird cases where we end up with
> > RAMBlocks with no name;  I think they'll badly confuse
> > the migration code, but I don't quite understand how they're
> > happening.
> > 
> > 1) device_del e1000e
> > 2) -object memory-backend-file  without wiring it up
> > 
> > I added some debug into migration/ram.c ram_save_setup to
> > dump the names it was seeing in it's FOREACH.
> > 
> > 1)
> >   (from https://bugzilla.redhat.com/show_bug.cgi?id=1425273)
> >   The simplest reproducer of this is:
> > 
> > ./qemu-system-x86_64 -nographic  -device e1000e,id=foo -m 1G -M 
> > pc,accel=kvm my.img
> > 
> >   with a Linux image and after it's booted do a device_del foo
> > 
> >   migration at that point sees an empty RAMBlock that was the ROM
> >   associated with the device.  This doesn't happen on an e1000 device,
> >   so I've not figured out what the difference is.
> > 
> >   This gives a : Unknown ramblock "", cannot accept migration
> >   on the destination (correctly).
> > 
> >   (This happens on 2.7.0 as well, so it's nothing new)
> > 
> > 2)
> >   ./qemu-system-x86_64 -nographic -object 
> > memory-backend-file,id=mem,size=512M,mem-path=/tmp
> > 
> >   Note I've not wired that memory to a NUMA node or a DIMM or anything, so
> >   it's just hanging around.
> >   Again that RAMBlock does exist and shows up in the migration code,
> >   I think it'll try and migrate it.
> it has to be registered with vmstate_register_ram() which
> doesn't happen for non used hostmem object.
> See:
>   pc_dimm_memory_plug()
> and
>   memory_region_allocate_system_memory()
> 
> > The real fun is that there doesn't seem to be anything that stops
> > two blocks having the same name!
> code doesn't permit duplicate ids for -object created objects
> but memory region api doesn't care about it as long as
> memory_region name is unique child name within its parent object
> children namespace.
> 
> you can do a check for empty / duplicate names at ram_block_add()
> time and fail gracefully, but that probably will break things as
> it hasn't been expected behavior before.

There's actually code to check for setting duplicate RAMBlock names;
what isn't caught is where two RAMBlocks have never had their names
set or where they've been unset.

I'm tempted to add a check at the start of migration; if one of these
blocks exists during a migration it'll probably succeed; two of them
however will cause chaos.

Dave

> 
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PATCH for-2.10 2/8] ppc/xics: add an ics_eoi() handler to XICSFabric

2017-03-08 Thread Cédric Le Goater
This handler will be required by PowerPC machines using multiple ICS
objects, like this is the case for PowerNV. Also update the sPAPR
machine to use the new handler.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics.c|  9 +++--
 hw/ppc/spapr.c| 11 +++
 include/hw/ppc/xics.h |  2 ++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 209e1a75ecb9..e6fecd6e1a89 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -169,7 +169,7 @@ void ics_resend(ICSState *ics)
 }
 }
 
-static void ics_eoi(ICSState *ics, int nr)
+void ics_eoi(ICSState *ics, int nr)
 {
 ICSStateClass *k = ICS_BASE_GET_CLASS(ics);
 
@@ -268,7 +268,6 @@ void icp_eoi(ICPState *icp, uint32_t xirr)
 {
 XICSFabric *xi = icp->xics;
 XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
-ICSState *ics;
 uint32_t irq;
 
 /* Send EOI -> ICS */
@@ -276,10 +275,8 @@ void icp_eoi(ICPState *icp, uint32_t xirr)
 trace_xics_icp_eoi(icp->cs->cpu_index, xirr, icp->xirr);
 irq = xirr & XISR_MASK;
 
-ics = xic->ics_get(xi, irq);
-if (ics) {
-ics_eoi(ics, irq);
-}
+xic->ics_eoi(xi, irq);
+
 if (!XISR(icp)) {
 icp_resend(icp);
 }
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c3bb99160545..043629cc5c54 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3024,6 +3024,16 @@ static void spapr_ics_resend(XICSFabric *dev)
 ics_resend(spapr->ics);
 }
 
+static void spapr_ics_eoi(XICSFabric *xi, int irq)
+{
+ICSState *ics;
+
+ics = spapr_ics_get(xi, irq);
+if (ics) {
+ics_eoi(ics, irq);
+}
+}
+
 static ICPState *spapr_icp_get(XICSFabric *xi, int server)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
@@ -3094,6 +3104,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 vhc->get_patbe = spapr_get_patbe;
 xic->ics_get = spapr_ics_get;
 xic->ics_resend = spapr_ics_resend;
+xic->ics_eoi = spapr_ics_eoi;
 xic->icp_get = spapr_icp_get;
 ispc->print_info = spapr_pic_print_info;
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 42bd24e975cb..00b003b2392d 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -155,6 +155,7 @@ typedef struct XICSFabricClass {
 InterfaceClass parent;
 ICSState *(*ics_get)(XICSFabric *xi, int irq);
 void (*ics_resend)(XICSFabric *xi);
+void (*ics_eoi)(XICSFabric *xi, int irq);
 ICPState *(*icp_get)(XICSFabric *xi, int server);
 } XICSFabricClass;
 
@@ -189,6 +190,7 @@ void icp_pic_print_info(ICPState *icp, Monitor *mon);
 void ics_pic_print_info(ICSState *ics, Monitor *mon);
 
 void ics_resend(ICSState *ics);
+void ics_eoi(ICSState *ics, int irq);
 void icp_resend(ICPState *ss);
 
 typedef struct sPAPRMachineState sPAPRMachineState;
-- 
2.7.4




[Qemu-devel] [PATCH for-2.10 0/8] ppc/pnv: interrupt controller (POWER8)

2017-03-08 Thread Cédric Le Goater
Hello,

Here is a series adding support for the interrupt controller as found
on a POWER8 system. POWER9 uses a different interrupt controller
called XIVE, still to be worked on.

The initial patches are small extensions to the recently added
XICSFabric framework. The PowerNV machine is then extended to hold the
Interrupt Source Control (ICS) and the Interrupt Control Presenter
objects of the system. After that, we use a memory region to model the
Interrupt Management area which contains the ICPs registers of a
POWER8 PowerNV system.

Last is a PSI (Processor Service Interface) bridge model to handle the
external interrupt, a minimal model for the OCC (on-chip Controller)
and support for the LPC controller on POWER8+ cpus.

To test, grab a kernel and a rootfs image here :

  
https://openpower.xyz/job/openpower-op-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/zImage.epapr
  
https://openpower.xyz/job/openpower-op-build/distro=ubuntu,target=palmetto/lastSuccessfulBuild/artifact/images/rootfs.cpio.xz

The full patchset is available here :

   https://github.com/legoater/qemu/commits/powernv-ipmi-2.9

Thanks,

C.

Benjamin Herrenschmidt (3):
  ppc/pnv: Add cut down PSI bridge model and hookup external interrupt
  ppc/pnv: Add OCC model stub with interrupt support
  ppc/pnv: Add support for POWER8+ LPC Controller

Cédric Le Goater (5):
  ppc/xics: add a xics_get_cpu_index_by_pir() helper
  ppc/xics: add an ics_eoi() handler to XICSFabric
  ppc/pnv: create the ICP and ICS objects under the machine
  ppc/pnv: add memory regions for the ICP registers
  ppc/pnv: map the ICP memory regions

 hw/intc/xics.c |  20 +-
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/pnv.c   | 224 -
 hw/ppc/pnv_core.c  | 158 +++-
 hw/ppc/pnv_lpc.c   |  47 +++-
 hw/ppc/pnv_occ.c   | 136 +++
 hw/ppc/pnv_psi.c   | 583 +
 hw/ppc/ppc.c   |  16 ++
 hw/ppc/spapr.c |  11 +
 include/hw/ppc/pnv.h   |  34 +++
 include/hw/ppc/pnv_core.h  |   1 +
 include/hw/ppc/pnv_lpc.h   |   9 +
 include/hw/ppc/pnv_occ.h   |  38 +++
 include/hw/ppc/pnv_psi.h   |  61 +
 include/hw/ppc/pnv_xscom.h |   6 +
 include/hw/ppc/xics.h  |   7 +
 target/ppc/cpu.h   |  10 +
 17 files changed, 1343 insertions(+), 20 deletions(-)
 create mode 100644 hw/ppc/pnv_occ.c
 create mode 100644 hw/ppc/pnv_psi.c
 create mode 100644 include/hw/ppc/pnv_occ.h
 create mode 100644 include/hw/ppc/pnv_psi.h

-- 
2.7.4




[Qemu-devel] [PATCH for-2.10 7/8] ppc/pnv: Add OCC model stub with interrupt support

2017-03-08 Thread Cédric Le Goater
From: Benjamin Herrenschmidt 

The OCC is an on-chip microcontroller based on a ppc405 core used
for various power management tasks. It comes with a pile of additional
hardware sitting on the PIB (aka XSCOM bus). At this point we don't
emulate it (nor plan to do so). However there is one facility which
is provided by the surrounding hardware that we do need, which is the
interrupt generation facility. OPAL uses it to send itself interrupts
under some circumstances and there are other uses around the corner.

So this implement just enough to support this.

Signed-off-by: Benjamin Herrenschmidt 
[clg: - updated for qemu-2.9
  - changed the XSCOM interface to fit new model
  - QOMified the model ]
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
---
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/pnv.c   |  10 
 hw/ppc/pnv_occ.c   | 136 +
 include/hw/ppc/pnv.h   |   2 +
 include/hw/ppc/pnv_occ.h   |  38 +
 include/hw/ppc/pnv_xscom.h |   3 +
 6 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/pnv_occ.c
 create mode 100644 include/hw/ppc/pnv_occ.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index dc19ee17fa57..ef67ea820158 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
 # IBM PowerNV
-obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o
+obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o 
pnv_occ.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 85b00bf235c6..ca5a078bc6c6 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -714,6 +714,11 @@ static void pnv_chip_init(Object *obj)
 object_property_add_child(obj, "psi", OBJECT(&chip->psi), NULL);
 object_property_add_const_link(OBJECT(&chip->psi), "xics",
OBJECT(qdev_get_machine()), &error_abort);
+
+object_initialize(&chip->occ, sizeof(chip->occ), TYPE_PNV_OCC);
+object_property_add_child(obj, "occ", OBJECT(&chip->occ), NULL);
+object_property_add_const_link(OBJECT(&chip->occ), "psi",
+   OBJECT(&chip->psi), &error_abort);
 }
 
 static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
@@ -824,6 +829,11 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
 object_property_set_bool(OBJECT(&chip->lpc), true, "realized",
  &error_fatal);
 pnv_xscom_add_subregion(chip, PNV_XSCOM_LPC_BASE, &chip->lpc.xscom_regs);
+
+/* Create the simplified OCC model */
+object_property_set_bool(OBJECT(&chip->occ), true, "realized",
+ &error_fatal);
+pnv_xscom_add_subregion(chip, PNV_XSCOM_OCC_BASE, &chip->occ.xscom_regs);
 }
 
 static Property pnv_chip_properties[] = {
diff --git a/hw/ppc/pnv_occ.c b/hw/ppc/pnv_occ.c
new file mode 100644
index ..32f167b77ea3
--- /dev/null
+++ b/hw/ppc/pnv_occ.c
@@ -0,0 +1,136 @@
+/*
+ * QEMU PowerNV Emulation of a few OCC related registers
+ *
+ * Copyright (c) 2016, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/hw.h"
+#include "sysemu/sysemu.h"
+#include "target/ppc/cpu.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+
+#include "hw/ppc/pnv.h"
+#include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/pnv_occ.h"
+
+#define OCB_OCI_OCCMISC 0x4020
+#define OCB_OCI_OCCMISC_AND 0x4021
+#define OCB_OCI_OCCMISC_OR  0x4022
+
+static void pnv_occ_set_misc(PnvOCC *occ, uint64_t val)
+{
+bool irq_state;
+
+val &= 0xull;
+
+occ->occmisc = val;
+irq_state = !!(val >> 63);
+pnv_psi_irq_set(occ->psi, PSIHB_IRQ_OCC, irq_state);
+}
+
+static uint64_t pnv_occ_xscom_read(void *opaque, hwaddr addr, unsigned size)
+{
+PnvOCC *occ = PNV_OCC(opaque);
+uint32_t offset = addr >> 3;
+uint64_t val = 0;
+
+switch (offset) {
+case OCB_OCI_OCCMISC:
+val = occ->occmisc;
+break;
+default:
+qemu_log_mask(LOG_UNIMP, "OCC Unimplemented register: Ox%"
+  HWADDR_PRIx "\n", addr);
+}
+re

[Qemu-devel] [PATCH for-2.10 8/8] ppc/pnv: Add support for POWER8+ LPC Controller

2017-03-08 Thread Cédric Le Goater
From: Benjamin Herrenschmidt 

It adds the Naples chip which supports proper LPC interrupts via the
LPC controller rather than via an external CPLD.

Signed-off-by: Benjamin Herrenschmidt 
[clg: - updated for qemu-2.9
  - ported on latest PowerNV patchset ]
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
---
 hw/ppc/pnv.c | 15 ++-
 hw/ppc/pnv_lpc.c | 47 +--
 include/hw/ppc/pnv_lpc.h |  9 +
 3 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index ca5a078bc6c6..65dfe5b4c97b 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -376,7 +376,14 @@ static void pnv_lpc_isa_irq_handler_cpld(void *opaque, int 
n, int level)
 
 static void pnv_lpc_isa_irq_handler(void *opaque, int n, int level)
 {
- /* XXX TODO */
+PnvChip *chip = opaque;
+PnvLpcController *lpc = &chip->lpc;
+
+/* The Naples HW latches the 1 levels, clearing is done by SW */
+if (level) {
+lpc->lpc_hc_irqstat |= LPC_HC_IRQ_SERIRQ0 >> n;
+pnv_lpc_eval_irqs(lpc);
+}
 }
 
 static ISABus *pnv_isa_create(PnvChip *chip)
@@ -719,6 +726,12 @@ static void pnv_chip_init(Object *obj)
 object_property_add_child(obj, "occ", OBJECT(&chip->occ), NULL);
 object_property_add_const_link(OBJECT(&chip->occ), "psi",
OBJECT(&chip->psi), &error_abort);
+
+/*
+ * The LPC controller needs PSI to generate interrupts
+ */
+object_property_add_const_link(OBJECT(&chip->lpc), "psi",
+   OBJECT(&chip->psi), &error_abort);
 }
 
 static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
index 78db52415b11..20cbb6a0dbbd 100644
--- a/hw/ppc/pnv_lpc.c
+++ b/hw/ppc/pnv_lpc.c
@@ -250,6 +250,34 @@ static const MemoryRegionOps pnv_lpc_xscom_ops = {
 .endianness = DEVICE_BIG_ENDIAN,
 };
 
+void pnv_lpc_eval_irqs(PnvLpcController *lpc)
+{
+bool lpc_to_opb_irq = false;
+
+/* Update LPC controller to OPB line */
+if (lpc->lpc_hc_irqser_ctrl & LPC_HC_IRQSER_EN) {
+uint32_t irqs;
+
+irqs = lpc->lpc_hc_irqstat & lpc->lpc_hc_irqmask;
+lpc_to_opb_irq = (irqs != 0);
+}
+
+/* We don't honor the polarity register, it's pointless and unused
+ * anyway
+ */
+if (lpc_to_opb_irq) {
+lpc->opb_irq_input |= OPB_MASTER_IRQ_LPC;
+} else {
+lpc->opb_irq_input &= ~OPB_MASTER_IRQ_LPC;
+}
+
+/* Update OPB internal latch */
+lpc->opb_irq_stat |= lpc->opb_irq_input & lpc->opb_irq_mask;
+
+/* Reflect the interrupt */
+pnv_psi_irq_set(lpc->psi, PSIHB_IRQ_LPC_I2C, lpc->opb_irq_stat != 0);
+}
+
 static uint64_t lpc_hc_read(void *opaque, hwaddr addr, unsigned size)
 {
 PnvLpcController *lpc = opaque;
@@ -300,12 +328,15 @@ static void lpc_hc_write(void *opaque, hwaddr addr, 
uint64_t val,
 break;
 case LPC_HC_IRQSER_CTRL:
 lpc->lpc_hc_irqser_ctrl = val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case LPC_HC_IRQMASK:
 lpc->lpc_hc_irqmask = val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case LPC_HC_IRQSTAT:
 lpc->lpc_hc_irqstat &= ~val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case LPC_HC_ERROR_ADDRESS:
 break;
@@ -363,14 +394,15 @@ static void opb_master_write(void *opaque, hwaddr addr,
 switch (addr) {
 case OPB_MASTER_LS_IRQ_STAT:
 lpc->opb_irq_stat &= ~val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case OPB_MASTER_LS_IRQ_MASK:
-/* XXX Filter out reserved bits */
 lpc->opb_irq_mask = val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case OPB_MASTER_LS_IRQ_POL:
-/* XXX Filter out reserved bits */
 lpc->opb_irq_pol = val;
+pnv_lpc_eval_irqs(lpc);
 break;
 case OPB_MASTER_LS_IRQ_INPUT:
 /* Read only */
@@ -398,6 +430,8 @@ static const MemoryRegionOps opb_master_ops = {
 static void pnv_lpc_realize(DeviceState *dev, Error **errp)
 {
 PnvLpcController *lpc = PNV_LPC(dev);
+Object *obj;
+Error *error = NULL;
 
 /* Reg inits */
 lpc->lpc_hc_fw_rd_acc_size = LPC_HC_FW_RD_4B;
@@ -441,6 +475,15 @@ static void pnv_lpc_realize(DeviceState *dev, Error **errp)
 pnv_xscom_region_init(&lpc->xscom_regs, OBJECT(dev),
   &pnv_lpc_xscom_ops, lpc, "xscom-lpc",
   PNV_XSCOM_LPC_SIZE);
+
+/* get PSI object from chip */
+obj = object_property_get_link(OBJECT(dev), "psi", &error);
+if (!obj) {
+error_setg(errp, "%s: required link 'psi' not found: %s",
+   __func__, error_get_pretty(error));
+return;
+}
+lpc->psi = PNV_PSI(obj);
 }
 
 static void pnv_lpc_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/ppc/pnv_lpc.h b/include/hw/ppc/pnv_lpc.h
index 38e5506975aa..53040026c37b 100644
--- a/include/hw/ppc/pnv_lpc.h
+++

[Qemu-devel] [PATCH for-2.10 1/8] ppc/xics: add a xics_get_cpu_index_by_pir() helper

2017-03-08 Thread Cédric Le Goater
This helper will be used to translate the server number of the XIVE
(which is a PIR) into an ICPState index number (which is a cpu index).

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics.c| 11 +++
 hw/ppc/ppc.c  | 16 
 include/hw/ppc/xics.h |  1 +
 target/ppc/cpu.h  | 10 ++
 4 files changed, 38 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index e740989a1162..209e1a75ecb9 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -49,6 +49,17 @@ int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
 return -1;
 }
 
+int xics_get_cpu_index_by_pir(int pir)
+{
+PowerPCCPU *cpu = ppc_get_vcpu_by_pir(pir);
+
+if (cpu) {
+return cpu->parent_obj.cpu_index;
+}
+
+return -1;
+}
+
 void xics_cpu_destroy(XICSFabric *xi, PowerPCCPU *cpu)
 {
 CPUState *cs = CPU(cpu);
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 5f93083d4a16..94bbe382a73a 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1379,6 +1379,22 @@ PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id)
 return NULL;
 }
 
+PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
+{
+CPUState *cs;
+
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+CPUPPCState *env = &cpu->env;
+
+if (env->spr_cb[SPR_PIR].default_value == pir) {
+return cpu;
+}
+}
+
+return NULL;
+}
+
 void ppc_cpu_parse_features(const char *cpu_model)
 {
 CPUClass *cc;
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 9a5e715fe553..42bd24e975cb 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -173,6 +173,7 @@ void xics_cpu_destroy(XICSFabric *xi, PowerPCCPU *cpu);
 
 /* Internal XICS interfaces */
 int xics_get_cpu_index_by_dt_id(int cpu_dt_id);
+int xics_get_cpu_index_by_pir(int pir);
 
 void icp_set_cppr(ICPState *icp, uint8_t cppr);
 void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 7c4a1f50b38b..24a5af95cb45 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2518,5 +2518,15 @@ int ppc_get_vcpu_dt_id(PowerPCCPU *cpu);
  */
 PowerPCCPU *ppc_get_vcpu_by_dt_id(int cpu_dt_id);
 
+/**
+ * ppc_get_vcpu_by_pir_id:
+ * @pir: Processor Identifier Register (SPR_PIR)
+ *
+ * Searches for a CPU by @pir.
+ *
+ * Returns: a PowerPCCPU struct
+ */
+PowerPCCPU *ppc_get_vcpu_by_pir(int pir);
+
 void ppc_maybe_bswap_register(CPUPPCState *env, uint8_t *mem_buf, int len);
 #endif /* PPC_CPU_H */
-- 
2.7.4




[Qemu-devel] [PATCH for-2.10 4/8] ppc/pnv: add memory regions for the ICP registers

2017-03-08 Thread Cédric Le Goater
This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.

This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c  |  20 +++
 hw/ppc/pnv_core.c | 146 ++
 include/hw/ppc/pnv.h  |  20 +++
 include/hw/ppc/pnv_core.h |   1 +
 include/hw/ppc/xics.h |   3 +
 5 files changed, 190 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 461d3535e99c..7b13b08deadf 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -658,6 +658,16 @@ static void pnv_chip_init(Object *obj)
 object_property_add_child(obj, "lpc", OBJECT(&chip->lpc), NULL);
 }
 
+static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
+{
+char *name;
+
+name = g_strdup_printf("icp-%x", chip->chip_id);
+memory_region_init(&chip->icp_mmio, OBJECT(chip), name, PNV_ICP_SIZE);
+sysbus_init_mmio(SYS_BUS_DEVICE(chip), &chip->icp_mmio);
+g_free(name);
+}
+
 static void pnv_chip_realize(DeviceState *dev, Error **errp)
 {
 PnvChip *chip = PNV_CHIP(dev);
@@ -680,6 +690,14 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
 }
 sysbus_mmio_map(SYS_BUS_DEVICE(chip), 0, PNV_XSCOM_BASE(chip));
 
+/* Interrupt Management Area. This is the memory region holding
+ * all the Interrupt Control Presenter (ICP) registers */
+pnv_chip_icp_realize(chip, &error);
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
 /* Cores */
 pnv_chip_core_sanitize(chip, &error);
 if (error) {
@@ -709,6 +727,8 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 object_property_set_int(OBJECT(pnv_core),
 pcc->core_pir(chip, core_hwid),
 "pir", &error_fatal);
+object_property_add_const_link(OBJECT(pnv_core), "xics",
+   qdev_get_machine(), &error_fatal);
 object_property_set_bool(OBJECT(pnv_core), true, "realized",
  &error_fatal);
 object_unref(OBJECT(pnv_core));
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index d79d530b4881..8633afbff795 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -26,6 +26,128 @@
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_xscom.h"
 
+static uint64_t pnv_core_icp_read(void *opaque, hwaddr addr, unsigned width)
+{
+ICPState *icp = opaque;
+bool byte0 = (width == 1 && (addr & 0x3) == 0);
+uint64_t val = 0x;
+
+switch (addr & 0xffc) {
+case 0: /* poll */
+val = icp_ipoll(icp, NULL);
+if (byte0) {
+val >>= 24;
+} else if (width != 4) {
+goto bad_access;
+}
+break;
+case 4: /* xirr */
+if (byte0) {
+val = icp_ipoll(icp, NULL) >> 24;
+} else if (width == 4) {
+val = icp_accept(icp);
+} else {
+goto bad_access;
+}
+break;
+case 12:
+if (byte0) {
+val = icp->mfrr;
+} else {
+goto bad_access;
+}
+break;
+case 16:
+if (width == 4) {
+val = icp->links[0];
+} else {
+goto bad_access;
+}
+break;
+case 20:
+if (width == 4) {
+val = icp->links[1];
+} else {
+goto bad_access;
+}
+break;
+case 24:
+if (width == 4) {
+val = icp->links[2];
+} else {
+goto bad_access;
+}
+break;
+default:
+bad_access:
+qemu_log_mask(LOG_GUEST_ERROR, "XICS: Bad ICP access 0x%"
+  HWADDR_PRIx"/%d\n", addr, width);
+}
+
+return val;
+}
+
+static void pnv_core_icp_write(void *opaque, hwaddr addr, uint64_t val,
+  unsigned width)
+{
+ICPState *icp = opaque;
+bool byte0 = (width == 1 && (addr & 0x3) == 0);
+
+switch (addr & 0xffc) {
+case 4: /* xirr */
+if (byte0) {
+icp_set_cppr(icp, val);
+} else if (width == 4) {
+icp_eoi(icp, val);
+} else {
+goto bad_access;
+}
+break;
+case 12:
+if (byte0) {
+icp_set_mfrr(icp, val);
+} else {
+goto bad_access;
+}
+break;
+case 16:
+if (width == 4) {
+icp->links[0] = val;
+} else {
+goto bad_access;
+}
+break;
+case 20:
+if (width == 4) {
+icp->links[1] = val;
+} else {
+goto

[Qemu-devel] [PATCH for-2.10 5/8] ppc/pnv: map the ICP memory regions

2017-03-08 Thread Cédric Le Goater
and populate the device tree accordingly for the guest to start using
interrupts. This also links the ICP object to its associated CPUState
(only used by KVM to control the kernel vCPU).

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c  | 55 +++
 hw/ppc/pnv_core.c | 12 
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 7b13b08deadf..0ae11cc3a2ca 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -35,6 +35,7 @@
 #include "monitor/monitor.h"
 #include "hw/intc/intc.h"
 
+#include "hw/ppc/xics.h"
 #include "hw/ppc/pnv_xscom.h"
 
 #include "hw/isa/isa.h"
@@ -216,6 +217,47 @@ static void powernv_create_core_node(PnvChip *chip, 
PnvCore *pc, void *fdt)
servers_prop, sizeof(servers_prop;
 }
 
+static void powernv_populate_icp(PnvChip *chip, void *fdt, int offset,
+  uint32_t pir, uint32_t count)
+{
+uint64_t addr;
+char *name;
+const char compat[] = "IBM,power8-icp\0IBM,ppc-xicp";
+uint32_t irange[2], i, rsize;
+uint64_t *reg;
+
+/*
+ * TODO: add multichip ICP BAR
+ */
+addr = PNV_ICP_BASE(chip) | (pir << 12);
+
+irange[0] = cpu_to_be32(pir);
+irange[1] = cpu_to_be32(count);
+
+rsize = sizeof(uint64_t) * 2 * count;
+reg = g_malloc(rsize);
+for (i = 0; i < count; i++) {
+reg[i * 2] = cpu_to_be64(addr | ((pir + i) * 0x1000));
+reg[i * 2 + 1] = cpu_to_be64(0x1000);
+}
+
+name = g_strdup_printf("interrupt-controller@%"PRIX64, addr);
+offset = fdt_add_subnode(fdt, offset, name);
+_FDT(offset);
+g_free(name);
+
+_FDT((fdt_setprop(fdt, offset, "compatible", compat, sizeof(compat;
+_FDT((fdt_setprop(fdt, offset, "reg", reg, rsize)));
+_FDT((fdt_setprop_string(fdt, offset, "device_type",
+  "PowerPC-External-Interrupt-Presentation")));
+_FDT((fdt_setprop(fdt, offset, "interrupt-controller", NULL, 0)));
+_FDT((fdt_setprop(fdt, offset, "ibm,interrupt-server-ranges",
+   irange, sizeof(irange;
+_FDT((fdt_setprop_cell(fdt, offset, "#interrupt-cells", 1)));
+_FDT((fdt_setprop_cell(fdt, offset, "#address-cells", 0)));
+g_free(reg);
+}
+
 static void powernv_populate_chip(PnvChip *chip, void *fdt)
 {
 PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
@@ -229,6 +271,10 @@ static void powernv_populate_chip(PnvChip *chip, void *fdt)
 PnvCore *pnv_core = PNV_CORE(chip->cores + i * typesize);
 
 powernv_create_core_node(chip, pnv_core, fdt);
+
+/* Interrupt Control Presenters (ICP). One per thread. */
+powernv_populate_icp(chip, fdt, 0, pnv_core->pir,
+ CPU_CORE(pnv_core)->nr_threads);
 }
 
 if (chip->ram_size) {
@@ -697,6 +743,7 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 error_propagate(errp, error);
 return;
 }
+sysbus_mmio_map(SYS_BUS_DEVICE(chip), 1, PNV_ICP_BASE(chip));
 
 /* Cores */
 pnv_chip_core_sanitize(chip, &error);
@@ -711,6 +758,7 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
  && (i < chip->nr_cores); core_hwid++) {
 char core_name[32];
 void *pnv_core = chip->cores + i * typesize;
+int j;
 
 if (!(chip->cores_mask & (1ull << core_hwid))) {
 continue;
@@ -738,6 +786,13 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
 PNV_XSCOM_EX_CORE_BASE(pcc->xscom_core_base,
core_hwid),
 &PNV_CORE(pnv_core)->xscom_regs);
+
+/* Map the ICP registers for each thread */
+for (j = 0; j < CPU_CORE(pnv_core)->nr_threads; j++) {
+memory_region_add_subregion(&chip->icp_mmio,
+ (pcc->core_pir(chip, core_hwid) + j) << 12,
+  &PNV_CORE(pnv_core)->icp_mmios[j]);
+}
 i++;
 }
 g_free(typename);
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 8633afbff795..d28fa445b11b 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -25,6 +25,7 @@
 #include "hw/ppc/pnv.h"
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_xscom.h"
+#include "hw/ppc/xics.h"
 
 static uint64_t pnv_core_icp_read(void *opaque, hwaddr addr, unsigned width)
 {
@@ -165,7 +166,7 @@ static void powernv_cpu_reset(void *opaque)
 env->msr |= MSR_HVB; /* Hypervisor mode */
 }
 
-static void powernv_cpu_init(PowerPCCPU *cpu, Error **errp)
+static void powernv_cpu_init(PowerPCCPU *cpu, XICSFabric *xi, Error **errp)
 {
 CPUPPCState *env = &cpu->env;
 int core_pir;
@@ -185,6 +186,9 @@ static void powernv_cpu_init(PowerPCCPU *cpu, Error **errp)
 cpu_ppc_tb_init(env, PNV_TIMEBASE_FREQ);
 
 qemu_register_reset(powernv_cpu_reset, cpu);
+
+/* xics_cpu_setup() assigns the CPU t

[Qemu-devel] [PATCH for-2.10 3/8] ppc/pnv: create the ICP and ICS objects under the machine

2017-03-08 Thread Cédric Le Goater
Like this is done for the sPAPR machine, we use a simple array under
the PowerNV machine to store the Interrupt Control Presenters (ICP)
objects, one for each vCPU. This array is indexed by 'cpu_index' of
the CPUState.

We use a list to hold the different Interrupt Control Sources (ICS)
objects, as PowerNV needs to handle multiple sources: for PCI-E and
for the Processor Service Interface (PSI).

Finally, to interface with the XICS layer which manipulates the ICP
and ICS objects, we extend the PowerNV machine with an XICSFabric
interface and its associated handlers.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c  | 89 +++
 include/hw/ppc/pnv.h  |  4 +++
 include/hw/ppc/xics.h |  1 +
 3 files changed, 94 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 09f0d22defb8..461d3535e99c 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -32,6 +32,8 @@
 #include "exec/address-spaces.h"
 #include "qemu/cutils.h"
 #include "qapi/visitor.h"
+#include "monitor/monitor.h"
+#include "hw/intc/intc.h"
 
 #include "hw/ppc/pnv_xscom.h"
 
@@ -416,6 +418,23 @@ static void ppc_powernv_init(MachineState *machine)
 machine->cpu_model = "POWER8";
 }
 
+/* Create the Interrupt Control Presenters before the vCPUs */
+pnv->nr_servers = pnv->num_chips * smp_cores * smp_threads;
+pnv->icps = g_new0(ICPState, pnv->nr_servers);
+for (i = 0; i < pnv->nr_servers; i++) {
+ICPState *icp = &pnv->icps[i];
+object_initialize(icp, sizeof(*icp), TYPE_ICP);
+qdev_set_parent_bus(DEVICE(icp), sysbus_get_default());
+object_property_add_child(OBJECT(pnv), "icp[*]", OBJECT(icp),
+  &error_fatal);
+object_property_add_const_link(OBJECT(icp), "xics", OBJECT(pnv),
+   &error_fatal);
+object_property_set_bool(OBJECT(icp), true, "realized", &error_fatal);
+}
+
+/* and the list of Interrupt Control Sources */
+QLIST_INIT(&pnv->ics);
+
 /* Create the processor chips */
 chip_typename = g_strdup_printf(TYPE_PNV_CHIP "-%s", machine->cpu_model);
 if (!object_class_by_name(chip_typename)) {
@@ -742,6 +761,48 @@ static void pnv_get_num_chips(Object *obj, Visitor *v, 
const char *name,
 visit_type_uint32(v, name, &POWERNV_MACHINE(obj)->num_chips, errp);
 }
 
+static ICSState *pnv_ics_get(XICSFabric *xi, int irq)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(xi);
+ICSState *ics;
+
+QLIST_FOREACH(ics, &pnv->ics, list) {
+if (ics_valid_irq(ics, irq)) {
+return ics;
+}
+}
+return NULL;
+}
+
+static void pnv_ics_resend(XICSFabric *xi)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(xi);
+ICSState *ics;
+
+QLIST_FOREACH(ics, &pnv->ics, list) {
+ics_resend(ics);
+}
+}
+
+static void pnv_ics_eoi(XICSFabric *xi, int irq)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(xi);
+ICSState *ics;
+
+QLIST_FOREACH(ics, &pnv->ics, list) {
+if (ics_valid_irq(ics, irq)) {
+ics_eoi(ics, irq);
+}
+}
+}
+
+static ICPState *pnv_icp_get(XICSFabric *xi, int server)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(xi);
+
+return (server < pnv->nr_servers) ? &pnv->icps[server] : NULL;
+}
+
 static void pnv_set_num_chips(Object *obj, Visitor *v, const char *name,
   void *opaque, Error **errp)
 {
@@ -783,9 +844,27 @@ static void powernv_machine_class_props_init(ObjectClass 
*oc)
   NULL);
 }
 
+static void pnv_pic_print_info(InterruptStatsProvider *obj,
+   Monitor *mon)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(obj);
+ICSState *ics;
+int i;
+
+for (i = 0; i < pnv->nr_servers; i++) {
+icp_pic_print_info(&pnv->icps[i], mon);
+}
+
+QLIST_FOREACH(ics, &pnv->ics, list) {
+ics_pic_print_info(ics, mon);
+}
+}
+
 static void powernv_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
+XICSFabricClass *xic = XICS_FABRIC_CLASS(oc);
+InterruptStatsProviderClass *ispc = INTERRUPT_STATS_PROVIDER_CLASS(oc);
 
 mc->desc = "IBM PowerNV (Non-Virtualized)";
 mc->init = ppc_powernv_init;
@@ -796,6 +875,11 @@ static void powernv_machine_class_init(ObjectClass *oc, 
void *data)
 mc->no_parallel = 1;
 mc->default_boot_order = NULL;
 mc->default_ram_size = 1 * G_BYTE;
+xic->icp_get = pnv_icp_get;
+xic->ics_get = pnv_ics_get;
+xic->ics_eoi = pnv_ics_eoi;
+xic->ics_resend = pnv_ics_resend;
+ispc->print_info = pnv_pic_print_info;
 
 powernv_machine_class_props_init(oc);
 }
@@ -806,6 +890,11 @@ static const TypeInfo powernv_machine_info = {
 .instance_size = sizeof(PnvMachineState),
 .instance_init = powernv_machine_initfn,
 .class_init= powernv_machine_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_XICS_FABRIC },
+{

[Qemu-devel] [PATCH for-2.10 6/8] ppc/pnv: Add cut down PSI bridge model and hookup external interrupt

2017-03-08 Thread Cédric Le Goater
From: Benjamin Herrenschmidt 

The PSI (Processor Service Interface) Controller is one of the engines
of the "Bridge" unit which connects the different interfaces to the
Power Processor.

This adds just enough of the PSI bridge to handle various on-chip and
the one external interrupt. The rest of PSI has to do with the link to
the IBM FSP service processor which we don't plan to emulate (not used
on OpenPower machines).

Signed-off-by: Benjamin Herrenschmidt 
[clg: - updated for qemu-2.9
  - changed the XSCOM interface to fit new model
  - QOMified the model
  - reworked set_xive and worked around a skiboot bug
  - removed the 'psi_mmio_to_xscom' mapping array ]
Signed-off-by: Cédric Le Goater 
---
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/pnv.c   |  35 ++-
 hw/ppc/pnv_psi.c   | 583 +
 include/hw/ppc/pnv.h   |   8 +
 include/hw/ppc/pnv_psi.h   |  61 +
 include/hw/ppc/pnv_xscom.h |   3 +
 6 files changed, 685 insertions(+), 7 deletions(-)
 create mode 100644 hw/ppc/pnv_psi.c
 create mode 100644 include/hw/ppc/pnv_psi.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index 001293423c8d..dc19ee17fa57 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 obj-$(CONFIG_PSERIES) += spapr_cpu_core.o spapr_ovec.o
 # IBM PowerNV
-obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o
+obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o pnv_lpc.o pnv_psi.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0ae11cc3a2ca..85b00bf235c6 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -356,15 +356,22 @@ static void ppc_powernv_reset(void)
  * have a CPLD that will collect the SerIRQ and shoot them as a
  * single level interrupt to the P8 chip. So let's setup a hook
  * for doing just that.
- *
- * Note: The actual interrupt input isn't emulated yet, this will
- * come with the PSI bridge model.
  */
 static void pnv_lpc_isa_irq_handler_cpld(void *opaque, int n, int level)
 {
-/* We don't yet emulate the PSI bridge which provides the external
- * interrupt, so just drop interrupts on the floor
- */
+PnvMachineState *pnv = POWERNV_MACHINE(qdev_get_machine());
+uint32_t old_state = pnv->cpld_irqstate;
+PnvChip *chip = opaque;
+
+if (level) {
+pnv->cpld_irqstate |= 1u << n;
+} else {
+pnv->cpld_irqstate &= ~(1u << n);
+}
+if (pnv->cpld_irqstate != old_state) {
+pnv_psi_irq_set(&chip->psi, PSIHB_IRQ_EXTERNAL,
+pnv->cpld_irqstate != 0);
+}
 }
 
 static void pnv_lpc_isa_irq_handler(void *opaque, int n, int level)
@@ -702,6 +709,11 @@ static void pnv_chip_init(Object *obj)
 
 object_initialize(&chip->lpc, sizeof(chip->lpc), TYPE_PNV_LPC);
 object_property_add_child(obj, "lpc", OBJECT(&chip->lpc), NULL);
+
+object_initialize(&chip->psi, sizeof(chip->psi), TYPE_PNV_PSI);
+object_property_add_child(obj, "psi", OBJECT(&chip->psi), NULL);
+object_property_add_const_link(OBJECT(&chip->psi), "xics",
+   OBJECT(qdev_get_machine()), &error_abort);
 }
 
 static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
@@ -722,6 +734,8 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 char *typename = pnv_core_typename(pcc->cpu_model);
 size_t typesize = object_type_get_instance_size(typename);
 int i, core_hwid;
+MachineState *machine = MACHINE(qdev_get_machine());
+PnvMachineState *pnv = POWERNV_MACHINE(machine);
 
 if (!object_class_by_name(typename)) {
 error_setg(errp, "Unable to find PowerNV CPU Core '%s'", typename);
@@ -797,6 +811,15 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
 }
 g_free(typename);
 
+
+/* Processor Service Interface (PSI) Host Bridge */
+object_property_set_bool(OBJECT(&chip->psi), true, "realized",
+ &error_fatal);
+pnv_xscom_add_subregion(chip, PNV_XSCOM_PSI_BASE, &chip->psi.xscom_regs);
+
+/* link in the PSI ICS */
+QLIST_INSERT_HEAD(&pnv->ics, &chip->psi.ics, list);
+
 /* Create LPC controller */
 object_property_set_bool(OBJECT(&chip->lpc), true, "realized",
  &error_fatal);
diff --git a/hw/ppc/pnv_psi.c b/hw/ppc/pnv_psi.c
new file mode 100644
index ..6ba688aac075
--- /dev/null
+++ b/hw/ppc/pnv_psi.c
@@ -0,0 +1,583 @@
+/*
+ * QEMU PowerNV PowerPC PSI interface
+ *
+ * Copyright (c) 2016, IBM Corporation
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your opti

[Qemu-devel] [PATCH] memory_region: Fix name comments

2017-03-08 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The 'name' parameter to memory_region_init_* had been marked as debug
only, however vmstate_region_ram uses it as a parameter to
qemu_ram_set_idstr to set RAMBlock names and these form part of the
migration stream.

Signed-off-by: Dr. David Alan Gilbert 
---
 include/exec/memory.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6911023..de8f69e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -307,7 +307,7 @@ struct MemoryRegionSection {
  *
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region; any subregions beyond this size will be clipped
  */
 void memory_region_init(MemoryRegion *mr,
@@ -355,7 +355,7 @@ void memory_region_unref(MemoryRegion *mr);
  * @ops: a structure containing read and write callbacks to be used when
  *   I/O is performed on the region.
  * @opaque: passed to the read and write callbacks of the @ops structure.
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 void memory_region_init_io(MemoryRegion *mr,
@@ -474,7 +474,7 @@ void memory_region_init_ram_device_ptr(MemoryRegion *mr,
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @orig: the region to be referenced; @mr will be equivalent to
  *@orig between @offset and @offset + @size - 1.
  * @offset: start of the section in @orig to be referenced.
@@ -537,7 +537,7 @@ void memory_region_init_rom_device(MemoryRegion *mr,
  *
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 static inline void memory_region_init_reservation(MemoryRegion *mr,
@@ -558,7 +558,7 @@ static inline void 
memory_region_init_reservation(MemoryRegion *mr,
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
  * @ops: a function that translates addresses into the @target region
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 void memory_region_init_iommu(MemoryRegion *mr,
-- 
2.9.3




Re: [Qemu-devel] RAMBlock's named ""

2017-03-08 Thread Juan Quintela
"Dr. David Alan Gilbert"  wrote:
> * Igor Mammedov (imamm...@redhat.com) wrote:
>> On Tue, 7 Mar 2017 19:46:56 +
>> "Dr. David Alan Gilbert"  wrote:
>> 
>> > We seem to have a couple of weird cases where we end up with
>> > RAMBlocks with no name;  I think they'll badly confuse
>> > the migration code, but I don't quite understand how they're
>> > happening.
>> > 
>> > 1) device_del e1000e
>> > 2) -object memory-backend-file  without wiring it up
>> > 
>> > I added some debug into migration/ram.c ram_save_setup to
>> > dump the names it was seeing in it's FOREACH.
>> > 
>> > 1)
>> >   (from https://bugzilla.redhat.com/show_bug.cgi?id=1425273)
>> >   The simplest reproducer of this is:
>> > 
>> > ./qemu-system-x86_64 -nographic -device e1000e,id=foo -m 1G -M
>> > pc,accel=kvm my.img
>> > 
>> >   with a Linux image and after it's booted do a device_del foo
>> > 
>> >   migration at that point sees an empty RAMBlock that was the ROM
>> >   associated with the device.  This doesn't happen on an e1000 device,
>> >   so I've not figured out what the difference is.
>> > 
>> >   This gives a : Unknown ramblock "", cannot accept migration
>> >   on the destination (correctly).
>> > 
>> >   (This happens on 2.7.0 as well, so it's nothing new)
>> > 
>> > 2)
>> >   ./qemu-system-x86_64 -nographic -object
>> > memory-backend-file,id=mem,size=512M,mem-path=/tmp
>> > 
>> >   Note I've not wired that memory to a NUMA node or a DIMM or anything, so
>> >   it's just hanging around.
>> >   Again that RAMBlock does exist and shows up in the migration code,
>> >   I think it'll try and migrate it.
>> it has to be registered with vmstate_register_ram() which
>> doesn't happen for non used hostmem object.
>> See:
>>   pc_dimm_memory_plug()
>> and
>>   memory_region_allocate_system_memory()
>> 
>> > The real fun is that there doesn't seem to be anything that stops
>> > two blocks having the same name!
>> code doesn't permit duplicate ids for -object created objects
>> but memory region api doesn't care about it as long as
>> memory_region name is unique child name within its parent object
>> children namespace.
>> 
>> you can do a check for empty / duplicate names at ram_block_add()
>> time and fail gracefully, but that probably will break things as
>> it hasn't been expected behavior before.
>
> There's actually code to check for setting duplicate RAMBlock names;
> what isn't caught is where two RAMBlocks have never had their names
> set or where they've been unset.
>
> I'm tempted to add a check at the start of migration; if one of these
> blocks exists during a migration it'll probably succeed; two of them
> however will cause chaos.

This is the best approach for the short term as far as I can see.

Later, Juan.



Re: [Qemu-devel] [PATCH] memory_region: Fix name comments

2017-03-08 Thread Philippe Mathieu-Daudé

On 03/08/2017 07:54 AM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

The 'name' parameter to memory_region_init_* had been marked as debug
only, however vmstate_region_ram uses it as a parameter to
qemu_ram_set_idstr to set RAMBlock names and these form part of the
migration stream.

Signed-off-by: Dr. David Alan Gilbert 


Reviewed-by: Philippe Mathieu-Daudé 


---
 include/exec/memory.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 6911023..de8f69e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -307,7 +307,7 @@ struct MemoryRegionSection {
  *
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region; any subregions beyond this size will be clipped
  */
 void memory_region_init(MemoryRegion *mr,
@@ -355,7 +355,7 @@ void memory_region_unref(MemoryRegion *mr);
  * @ops: a structure containing read and write callbacks to be used when
  *   I/O is performed on the region.
  * @opaque: passed to the read and write callbacks of the @ops structure.
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 void memory_region_init_io(MemoryRegion *mr,
@@ -474,7 +474,7 @@ void memory_region_init_ram_device_ptr(MemoryRegion *mr,
  *
  * @mr: the #MemoryRegion to be initialized.
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @orig: the region to be referenced; @mr will be equivalent to
  *@orig between @offset and @offset + @size - 1.
  * @offset: start of the section in @orig to be referenced.
@@ -537,7 +537,7 @@ void memory_region_init_rom_device(MemoryRegion *mr,
  *
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 static inline void memory_region_init_reservation(MemoryRegion *mr,
@@ -558,7 +558,7 @@ static inline void 
memory_region_init_reservation(MemoryRegion *mr,
  * @mr: the #MemoryRegion to be initialized
  * @owner: the object that tracks the region's reference count
  * @ops: a function that translates addresses into the @target region
- * @name: used for debugging; not visible to the user or ABI
+ * @name: Region name, becomes part of RAMBlock name used in migration stream
  * @size: size of the region.
  */
 void memory_region_init_iommu(MemoryRegion *mr,





Re: [Qemu-devel] [PATCH] memory: reduce heap Rss size around 3M

2017-03-08 Thread Philippe Mathieu-Daudé

On 03/08/2017 12:11 PM, Yang Zhong wrote:

Since cpu-memory and memory have same address space,one malloced
memory is enough. This patch will skip memory malloc for memory
address space,which will reduce around 3M physical memory in heap.

Signed-off-by: Yang Zhong 


Reviewed-by: Philippe Mathieu-Daudé 


---
 memory.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index 284894b..799ca4c 100644
--- a/memory.c
+++ b/memory.c
@@ -2422,8 +2422,10 @@ AddressSpace *address_space_init_shareable(MemoryRegion 
*root, const char *name)
 AddressSpace *as;

 QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
-if (root == as->root && as->malloced) {
-as->ref_count++;
+if (root == as->root) {
+if (as->malloced) {
+as->ref_count++;
+}
 return as;
 }
 }





Re: [Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Gerd Hoffmann
  Hi,

> libvirt suffered similar lack of clarity around when to bump major version
> number as opposed to minor version. To address this we recently adopted the
> rule[1] that major version number changes have no relation to features. The
> major number is simply incremented at the start of each calendar year.

I like the idea of time-based version numbering.  Changing the plan for
2.9 still is probably a bad idea given we have a 2.9 machine type
already etc.

But we can start changing things afterwards.  So we could start with 3.x
after 2.9, and bump the major for the first release each year
(libvirt-style).  Or we could use the year as major number and jump
straight to 17.x (mesa-style).

> From the POV of libvirt, I don't think saying that .0 releases have
> incompatible changes is particularly useful in itself. What *is* useful
> is to have a clear rule around a deprecation cycle. ie, a rule that says
> a feature will be marked deprecated for 3 releases or 12 months, before it
> is permitted to be removed/changed. If that were followed, there is no
> need to batch up such changes in a .0 release IMHO.

Yes, it's probably a good idea to deprecate things first.  When going
with the 12 months rule it could be useful to batch things and do a
yearly "spring cleaning" when the major is bumped.

We could also create new machine types for major versions only, to keep
the number somewhat under control.  Not sure how well that would work in
practice.  We'd probably need a ${type}-next machine type then (future
${type}-4 machine type, for development, incompatible changes allowed)
and make ${type}-3 the default machine type.

cheers, 
  Gerd




Re: [Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Thomas Huth
On 08.03.2017 11:03, Peter Maydell wrote:
> On 8 March 2017 at 09:26, Thomas Huth  wrote:
>> But anyway, the more important thing that keeps me concerned is: Someone
>>  once told me that we should get rid of old parameters and interfaces
>> (like HMP commands) primarily only when we're changing to a new major
>> version number. As you all know, QEMU has a lot of legacy options, which
>> are likely rather confusing than helpful for the new users nowadays,
>> e.g. things like the "-net channel" option (which is fortunately even
>> hardly documented), but maybe also even the whole vlan/hub concept in
>> the net code, or legacy parameters like "-usbdevice". If we switch to
>> version 3.0, could we agree to remove at least some of them?
> 
> I think if we are going to deprecate and remove options we need
> a clear transition plan for doing so, which means at least one
> release where options are "still works, but warn that they
> are going away with pointer to documentation or similar info
> about their replacement syntax", before actually dropping them.

Yes, that's certainly a good idea. But as Daniel suggested in his mail,
I think we should also have the rule that the option should be marked as
deprecated in multiple releases first - so that the users have a chance
to speak up before something gets really removed (otherwise the option
could be removed right on the first day after the initial release with
the deprecation message, so there is no time for the user to notice this
and complain). Not sure whether we need three releases, as Daniel
suggested, though, but if that's common sense, that's fine for me, too.

Anyway, I've now started a Wiki page where we could track the removal of
deprecated interfaces:

 http://wiki.qemu-project.org/Features/LegacyRemoval

Feedback / updates / addition of other legacy interfaces is welcome!

 Thomas




Re: [Qemu-devel] [PATCH for-2.10 4/8] ppc/pnv: add memory regions for the ICP registers

2017-03-08 Thread Philippe Mathieu-Daudé

Hi Cédric,

On 03/08/2017 07:52 AM, Cédric Le Goater wrote:

This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.

This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/pnv.c  |  20 +++
 hw/ppc/pnv_core.c | 146 ++
 include/hw/ppc/pnv.h  |  20 +++
 include/hw/ppc/pnv_core.h |   1 +
 include/hw/ppc/xics.h |   3 +
 5 files changed, 190 insertions(+)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 461d3535e99c..7b13b08deadf 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -658,6 +658,16 @@ static void pnv_chip_init(Object *obj)
 object_property_add_child(obj, "lpc", OBJECT(&chip->lpc), NULL);
 }

+static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
+{
+char *name;
+
+name = g_strdup_printf("icp-%x", chip->chip_id);
+memory_region_init(&chip->icp_mmio, OBJECT(chip), name, PNV_ICP_SIZE);
+sysbus_init_mmio(SYS_BUS_DEVICE(chip), &chip->icp_mmio);
+g_free(name);
+}
+
 static void pnv_chip_realize(DeviceState *dev, Error **errp)
 {
 PnvChip *chip = PNV_CHIP(dev);
@@ -680,6 +690,14 @@ static void pnv_chip_realize(DeviceState *dev, Error 
**errp)
 }
 sysbus_mmio_map(SYS_BUS_DEVICE(chip), 0, PNV_XSCOM_BASE(chip));

+/* Interrupt Management Area. This is the memory region holding
+ * all the Interrupt Control Presenter (ICP) registers */
+pnv_chip_icp_realize(chip, &error);
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
 /* Cores */
 pnv_chip_core_sanitize(chip, &error);
 if (error) {
@@ -709,6 +727,8 @@ static void pnv_chip_realize(DeviceState *dev, Error **errp)
 object_property_set_int(OBJECT(pnv_core),
 pcc->core_pir(chip, core_hwid),
 "pir", &error_fatal);
+object_property_add_const_link(OBJECT(pnv_core), "xics",
+   qdev_get_machine(), &error_fatal);
 object_property_set_bool(OBJECT(pnv_core), true, "realized",
  &error_fatal);
 object_unref(OBJECT(pnv_core));
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index d79d530b4881..8633afbff795 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -26,6 +26,128 @@
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_xscom.h"

+static uint64_t pnv_core_icp_read(void *opaque, hwaddr addr, unsigned width)
+{
+ICPState *icp = opaque;
+bool byte0 = (width == 1 && (addr & 0x3) == 0);


hard to read, an inline function should produce the same code at be more 
easily reviewable.



+uint64_t val = 0x;
+
+switch (addr & 0xffc) {
+case 0: /* poll */
+val = icp_ipoll(icp, NULL);
+if (byte0) {
+val >>= 24;
+} else if (width != 4) {
+goto bad_access;
+}
+break;
+case 4: /* xirr */
+if (byte0) {
+val = icp_ipoll(icp, NULL) >> 24;
+} else if (width == 4) {
+val = icp_accept(icp);
+} else {
+goto bad_access;
+}
+break;
+case 12:
+if (byte0) {
+val = icp->mfrr;
+} else {
+goto bad_access;
+}
+break;
+case 16:
+if (width == 4) {
+val = icp->links[0];
+} else {
+goto bad_access;
+}
+break;
+case 20:
+if (width == 4) {
+val = icp->links[1];
+} else {
+goto bad_access;
+}
+break;
+case 24:
+if (width == 4) {
+val = icp->links[2];
+} else {
+goto bad_access;
+}
+break;
+default:
+bad_access:
+qemu_log_mask(LOG_GUEST_ERROR, "XICS: Bad ICP access 0x%"
+  HWADDR_PRIx"/%d\n", addr, width);
+}
+
+return val;
+}
+
+static void pnv_core_icp_write(void *opaque, hwaddr addr, uint64_t val,
+  unsigned width)
+{
+ICPState *icp = opaque;
+bool byte0 = (width == 1 && (addr & 0x3) == 0);
+
+switch (addr & 0xffc) {
+case 4: /* xirr */
+if (byte0) {
+icp_set_cppr(icp, val);
+} else if (width == 4) {
+icp_eoi(icp, val);
+} else {
+goto bad_access;
+}
+break;
+case 12:
+if (byte0) {
+icp_set_mfrr(icp, val);
+} else {
+goto bad_access;
+}
+break;
+case 16:
+if (width == 4) {
+icp->links[0] = val;
+} else {
+ 

Re: [Qemu-devel] What's the next QEMU version after 2.9 ? (or: when is a good point in time to get rid of old interfaces)

2017-03-08 Thread Daniel P. Berrange
On Wed, Mar 08, 2017 at 12:22:24PM +0100, Thomas Huth wrote:
> On 08.03.2017 11:03, Peter Maydell wrote:
> > On 8 March 2017 at 09:26, Thomas Huth  wrote:
> >> But anyway, the more important thing that keeps me concerned is: Someone
> >>  once told me that we should get rid of old parameters and interfaces
> >> (like HMP commands) primarily only when we're changing to a new major
> >> version number. As you all know, QEMU has a lot of legacy options, which
> >> are likely rather confusing than helpful for the new users nowadays,
> >> e.g. things like the "-net channel" option (which is fortunately even
> >> hardly documented), but maybe also even the whole vlan/hub concept in
> >> the net code, or legacy parameters like "-usbdevice". If we switch to
> >> version 3.0, could we agree to remove at least some of them?
> > 
> > I think if we are going to deprecate and remove options we need
> > a clear transition plan for doing so, which means at least one
> > release where options are "still works, but warn that they
> > are going away with pointer to documentation or similar info
> > about their replacement syntax", before actually dropping them.
> 
> Yes, that's certainly a good idea. But as Daniel suggested in his mail,
> I think we should also have the rule that the option should be marked as
> deprecated in multiple releases first - so that the users have a chance
> to speak up before something gets really removed (otherwise the option
> could be removed right on the first day after the initial release with
> the deprecation message, so there is no time for the user to notice this
> and complain). Not sure whether we need three releases, as Daniel
> suggested, though, but if that's common sense, that's fine for me, too.

FYI, I didn't put any thought into my 3 releases / 12 months numbers. I
just arbitrarily picked them out of the hat, so don't consider it my
endorsement of that particular length of time :-) I think 2 is the minimum
number of releases we should deprecate for, beyond that, I'm open minded

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH RFC 1/1] vmstate: draft fix for failed iotests case 68 and 91

2017-03-08 Thread Halil Pasic


On 03/08/2017 08:05 AM, QingFeng Hao wrote:
> 
> 
> 在 2017/3/7 18:19, Halil Pasic 写道:
>>
>> On 03/07/2017 11:05 AM, Kevin Wolf wrote:
>>> Am 07.03.2017 um 10:54 hat Halil Pasic geschrieben:

 On 03/07/2017 10:29 AM, Kevin Wolf wrote:
> Am 07.03.2017 um 03:53 hat QingFeng Hao geschrieben:
>> I am not very clear about the logic in vmstate.c, but from its context in
>> vmstate_save_state, it seems size should not be 0, otherwise the followed
>> for loop will keep working on the same element. So I just add a simple
>> check to pass that case, not sure if it's right but it can pass iotest
>> case 68 and 91 now.
>>
>> The iotest's failed output is:
>> 068 1s ... - output mismatch (see 068.out.bad)
>> --- 
>> /home/haoqf/KVMonz/gitcheck/work/qemu-master/tree/qemu/tests/qemu-iotests/068.out
>>2017-03-06 05:52:24.817328899 +0100
>> +++ 068.out.bad 2017-03-07 03:28:44.426714519 +0100
>> @@ -3,9 +3,13 @@
>>   === Saving and reloading a VM state to/from a qcow2 image ===
>>
>>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
>> +qemu-system-s390x: migration/vmstate.c:336: vmstate_save_state: 
>> Assertion `first_elem || !n_elems' failed.
>> +./common.config: line 109: 52497 Aborted ( if [ -n 
>> "${QEMU_NEED_PID}" ]; then
>> +echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
>> +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" )
>>
>>
>> Signed-off-by: QingFeng Hao 
>> ---
>>   migration/vmstate.c | 8 
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/migration/vmstate.c b/migration/vmstate.c
>> index 78b3cd4..ff28dde 100644
>> --- a/migration/vmstate.c
>> +++ b/migration/vmstate.c
>> @@ -106,6 +106,10 @@ int vmstate_load_state(QEMUFile *f, const 
>> VMStateDescription *vmsd,
>>   int i, n_elems = vmstate_n_elems(opaque, field);
>>   int size = vmstate_size(opaque, field);
>>   +if (size == 0) {
>> +field++;
>> +continue;
>> +}
>>   vmstate_handle_alloc(first_elem, field, opaque);
>>   if (field->flags & VMS_POINTER) {
>>   first_elem = *(void **)first_elem;
>> @@ -322,6 +326,10 @@ void vmstate_save_state(QEMUFile *f, const 
>> VMStateDescription *vmsd,
>>   int64_t old_offset, written_bytes;
>>   QJSON *vmdesc_loop = vmdesc;
>>   +if (size == 0) {
>> +field++;
>> +continue;
>> +}
>>   trace_vmstate_save_state_loop(vmsd->name, field->name, 
>> n_elems);
>>   if (field->flags & VMS_POINTER) {
>>   first_elem = *(void **)first_elem;
> This is really a live migration fix, so I'm adding Juan and Dave to CC.
 You are right, this is migration stuff and has very little to do with
 qemu-block.
> I suspect the real question is why a field with size 0 was even stored
> in the vmstate to begin with.
>
 I have looked onto the issue. It affects s390x only if we
 are running without KVM. Basically, S390CPU.irqstate is unused
 if we do not use KVM, and thus no buffer is allocated.

 IMHO this is a missing field and the cleaner way to handle such
 missing fields is exist. However this used to work, and I recommended
 QuiFeng Hao to discuss the problem upstream.

 By the way, I think, if we want to go back to the old behavior
 and support VMS_VBUFFER with size 0 and nullptr, a much
 cleaner way to do the fix is to change the assert to:

 assert(first_elem  || !n_elems || !size)

 Obviously the other clean way to fix is to implement exists.
>>> If you're right that this specific vmstate was valid in earlier
>>> versions, then I think it's clear that we need to make it work again.
>>> Otherwise we're breaking migration from old versions.
>> Not really. We would not break migration because nothing was written to
>> the stream for VMS_VBUFFER of size 0 except the vmdesc which is at the end,
>> 'debug only', and does not affect migration compatibility.
>>
>> IMHO it is an API question. I would have said, there is no data, therefore
>> there is no field if it's from scratch. But with prior history, I agree
>> with Dave, we should restore old behavior -- which was changed 
>> unintentionally
>> because I made a wrong assumption.
> Halil,
> Unfortunately, another assert failed after I change the code as below in
> vmstate_save_state and vmstate_load_state:
> assert(first_elem  || !n_elems || !size);
> 
> The error is:
> +qemu-system-s390x: migration/vmstate.c:341: vmstate_save_state: Assertion 
> `field->flags & VMS_ARRAY_OF_POINTER' failed.
> It's the code as below:
>  if (!curr_elem) {

Sorry, I failed to mention this yesterday. You should a

Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Julian Kirsch
On 08.03.2017 04:09, Richard Henderson wrote:
> On 03/08/2017 11:16 AM, Julian Kirsch wrote:
>> For instance, many modern x86-64 operating systems maintain access to 
>> internal
>> data structures via the MSR_GSBASE/MSR_KERNELGSBASE MSRs. Giving
>> introspection utilities (such as a remotely attached gdb) a way of
>> accessing these registers improves analysis results drastically.
> 
> If this is just for gdb, then we should provide access via the normal gdbstub,
> plus appropriate xml files.  There are plenty of examples of this for other 
> cpus.
> 
> 
> r~
> 

Hi Richard,

I considered adding this functionality to the gdbstub.
However, my understanding of this approach is that it would make the patch
dependent on a particular (newly added) xml file being present at the gdb client
side. The current solution avoids this by letting gdb clients simply use the
newly introduced commands by means of the "monitor" command already built into 
gdb.
While I appreciate that this is a bit hacky, I, on the other hand, cannot come
up with any good setting in which a normal user space gdb client might need
access to the MSRs except for the qemu one, such that adding a new xml file to
gdb just for this special use case might be difficult to communicate on the
gdb-devel list.
Another question arising in this context would also consider what and how many
MSRs the xml file(s) should contain, including the logic to provide distinct
architectural MSRs for 32 and 64 bit targets. I hope you can see this approach
incurring non-negligible planning overhead with the end result providing same or
even less functionality.
Nevertheless, I'd move the functionality over to the gdbstub and bug on
gdb-devel to include a new (minimal) xml file, if it was the only way to get an
according patch upstream in qemu.

Best,
Julian



Re: [Qemu-devel] [PATCH] memory_region: Fix name comments

2017-03-08 Thread Peter Maydell
On 8 March 2017 at 11:54, Dr. David Alan Gilbert (git)
 wrote:
> From: "Dr. David Alan Gilbert" 
>
> The 'name' parameter to memory_region_init_* had been marked as debug
> only, however vmstate_region_ram uses it as a parameter to
> qemu_ram_set_idstr to set RAMBlock names and these form part of the
> migration stream.
>
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  include/exec/memory.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 6911023..de8f69e 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -307,7 +307,7 @@ struct MemoryRegionSection {
>   *
>   * @mr: the #MemoryRegion to be initialized
>   * @owner: the object that tracks the region's reference count
> - * @name: used for debugging; not visible to the user or ABI
> + * @name: Region name, becomes part of RAMBlock name used in migration stream

...for RAM-backend MRs only, presumably? Also, what are the
uniqueness constraints?

>   * @size: size of the region; any subregions beyond this size will be clipped
>   */
>  void memory_region_init(MemoryRegion *mr,
> @@ -355,7 +355,7 @@ void memory_region_unref(MemoryRegion *mr);
>   * @ops: a structure containing read and write callbacks to be used when
>   *   I/O is performed on the region.
>   * @opaque: passed to the read and write callbacks of the @ops structure.
> - * @name: used for debugging; not visible to the user or ABI
> + * @name: Region name, becomes part of RAMBlock name used in migration stream

ditto.

>   * @size: size of the region.
>   */
>  void memory_region_init_io(MemoryRegion *mr,
> @@ -474,7 +474,7 @@ void memory_region_init_ram_device_ptr(MemoryRegion *mr,
>   *
>   * @mr: the #MemoryRegion to be initialized.
>   * @owner: the object that tracks the region's reference count
> - * @name: used for debugging; not visible to the user or ABI
> + * @name: Region name, becomes part of RAMBlock name used in migration stream
>   * @orig: the region to be referenced; @mr will be equivalent to
>   *@orig between @offset and @offset + @size - 1.
>   * @offset: start of the section in @orig to be referenced.

The diff here is rather lacking in context, but this comment change
is for memory_region_init_alias(). Aliases presumably don't get
migrated (the thing they alias into will be migrated if it's RAM),
so the comment seems out of place here.

> @@ -537,7 +537,7 @@ void memory_region_init_rom_device(MemoryRegion *mr,
>   *
>   * @mr: the #MemoryRegion to be initialized
>   * @owner: the object that tracks the region's reference count
> - * @name: used for debugging; not visible to the user or ABI
> + * @name: Region name, becomes part of RAMBlock name used in migration stream
>   * @size: size of the region.
>   */
>  static inline void memory_region_init_reservation(MemoryRegion *mr,

Similarly, a reservation MR isn't RAM-backed.

> @@ -558,7 +558,7 @@ static inline void 
> memory_region_init_reservation(MemoryRegion *mr,
>   * @mr: the #MemoryRegion to be initialized
>   * @owner: the object that tracks the region's reference count
>   * @ops: a function that translates addresses into the @target region
> - * @name: used for debugging; not visible to the user or ABI
> + * @name: Region name, becomes part of RAMBlock name used in migration stream
>   * @size: size of the region.
>   */
>  void memory_region_init_iommu(MemoryRegion *mr,

...and nor is an IOMMU MR.

> --
> 2.9.3
>


thanks
-- PMM

-- 
12345678901234567890123456789012345678901234567890123456789012345678901234567890
 1 2 3 4 5 6 7 8



Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Dr. David Alan Gilbert
* Julian Kirsch (g...@kirschju.re) wrote:
> Provide read/write access to x86 model specific registers (MSRs) by means of
> two new HMP commands "msr-list" and "msr-set". The rationale behind this
> is to improve introspection capabilities for system virtualization mode.
> For instance, many modern x86-64 operating systems maintain access to internal
> data structures via the MSR_GSBASE/MSR_KERNELGSBASE MSRs. Giving
> introspection utilities (such as a remotely attached gdb) a way of
> accessing these registers improves analysis results drastically.

I'll leave those who know more about gdb to answer whether that's the best way
of doing it; but I can see them being useful to those just trying to debug
stuff from the monitor.

> Signed-off-by: Julian Kirsch 
> ---
> This patch moves the logic of the rdmsr and wrmsr functions to helper.c and
> replaces their current versions in misc_helper.c with stubs calling the new
> functions.

I think the patch needs to be split up; you should at least have:
  a) The big part that moves the helpers (if that's the right thing to do)
  b) The qmp code
  c) The hmp code

I also don't see why you need to move the helpers.

> The ordering of MSRs now loosely follows the ordering used in the KVM
> module. As qemu operates on cached values in the CPUX86State struct, the 
> msr-set
> command is implemented in a hackish way: In order to force qemu to flush the 
> new
> values to KVM a call to cpu_synchronize_post_init is triggered, eventually
> ending up in calling kvm_put_msrs at KVM_PUT_FULL_STATE level. As MSR writes
> could *still* be caught by the code logic in this function, the msr-set 
> function
> reads back the values written to determine whether the write was successful.
> This way, we don't have to duplicate the logic used in kvm_put_msrs 
> (has_msr_XXX)
> to x86_cpu_wrmsr.
> There are several things I would like to pooint out about this patch:
>   - The "msr-list" command currently displays MSR values for all virtual cpus.
> This is somewhat inconsistent with "info registers" just displaying the
> value of the current default cpu. One might think about just displaying 
> the
> current value and offer access to other CPU's MSRs by means of switching
> between CPUs using the "cpu" monitor command.

Yes, it's probably best to make it just the current CPU to be consistent.

>   - The new version of x86_cpu_rdmsr exposes MSRs that are arguably of
> questionable help for any human / tool using the monitor. However I merely
> felt a deep urge to reflect the full MSR list from kvm.c when writing the
> code.

No point in guessing which ones the human will need; may as well give them
everything so they can debug bugs that are obscure.

>   - While the need for msr-list is evident (see commit msg), msr-set could be
> used in more obscure cases. For instance, one might offer a way to access
> and configure performance counter MSRs of the guest via the hmp. If this
> feels too much like an inacceptable hack, I'll happily drop the msr-set
> part.

It feels reasonable to have it for debugging.

(Heck, aren't those big switch statements depressing, I'm sure there
must be a way to turn a chunk of them into a table that could be shared between
the helpers and even maybe the kvm code; anyway, not the subject of this patch).

Dave

> Best,
> Julian
> 
>  cpus.c|  99 +
>  hmp-commands.hx   |  29 +++
>  hmp.c |  31 +++
>  hmp.h |   2 +
>  qapi-schema.json  |  49 +
>  target/i386/cpu.h |   3 +
>  target/i386/helper.c  | 518 
> ++
>  target/i386/misc_helper.c | 298 +-
>  8 files changed, 742 insertions(+), 287 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index c857ad2957..c5088d2294 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1919,6 +1919,105 @@ exit:
>  fclose(f);
>  }
>  
> +MsrInfoList *qmp_msr_list(uint32_t msr_idx, Error **errp)
> +{
> +#ifdef TARGET_I386
> +bool valid;
> +X86CPU *cpu;
> +CPUX86State *env;
> +CPUState *cpu_state;
> +MsrInfoList *head = NULL, *cur_item = NULL;
> +
> +CPU_FOREACH(cpu_state) {
> +cpu = X86_CPU(cpu_state);
> +env = &cpu->env;
> +
> +cpu_synchronize_state(cpu_state);
> +
> +MsrInfoList *info;
> +info = g_malloc0(sizeof(*info));
> +info->value = g_malloc0(sizeof(*info->value));
> +
> +info->value->cpu_idx = cpu_state->cpu_index;
> +info->value->msr_idx = msr_idx;
> +info->value->value = x86_cpu_rdmsr(env, msr_idx, &valid);
> +
> +if (!valid) {
> +error_setg(errp, "Failed to retreive MSR 0x%08x on CPU %u",
> +   msr_idx, cpu_state->cpu_index);
> +return head;
> +}
> +
> +/* XXX: waiting for the qapi to support GSList */
> +if (!cur_item) {
> +head = cur_item = in

[Qemu-devel] [PATCH for-2.9 v3] file-posix: Consider max_segments for BlockLimits.max_transfer

2017-03-08 Thread Fam Zheng
BlockLimits.max_transfer can be too high without this fix, guest will
encounter I/O error or even get paused with werror=stop or rerror=stop. The
cause is explained below.

Linux has a separate limit, /sys/block/.../queue/max_segments, which in
the worst case can be more restrictive than the BLKSECTGET which we
already consider (note that they are two different things). So, the
failure scenario before this patch is:

1) host device has max_sectors_kb = 4096 and max_segments = 64;
2) guest learns max_sectors_kb limit from QEMU, but doesn't know
   max_segments;
3) guest issues e.g. a 512KB request thinking it's okay, but actually
   it's not, because it will be passed through to host device as an
   SG_IO req that has niov > 64;
4) host kernel doesn't like the segmenting of the request, and returns
   -EINVAL;

This patch checks the max_segments sysfs entry for the host device and
calculates a "conservative" bytes limit using the page size, which is
then merged into the existing max_transfer limit. Guest will discover
this from the usual virtual block device interfaces. (In the case of
scsi-generic, it will be done in the INQUIRY reply interception in
device model.)

The other possibility is to actually propagate it as a separate limit,
but it's not better. On the one hand, there is a big complication: the
limit is per-LUN in QEMU PoV (because we can attach LUNs from different
host HBAs to the same virtio-scsi bus), but the channel to communicate
it in a per-LUN manner is missing down the stack; on the other hand,
two limits versus one doesn't change much about the valid size of I/O
(because guest has no control over host segmenting).

Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
was explored. Unfortunately there is no neat way to ensure the bounce
buffer is less segmented (in terms of DMA addr) than the guest buffer.

Practically, this bug is not very common. It is only reported on a
Emulex (lpfc), so it's okay to get it fixed in the easier way.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 

---

v3: Clearer commit message. [Kevin]
v2: Use /sys/dev/block/MAJOR:MINOR/queue/max_segments. [Paolo]
---
 block/file-posix.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4de1abd..c4c0663 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -668,6 +668,48 @@ static int hdev_get_max_transfer_length(BlockDriverState 
*bs, int fd)
 #endif
 }
 
+static int hdev_get_max_segments(const struct stat *st)
+{
+#ifdef CONFIG_LINUX
+char buf[32];
+const char *end;
+char *sysfspath;
+int ret;
+int fd = -1;
+long max_segments;
+
+sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
+major(st->st_rdev), minor(st->st_rdev));
+fd = open(sysfspath, O_RDONLY);
+if (fd == -1) {
+ret = -errno;
+goto out;
+}
+do {
+ret = read(fd, buf, sizeof(buf));
+} while (ret == -1 && errno == EINTR);
+if (ret < 0) {
+ret = -errno;
+goto out;
+} else if (ret == 0) {
+ret = -EIO;
+goto out;
+}
+buf[ret] = 0;
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(buf, &end, 10, &max_segments);
+if (ret == 0 && end && *end == '\n') {
+ret = max_segments;
+}
+
+out:
+g_free(sysfspath);
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -679,6 +721,11 @@ static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
 if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
 bs->bl.max_transfer = pow2floor(ret);
 }
+ret = hdev_get_max_segments(&st);
+if (ret > 0) {
+bs->bl.max_transfer = MIN(bs->bl.max_transfer,
+  ret * getpagesize());
+}
 }
 }
 
-- 
2.9.3




Re: [Qemu-devel] [PATCH 2/2] vl: Print CPU help after we've registered the CPU accelerators

2017-03-08 Thread Eduardo Habkost
On Wed, Mar 08, 2017 at 01:33:45PM +1100, David Gibson wrote:
> On Tue, Mar 07, 2017 at 09:02:37AM -0300, Eduardo Habkost wrote:
> > On Tue, Mar 07, 2017 at 02:31:05PM +1100, David Gibson wrote:
> > > On Mon, Mar 06, 2017 at 08:47:52AM -0300, Eduardo Habkost wrote:
> > > > On Mon, Mar 06, 2017 at 10:06:27AM +1100, David Gibson wrote:
> > > > > On Fri, Mar 03, 2017 at 11:58:07AM -0300, Eduardo Habkost wrote:
> > > > > > On Tue, Jan 31, 2017 at 02:11:59PM +0100, Thomas Huth wrote:
> > > > > > > When running with KVM on POWER, we register some CPU types during
> > > > > > > the initialization function of the ppc64 KVM code (which 
> > > > > > > unfortunately
> > > > > > > also can not be done via a type_init() like it is done on x86).
> > > > > > 
> > > > > > Can you elaborate why it can't be done via type_init()? If the
> > > > > > QOM type hierarchy depends on any runtime data unavailable at
> > > > > > type_init(), we should fix that.
> > > > > 
> > > > > Hmm.. how?  This is specifically for the special 'host' cpu in the
> > > > > case of KVM.  We can't use a static configuration here, because there
> > > > > are things on the host that could limit what features of the CPU are
> > > > > available for guest use.  So, we need KVM to be initialized in order
> > > > > to query that information.
> > > > 
> > > > There's nothing saying you need to query that information before
> > > > type_register() or during class_init, does it? The behavior of
> > > > TYPE_HOST_POWERPC_CPU after object_new() is called can be as
> > > > dynamic as you want it to, but the QOM type hierarchy is supposed
> > > > to be static.
> > > 
> > > So, the thing is we have some properties that logically belong in the
> > > CPU class, rather than instance, and that's correct for all TCG CPUs,
> > > but not for the KVM host CPU.  It seems nasty to have to force all
> > > those things into the instance just because of KVM.
> > 
> > You can still register any class properties you want, without
> > querying KVM first. Are there properties that you are able to
> > register only after querying KVM? Is the set of class properties
> > on TYPE_HOST_POWERPC_CPU different depending on the host
> > capabilities?
> 
> Sorry, I used "properties" sloppily, not meaning QOM properties.
> 
> There are several fields in the CPU class which describe CPU
> capabilities - bitmasks indicating which instructions are available,
> and L1 cacheline sizes.  There are some other things that are in the
> CPU instance at the moment, but arguably would belong better in the
> class: available page sizes and PTE encodings, for example.
> 
> At the real hardware level these things are dependent only the model
> of CPU - hence belonging in cpu class, not instance.  But, because of
> the way virtualization works on POWER, some of the features may not be
> available to guests, due to configuration of the hypervisor.  So for
> the "host" cpu we need to query KVM to see which CPU features are
> actually available.
> 

I see. If there is data that is available only at PowerPCCPUClass
and you don't want to duplicate it at PowerPCCPU, we can have a
solution for that: instead of late-registration of the class, we
could leave those fields to be populated after KVM is
initialized.

Anyway, I don't think this is urgent: the code already treats
"host" as an exception in ppc_cpu_list(), and (AFAICS) the
original problem this patch addresses is related to the
inaccurate alias information only (which is not a consequence of
the late type_register() call).

> > I'm looking at target/ppc/cpu-models.c, and I only see the
> > class_init function setting PowerPCCPUClass::pvr and
> > PowerPCCPUClass::svr. Am I missing anything?
> 
> Yes.  Notice that the .parent set there is not the base powerpc cpu
> class, but a "family" class.  Those families are defined in
> translate_init.c by another hairy macro POWERPC_FAMILY().  The family
> class_init function (generally done as a block below the macro
> invocation) initializes several other things, including insns_flags.
> 
> kvmppc_host_cpu_class_init(), uses the family's copy of insns_flags as
> a base, but needs to query the host and adjust it for a couple of
> instruction groups that can be disabled from the hypervisor.
> 
> [Aside: we also update the cache sizes based on host information.
> The reasons for that are complicated.  This particular case doesn't
> require a late class init though, because although it's queried from
> the host, it's not queried from kvm specifically]
> 
> > You can still keep pvr and svr on the CPU class. The only thing
> > that's different on TYPE_HOST_POWERPC_CPU is that it won't have
> > the preset values on PowerPCCPUClass. But it can initialize the
> > instance values on ->instance_init() or ->realize() instead.
> > 
> > I even see ppc_cpu_class_by_pvr*() treating TYPE_HOST_POWERPC_CPU
> > as special on PVR lookup, already. So it looks like the code is
> > almost ready for that?
> 
> PVR is a red herring.  insn_flag

[Qemu-devel] [PATCH] qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip

2017-03-08 Thread Thomas Huth
There are some broken URLs in the qemu-doc which reference tools that
are not available at their original location anymore. Fortunately, they
have been mirrored to archive.org, so point to that location instead.

Signed-off-by: Thomas Huth 
---
 qemu-doc.texi | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index 794ab4a..50411bc 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1878,8 +1878,8 @@ resolution modes which the Cirrus Logic BIOS does not 
support (i.e. >=
 Windows 9x does not correctly use the CPU HLT
 instruction. The result is that it takes host CPU cycles even when
 idle. You can install the utility from
-@url{http://www.user.cityline.ru/~maxamn/amnhltm.zip} to solve this
-problem. Note that no such tool is needed for NT, 2000 or XP.
+@url{http://web.archive.org/web/20060212132151/http://www.user.cityline.ru/~maxamn/amnhltm.zip}
+to solve this problem. Note that no such tool is needed for NT, 2000 or XP.
 
 @subsubsection Windows 2000 disk full problem
 
@@ -1927,9 +1927,9 @@ vvfat block device ("-hdb 
fat:directory_which_holds_the_SP").
 @subsubsection CPU usage reduction
 
 DOS does not correctly use the CPU HLT instruction. The result is that
-it takes host CPU cycles even when idle. You can install the utility
-from @url{http://www.vmware.com/software/dosidle210.zip} to solve this
-problem.
+it takes host CPU cycles even when idle. You can install the utility from
+@url{http://web.archive.org/web/20051222085335/http://www.vmware.com/software/dosidle210.zip}
+to solve this problem.
 
 @node QEMU System emulator for non PC targets
 @chapter QEMU System emulator for non PC targets
-- 
1.8.3.1




Re: [Qemu-devel] [Qemu-trivial] [PATCH] qemu-doc: Fix broken URLs of amnhltm.zip and dosidle210.zip

2017-03-08 Thread Laurent Vivier
On 08/03/2017 13:13, Thomas Huth wrote:
> There are some broken URLs in the qemu-doc which reference tools that
> are not available at their original location anymore. Fortunately, they
> have been mirrored to archive.org, so point to that location instead.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Laurent Vivier 

> ---
>  qemu-doc.texi | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 794ab4a..50411bc 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1878,8 +1878,8 @@ resolution modes which the Cirrus Logic BIOS does not 
> support (i.e. >=
>  Windows 9x does not correctly use the CPU HLT
>  instruction. The result is that it takes host CPU cycles even when
>  idle. You can install the utility from
> -@url{http://www.user.cityline.ru/~maxamn/amnhltm.zip} to solve this
> -problem. Note that no such tool is needed for NT, 2000 or XP.
> +@url{http://web.archive.org/web/20060212132151/http://www.user.cityline.ru/~maxamn/amnhltm.zip}
> +to solve this problem. Note that no such tool is needed for NT, 2000 or XP.
>  
>  @subsubsection Windows 2000 disk full problem
>  
> @@ -1927,9 +1927,9 @@ vvfat block device ("-hdb 
> fat:directory_which_holds_the_SP").
>  @subsubsection CPU usage reduction
>  
>  DOS does not correctly use the CPU HLT instruction. The result is that
> -it takes host CPU cycles even when idle. You can install the utility
> -from @url{http://www.vmware.com/software/dosidle210.zip} to solve this
> -problem.
> +it takes host CPU cycles even when idle. You can install the utility from
> +@url{http://web.archive.org/web/20051222085335/http://www.vmware.com/software/dosidle210.zip}
> +to solve this problem.
>  
>  @node QEMU System emulator for non PC targets
>  @chapter QEMU System emulator for non PC targets
> 




Re: [Qemu-devel] [PATCH for-2.9 v3] file-posix: Consider max_segments for BlockLimits.max_transfer

2017-03-08 Thread Kevin Wolf
Am 08.03.2017 um 13:08 hat Fam Zheng geschrieben:
> BlockLimits.max_transfer can be too high without this fix, guest will
> encounter I/O error or even get paused with werror=stop or rerror=stop. The
> cause is explained below.
> 
> Linux has a separate limit, /sys/block/.../queue/max_segments, which in
> the worst case can be more restrictive than the BLKSECTGET which we
> already consider (note that they are two different things). So, the
> failure scenario before this patch is:
> 
> 1) host device has max_sectors_kb = 4096 and max_segments = 64;
> 2) guest learns max_sectors_kb limit from QEMU, but doesn't know
>max_segments;
> 3) guest issues e.g. a 512KB request thinking it's okay, but actually
>it's not, because it will be passed through to host device as an
>SG_IO req that has niov > 64;
> 4) host kernel doesn't like the segmenting of the request, and returns
>-EINVAL;
> 
> This patch checks the max_segments sysfs entry for the host device and
> calculates a "conservative" bytes limit using the page size, which is
> then merged into the existing max_transfer limit. Guest will discover
> this from the usual virtual block device interfaces. (In the case of
> scsi-generic, it will be done in the INQUIRY reply interception in
> device model.)
> 
> The other possibility is to actually propagate it as a separate limit,
> but it's not better. On the one hand, there is a big complication: the
> limit is per-LUN in QEMU PoV (because we can attach LUNs from different
> host HBAs to the same virtio-scsi bus), but the channel to communicate
> it in a per-LUN manner is missing down the stack; on the other hand,
> two limits versus one doesn't change much about the valid size of I/O
> (because guest has no control over host segmenting).
> 
> Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
> was explored. Unfortunately there is no neat way to ensure the bounce
> buffer is less segmented (in terms of DMA addr) than the guest buffer.
> 
> Practically, this bug is not very common. It is only reported on a
> Emulex (lpfc), so it's okay to get it fixed in the easier way.
> 
> Reviewed-by: Paolo Bonzini 
> Signed-off-by: Fam Zheng 

Thanks, applied to the block branch.

Kevin



[Qemu-devel] [RFC PATCH qemu] dtc: Allow compiling with old gcc

2017-03-08 Thread Alexey Kardashevskiy
After 6e85fce0225f "dtc: Update requirement to v1.4.2" QEMU stopped
compiling in CentOS7:

In file included from /home/aik/p/qemu/dtc/libfdt/libfdt.h:54:0,
 from /home/aik/p/qemu/device_tree.c:30:
/home/aik/p/qemu/dtc/libfdt/libfdt_env.h:64:0: error: "__bitwise" redefined 
[-Werror]
 #define __bitwise
 ^
In file included from /usr/include/asm/ptrace.h:27:0,
 from /usr/include/asm/sigcontext.h:11,
 from /usr/include/bits/sigcontext.h:27,
 from /usr/include/signal.h:340,
 from /home/aik/p/qemu/include/qemu/osdep.h:86,
 from /home/aik/p/qemu/device_tree.c:14:
/usr/include/linux/types.h:21:0: note: this is the location of the previous 
definition
 #define __bitwise __bitwise__
 ^
cc1: all warnings being treated as errors
make: *** [device_tree.o] Error 1
make: *** Waiting for unfinished jobs

The reason is that CentOS7 comes with libfdt 1.4.0 so QEMU tries using
the internal one which does not compile as CentOS7 comes with gcc v4.8.5
which reports warnings which it would not if the OS's libfdt was used
(libfdt_env.h has not changed between 1.4.0 and 1.4.2).

gcc 6.2.0 from Ubuntu v16.10 handles this fine.

This replaces -I with -isystem to suppress the warning (which turns
to an error because of -Werror).

Signed-off-by: Alexey Kardashevskiy 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 6c21975f02..2c2a5df14a 100755
--- a/configure
+++ b/configure
@@ -3414,7 +3414,7 @@ EOF
symlink "$source_path/dtc/Makefile" "dtc/Makefile"
symlink "$source_path/dtc/scripts" "dtc/scripts"
 fi
-fdt_cflags="-I\$(SRC_PATH)/dtc/libfdt"
+fdt_cflags="-isystem\$(SRC_PATH)/dtc/libfdt"
 fdt_libs="-L\$(BUILD_DIR)/dtc/libfdt $fdt_libs"
   elif test "$fdt" = "yes" ; then
 # have neither and want - prompt for system/submodule install
-- 
2.11.0




Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-03-08 Thread Ketan Nilangekar


> On Mar 8, 2017, at 1:13 AM, Daniel P. Berrange  wrote:
> 
>> On Tue, Mar 07, 2017 at 05:27:55PM -0800, ashish mittal wrote:
>> Thanks! There is one more input I need some help with!
>> 
>> VxHS network library opens a fixed number of connection channels to a
>> given host, and all the vdisks (that connect to the same host) share
>> these connection channels.
>> 
>> Therefore, we need to open secure channels to a specific target host
>> only once for the first vdisk that connects to that host. All the
>> other vdisks that connect to the same target host will share the same
>> set of secure communication channels.
>> 
>> I hope the above scheme is acceptable?
> 
> I don't think I'm in favour of such an approach, as it forces a single
> QEMU process to use the same privileges for all disks it uses.
> 
> Consider an example where a QEMU process has two disks, one shared
> readonly disk and one exclusive writable disk, both on the same host.
> 

This is not a use case for VxHS as a solution. We do not support sharing of 
vdisks across QEMU instances.

Vxhs library was thus not designed to open different connections for individual 
vdisks within a QEMU instance.

Implementing this will involve rewrite of significant parts of libvxhs client 
and server. Is this a new requirement for acceptance into QEMU?


> It is reasonable as an administrator to want to use different credentials
> for each of these. ie, they might have a set of well known credentials to
> authenticate to get access to the read-only disk, but have a different set
> of strictly limited access credentials to get access to the writable disk
> 
> Trying to re-use the same connection for multiple cause prevents QEMU from
> authenticating with different credentials per disk, so I don't think that
> is a good approach - each disk should have totally independant state.
> 
>> 
>> If yes, then we have a couple of options to implement this:
>> 
>> (1) Accept the TLS credentials per vdisk using the previously
>> discussed --object tls-creds-x509 mechanism. In this case, if more
>> than one vdisk have the same host info, then we will use only the
>> first one's creds to set up the secure connection, and ignore the
>> others. vdisks that connect to different target hosts will use their
>> individual tls-creds-x509 to set up the secure channels. This is, of
>> course, not relevant for qemu-img/qemu-io as they can open only one
>> vdisk at a time.
>> 
>> (2) Instead of having a per-vdisk --object tls-creds-x509, have a
>> single such argument on the command line for vxhs block device code to
>> consume - if that is possible! One way to achieve this could be the
>> user/password authentication we discussed earlier, which we could use
>> to pass the directory where cert/keys are kept.
>> 
>> (3) Use the instance UUID, when available, to lookup the cert files
>> per instance (i.e. for qemu-kvm), and use the --object tls-creds-x509
>> mechanism, when instance UUID is NULL (i.e. qemu-io, qemu-img etc).
>> The cert/key files are anyway protected by file permissions in either
>> case, so I guess there is no additional security provided by either
>> method.
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Marcel Apfelbaum

On 03/08/2017 11:16 AM, Peter Xu wrote:

On Wed, Mar 08, 2017 at 05:09:45PM +0800, Peter Xu wrote:

On Wed, Mar 08, 2017 at 10:17:09AM +0200, Marcel Apfelbaum wrote:

[...]



I am sorry I was not clear enough:
If a device is added after the system is up (hotplug), we cannot depend on the 
"machine_done"
event to enable "bus master".
This is why we have
   if (qdev_hotplug)
pci_init_bus_master(pci_dev);

The code you proposed changes the order, so this call is done *after* realize.

My question was: What if any other device may require the bus_master_as
at realize time (and can be hot-plugged) ?
For example: hcd-ehci/hcd-ohci devices call pci_get_address_space()
and caches the bus_master_as.


Oh, I didn't notice that there are other devices that used
bus_master_as during realization. If so... Would this really work even
without hot plug? Considering that bus_master_as won't be inited until
machine done phase?


Please ignore my question... I think the answer is that these devices
are only caching the pointer of bus_master_as. So it won't really use
the address space before machine_done.

If so, IMHO moving pci_init_bus_master() after device specific
realize() is okay as well then, right?



It should work, right. But you may want to double-check anyway :)

Thanks,
Marcel


Thanks,

-- peterx






Re: [Qemu-devel] [PATCH v3] intel_iommu: check misordered init when realize

2017-03-08 Thread Marcel Apfelbaum

On 03/08/2017 12:08 PM, Jason Wang wrote:



On 2017年03月08日 16:35, Paolo Bonzini wrote:


- Original Message -

From: "Marcel Apfelbaum" 
To: "Michael S. Tsirkin" , "Peter Xu" 
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" , "yi l liu" 
, "Jintack Lim"
, "Jason Wang" , "Alex Williamson" 

Sent: Thursday, March 2, 2017 7:24:44 AM
Subject: Re: [PATCH v3] intel_iommu: check misordered init when realize

On 03/02/2017 07:13 AM, Michael S. Tsirkin wrote:

On Thu, Mar 02, 2017 at 11:32:18AM +0800, Peter Xu wrote:

Intel vIOMMU devices are created with "-device" parameter, while here
actually we need to make sure the dmar device be created before other
PCI devices (like vfio-pci, virtio-pci ones) so that we know iommu_fn
will be setup correctly before realizations of those PCI devices (it is
legal that PCI device fetch these info during its realization). Now this
ordering yet cannot be achieved elsewhere, and devices will be created
in the order that user specified. We need to avoid that.

This patch tries to detect this kind of misordering issue during init of
VT-d device, then report to guest if misordering happened. In the
future, we can provide something better to solve it, e.g., to support
device init ordering, then we can live without this patch.

Signed-off-by: Peter Xu 

Unfortunately with virtio it's a regression, as it used to
work with iommu. So I'm afraid we need to look into supporting
arbitrary order right now :(


Hi,

A fast way to do it is to initialize iommu with a new keyword, like
  -iommu intel-iommu
Or maybe use the existing "-object" somehow ?
  From vl.c comments:
"Initial object creation happens before all other
 QEMU data types are created "

Another idea is to add a third run on QEMU cmd line arguments,
but this would affect the whole system.

However having an ordering system beats all other ideas.

The ordering should be explicit in the command line.  Since we do not have
cycles, it should be possible.

The best solution would have been to add something like

-device intel-iommu,id=root-complex-iommu -global 
mch.iommu=root-complex-iommu

but it's too late for that.

Paolo



Yes, it's probably too late other than having workarounds for 2.9.


I suggested to Laine to require libvirt to initialize first the iommu device.
and since most QEMU users use libvirt anyway, we may just be OK (kind of).


And to eliminate similar issues, we may initializing the pci devices by 
traversing the PCI tree from root complex in level order.


Interesting idea, we'll explore it for 2.10.
But we need initialization order for all machine components, not only PCI.

Thanks,
Marcel



Thanks





Re: [Qemu-devel] [PATCH for-2.10 4/8] ppc/pnv: add memory regions for the ICP registers

2017-03-08 Thread Cédric Le Goater
On 03/08/2017 12:24 PM, Philippe Mathieu-Daudé wrote:
> Hi Cédric,
> 
> On 03/08/2017 07:52 AM, Cédric Le Goater wrote:
>> This provides to a PowerNV chip (POWER8) access to the Interrupt
>> Management area, which contains the registers of the Interrupt Control
>> Presenters of each thread. These are used to accept, return, forward
>> interrupts in the system.
>>
>> This area is modeled with a per-chip container memory region holding
>> all the ICP registers. Each thread of a chip is then associated with
>> its ICP registers using a memory subregion indexed by its PIR number
>> in the overall region.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  hw/ppc/pnv.c  |  20 +++
>>  hw/ppc/pnv_core.c | 146 
>> ++
>>  include/hw/ppc/pnv.h  |  20 +++
>>  include/hw/ppc/pnv_core.h |   1 +
>>  include/hw/ppc/xics.h |   3 +
>>  5 files changed, 190 insertions(+)
>>
>> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
>> index 461d3535e99c..7b13b08deadf 100644
>> --- a/hw/ppc/pnv.c
>> +++ b/hw/ppc/pnv.c
>> @@ -658,6 +658,16 @@ static void pnv_chip_init(Object *obj)
>>  object_property_add_child(obj, "lpc", OBJECT(&chip->lpc), NULL);
>>  }
>>
>> +static void pnv_chip_icp_realize(PnvChip *chip, Error **errp)
>> +{
>> +char *name;
>> +
>> +name = g_strdup_printf("icp-%x", chip->chip_id);
>> +memory_region_init(&chip->icp_mmio, OBJECT(chip), name, PNV_ICP_SIZE);
>> +sysbus_init_mmio(SYS_BUS_DEVICE(chip), &chip->icp_mmio);
>> +g_free(name);
>> +}
>> +
>>  static void pnv_chip_realize(DeviceState *dev, Error **errp)
>>  {
>>  PnvChip *chip = PNV_CHIP(dev);
>> @@ -680,6 +690,14 @@ static void pnv_chip_realize(DeviceState *dev, Error 
>> **errp)
>>  }
>>  sysbus_mmio_map(SYS_BUS_DEVICE(chip), 0, PNV_XSCOM_BASE(chip));
>>
>> +/* Interrupt Management Area. This is the memory region holding
>> + * all the Interrupt Control Presenter (ICP) registers */
>> +pnv_chip_icp_realize(chip, &error);
>> +if (error) {
>> +error_propagate(errp, error);
>> +return;
>> +}
>> +
>>  /* Cores */
>>  pnv_chip_core_sanitize(chip, &error);
>>  if (error) {
>> @@ -709,6 +727,8 @@ static void pnv_chip_realize(DeviceState *dev, Error 
>> **errp)
>>  object_property_set_int(OBJECT(pnv_core),
>>  pcc->core_pir(chip, core_hwid),
>>  "pir", &error_fatal);
>> +object_property_add_const_link(OBJECT(pnv_core), "xics",
>> +   qdev_get_machine(), &error_fatal);
>>  object_property_set_bool(OBJECT(pnv_core), true, "realized",
>>   &error_fatal);
>>  object_unref(OBJECT(pnv_core));
>> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
>> index d79d530b4881..8633afbff795 100644
>> --- a/hw/ppc/pnv_core.c
>> +++ b/hw/ppc/pnv_core.c
>> @@ -26,6 +26,128 @@
>>  #include "hw/ppc/pnv_core.h"
>>  #include "hw/ppc/pnv_xscom.h"
>>
>> +static uint64_t pnv_core_icp_read(void *opaque, hwaddr addr, unsigned width)
>> +{
>> +ICPState *icp = opaque;
>> +bool byte0 = (width == 1 && (addr & 0x3) == 0);
> 
> hard to read, an inline function should produce the same code at be more 
> easily reviewable.

true. I can improve that if a resend is needed.

Thanks,

C.

>> +uint64_t val = 0x;
>> +
>> +switch (addr & 0xffc) {
>> +case 0: /* poll */
>> +val = icp_ipoll(icp, NULL);
>> +if (byte0) {
>> +val >>= 24;
>> +} else if (width != 4) {
>> +goto bad_access;
>> +}
>> +break;
>> +case 4: /* xirr */
>> +if (byte0) {
>> +val = icp_ipoll(icp, NULL) >> 24;
>> +} else if (width == 4) {
>> +val = icp_accept(icp);
>> +} else {
>> +goto bad_access;
>> +}
>> +break;
>> +case 12:
>> +if (byte0) {
>> +val = icp->mfrr;
>> +} else {
>> +goto bad_access;
>> +}
>> +break;
>> +case 16:
>> +if (width == 4) {
>> +val = icp->links[0];
>> +} else {
>> +goto bad_access;
>> +}
>> +break;
>> +case 20:
>> +if (width == 4) {
>> +val = icp->links[1];
>> +} else {
>> +goto bad_access;
>> +}
>> +break;
>> +case 24:
>> +if (width == 4) {
>> +val = icp->links[2];
>> +} else {
>> +goto bad_access;
>> +}
>> +break;
>> +default:
>> +bad_access:
>> +qemu_log_mask(LOG_GUEST_ERROR, "XICS: Bad ICP access 0x%"
>> +  HWADDR_PRIx"/%d\n", addr, width);
>> +}
>> +
>> +return val;
>> +}
>> +
>> +static void pnv_core_icp_write(void *opaque, hwaddr addr, uint64_t val,
>> +  unsigned width)
>> +{
>> +ICPState *icp = opaque;
>> +bool byte0 =

Re: [Qemu-devel] [PATCH v2] Add inst_dirty_pages_rate in 'info migrate'

2017-03-08 Thread Daniel P. Berrange
On Wed, Mar 08, 2017 at 04:28:19PM +0800, Chao Fan wrote:
> Auto-converge aims to accelerate migration by slowing down the
> generation of dirty pages. But user doesn't know how to determine the
> throttle value, so, a new item "inst-dirty-pages-rate" in "info migrate"
> would be helpful for user's determination.

The "info migrate" command already reports a "dirty-pages-rate" value.

Maybe I'm mis-understanding what you're calculcating, this this proposal
looks the same, except reporting in bytes rather than page counts.

QEMU in fact already records the bytes count internally too in the
'dirty_pages_bytes' parameter which is calculated from taking
'dirty_pages_size * TARGET_PAGE_SIZE'.

So I wonder if we can just export the existing dirty-pages-bytes
value in info migrate, and avoid needing this new code here:

> +static void migration_inst_rate(void)
> +{
> +int64_t dirty_pages_time_now;
> +if (!dirty_pages_time_prev) {
> +dirty_pages_time_prev = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +}
> +dirty_pages_time_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +if (dirty_pages_time_now > dirty_pages_time_prev + 1000) {
> +RAMBlock *block;
> +MigrationState *s = migrate_get_current();
> +int64_t inst_dirty_pages = 0;
> +int64_t i;
> +unsigned long *num;
> +unsigned long len = 0;
> +
> +rcu_read_lock();
> +DirtyMemoryBlocks *blocks = atomic_rcu_read(
> + &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +if (len == 0) {
> +len = block->offset;
> +}
> +len += block->used_length;
> +}
> +ram_addr_t idx = (len >> TARGET_PAGE_BITS) / DIRTY_MEMORY_BLOCK_SIZE;
> +if (((len >> TARGET_PAGE_BITS) % DIRTY_MEMORY_BLOCK_SIZE) != 0) {
> +idx++;
> +}
> +for (i = 0; i < idx; i++) {
> +num = blocks->blocks[i];
> +inst_dirty_pages += bitmap_weight(num, DIRTY_MEMORY_BLOCK_SIZE);
> +}
> +rcu_read_unlock();
> +
> +s->inst_dirty_pages_rate = inst_dirty_pages * TARGET_PAGE_SIZE *
> +1000 / (dirty_pages_time_now - dirty_pages_time_prev);
> +}
> +dirty_pages_time_prev = dirty_pages_time_now;
>  }

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Eduardo Habkost
On Wed, Mar 08, 2017 at 11:59:11AM +, Dr. David Alan Gilbert wrote:
[...]
> > +MsrInfoList *qmp_msr_list(uint32_t msr_idx, Error **errp)
> > +{
> > +#ifdef TARGET_I386
> > +bool valid;
> > +X86CPU *cpu;
> > +CPUX86State *env;
> > +CPUState *cpu_state;
> > +MsrInfoList *head = NULL, *cur_item = NULL;
> > +
> > +CPU_FOREACH(cpu_state) {
> > +cpu = X86_CPU(cpu_state);
> > +env = &cpu->env;
> > +
> > +cpu_synchronize_state(cpu_state);
> > +
> > +MsrInfoList *info;
> > +info = g_malloc0(sizeof(*info));
> > +info->value = g_malloc0(sizeof(*info->value));
> > +
> > +info->value->cpu_idx = cpu_state->cpu_index;
> > +info->value->msr_idx = msr_idx;
> > +info->value->value = x86_cpu_rdmsr(env, msr_idx, &valid);
> > +
> > +if (!valid) {
> > +error_setg(errp, "Failed to retreive MSR 0x%08x on CPU %u",
> > +   msr_idx, cpu_state->cpu_index);
> > +return head;
> > +}
> > +
> > +/* XXX: waiting for the qapi to support GSList */
> > +if (!cur_item) {
> > +head = cur_item = info;
> > +} else {
> > +cur_item->next = info;
> > +cur_item = info;
> > +}
> > +}
> > +
> > +return head;
> > +#else
> > +error_setg(errp, "MSRs are not supported on this architecture");
> > +return NULL;
> > +#endif
> > +}
> 
> This should be abstracted some how so that we don't need
> x86 specifics in cpus.c; perhaps just an architecture call
> back on the CPU.

If it's only supported by x86, I would just move the
implementation to a x86-specific file, and add a stub for the
other architectures. See qmp_query_gic_capabilities() for an
example.

Also, the command should be added to
qmp_unregister_commands_hack() so we don't even report it as
available on other architectures.

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 00/11] MTTCG fix-ups for 2.9

2017-03-08 Thread Pranith Kumar
On Wed, Mar 8, 2017 at 2:39 AM, Alex Bennée  wrote:
>
> Pranith Kumar  writes:
>
>> Hi Alex,
>>
>> On Tue, Mar 7, 2017 at 10:50 AM, Alex Bennée  wrote:
>>> Hi,
>>>
>>> This is the latest iteration of fixes for problems introduced by
>>> MTTCG. Aside from the usual slew of addressing review comments and
>>> applying tags I've also pulled in Yongbok Kim's MIPS patch (to replace
>>> mine) and Paolo's VMEXIT fix for the problem reported by Alexander
>>> Boettcher.
>>>
>>> Unless anyone shouts I'll wrap these up into a pull request for Peter
>>> tomorrow.
>>
>>
>> Can you also pick up this patch? I am not sure who else will.
>>
>> http://patchwork.ozlabs.org/patch/733293/
>
> Does the problem only exhibit itself with --accel tcg,thread=multi? I
> was figuring it would go in with a larger x86 MTTCG enabling patch set
> post 2.9. If it can occur with the defaults now I'll happily add it
> to the pullreq I'm preparing.
>

It happens only with thread=multi, so I guess later on is OK.

Thanks,
-- 
Pranith



Re: [Qemu-devel] [RFC PATCH qemu] dtc: Allow compiling with old gcc

2017-03-08 Thread Peter Maydell
On 8 March 2017 at 13:49, Alexey Kardashevskiy  wrote:
> After 6e85fce0225f "dtc: Update requirement to v1.4.2" QEMU stopped
> compiling in CentOS7:
>
> In file included from /home/aik/p/qemu/dtc/libfdt/libfdt.h:54:0,
>  from /home/aik/p/qemu/device_tree.c:30:
> /home/aik/p/qemu/dtc/libfdt/libfdt_env.h:64:0: error: "__bitwise" redefined 
> [-Werror]
>  #define __bitwise
>  ^
> In file included from /usr/include/asm/ptrace.h:27:0,
>  from /usr/include/asm/sigcontext.h:11,
>  from /usr/include/bits/sigcontext.h:27,
>  from /usr/include/signal.h:340,
>  from /home/aik/p/qemu/include/qemu/osdep.h:86,
>  from /home/aik/p/qemu/device_tree.c:14:
> /usr/include/linux/types.h:21:0: note: this is the location of the previous 
> definition
>  #define __bitwise __bitwise__
>  ^
> cc1: all warnings being treated as errors
> make: *** [device_tree.o] Error 1
> make: *** Waiting for unfinished jobs
>
> The reason is that CentOS7 comes with libfdt 1.4.0 so QEMU tries using
> the internal one which does not compile as CentOS7 comes with gcc v4.8.5
> which reports warnings which it would not if the OS's libfdt was used
> (libfdt_env.h has not changed between 1.4.0 and 1.4.2).
>
> gcc 6.2.0 from Ubuntu v16.10 handles this fine.
>
> This replaces -I with -isystem to suppress the warning (which turns
> to an error because of -Werror).

Thanks for the bug report. I think it would be cleaner to fix this
by fixing the problem upstream in libfdt and then moving our
submodule forward to the fixed version. (libfdt should not be
defining __ prefixed symbols as these are reserved for the
system.)

thanks
-- PMM



Re: [Qemu-devel] [PATCH 2/2] vl: Print CPU help after we've registered the CPU accelerators

2017-03-08 Thread Peter Maydell
On 3 March 2017 at 15:58, Eduardo Habkost  wrote:
> On Tue, Jan 31, 2017 at 02:11:59PM +0100, Thomas Huth wrote:
>> When running with KVM on POWER, we register some CPU types during
>> the initialization function of the ppc64 KVM code (which unfortunately
>> also can not be done via a type_init() like it is done on x86).
>
> Can you elaborate why it can't be done via type_init()? If the
> QOM type hierarchy depends on any runtime data unavailable at
> type_init(), we should fix that.

On ARM we (currently) have the KVM-only 'host' CPU type be
added to the type hierarchy only at runtime in kvm_init(),
but we deal with the '-help' problem by having arm_cpu_list() do

#ifdef CONFIG_KVM
/* The 'host' CPU type is dynamically registered only if KVM is
 * enabled, so we have to special-case it here:
 */
(*cpu_fprintf)(f, "  host (only available in KVM mode)\n");
#endif

thanks
-- PMM



Re: [Qemu-devel] [PULL 00/27] Block layer fixes for 2.9.0-rc0

2017-03-08 Thread Peter Maydell
On 7 March 2017 at 16:40, Kevin Wolf  wrote:
> The following changes since commit ff79d5e939c38677a575e3493eb9b4d36eb21865:
>
>   Merge remote-tracking branch 'remotes/xtensa/tags/20170306-xtensa' into 
> staging (2017-03-07 09:57:14 +)
>
> are available in the git repository at:
>
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to b69f00dde490e88d55f5ee731545e690b2dc61f8:
>
>   commit: Don't use error_abort in commit_start (2017-03-07 14:53:29 +0100)
>
> 
> Block layer fixes for 2.9.0-rc0
>

Applied, thanks.

-- PMM



Re: [Qemu-devel] [PATCH 2/2] vl: Print CPU help after we've registered the CPU accelerators

2017-03-08 Thread Eduardo Habkost
On Wed, Mar 08, 2017 at 03:31:01PM +0100, Peter Maydell wrote:
> On 3 March 2017 at 15:58, Eduardo Habkost  wrote:
> > On Tue, Jan 31, 2017 at 02:11:59PM +0100, Thomas Huth wrote:
> >> When running with KVM on POWER, we register some CPU types during
> >> the initialization function of the ppc64 KVM code (which unfortunately
> >> also can not be done via a type_init() like it is done on x86).
> >
> > Can you elaborate why it can't be done via type_init()? If the
> > QOM type hierarchy depends on any runtime data unavailable at
> > type_init(), we should fix that.
> 
> On ARM we (currently) have the KVM-only 'host' CPU type be
> added to the type hierarchy only at runtime in kvm_init(),
> but we deal with the '-help' problem by having arm_cpu_list() do
> 
> #ifdef CONFIG_KVM
> /* The 'host' CPU type is dynamically registered only if KVM is
>  * enabled, so we have to special-case it here:
>  */
> (*cpu_fprintf)(f, "  host (only available in KVM mode)\n");
> #endif
> 

We already have similar code at ppc_cpu_list():

#ifdef CONFIG_KVM
cpu_fprintf(f, "\n");
cpu_fprintf(f, "PowerPC %-16s\n", "host");
#endif

If I understood it correctly, the current problem is just
inaccurate alias information being printed because the alias
table is modified by the accelerator.

My current suggestion is to avoid printing anything that depends
on the current machine/accelerator at the help output.

-- 
Eduardo



Re: [Qemu-devel] [Qemu-trivial] [PATCH for-2.9?] tests: Ignore more test executables

2017-03-08 Thread Eric Blake
On 03/08/2017 04:00 AM, Laurent Vivier wrote:
> and tests/test-arm-mptimer?

Introduced in commit 882fac37. Looks like my usual subset build focusing
on x86 doesn't see it, but indeed an all-targets build shows it. I'll
spin a v2.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH for-2.9? v2] tests: Ignore more test executables

2017-03-08 Thread Eric Blake
Ignore test executables when building in-tree:
test-arm-mptimer introduced in commit 882fac3
test-crypto-hmac introduced in commit 4fd460b
test-aio-multithread introduced in commit 0c330a7

Signed-off-by: Eric Blake 
---

v2: pick up another built binary [Laurent]

Doesn't affect the built binaries, but does make it harder to
accidentally commit an unintended binary when doing 'git add .'.
Therefore, it's up to the maintainer if this is 2.9 or 2.10
material.  (test-arm-mptimer has been around since the 2.8 release)

 tests/.gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/.gitignore b/tests/.gitignore
index dc37519..73598b6 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,8 @@ check-qom-proplist
 qht-bench
 rcutorture
 test-aio
+test-aio-multithread
+test-arm-mptimer
 test-base64
 test-bitops
 test-bitcnt
@@ -24,6 +26,7 @@ test-crypto-afsplit
 test-crypto-block
 test-crypto-cipher
 test-crypto-hash
+test-crypto-hmac
 test-crypto-ivgen
 test-crypto-pbkdf
 test-crypto-secret
-- 
2.9.3




Re: [Qemu-devel] [Qemu-trivial] [PATCH for-2.9? v2] tests: Ignore more test executables

2017-03-08 Thread Laurent Vivier
On 08/03/2017 16:15, Eric Blake wrote:
> Ignore test executables when building in-tree:
> test-arm-mptimer introduced in commit 882fac3
> test-crypto-hmac introduced in commit 4fd460b
> test-aio-multithread introduced in commit 0c330a7
> 
> Signed-off-by: Eric Blake 

Reviewed-by: Laurent Vivier 

> ---
> 
> v2: pick up another built binary [Laurent]
> 
> Doesn't affect the built binaries, but does make it harder to
> accidentally commit an unintended binary when doing 'git add .'.
> Therefore, it's up to the maintainer if this is 2.9 or 2.10
> material.  (test-arm-mptimer has been around since the 2.8 release)
> 
>  tests/.gitignore | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tests/.gitignore b/tests/.gitignore
> index dc37519..73598b6 100644
> --- a/tests/.gitignore
> +++ b/tests/.gitignore
> @@ -11,6 +11,8 @@ check-qom-proplist
>  qht-bench
>  rcutorture
>  test-aio
> +test-aio-multithread
> +test-arm-mptimer
>  test-base64
>  test-bitops
>  test-bitcnt
> @@ -24,6 +26,7 @@ test-crypto-afsplit
>  test-crypto-block
>  test-crypto-cipher
>  test-crypto-hash
> +test-crypto-hmac
>  test-crypto-ivgen
>  test-crypto-pbkdf
>  test-crypto-secret
> 




Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Julian Kirsch
On 08.03.2017 12:59, Dr. David Alan Gilbert wrote:
> * Julian Kirsch (g...@kirschju.re) wrote:
> 
> I'll leave those who know more about gdb to answer whether that's the best way
> of doing it; but I can see them being useful to those just trying to debug
> stuff from the monitor.
> 

Just want to make a side note here that the qemu monitor is fully accessible
from gdb using the "monitor" command. Hence adding the commands to hmp makes
them available for the monitor and gdb clients at the same time.

> 
> I think the patch needs to be split up; you should at least have:
>   a) The big part that moves the helpers (if that's the right thing to do)
>   b) The qmp code
>   c) The hmp code
> 
> I also don't see why you need to move the helpers.

Yes, splitting the patch up into three smaller ones sounds like a good idea.

The helpers are moved to helpers.c because none of the functions in
misc_helpers.c seems to be contained in any header file (just referenced by
tcg). I didn't want to be an exception to this rule. Moving the functionality to
cpu_x86_[rd|wr]msr and including their prototype in target/i386/cpu.h also makes
them available to the qmp code in cpus.c in a way I considered elegant enough.
Besides, helper_cpuid/cpu_x86_cpuid seem to follow the same structure. Correct
me if I'm wrong on this one.


>> There are several things I would like to pooint out about this patch:
>>   - The "msr-list" command currently displays MSR values for all virtual 
>> cpus.
>> This is somewhat inconsistent with "info registers" just displaying the
>> value of the current default cpu. One might think about just displaying 
>> the
>> current value and offer access to other CPU's MSRs by means of switching
>> between CPUs using the "cpu" monitor command.
> 
> Yes, it's probably best to make it just the current CPU to be consistent.
> 

Will take care of this as well. This will causes me to rename "msr-list" to
"msr-get".

>>   - The new version of x86_cpu_rdmsr exposes MSRs that are arguably of
>> questionable help for any human / tool using the monitor. However I 
>> merely
>> felt a deep urge to reflect the full MSR list from kvm.c when writing the
>> code.
> 
> No point in guessing which ones the human will need; may as well give them
> everything so they can debug bugs that are obscure.
> 

Alright, then let's keep the shadiness.

>>   - While the need for msr-list is evident (see commit msg), msr-set could be
>> used in more obscure cases. For instance, one might offer a way to access
>> and configure performance counter MSRs of the guest via the hmp. If this
>> feels too much like an inacceptable hack, I'll happily drop the msr-set
>> part.
> 
> It feels reasonable to have it for debugging.
> 

Fine with me.

> (Heck, aren't those big switch statements depressing, I'm sure there
> must be a way to turn a chunk of them into a table that could be shared 
> between
> the helpers and even maybe the kvm code; anyway, not the subject of this 
> patch).
> 

(I fully agree.)

> Dave
> 

Best,
Julian



Re: [Qemu-devel] [PATCH 2/2] vl: Print CPU help after we've registered the CPU accelerators

2017-03-08 Thread Thomas Huth
On 08.03.2017 15:50, Eduardo Habkost wrote:
> On Wed, Mar 08, 2017 at 03:31:01PM +0100, Peter Maydell wrote:
>> On 3 March 2017 at 15:58, Eduardo Habkost  wrote:
>>> On Tue, Jan 31, 2017 at 02:11:59PM +0100, Thomas Huth wrote:
 When running with KVM on POWER, we register some CPU types during
 the initialization function of the ppc64 KVM code (which unfortunately
 also can not be done via a type_init() like it is done on x86).
>>>
>>> Can you elaborate why it can't be done via type_init()? If the
>>> QOM type hierarchy depends on any runtime data unavailable at
>>> type_init(), we should fix that.
>>
>> On ARM we (currently) have the KVM-only 'host' CPU type be
>> added to the type hierarchy only at runtime in kvm_init(),
>> but we deal with the '-help' problem by having arm_cpu_list() do
>>
>> #ifdef CONFIG_KVM
>> /* The 'host' CPU type is dynamically registered only if KVM is
>>  * enabled, so we have to special-case it here:
>>  */
>> (*cpu_fprintf)(f, "  host (only available in KVM mode)\n");
>> #endif
>>
> 
> We already have similar code at ppc_cpu_list():
> 
> #ifdef CONFIG_KVM
> cpu_fprintf(f, "\n");
> cpu_fprintf(f, "PowerPC %-16s\n", "host");
> #endif
> 
> If I understood it correctly, the current problem is just
> inaccurate alias information being printed because the alias
> table is modified by the accelerator.

Yes.

> My current suggestion is to avoid printing anything that depends
> on the current machine/accelerator at the help output.

I'll have a look at that...

 Thomas




Re: [Qemu-devel] [PATCH for-2.10 1/3] block: Add errp to b{lk, drv}_truncate()

2017-03-08 Thread Max Reitz
On 07.03.2017 11:47, Kevin Wolf wrote:
> Am 06.03.2017 um 20:54 hat Max Reitz geschrieben:
>> For one thing, this allows us to drop the error message generation from
>> qemu-img.c and blockdev.c and instead have it unified in
>> bdrv_truncate().
>>
>> Signed-off-by: Max Reitz 
> 
>>  block/commit.c |  5 +++--
>>  block/mirror.c |  2 +-
> 
> These still pass NULL after the series. Fixing it would require to add a
> way to complete jobs with an Error object, but maybe we should do this
> sooner or later anyway. Not necessarily part of this series, though.
> 
>>  block/vhdx.c   |  6 +++---
>>  block/vpc.c|  2 +-
> 
> vpc can easily be fixed to pass the actual errp from vpc_create() to the
> blk_truncate() call. In vhdx.c, the blk_truncate() call is a bit more
> deeply nested, but it doesn't seem completely unreasonable there either.

I'll look into it, thanks.

>> --- a/qemu-io-cmds.c
>> +++ b/qemu-io-cmds.c
>> @@ -1569,7 +1569,7 @@ static int truncate_f(BlockBackend *blk, int argc, 
>> char **argv)
>>  return 0;
>>  }
>>  
>> -ret = blk_truncate(blk, offset);
>> +ret = blk_truncate(blk, offset, NULL);
>>  if (ret < 0) {
>>  printf("truncate: %s\n", strerror(-ret));
>>  return 0;
> 
> Come on, using NULL immediately before a printf()? Doing the right thing
> here is trivial.

Yeah, but it's also qemu-io, so I somehow felt both "Better not touch it
because it may break tests" and "Nobody really cares about a nice
message here anyway".

Will fix, since apparently at least you do care. ;-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] qmp-shell: add persistent command history

2017-03-08 Thread Nir Soffer
On Wed, Mar 8, 2017 at 8:29 AM, Markus Armbruster  wrote:
> John Snow  writes:
>
>> On 03/07/2017 03:16 AM, Markus Armbruster wrote:
>>> John Snow  writes:
>>>
 On 03/06/2017 03:18 AM, Markus Armbruster wrote:
> Nir Soffer  writes:
>
>> On Fri, Mar 3, 2017 at 9:29 PM, John Snow  wrote:
>>>
>>>
>>> On 03/03/2017 02:26 PM, Nir Soffer wrote:
 On Fri, Mar 3, 2017 at 8:54 PM, John Snow  wrote:
> Use the existing readline history function we are utilizing
> to provide persistent command history across instances of qmp-shell.
>
> This assists entering debug commands across sessions that may be
> interrupted by QEMU sessions terminating, where the qmp-shell has
> to be relaunched.
>
> Signed-off-by: John Snow 
> ---
>
> v2: Adjusted the errors to whine about non-ENOENT errors, but still
> intercept all errors as non-fatal.
> Save history atexit() to match bash standard behavior
>
>  scripts/qmp/qmp-shell | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell
> index 0373b24..55a8285 100755
> --- a/scripts/qmp/qmp-shell
> +++ b/scripts/qmp/qmp-shell
> @@ -70,6 +70,9 @@ import json
>  import ast
>  import readline
>  import sys
> +import os
> +import errno
> +import atexit
>
>  class QMPCompleter(list):
>  def complete(self, text, state):
> @@ -109,6 +112,7 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  self._pretty = pretty
>  self._transmode = False
>  self._actions = list()
> +self._histfile = os.path.join(os.path.expanduser('~'), 
> '.qmp_history')
>
> I selfishly object to this filename, because I'm using it with
>
> $ socat UNIX:/work/armbru/images/test-qmp 
> READLINE,history=$HOME/.qmp_history,prompt='QMP> '
>
> Just kidding.  But seriously, shouldn't this be named after the
> *application* (qmp-shell) rather than the protocol (qmp)?
>

 Seeing as the history itself is the qmp-shell syntax, you are correct.

 (Hey, it would be interesting to store the generated QMP into the
 qmp_history, though...!)
>>>
>>> Hah!  Saving it would be easy enough, but reloading it...  okay, call it
>>> a "backup" and declare victory when saving works.
>>>
>
>  def __get_address(self, arg):
>  """
> @@ -137,6 +141,21 @@ class QMPShell(qmp.QEMUMonitorProtocol):
>  # XXX: default delimiters conflict with some command names 
> (eg. query-),
>  # clearing everything as it doesn't seem to matter
>  readline.set_completer_delims('')
> +try:
> +readline.read_history_file(self._histfile)
> +except Exception as e:
> +if isinstance(e, IOError) and e.errno == errno.ENOENT:
> +# File not found. No problem.
> +pass
> +else:
> +print "Failed to read history '%s'; %s" % 
> (self._histfile, e)

 I would handle only IOError, since any other error means a bug in this 
 code
 or in the underlying readline library, and the best way to handle this 
 is to
 let it fail loudly.

>>>
>>> Disagree. No reason to stop the shell from working because a QOL feature
>>> didn't initialize correctly.
>>>
>>> The warning will be enough to solicit reports and fixes if necessary.
>>
>> I agree, the current solution is good tradeoff.
>
> For what it's worth, bash seems to silently ignore a history file it
> can't read.  Tested by running "HISTFILE=xxx bash", then chmod 0 xxx, da
> capo.
>

 Right, done by example.

>> One thing missing, is a call to readline.set_history_length, without
>> it the history
>> will grow without limit, see:
>> https://docs.python.org/2/library/readline.html#readline.set_history_length
>
> Should this be addressed for 2.9?
>

 You can add a limit if you want to; I don't have suggestions for which
 completely arbitrary limit makes sense, so I left it blank intentionally.
>>>
>>> For what it's worth, bash defaults HISTSIZE to 500.
>>>
>>> GNU Readline lets users configure it in ~/.inputrc.  Conditional
>>> configuration is possible, i.e. something like
>>>
>>> $if Qmp-shell
>>> set history-size 5000
>>> $endif
>>>
>>> should work, provided qmp-shell sets rl_readline_name as it should.
>>>
>>
>> Spoke too soon. I don't see a way to control this in python's readline
>> library... I'm not very familiar with readline

Re: [Qemu-devel] [PATCH] X86/HMP: Expose x86 model specific registers via human monitor

2017-03-08 Thread Julian Kirsch
On 08.03.2017 14:57, Eduardo Habkost wrote:

>>
>> This should be abstracted some how so that we don't need
>> x86 specifics in cpus.c; perhaps just an architecture call
>> back on the CPU.
> 
> If it's only supported by x86, I would just move the
> implementation to a x86-specific file, and add a stub for the
> other architectures. See qmp_query_gic_capabilities() for an
> example.
> 
> Also, the command should be added to
> qmp_unregister_commands_hack() so we don't even report it as
> available on other architectures.
> 

Awesome, thanks for your comments, I'll move the qmp commands to
target/i386/monitor.c and unregister them for architectures other than I386. Do
I have to explicitly take care of unregistering the hmp commands as well?

-Julian



Re: [Qemu-devel] [PATCH] qdev: Constify data pointed by few arguments and local variables

2017-03-08 Thread Krzysztof Kozlowski
On Mon, Mar 06, 2017 at 09:09:48AM -0300, Eduardo Habkost wrote:
> Hi,
> 
> 
> On Sun, Mar 05, 2017 at 11:46:33PM +0200, Krzysztof Kozlowski wrote:
> > In few places the function arguments and local variables are not
> > modifying data passed through pointers so this can be made const for
> > code safeness.
> > 
> > Signed-off-by: Krzysztof Kozlowski 
> 
> I believe most changes below are misleading to users of the API,
> and make the code less safe. Most of the pointers passed as
> argument to those functions will be stored at non-const pointer
> fields inside the objects.
> 
> > ---
> >  hw/core/qdev-properties-system.c |  6 +++---
> >  hw/core/qdev-properties.c|  7 ---
> >  include/hw/qdev-properties.h | 11 +++
> >  3 files changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/core/qdev-properties-system.c 
> > b/hw/core/qdev-properties-system.c
> > index c34be1c1bace..abbf3ef754d8 100644
> > --- a/hw/core/qdev-properties-system.c
> > +++ b/hw/core/qdev-properties-system.c
> > @@ -405,7 +405,7 @@ void qdev_prop_set_drive(DeviceState *dev, const char 
> > *name,
> >  if (value) {
> >  ref = blk_name(value);
> >  if (!*ref) {
> > -BlockDriverState *bs = blk_bs(value);
> > +const BlockDriverState *bs = blk_bs(value);
> >  if (bs) {
> >  ref = bdrv_get_node_name(bs);
> >  }
> 
> This part looks safe, but still misleading: the
> object_property_set_str() call will end up changing a non-const
> pointer field in the object. I'm not sure what's the benefit of
> this change.

I might be missing something... but I am touching only the 'bs' and it
is passed to bdrv_get_node_name() also as const. The
bdrv_get_node_name() just accepts pointer to const.

I am not sure why are you referring to object_property_set_str(). The
value returned by bdrv_get_node_name() is pointer to const.
object_property_set_str() also takes pointer to const.

So entire path, starting from 'bs' uses pointer to const... thus I
find misleading that 'bs' is not pointing to const data. It should.

> 
> > @@ -416,7 +416,7 @@ void qdev_prop_set_drive(DeviceState *dev, const char 
> > *name,
> >  }
> >  
> >  void qdev_prop_set_chr(DeviceState *dev, const char *name,
> > -   Chardev *value)
> > +   const Chardev *value)
> 
> This wrapper will end up storing 'value' in a non-const pointer
> in the object (e.g. PL011State::chr). Declaring 'value' as const
> is misleading.

The object_property_set_str() takes data as pointer to const. If data
ends up as being non-const, then this is the mistake -
object_property_set_str().

> 
> 
> >  {
> >  assert(!value || value->label);
> >  object_property_set_str(OBJECT(dev),
> > @@ -424,7 +424,7 @@ void qdev_prop_set_chr(DeviceState *dev, const char 
> > *name,
> >  }
> >  
> >  void qdev_prop_set_netdev(DeviceState *dev, const char *name,
> > -  NetClientState *value)
> > +  const NetClientState *value)
> 
> Same case here: the wrapper will store 'value' in a non-const
> NetClientState pointer in the object.
> 
> >  {
> >  assert(!value || value->name);
> >  object_property_set_str(OBJECT(dev),
> > diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> > index 6ab4265eb478..34ec10f0caac 100644
> > --- a/hw/core/qdev-properties.c
> > +++ b/hw/core/qdev-properties.c
> > @@ -1010,7 +1010,8 @@ void qdev_prop_set_string(DeviceState *dev, const 
> > char *name, const char *value)
> >  object_property_set_str(OBJECT(dev), value, name, &error_abort);
> >  }
> >  
> > -void qdev_prop_set_macaddr(DeviceState *dev, const char *name, uint8_t 
> > *value)
> > +void qdev_prop_set_macaddr(DeviceState *dev, const char *name,
> > +   const uint8_t *value)
> 
> This one makes sense to me.
> 
> >  {
> >  char str[2 * 6 + 5 + 1];
> >  snprintf(str, sizeof(str), "%02x:%02x:%02x:%02x:%02x:%02x",
> > @@ -1028,10 +1029,10 @@ void qdev_prop_set_enum(DeviceState *dev, const 
> > char *name, int value)
> >  name, &error_abort);
> >  }
> >  
> > -void qdev_prop_set_ptr(DeviceState *dev, const char *name, void *value)
> > +void qdev_prop_set_ptr(DeviceState *dev, const char *name, const void 
> > *value)
> >  {
> >  Property *prop;
> > -void **ptr;
> > +const void **ptr;
> 
> This one is also misleading: the field pointed by
> qdev_prop_find() is normally a non-const pointer.

Understood, I'll drop it.

Best regards,
Krzysztof




Re: [Qemu-devel] [PATCH V3] virtio: do not require IOMMU to be created in advance

2017-03-08 Thread Igor Mammedov
On Tue, 7 Mar 2017 14:47:30 +0200
Marcel Apfelbaum  wrote:

> On 03/07/2017 11:09 AM, Jason Wang wrote:
> > After commit 96a8821d2141 ("virtio: unbreak virtio-pci with IOMMU
> > after caching ring translations"), IOMMU was required to be created in
> > advance. This is because we can only get the correct dma_as after pci
> > IOMMU (e.g intel_iommu) was initialized. This is suboptimal and
> > inconvenient for user. This patch releases this by:
> >
> > - introduce a bus_master_ready method for PCIDeviceClass and trigger
> >   this during pci_init_bus_master
> > - implement virtio-pci method and 1) reset the dma_as 2) re-register
> >   the memory listener to the new dma_as
> >  
> 
> Hi Jason,
> 
> > Cc: Paolo Bonzini 
> > Signed-off-by: Jason Wang 
> > ---
> > Changes from V2:
> > - delay pci_init_bus_master() after pci device is realized to make
> >   bus_master_ready a more generic method
> > ---
> >  hw/pci/pci.c   | 11 ---
> >  hw/virtio/virtio-pci.c |  9 +
> >  hw/virtio/virtio.c | 19 +++
> >  include/hw/pci/pci.h   |  1 +
> >  include/hw/virtio/virtio.h |  1 +
> >  5 files changed, 38 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index 273f1e4..22e6ad9 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -83,6 +83,7 @@ static const VMStateDescription vmstate_pcibus = {
> >  static void pci_init_bus_master(PCIDevice *pci_dev)
> >  {
> >  AddressSpace *dma_as = pci_device_iommu_address_space(pci_dev);
> > +PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
> >
> >  memory_region_init_alias(&pci_dev->bus_master_enable_region,
> >   OBJECT(pci_dev), "bus master",
> > @@ -90,6 +91,9 @@ static void pci_init_bus_master(PCIDevice *pci_dev)
> >  memory_region_set_enabled(&pci_dev->bus_master_enable_region, false);
> >  address_space_init(&pci_dev->bus_master_as,
> > &pci_dev->bus_master_enable_region, pci_dev->name);
> > +if (pc->bus_master_ready) {
> > +pc->bus_master_ready(pci_dev);
> > +}
> >  }
> >
> >  static void pcibus_machine_done(Notifier *notifier, void *data)
> > @@ -995,9 +999,6 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> > *pci_dev, PCIBus *bus,
> >  pci_dev->devfn = devfn;
> >  pci_dev->requester_id_cache = pci_req_id_cache_get(pci_dev);
> >
> > -if (qdev_hotplug) {
> > -pci_init_bus_master(pci_dev);
> > -}
> >  pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
> >  pci_dev->irq_state = 0;
> >  pci_config_alloc(pci_dev);
> > @@ -1995,6 +1996,10 @@ static void pci_qdev_realize(DeviceState *qdev, 
> > Error **errp)
> >  }
> >  }
> >
> > +if (qdev_hotplug) {
> > +pci_init_bus_master(pci_dev);
> > +}
> > +  
> 
> How does it help if we move qdev_hotplug check outside 
> "do_pci_register_device"?
> I suppose you want the code to run after the "realize" function?
> If so, what happens if a "realize" function of another device needs the 
> bus_master_as
> (at hotplug time)?

Conceptually,
I'm not sure that inherited device class realize
should be aware of uninitialized bus_master,
which belongs to PCI device, nor should it initialize
it by calling pci_init_bus_master() explicitly,
it's parent's class job (PCIDevice).

more close to current code:
if xen-pci-passthrough were hotplugged then it looks like
this hunk could break it:
hw/xen/xen_pt.c:
 memory_listener_register(&s->memory_listener, &s->dev.bus_master_as);
would happen with uninitialized bus_master_as
as realize is called before pci_init_bus_master();

So the same question as Marcel's but other way around,
why this hunk "has to" be moved.


> 
> Thanks,
> Marcel
> 
> >  /* rom loading */
> >  is_default_rom = false;
> >  if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index b76f3f6..21cbda2 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -1845,6 +1845,14 @@ static void virtio_pci_exit(PCIDevice *pci_dev)
> >  address_space_destroy(&proxy->modern_as);
> >  }
> >
> > +static void virtio_pci_bus_master_ready(PCIDevice *pci_dev)
> > +{
> > +VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
> > +VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
> > +
> > +virtio_device_reset_dma_as(vdev);
> > +}
> > +
> >  static void virtio_pci_reset(DeviceState *qdev)
> >  {
> >  VirtIOPCIProxy *proxy = VIRTIO_PCI(qdev);
> > @@ -1904,6 +1912,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
> > void *data)
> >  dc->props = virtio_pci_properties;
> >  k->realize = virtio_pci_realize;
> >  k->exit = virtio_pci_exit;
> > +k->bus_master_ready = virtio_pci_bus_master_ready;
> >  k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> >  k->revision = VIRTIO_PCI_ABI_VERSION;
> >  k->class_id = PCI_CLASS_OTHERS;
> > diff --git a/hw/virt

Re: [Qemu-devel] [PATCH for-2.10 1/3] block: Add errp to b{lk, drv}_truncate()

2017-03-08 Thread Max Reitz
On 07.03.2017 11:47, Kevin Wolf wrote:
> Am 06.03.2017 um 20:54 hat Max Reitz geschrieben:
>> For one thing, this allows us to drop the error message generation from
>> qemu-img.c and blockdev.c and instead have it unified in
>> bdrv_truncate().
>>
>> Signed-off-by: Max Reitz 
> 
>>  block/commit.c |  5 +++--
>>  block/mirror.c |  2 +-
> 
> These still pass NULL after the series. Fixing it would require to add a
> way to complete jobs with an Error object, but maybe we should do this
> sooner or later anyway. Not necessarily part of this series, though.
> 
>>  block/vhdx.c   |  6 +++---
>>  block/vpc.c|  2 +-
> 
> vpc can easily be fixed to pass the actual errp from vpc_create() to the
> blk_truncate() call. In vhdx.c, the blk_truncate() call is a bit more
> deeply nested, but it doesn't seem completely unreasonable there either.

vhdx is insofar not completely unreasonable because it already sometimes
generates Error objects and sometimes doesn't; so if I were to expand on
that inconsistent behavior, I wouldn't make matters worse. But still I
don't feel quite comfortable touching that.

Let's see, maybe I prepend a patch that makes vhdx_create() always
generate Error objects.

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1671173] [NEW] OS started to crash with a message: "Trying to execute code outside RAM or ROM"

2017-03-08 Thread Konstantin Tcholokachvili
Public bug reported:

There is a project (https://github.com/narke/colorForth ) wich always
worked with qemu up to version 2.5.1.1 but doesn't works from version
2.6 onwards. It continues to work with bochs.

Downlaod: git clone https://github.com/narke/colorForth.git
Build: make
Test: qemu-system-i386 -drive format=raw,file=cf2012.img,index=0,if=floppy


System information: Ubuntu LTS 16.04 x86-64
Affected qemu versions: 2.6 to present (2.8)


I got the message:


WARNING: Image format was not specified for 'cf2012.img' and probing guessed 
raw.
 Automatically detecting the format is dangerous for raw images, write 
operations on block 0 will be restricted.
 Specify the 'raw' format explicitly to remove the restrictions.
qemu-system-i386: Trying to execute code outside RAM or ROM at 0x8998c426
This usually means one of the following happened:

(1) You told QEMU to execute a kernel for the wrong machine type, and it 
crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb 
QEMU machine)
(2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed a 
ROM full of no-op instructions until it fell off the end
(3) Your guest kernel has a bug and crashed by jumping off into nowhere

This is almost always one of the first two, so check your command line and that 
you are using the right type of kernel for this machine.
If you think option (3) is likely then you can try debugging your guest with 
the -d debug options; in particular -d guest_errors will cause the log to 
include a dump of the guest register state at this point.

Execution cannot continue; stopping here.


Thank you in advance.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1671173

Title:
  OS started to crash with a message: "Trying to execute code outside
  RAM or ROM"

Status in QEMU:
  New

Bug description:
  There is a project (https://github.com/narke/colorForth ) wich always
  worked with qemu up to version 2.5.1.1 but doesn't works from version
  2.6 onwards. It continues to work with bochs.

  Downlaod: git clone https://github.com/narke/colorForth.git
  Build: make
  Test: qemu-system-i386 -drive format=raw,file=cf2012.img,index=0,if=floppy

  
  System information: Ubuntu LTS 16.04 x86-64
  Affected qemu versions: 2.6 to present (2.8)

  
  I got the message:

  
  WARNING: Image format was not specified for 'cf2012.img' and probing guessed 
raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.
  qemu-system-i386: Trying to execute code outside RAM or ROM at 0x8998c426
  This usually means one of the following happened:

  (1) You told QEMU to execute a kernel for the wrong machine type, and it 
crashed on startup (eg trying to run a raspberry pi kernel on a versatilepb 
QEMU machine)
  (2) You didn't give QEMU a kernel or BIOS filename at all, and QEMU executed 
a ROM full of no-op instructions until it fell off the end
  (3) Your guest kernel has a bug and crashed by jumping off into nowhere

  This is almost always one of the first two, so check your command line and 
that you are using the right type of kernel for this machine.
  If you think option (3) is likely then you can try debugging your guest with 
the -d debug options; in particular -d guest_errors will cause the log to 
include a dump of the guest register state at this point.

  Execution cannot continue; stopping here.

  
  Thank you in advance.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1671173/+subscriptions



Re: [Qemu-devel] [PATCH 1/2] RAMBlocks: qemu_ram_is_shared

2017-03-08 Thread Halil Pasic


On 03/07/2017 07:36 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Provide a helper to say whether a RAMBlock was created as a
> shared mapping.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  exec.c| 5 +
>  include/exec/cpu-common.h | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index aabb035..c2cc455 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1560,6 +1560,11 @@ const char *qemu_ram_get_idstr(RAMBlock *rb)
>  return rb->idstr;
>  }
> 
> +bool qemu_ram_is_shared(RAMBlock *rb)
> +{
> +return rb->flags & RAM_SHARED;

Hm, it appears to me RAM_SHARED is defined conditionally, that's within 
#if !defined(CONFIG_USER_ONLY)
#endif

The function however seems to be defined and using RAM_SHARED
unconditionally. Either one or the other seems inappropriate to me.

Regards,
Halil




Re: [Qemu-devel] [PATCH 1/2] RAMBlocks: qemu_ram_is_shared

2017-03-08 Thread Dr. David Alan Gilbert
* Halil Pasic (pa...@linux.vnet.ibm.com) wrote:
> 
> 
> On 03/07/2017 07:36 PM, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Provide a helper to say whether a RAMBlock was created as a
> > shared mapping.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  exec.c| 5 +
> >  include/exec/cpu-common.h | 1 +
> >  2 files changed, 6 insertions(+)
> > 
> > diff --git a/exec.c b/exec.c
> > index aabb035..c2cc455 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -1560,6 +1560,11 @@ const char *qemu_ram_get_idstr(RAMBlock *rb)
> >  return rb->idstr;
> >  }
> > 
> > +bool qemu_ram_is_shared(RAMBlock *rb)
> > +{
> > +return rb->flags & RAM_SHARED;
> 
> Hm, it appears to me RAM_SHARED is defined conditionally, that's within 
> #if !defined(CONFIG_USER_ONLY)
> #endif
> 
> The function however seems to be defined and using RAM_SHARED
> unconditionally. Either one or the other seems inappropriate to me.

Oops! Thanks for spotting that; I'll go and fix that.

Dave

> Regards,
> Halil
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v2 0/9] SMMUv3 Emulation support

2017-03-08 Thread Auger Eric
Hi,
On 22/08/2016 18:17, Prem Mallappa wrote:
> v1 -> v2:
>   - Adopted review comments from Eric Auger
>   - Make SMMU_DPRINTF to internally call qemu_log
>   (since translation requests are too many, we need control
>on the type of log we want)
>   - SMMUTransCfg modified to suite simplicity
>   - Change RegInfo to uint64 register array
>   - Code cleanup
>   - Test cleanups
>   - Reshuffled patches
> 
> RFC -> v1:
>   - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
>   - Reworked register access/update logic
>   - Factored out translation code for
>   - single point bug fix
>   - sharing/removal in future
>   - (optional) Unit tests added, with PCI test device
>   - S1 with 4k/64k, S1+S2 with 4k/64k
>   - (S1 or S2) only can be verified by Linux 4.7 driver
>   - (optional) Priliminary ACPI support
> 
> RFC:
>   - Implements SMMUv3 spec 11.0
>   - Supported for PCIe devices, 
>   - Command Queue and Event Queue supported
>   - LPAE only, S1 is supported and Tested, S2 not tested
>   - BE mode Translation not supported
>   - IRQ support (legacy, no MSI)
>   - Tested with DPDK and e1000 
> 
> Patch 1: Add new log type for IOMMU transactions
> 
> Patch 2: Adds support in virt.c to create both SMMUv3 device and dts entries
> 
> Patch 2: Adds SMMUv3 model to QEMU
>   Multiple files, big ones, translate functionality is split across to
>   accomodate SMMUv2 model, and to remove when common translation feature
>   (if) becomes available.
> 
> Patch 3: Adds SMMU build support
> 
> Patch 4: Some devicetree function to add support for SMMU's multiple interrupt
>assignment with names
> 
> << optional patches >>
> Optional patches are posted for completeness or for those who wants to test.
> 
> Patch 5: A simple PCI device which does DMA from 'src' to 'dst' given
>src_addr, dst_addr and size, and is used by unit test, uses
>pci_dma_read and pci_dma_write in a crude way but serves the purpose.
> 
> Patch 6: Current libqos PCI helpers are x86 only, this addes a generic 
> interface
> 
> Patch 7: Unit tests for SMMU, 
>   - initializes SMMU device 
>   - initializes Test device
>   - allocates page tables 1:1 mapping va == pa
>   - allocates STE/CD accordingly for S1, S2, S1+S2
>   - initiates DMA via PCI test device
>   - verifies transfered data
> 
> Patch 8: Added ACPI IORT tables, was needed for internal project purpose, but 
>posting here for anyone looking for testing ACPI on ARM platforms.
>(P.S: Linux side IORT patches are WIP)
> 
> Repo:
> https://github.com/pmallappa/qemu/tree/upstream/smmuv3/v2
> 
> To Test:
> $ make tests/smmuv3-test
> $ QTEST_QEMU_BINARY=aarch64-softmmu/qemu-system-aarch64 tests/smmuv3-test
> << expect lot of prints >>
> 
> Any comments welcome..
As Prem was forced to stop his activity on this series, I volunteer to
pursue his work. Prior to starting the work, I just would like to check
nobody works on this already or objects.

If not, I intend to rebase and will do my utmost to align, when sensible
with what was done on Xilinx vsmmuv2/intel iommu.

Thanks

Eric
> 
> Cheers
> /Prem
> 
> Prem Mallappa (9):
>   log: Add new IOMMU type
>   devicetree: Added new APIs to make use of more fdt functions
>   hw: arm: SMMUv3 emulation model
>   hw: arm: Added SMMUv3 files for build
>   hw: arm: Add SMMUv3 to virt platform, create DTS accordingly
>   [optional] hw: misc: added testdev for smmu
>   [optional] tests: libqos: generic pci probing helpers
>   [optional] tests: SMMUv3 unit tests
>   [optional] arm: smmu-v3: ACPI IORT initial support
> 
>  default-configs/aarch64-softmmu.mak |1 +
>  device_tree.c   |   35 +
>  hw/arm/Makefile.objs|1 +
>  hw/arm/smmu-common.c|  152 
>  hw/arm/smmu-common.h|  141 
>  hw/arm/smmu-v3.c| 1369 
> +++
>  hw/arm/smmuv3-internal.h|  432 +++
>  hw/arm/virt-acpi-build.c|   43 ++
>  hw/arm/virt.c   |   62 ++
>  hw/misc/Makefile.objs   |2 +-
>  hw/misc/pci-testdev-smmu.c  |  239 ++
>  hw/misc/pci-testdev-smmu.h  |   22 +
>  hw/vfio/common.c|2 +-
>  include/hw/acpi/acpi-defs.h |   84 +++
>  include/hw/arm/smmu.h   |   33 +
>  include/hw/arm/virt.h   |2 +
>  include/qemu/log.h  |1 +
>  include/sysemu/device_tree.h|   18 +
>  tests/Makefile.include  |4 +
>  tests/libqos/pci-generic.c  |  197 +
>  tests/libqos/pci-generic.h  |   58 ++
>  tests/smmuv3-test.c |  952 +++

Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-03-08 Thread ashish mittal
On Wed, Mar 8, 2017 at 5:04 AM, Ketan Nilangekar
 wrote:
>
>
>> On Mar 8, 2017, at 1:13 AM, Daniel P. Berrange  wrote:
>>
>>> On Tue, Mar 07, 2017 at 05:27:55PM -0800, ashish mittal wrote:
>>> Thanks! There is one more input I need some help with!
>>>
>>> VxHS network library opens a fixed number of connection channels to a
>>> given host, and all the vdisks (that connect to the same host) share
>>> these connection channels.
>>>
>>> Therefore, we need to open secure channels to a specific target host
>>> only once for the first vdisk that connects to that host. All the
>>> other vdisks that connect to the same target host will share the same
>>> set of secure communication channels.
>>>
>>> I hope the above scheme is acceptable?
>>
>> I don't think I'm in favour of such an approach, as it forces a single
>> QEMU process to use the same privileges for all disks it uses.
>>
>> Consider an example where a QEMU process has two disks, one shared
>> readonly disk and one exclusive writable disk, both on the same host.
>>
>
> This is not a use case for VxHS as a solution. We do not support sharing of 
> vdisks across QEMU instances.
>
> Vxhs library was thus not designed to open different connections for 
> individual vdisks within a QEMU instance.
>
> Implementing this will involve rewrite of significant parts of libvxhs client 
> and server. Is this a new requirement for acceptance into QEMU?
>
>
>> It is reasonable as an administrator to want to use different credentials
>> for each of these. ie, they might have a set of well known credentials to
>> authenticate to get access to the read-only disk, but have a different set
>> of strictly limited access credentials to get access to the writable disk
>>
>> Trying to re-use the same connection for multiple cause prevents QEMU from
>> authenticating with different credentials per disk, so I don't think that
>> is a good approach - each disk should have totally independant state.
>>

libvxhs does not make any claim to fit all the general purpose
use-cases. It was purpose-built to be the communication channel for
our block device service. As such, we do not need/support all the
general use-cases. For the same reason we changed the name of the
library from linqnio to libvxhs (v8 changelog, #2).

Having dedicated communication channels for each device, or sharing
the channels between multiple devices, should both be acceptable
choices. The latter, IO multiplexing, is also a widely adopted IO
model. It just happens to fit our use-cases better.

Binding access control to a communication channel would prevent
anybody from using the latter approach. Having a separate way to allow
access-control would permit the use of latter also.

>>>
>>> If yes, then we have a couple of options to implement this:
>>>
>>> (1) Accept the TLS credentials per vdisk using the previously
>>> discussed --object tls-creds-x509 mechanism. In this case, if more
>>> than one vdisk have the same host info, then we will use only the
>>> first one's creds to set up the secure connection, and ignore the
>>> others. vdisks that connect to different target hosts will use their
>>> individual tls-creds-x509 to set up the secure channels. This is, of
>>> course, not relevant for qemu-img/qemu-io as they can open only one
>>> vdisk at a time.
>>>
>>> (2) Instead of having a per-vdisk --object tls-creds-x509, have a
>>> single such argument on the command line for vxhs block device code to
>>> consume - if that is possible! One way to achieve this could be the
>>> user/password authentication we discussed earlier, which we could use
>>> to pass the directory where cert/keys are kept.
>>>
>>> (3) Use the instance UUID, when available, to lookup the cert files
>>> per instance (i.e. for qemu-kvm), and use the --object tls-creds-x509
>>> mechanism, when instance UUID is NULL (i.e. qemu-io, qemu-img etc).
>>> The cert/key files are anyway protected by file permissions in either
>>> case, so I guess there is no additional security provided by either
>>> method.
>>
>> Regards,
>> Daniel
>> --
>> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
>> |: http://libvirt.org  -o- http://virt-manager.org :|
>> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



Re: [Qemu-devel] [PATCH v8 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2017-03-08 Thread Daniel P. Berrange
On Wed, Mar 08, 2017 at 09:59:32AM -0800, ashish mittal wrote:
> On Wed, Mar 8, 2017 at 5:04 AM, Ketan Nilangekar
>  wrote:
> >
> >
> >> On Mar 8, 2017, at 1:13 AM, Daniel P. Berrange  wrote:
> >>
> >>> On Tue, Mar 07, 2017 at 05:27:55PM -0800, ashish mittal wrote:
> >>> Thanks! There is one more input I need some help with!
> >>>
> >>> VxHS network library opens a fixed number of connection channels to a
> >>> given host, and all the vdisks (that connect to the same host) share
> >>> these connection channels.
> >>>
> >>> Therefore, we need to open secure channels to a specific target host
> >>> only once for the first vdisk that connects to that host. All the
> >>> other vdisks that connect to the same target host will share the same
> >>> set of secure communication channels.
> >>>
> >>> I hope the above scheme is acceptable?
> >>
> >> I don't think I'm in favour of such an approach, as it forces a single
> >> QEMU process to use the same privileges for all disks it uses.
> >>
> >> Consider an example where a QEMU process has two disks, one shared
> >> readonly disk and one exclusive writable disk, both on the same host.
> >>
> >
> > This is not a use case for VxHS as a solution. We do not support sharing of 
> > vdisks across QEMU instances.
> >
> > Vxhs library was thus not designed to open different connections for 
> > individual vdisks within a QEMU instance.
> >
> > Implementing this will involve rewrite of significant parts of libvxhs 
> > client and server. Is this a new requirement for acceptance into QEMU?
> >
> >
> >> It is reasonable as an administrator to want to use different credentials
> >> for each of these. ie, they might have a set of well known credentials to
> >> authenticate to get access to the read-only disk, but have a different set
> >> of strictly limited access credentials to get access to the writable disk
> >>
> >> Trying to re-use the same connection for multiple cause prevents QEMU from
> >> authenticating with different credentials per disk, so I don't think that
> >> is a good approach - each disk should have totally independant state.
> >>
> 
> libvxhs does not make any claim to fit all the general purpose
> use-cases. It was purpose-built to be the communication channel for
> our block device service. As such, we do not need/support all the
> general use-cases. For the same reason we changed the name of the
> library from linqnio to libvxhs (v8 changelog, #2).

I raise these kind of points because they are relevant to apps like
OpenStack, against which you've proposed VHXS support. OpenStack
intends to allow a single volume to be shared by multiple guests,
so declare that out of scope you're crippling certain use cases
within openstack. Of course you're free to make such a decision,
but it makes VHXS a less compelling technology to use IMHO.

> Having dedicated communication channels for each device, or sharing
> the channels between multiple devices, should both be acceptable
> choices. The latter, IO multiplexing, is also a widely adopted IO
> model. It just happens to fit our use-cases better.
> 
> Binding access control to a communication channel would prevent
> anybody from using the latter approach. Having a separate way to allow
> access-control would permit the use of latter also.

Sharing access control across multiple disks does not fit in effectively
with the model used by apps that manage QEMU. Libvirt, and apps above
libvirt, like OpenStack, oVirt, and things like Kubernetes all represent
the information required to connect to a network block device, on a
per-disk basis - there's no sense of having some information that is
shared across all disks associated with a VM.

So from the POV of modelling this in QEMU, all information needs to be
specified against the individual -drive / -blockdev. If you really must,
you will just have to reject configurations which imply talking to the
same host, with incompatible parameters. Better would be to dynamically
determine if you can re-use connections, vs spawn new connection based
on the config.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|



[Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Max Reitz
In order to use error_setg() and similar functions, we need to include
qapi/error.h.

Signed-off-by: Max Reitz 
---
 block/archipelago.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/archipelago.c b/block/archipelago.c
index 2449cfc702..01c4ff2f7d 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -55,6 +55,7 @@
 #include "block/block_int.h"
 #include "qemu/error-report.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"
-- 
2.12.0




[Qemu-devel] Performance difference between QEMU 2.2.0 and 2.6.0

2017-03-08 Thread Wenwen Wang
Hello all,

I just found that there is a significant performance difference between
QEMU version 2.2.0 and 2.6.0.

Here are the execution times of the benchmark 458.sjeng from SPEC CINT2006
with test workload using user only emulation from ARM guest to x86-32 host:

qemu-2.2.0: 29 seconds
qemu-2.6.0: 38 seconds

I am sure that the evaluation environments are the same in the two
executions.

I also evaluated other benchmarks, it seems that they also have some
performance differences.

Could anyone tell me what this difference comes from or give me any hint?

Many thanks in advance!

Wenwen


[Qemu-devel] [Bug 1274170] Re: qemu window hides in the background on osx

2017-03-08 Thread Colin Burgess
This seems to only affect fullscreen mode, so if that's the case, I
don't see it as resolved.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1274170

Title:
  qemu window hides in the background on osx

Status in QEMU:
  Incomplete

Bug description:
  When launching qemu on OSX (10.8.5), the window comes up in the
  background.  A bit of googling shows that the addition of [NSApp
  activateIgnoringOtherApps:YES]; before the [NSApp run]; in main fixes
  this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1274170/+subscriptions



Re: [Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Eric Blake
On 03/08/2017 12:18 PM, Max Reitz wrote:
> In order to use error_setg() and similar functions, we need to include
> qapi/error.h.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/archipelago.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Eric Blake 

Does this mean our automated buildbots aren't building archipelago by
default?

> 
> diff --git a/block/archipelago.c b/block/archipelago.c
> index 2449cfc702..01c4ff2f7d 100644
> --- a/block/archipelago.c
> +++ b/block/archipelago.c
> @@ -55,6 +55,7 @@
>  #include "block/block_int.h"
>  #include "qemu/error-report.h"
>  #include "qemu/thread.h"
> +#include "qapi/error.h"
>  #include "qapi/qmp/qint.h"
>  #include "qapi/qmp/qstring.h"
>  #include "qapi/qmp/qjson.h"
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Max Reitz
On 08.03.2017 19:25, Eric Blake wrote:
> On 03/08/2017 12:18 PM, Max Reitz wrote:
>> In order to use error_setg() and similar functions, we need to include
>> qapi/error.h.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/archipelago.c | 1 +
>>  1 file changed, 1 insertion(+)
> 
> Reviewed-by: Eric Blake 
> 
> Does this mean our automated buildbots aren't building archipelago by
> default?

Well, at least patchew didn't catch that I definitely broke archipelago
in my bdrv_truncate() errp series...

[CC-ing Fam]

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Eric Blake
On 03/08/2017 12:25 PM, Eric Blake wrote:
> On 03/08/2017 12:18 PM, Max Reitz wrote:
>> In order to use error_setg() and similar functions, we need to include
>> qapi/error.h.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/archipelago.c | 1 +
>>  1 file changed, 1 insertion(+)
> 
> Reviewed-by: Eric Blake 
> 
> Does this mean our automated buildbots aren't building archipelago by
> default?
> 

Oh, I meant to add:

Should probably mention that it was commit da34e65 that introduced the
problem (if I'm right?).  Does that mean we have not been compiling
archipelago.c since Mar 2016?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Max Reitz
On 08.03.2017 19:28, Eric Blake wrote:
> On 03/08/2017 12:25 PM, Eric Blake wrote:
>> On 03/08/2017 12:18 PM, Max Reitz wrote:
>>> In order to use error_setg() and similar functions, we need to include
>>> qapi/error.h.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---
>>>  block/archipelago.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>
>> Reviewed-by: Eric Blake 
>>
>> Does this mean our automated buildbots aren't building archipelago by
>> default?
>>
> 
> Oh, I meant to add:
> 
> Should probably mention that it was commit da34e65 that introduced the
> problem (if I'm right?).  Does that mean we have not been compiling
> archipelago.c since Mar 2016?

At least here on my machine it does indeed compile before that commit
and fails afterwards. So, yes, maybe we actually have not. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH for-2.9] block/archipelago: Make it compile

2017-03-08 Thread Philippe Mathieu-Daudé

On 03/08/2017 03:25 PM, Eric Blake wrote:

On 03/08/2017 12:18 PM, Max Reitz wrote:

In order to use error_setg() and similar functions, we need to include
qapi/error.h.

Signed-off-by: Max Reitz 
---
 block/archipelago.c | 1 +
 1 file changed, 1 insertion(+)


Reviewed-by: Eric Blake 


Reviewed-by: Philippe Mathieu-Daudé 



Does this mean our automated buildbots aren't building archipelago by
default?



diff --git a/block/archipelago.c b/block/archipelago.c
index 2449cfc702..01c4ff2f7d 100644
--- a/block/archipelago.c
+++ b/block/archipelago.c
@@ -55,6 +55,7 @@
 #include "block/block_int.h"
 #include "qemu/error-report.h"
 #include "qemu/thread.h"
+#include "qapi/error.h"
 #include "qapi/qmp/qint.h"
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qjson.h"







  1   2   >