date:20170524

David Gibson  wrote:
> On Wed, May 24, 2017 at 08:28:59AM +0200, Juan Quintela wrote:
>> Bharata B Rao  wrote:
>> > Introduce a new function unregister_savevm_live() to unregister the vmstate
>> > handlers registered via register_savevm_live().
>> >
>> > register_savevm() allocates SaveVMHandlers while register_savevm_live()
>> > gets passed with SaveVMHandlers. During unregistration, we  want to
>> > free SaveVMHandlers in the former case but not free in the latter case.
>> > Hence this new API is needed to differentiate this.
>> >
>> > This new API will be needed by PowerPC to unregister the HTAB savevm
>> > handlers.
>> >
>> > Signed-off-by: Bharata B Rao 
>> > Reviewed-by: David Gibson 
>> > Cc: Juan Quintela 
>> > Cc: Dr. David Alan Gilbert 
>> 
>> Hi
>> 
>> How about this one?
>> I just test compiled it.
>> 
>> Advantage from my point of view is that we always do the right thing.
>> And as migration code already knows if it has to be freed or not, I
>> think it is a better API.
>> 
>> What do you think?
>
> I think this is a better approach.  Do you want to push this one
> directly, Juan, or do you want me to take it through my tree?

I will push it on my next pull request.  I mean, I will send it for
review on own top level.

Thanks, Juan.

Re: [Qemu-devel] [PATCH v3 00/10] Clock framework API.

2017-05-24 Thread KONRAD Frederic


Ping.

Thanks,
Fred

Le 28/02/2017 à 11:02, fred.kon...@greensocs.com a écrit :

From: KONRAD Frederic 

Hi,

This is the third version of the clock framework API it contains:

  * The first 6 patches which introduce the framework.
  * The 7th patch which introduces a fixed-clock model.
  * The rest which gives an example how to model a PLL from the existing
zynqmp-crf extracted from the qemu xilinx tree.

No specific behavior is expected yet when the CRF register set is accessed but
the user can see for example the dp_video_ref and vpll_to_lpd rate changing in
the monitor with the "info qtree" command when the vpll_ctrl register is
modified.

bus: main-system-bus
  type System
  dev: xlnx.zynqmp_crf, id ""
gpio-out "sysbus-irq" 1
qemu-clk "dbg_trace" 0
qemu-clk "dp_stc_ref" 0
qemu-clk "dpll_to_lpd" 1250
qemu-clk "acpu_clk" 0
qemu-clk "pcie_ref" 0
qemu-clk "topsw_main" 0
qemu-clk "topsw_lsbus" 0
qemu-clk "dp_audio_ref" 0
qemu-clk "sata_ref" 0
qemu-clk "dp_video_ref" 1428571
qemu-clk "vpll_clk" 5000
qemu-clk "apll_to_lpd" 1250
qemu-clk "dpll_clk" 5000
qemu-clk "gpu_ref" 0
qemu-clk "aux_refclk" 0
qemu-clk "video_clk" 2700
qemu-clk "gdma_ref" 0
qemu-clk "gt_crx_ref_clk" 0
qemu-clk "dbg_fdp" 0
qemu-clk "apll_clk" 5000
qemu-clk "pss_alt_ref_clk" 0
qemu-clk "ddr" 0
qemu-clk "dbg_tstmp" 0
qemu-clk "pss_ref_clk" 5000
qemu-clk "dpdma_ref" 0
qemu-clk "vpll_to_lpd" 1250
mmio fd1a/010c

This series is based on the current master
(d992f2f1368ceb92e6bfd8efece174110f4236ff).

Thanks,
Fred

V2 -> V3:
  * Rebased on current master.
  * Renamed qemu_clk / QEMUClock as suggested by Cédric.
  * Renamed in_rate to ref_rate and out_rate to rate.
  * Renamed qemu_clk_bind_clock to qemu_clk_bind.
  * Example added to the documentation as suggested by Peter.

V1 -> V2:
  * Rebased on current master.
  * Some function renamed and documentation fixed.

RFC -> V1:
  * Rebased on current master.
  * The docs has been fixed.
  * qemu_clk_init_device helper has been provided to ease the initialization
of the devices.

KONRAD Frederic (10):
  qemu-clk: introduce qemu-clk qom object
  qemu-clk: allow to add a clock to a device
  qemu-clk: allow to bind two clocks together
  qemu-clk: introduce an init array to help the device construction
  qdev-monitor: print the device's clock with info qtree
  docs: add qemu-clock documentation
  introduce fixed-clock
  introduce zynqmp_crf
  zynqmp: add the zynqmp_crf to the platform
  zynqmp: add reference clock

 Makefile.objs |   1 +
 docs/clock.txt| 278 
 hw/arm/xlnx-zynqmp.c  |  56 +++
 hw/misc/Makefile.objs |   2 +
 hw/misc/fixed-clock.c |  88 
 hw/misc/xilinx_zynqmp_crf.c   | 968 ++
 include/hw/arm/xlnx-zynqmp.h  |   8 +
 include/hw/misc/fixed-clock.h |  30 ++
 include/qemu/qemu-clock.h | 161 +++
 qdev-monitor.c|   2 +
 qemu-clock.c  | 176 
 11 files changed, 1770 insertions(+)
 create mode 100644 docs/clock.txt
 create mode 100644 hw/misc/fixed-clock.c
 create mode 100644 hw/misc/xilinx_zynqmp_crf.c
 create mode 100644 include/hw/misc/fixed-clock.h
 create mode 100644 include/qemu/qemu-clock.h
 create mode 100644 qemu-clock.c

[Qemu-devel] [PATCH] migration: Allow unregister of save_live handlers

Migration non save_live handlers have an ops member that is
dinamically allocated by migration code.  Save_live handlers have it
passed as argument and are responsability of the caller.  Add a new
member is_allocated that remembers if ops has to be freed.  This
allows unregister_savevm() to work with save_live handlers.

Signed-off-by: Juan Quintela 
---
 include/migration/vmstate.h | 2 ++
 migration/savevm.c  | 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f97411d..1d20e30 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -57,6 +57,8 @@ typedef struct SaveVMHandlers {
   uint64_t *non_postcopiable_pending,
   uint64_t *postcopiable_pending);
 LoadStateHandler *load_state;
+/* Has been allocated by migratation code */
+bool is_allocated;
 } SaveVMHandlers;
 
 int register_savevm(DeviceState *dev,
diff --git a/migration/savevm.c b/migration/savevm.c
index d971e5e..187f386 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -628,6 +628,7 @@ int register_savevm(DeviceState *dev,
 SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
 ops->save_state = save_state;
 ops->load_state = load_state;
+ops->is_allocated = true;
 return register_savevm_live(dev, idstr, instance_id, version_id,
 ops, opaque);
 }
@@ -651,7 +652,9 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
 QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
 g_free(se->compat);
-g_free(se->ops);
+if (se->ops->is_allocated) {
+g_free(se->ops);
+}
 g_free(se);
 }
 }
-- 
2.9.3

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
> This patch provides blocktime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach was suggested by Peter Xu, as an improvements of
> previous approch where QEMU kept tree with faulted page address and cpus 
> bitmask
> in it. Now QEMU is keeping array with faulted page address as value and vCPU
> as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> list for blocktime per vCPU (could be traced with page_fault_addr)
> 
> Blocktime will not calculated if postcopy_blocktime field of
> MigrationIncomingState wasn't initialized.
> 
> Signed-off-by: Alexey Perevalov 
> ---
>  migration/postcopy-ram.c | 102 
> ++-
>  migration/trace-events   |   5 ++-
>  2 files changed, 105 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index d647769..e70c44b 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -23,6 +23,7 @@
>  #include "postcopy-ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include 
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char 
> *block_name, void *host_addr,
>  return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +CPUState *cpu_iter;
> +
> +CPU_FOREACH(cpu_iter) {
> +if (cpu_iter->thread_id == pid) {
> +return cpu_iter->cpu_index;
> +}
> +}
> +trace_get_mem_fault_cpu_index(pid);
> +return -1;
> +}
> +
> +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> +RAMBlock *rb)
> +{
> +int cpu;
> +unsigned long int nr_bit;
> +MigrationIncomingState *mis = migration_incoming_get_current();
> +PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> +int64_t now_ms;
> +
> +if (!dc || ptid == 0) {
> +return;
> +}
> +cpu = get_mem_fault_cpu_index(ptid);
> +if (cpu < 0) {
> +return;
> +}
> +nr_bit = get_copied_bit_offset(addr);
> +if (test_bit(nr_bit, mis->copied_pages)) {
> +return;
> +}
> +now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +if (dc->vcpu_addr[cpu] == 0) {
> +atomic_inc(&dc->smp_cpus_down);
> +}
> +
> +atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> +atomic_xchg__nocheck(&dc->last_begin, now_ms);
> +atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);

Looks like this is not what you and Dave have discussed?

(Btw, sorry to have not followed the thread recently, so I just went
 over the discussion again...)

What I see that Dave suggested is (I copied from Dave's email):

blocktime_start:
   set CPU stall address
   check bitmap entry
 if set then zero stall-address

While here it is:

blocktime_start:
   check bitmap entry
 if set then return
   set CPU stall address

I don't think current version can really solve the risk condition. See
this possible sequence:

   receive-thread fault-thread 
   -- 
  blocktime_start
check bitmap entry,
  if set then return
   blocktime_end
 set bitmap entry
 read CPU stall address,
   if none-0 then zero it
set CPU stall address [1]

Then imho the address set at [1] will be stall again until forever.

I think we should follow exactly what Dave has suggested.

And.. after a second thought, I am afraid even this would not satisfy
all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
the question is, can it quickly trigger another page fault?

Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
showing vcpu-thread X as well):

  receive-thread   fault-threadvcpu-thread X
  --   -
   fault at addr A1
   fault_addr[X]=A1
  UFFDIO_COPY page A1
  check fault_addr[X] with A1
if match, clear fault_addr[X]
   vcpu X starts

This is fine.

While since "vcpu X starts" can be right after UFFDIO_COPY, can this
be possible?

  receive-thread   fault-threadvcpu-thread X
  --   -
   fault at addr A1
   fault_addr[X]=A1
  UFFDIO_COPY page A1
   vcpu X starts
   fault at addr A2
   fault_addr[X]=A2
  check fault_addr[X] with A1
if match, clear fault_addr[X]
^
|
+-- he

Re: [Qemu-devel] [PATCH V6 07/10] migration: add bitmap for copied page

2017-05-24 Thread Alexey

On Wed, May 24, 2017 at 02:57:36PM +0800, Peter Xu wrote:
> On Tue, May 23, 2017 at 02:31:08PM +0300, Alexey Perevalov wrote:
> > This patch adds ability to track down already copied
> > pages, it's necessary for calculation vCPU block time in
> > postcopy migration feature and maybe for restore after
> > postcopy migration failure.
> > 
> > Functions which work with RAMBlock are placed into ram.c,
> > due to TARGET_WORDS_BIGENDIAN is poisoned int postcopy-ram.c -
> > hardware independed code.
> > 
> > Signed-off-by: Alexey Perevalov 
> > ---
> >  include/migration/migration.h | 16 +++
> >  migration/postcopy-ram.c  | 22 ---
> >  migration/ram.c   | 63 
> > +++
> >  3 files changed, 97 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 449cb07..4e05c83 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -101,6 +101,20 @@ struct MigrationIncomingState {
> >  LoadStateEntry_Head loadvm_handlers;
> >  
> >  /*
> > + * bitmap indicates whether page copied,
> > + * based on ramblock offset
> > + * now it is using only for blocktime calculation in
> > + * postcopy migration, so livetime of this entry:
> > + * since user requested blocktime calculation,
> > + * till the end of postcopy migration
> > + * as an example it could represend following memory map
> > + * ___
> > + * |4k pages | hugepages | 4k pages
> 
> Can we really do this?

> 
> The problem is AFAIU migration stream is sending pages only in target
> page size, even for huge pages... so even for huge pages we should
> still need per TARGET_PAGE_SIZE bitmap?
sending maybe, but copying to userfault fd is doing in hugepage size
in case of hugetlbfs memory backend.
> 
> (Please refer to ram_state_init() on init of RAMBlock.bmap)
I thought to use bitmap per ramblock, but I decided to not do it,
because of following reasons:
1. There is only 4k ramblocks, for such ramblock it's not
optimal
2. copied pages bitmap - it's attribute of incoming migration,
but ramblock it's general structure, although bmap it's about
dirty pages of precopy migrations, hmm, but RAMBlock also
contains unsentmap and it's for postcopy.
> 
> > + *
> > + * */
> > +unsigned long *copied_pages;
> > +
> > +/*
> >   * PostcopyBlocktimeContext to keep information for postcopy
> >   * live migration, to calculate vCPU block time
> >   * */
> > @@ -279,6 +293,8 @@ int migrate_compress_threads(void);
> >  int migrate_decompress_threads(void);
> >  bool migrate_use_events(void);
> >  bool migrate_postcopy_blocktime(void);
> > +unsigned long int get_copied_bit_offset(ram_addr_t addr);
> > +unsigned long int *copied_pages_bitmap_new(void);
> >  
> >  /* Sending on the return path - generic and then for each message type */
> >  void migrate_send_rp_message(MigrationIncomingState *mis,
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 5435a40..d647769 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -94,23 +94,34 @@ static void destroy_blocktime_context(struct 
> > PostcopyBlocktimeContext *ctx)
> >  
> >  static void postcopy_migration_cb(Notifier *n, void *data)
> >  {
> > -PostcopyBlocktimeContext *ctx = container_of(n, 
> > PostcopyBlocktimeContext,
> > -   postcopy_notifier);
> >  MigrationState *s = data;
> >  if (migration_has_finished(s) || migration_has_failed(s)) {
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +PostcopyBlocktimeContext *ctx = mis->blocktime_ctx;
> > +
> > +if (!ctx) {
> > +return;
> > +}
> 
> If we are in postcopy_migration_cb() already, then we have registered
> the notifiers, then... could this (!ctx) really happen?
> 
ok, maybe it's unnecessary sanity check.

> > +
> >  g_free(ctx->page_fault_vcpu_time);
> >  /* g_free is NULL robust */
> >  ctx->page_fault_vcpu_time = NULL;
> >  g_free(ctx->vcpu_addr);
> >  ctx->vcpu_addr = NULL;
> > +g_free(mis->copied_pages);
> > +mis->copied_pages = NULL;
> >  }
> >  }
> >  
> >  static void migration_exit_cb(Notifier *n, void *data)
> >  {
> > -PostcopyBlocktimeContext *ctx = container_of(n, 
> > PostcopyBlocktimeContext,
> > -   exit_notifier);
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +PostcopyBlocktimeContext *ctx = mis->blocktime_ctx;
> > +if (!ctx) {
> > +return;
> > +}
> 
> I failed to find out why touched this up...
> 
> Thanks,
> 
> >  destroy_blocktime_context(ctx);
> > +mis->blocktime_ctx = NULL;
> >  }
> >  
> >  static struct

Re: [Qemu-devel] [PATCH v11 0/5] migration/ppc: migrating DRC and ccs_list

2017-05-24 Thread David Gibson

I've merged 1..4 with some minor changes for style and error
handling.  5/5 I still need to review in a bit more detail.

On Mon, May 22, 2017 at 04:35:46PM -0300, Daniel Henrique Barboza wrote:
> v11:
> - rebased with dgibson/ppc-for-2.10 branch
> - patch 1:
> * use PCDIMMDevice* as index of pending_dimm_unplugs list instead of
> using its uint64_t address;
> * spapr_del_lmbs() function was merged with spapr_memory_unplug_request();
> * wherever applicable, retrieve the machine by using HotplugHandler;
> * added more info about the sPAPRDIMMState cache in the comment of
> spapr.h.
> 
> v10:
> - removed 'migrating pending_events' patch this series
> - patch 1:
> * removed extra line between definitions;
> * removed spapr_pending_dimms functions definitons from spapr.h
> * turned spapr_pending_dimms functions into static
> - patch 2:
> * fixed the switch() statement - PHB and VIO cases goes to default,
> default now executes assert()
> - patch 3:
> * minor style changes/fixes
> * changed switch default to execute assert() and now uses the same
> logic for both CPU and LMB DRCs 
> - patch 4 (*new*):
> * this new patch implements a new function to recover the pending DIMM
> unplug LMB state inside the spapr_lmb_release callback
> 
> v9:
> - patch 1 (*new*): added a qtail in sPAPRMachineState called 
> pending_dimm_unplugs
> that stores the DIMM LMB state during the unplug process.
> - patch 2 (*new*): merged v8-patch1 and v8-patch2: removing detach_cb and
> detach_cb_opaque.
> - patch 3:
> * removed dk->vmsd entry. We're using vmstate_register instead
> * added 'awaiting_allocation' flag in the DRC migration
> - patch 4 (*new*): migrating spapr->pending_dimm_unplugs qtailq to allow
> for an ongoing PCDIMM unplug to continue after a migration.
> 
> v8:
> - new patch added: 'removing spapr_drc_detach_cb opaques'. This new patch 
> removes
> the need for the detach_cb_opaques inside the removal callback functions. See
> the commit message of the patch for more info.
> 
> v7:
> - removed the first patch. DRC registration is now done by vmstate_register
> in patch 2.
> - added a new patch that changes spapr_dr_connector_new to receive as argument
> the detach_cb.
> - removed the callback logic of patch 2 since there is no need to restore the
> detach_cb on post-load due to the detach_cb on spapr_dr_connector_new change.
> - added word separators in the VMSD names of patch 3 and 4.
> 
> v6: - Rebased with QEMU master after 6+ months.
> - Simplified the logic in patch 1.
> - Reworked patch 2: added CPU DRC migration, removed a function pointer 
> from DRC
> class and minor improvements.
> - Added clarifications from the previous v5 discussions in the commit 
> messages.
> 
> v5: - Rebased to David's ppc-for-2.8.
> 
> v4: - Introduce a way to set customized instance_id in SaveStateEntry. Use it
>   to set instance_id for DRC using its unique index to address David 
>   Gibson's concern.
> - Rename VMS_CSTM to VMS_LINKED based on Paolo Bonzini's suggestions.
> - Clean up qjson stuff in put_qtailq. 
> - Add trace for put_qtailq and get_qtailq based on David Gilbert's 
>   suggestion.
> 
> - Based on David's ppc-for-2.7. 
> 
> v3: - Simplify overall design followng discussion with Paolo. No longer need
>   metadata to migrate QTAILQ.
> - Extend VMStateInfo instead of adding similar fields to VMStateField.
> - Clean up macros in qemu/queue.h.
> (link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg05695.html)
> 
> v2: - Introduce a general approach to migrate QTAILQ in qemu/queue.h.
> - Migrate signalled field in the DRC state.
> - Put the newly added migrating fields in subsections so that backward 
>   migration is not broken.  
> - Set detach_cb field right after migration so that a migrated hot-unplug
>   event could finish its course.
> (link: https://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04188.html)
> 
> v1: - Inital version.
> (link: https://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02601.html)
> 
> 
> To make guest devices (PCI, CPU and memory) hotplug work together 
> with guest migration, spapr drc state needs be transmitted in
> migration. This patch defines the VMStateDescription struct for
> spapr drc state to enable it.
> 
> To fix the potential racing between hotplug events on guest and 
> guest migration and ccs_list of spapr state need be transmitted in
> migration. This patch set also takes care of it.
> 
> 
> Daniel Henrique Barboza (4):
>   hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState
>   hw/ppc: removing drc->detach_cb and drc->detach_cb_opaque
>   hw/ppc: migrating the DRC state of hotplugged devices
>   hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_release
> 
> Jianjun Duan (1):
>   migration: spapr: migrate ccs_list in spapr state
> 
>  hw/ppc/spapr.c  | 175 
>

[Qemu-devel] [PATCH] net: Mark the 'hubport' netdev as deprecated

2017-05-24 Thread Thomas Huth

The 'hubport' netdev is closely tied to the 'vlan' concept which
has been marked as deprecated in commit a2dbe1356faff3cb6 already.
Thus we should also mark the hubport netdevs as deprecated to make
the remaining users aware that they should not use this anymore.

Signed-off-by: Thomas Huth 
---
 net/hub.c   | 4 
 qemu-options.hx | 6 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/hub.c b/net/hub.c
index 32d8cf5..85bd5bc 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -13,6 +13,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "monitor/monitor.h"
 #include "net/net.h"
 #include "clients.h"
@@ -286,6 +287,9 @@ int net_init_hubport(const Netdev *netdev, const char *name,
 {
 const NetdevHubPortOptions *hubport;
 
+error_report("hubports are deprecated and will be removed in a "
+ "future release");
+
 assert(netdev->type == NET_CLIENT_DRIVER_HUBPORT);
 assert(!peer);
 hubport = &netdev->u.hubport;
diff --git a/qemu-options.hx b/qemu-options.hx
index dc1a48a..efb555c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1810,7 +1810,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "-netdev vhost-user,id=str,chardev=dev[,vhostforce=on|off]\n"
 "configure a vhost-user network, backed by a chardev 
'dev'\n"
 "-netdev hubport,id=str,hubid=n\n"
-"configure a hub port on QEMU VLAN 'n'\n", QEMU_ARCH_ALL)
+"configure a hub port on QEMU VLAN 'n' (deprecated)\n", 
QEMU_ARCH_ALL)
 DEF("net", HAS_ARG, QEMU_OPTION_net,
 "-net 
nic[,vlan=n][,macaddr=mac][,model=type][,name=str][,addr=str][,vectors=v]\n"
 "old way to create a new NIC and connect it to VLAN 'n'\n"
@@ -2239,7 +2239,9 @@ Create a hub port on QEMU "vlan" @var{hubid}.
 
 The hubport netdev lets you connect a NIC to a QEMU "vlan" instead of a single
 netdev.  @code{-net} and @code{-device} with parameter @option{vlan} create the
-required hub automatically.
+required hub automatically. Note that the "vlan" concept and thus the hubport
+option, too, are considered as deprecated and might be removed in a future
+release of QEMU.
 
 @item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
 
-- 
1.8.3.1

[Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread Ladi Prosek

For reasons unknown, Windows won't online all memory, both at command
line and hot-plugged later, unless the hotplug mem hole SRAT entry
specifies a node greater or equal to the ones where memory is added.

Using the highest node on the machine makes recent versions of Windows
happy.

With this example command line:
  ... \
  -m 1024,slots=4,maxmem=32G \
  -numa node,nodeid=0 \
  -numa node,nodeid=1 \
  -numa node,nodeid=2 \
  -numa node,nodeid=3 \
  -object memory-backend-ram,size=1G,id=mem-mem1 \
  -device pc-dimm,id=dimm-mem1,memdev=mem-mem1,node=1

Windows reports a total of 1G of RAM without this commit and the expected
2G with this commit.

Signed-off-by: Ladi Prosek 
---
 hw/i386/acpi-build.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index afcadac..9653583 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2404,14 +2404,17 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 }
 
 /*
- * Entry is required for Windows to enable memory hotplug in OS.
+ * Entry is required for Windows to enable memory hotplug in OS
+ * and for Linux to enable SWIOTLB even if booted with less than
+ * 4G of RAM. Windows works better if the entry sets proximity
+ * to the highest NUMA node in the machine.
  * Memory devices may override proximity set by this entry,
  * providing _PXM method if necessary.
  */
 if (hotplugabble_address_space_size) {
 numamem = acpi_data_push(table_data, sizeof *numamem);
 build_srat_memory(numamem, pcms->hotplug_memory.base,
-  hotplugabble_address_space_size, 0,
+  hotplugabble_address_space_size, pcms->numa_nodes - 
1,
   MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
 }
 
-- 
2.9.3

Re: [Qemu-devel] [virtio-dev] Re: [PATCH RFC] virtio-net: enable configurable tx queue size

2017-05-24 Thread Wei Wang


On 05/24/2017 11:19 AM, Jason Wang wrote:



On 2017年05月23日 18:36, Wei Wang wrote:

On 05/23/2017 02:24 PM, Jason Wang wrote:



On 2017年05月23日 13:15, Wei Wang wrote:

On 05/23/2017 10:04 AM, Jason Wang wrote:



On 2017年05月22日 19:52, Wei Wang wrote:

On 05/20/2017 04:42 AM, Michael S. Tsirkin wrote:

On Fri, May 19, 2017 at 10:32:19AM +0800, Wei Wang wrote:

This patch enables the virtio-net tx queue size to be configurable
between 256 (the default queue size) and 1024 by the user. The 
queue

size specified by the user should be power of 2.

Setting the tx queue size to be 1024 requires the guest driver to
support the VIRTIO_NET_F_MAX_CHAIN_SIZE feature.
This should be a generic ring feature, not one specific to 
virtio net.

OK. How about making two more changes below:

1) make the default tx queue size = 1024 (instead of 256).


As has been pointed out, you need compat the default value too in 
this case.


The driver gets the size info from the device, then would it cause any
compatibility issue if we change the default ring size to 1024 in the
vhost case? In other words, is there any software (i.e. any 
virtio-net driver)

functions based on the assumption of 256 queue size?


I don't know. But is it safe e.g we migrate from 1024 to an older 
qemu with 256 as its queue size?


Yes, I think it is safe, because the default queue size is used when 
the device is being

set up (e.g. feature negotiation).
During migration (the device has already been running), the 
destination machine will
load the device state based on the the queue size that is being used 
(i.e. vring.num).

The default value is not used any more after the setup phase.


I haven't checked all cases, but there's two obvious things:

- After migration and after a reset, it will go back to 256 on dst.


Please let me clarify what we want first: when QEMU boots and it 
realizes the
virtio-net device, if the tx_queue_size is not given by the command 
line, we want

to use 1024 as the queue size, that is, virtio_add_queue(,1024,), which sets
vring.num=1024 and vring.num_default=1024.

When migration happens, the vring.num variable (has been 1024) is sent to
the destination machine, where virtio_load() will assign the destination 
side vring.num
to that value (1024). So, vring.num=1024 continues to work on the 
destination machine

with old QEMU. I don't see an issue here.

If reset happens, I think the device and driver will re-do the 
initialization steps. So, if they are
with the old QEMU, then they use the old qemu realize() function to do 
virtio_add_queue(,256,),
and the driver will re-do the probe() steps and take vring.num=256, then 
everything works fine.




- ABI is changed, e.g -M pc-q35-2.10 returns 1024 on 2.11

Didn't get this. Could you please explain more? which ABI would be 
changed, and why it affects q35?









For live migration, the queue size that is being used will also be 
transferred

to the destination.




We can reduce the size (to 256) if the MAX_CHAIN_SIZE feature
is not supported by the guest.
In this way, people who apply the QEMU patch can directly use the
largest queue size(1024) without adding the booting command line.

2) The vhost backend does not use writev, so I think when the vhost
backed is used, using 1024 queue size should not depend on the
MAX_CHAIN_SIZE feature.


But do we need to consider even larger queue size now?


Need Michael's feedback on this. Meanwhile, I'll get the next 
version of

code ready and check if larger queue size would cause any corner case.


The problem is, do we really need a new config filed for this? Or 
just introduce a flag which means "I support up to 1024 sgs" is 
sufficient?




For now, it also works without the new config field, max_chain_size,
But I would prefer to keep the new config field, because:

Without that, the driver will work on  an assumed value, 1023.


This is the fact, and it's too late to change legacy driver.


If the future, QEMU needs to change it to 1022, then how can the
new QEMU tell the old driver, which supports the MAX_CHAIN_SIZE
feature but works with the old hardcode value 1023?


Can config filed help in this case? The problem is similar to 
ANY_HEADER_SG, the only thing we can is to clarify the limitation for 
new drivers.




I think it helps, because the driver will do
virtio_cread_feature(vdev, VIRTIO_NET_F_MAX_CHAIN_SIZE,
  struct virtio_net_config, 
max_chain_size, &chain_size);
to get the max_chain_size from the device. So when new QEMU has a new 
value of max_chain_size, old driver

will get the new value.

Best,
Wei

Re: [Qemu-devel] [PATCH] migration: Allow unregister of save_live handlers

On Wed, May 24, 2017 at 09:37:19AM +0200, Juan Quintela wrote:
> Migration non save_live handlers have an ops member that is
> dinamically allocated by migration code.  Save_live handlers have it
> passed as argument and are responsability of the caller.  Add a new
> member is_allocated that remembers if ops has to be freed.  This
> allows unregister_savevm() to work with save_live handlers.
> 
> Signed-off-by: Juan Quintela 
> ---
>  include/migration/vmstate.h | 2 ++
>  migration/savevm.c  | 5 -
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index f97411d..1d20e30 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -57,6 +57,8 @@ typedef struct SaveVMHandlers {
>uint64_t *non_postcopiable_pending,
>uint64_t *postcopiable_pending);
>  LoadStateHandler *load_state;
> +/* Has been allocated by migratation code */
> +bool is_allocated;
>  } SaveVMHandlers;
>  
>  int register_savevm(DeviceState *dev,
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d971e5e..187f386 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -628,6 +628,7 @@ int register_savevm(DeviceState *dev,
>  SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
>  ops->save_state = save_state;
>  ops->load_state = load_state;
> +ops->is_allocated = true;
>  return register_savevm_live(dev, idstr, instance_id, version_id,
>  ops, opaque);
>  }
> @@ -651,7 +652,9 @@ void unregister_savevm(DeviceState *dev, const char 
> *idstr, void *opaque)
>  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
>  QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
>  g_free(se->compat);
> -g_free(se->ops);
> +if (se->ops->is_allocated) {

Would it be good to check against (se->ops && se->ops->is_allocated)?
Since I see that devices registered via
vmstate_register_with_alias_id() won't have this se->ops. I just don't
know whether that case will be allowed to be unregistered with current
function.

Thanks,

> +g_free(se->ops);
> +}
>  g_free(se);
>  }
>  }
> -- 
> 2.9.3
> 

-- 
Peter Xu

Re: [Qemu-devel] [virtio-dev] Re: [virtio-dev] Re: [PATCH v2 00/16] Vhost-pci for inter-VM communication

2017-05-24 Thread Wei Wang

On 05/24/2017 11:24 AM, Jason Wang wrote:

On 2017年05月23日 18:48, Wei Wang wrote:

On 05/23/2017 02:32 PM, Jason Wang wrote:

On 2017年05月23日 13:47, Wei Wang wrote:

On 05/23/2017 10:08 AM, Jason Wang wrote:

On 2017年05月22日 19:46, Wang, Wei W wrote:

On Monday, May 22, 2017 10:28 AM, Jason Wang wrote:

On 2017年05月19日 23:33, Stefan Hajnoczi wrote:

On Fri, May 19, 2017 at 11:10:33AM +0800, Jason Wang wrote:

On 2017年05月18日 11:03, Wei Wang wrote:

On 05/17/2017 02:22 PM, Jason Wang wrote:

On 2017年05月17日 14:16, Jason Wang wrote:

On 2017年05月16日 15:12, Wei Wang wrote:

Hi:

Care to post the driver codes too?

OK. It may take some time to clean up the driver code
before post
it out. You can first have a check of the draft at the
repo here:

https://github.com/wei-w-wang/vhost-pci-driver

Best,
Wei
Interesting, looks like there's one copy on tx side. We
used to
have zerocopy support for tun for VM2VM traffic. Could you
please

try to compare it with your vhost-pci-net by:

We can analyze from the whole data path - from VM1's network
stack

to send packets -> VM2's network stack to receive packets. The
number of copies are actually the same for both.
That's why I'm asking you to compare the performance. The only
reason

for vhost-pci is performance. You should prove it.

There is another reason for vhost-pci besides maximum performance:

vhost-pci makes it possible for end-users to run networking or
storage
appliances in compute clouds. Cloud providers do not allow
end-users
to run custom vhost-user processes on the host so you need
vhost-pci.

Stefan
Then it has non NFV use cases and the question goes back to the
performance
comparing between vhost-pci and zerocopy vhost_net. If it does
not perform

better, it was less interesting at least in this case.

Probably I can share what we got about vhost-pci and vhost-user:
https://github.com/wei-w-wang/vhost-pci-discussion/blob/master/vhost_pci_vs_vhost_user.pdf

Right now, I don’t have the environment to add the vhost_net test.

Thanks, the number looks good. But I have some questions:

- Is the number measured through your vhost-pci kernel driver code?

Yes, the kernel driver code.

Interesting, in the above link, "l2fwd" was used in vhost-pci
testing. I want to know more about the test configuration: If l2fwd
is the one that dpdk had, want to know how can you make it work for
kernel driver. (Maybe packet socket I think?) If not, want to know
how do you configure it (e.g through bridge or act_mirred or
others). And in OVS dpdk, is dpdk l2fwd + pmd used in the testing?

Oh, that l2fwd is a kernel module from OPNFV vsperf
(http://artifacts.opnfv.org/vswitchperf/docs/userguide/quickstart.html)
For both legacy and vhost-pci cases, they use the same l2fwd module.
No bridge is used, the module already works at L2 to forward packets
between two net devices.

Thanks for the pointer. Just to confirm, I think virtio-net kernel
driver is used in OVS-dpdk test?

Yes. In both cases, the guests are using kernel drivers.

Another question is, can we manage to remove the copy in tx? If not,
is it a limitation of virtio protocol?

No, we can't. Use this example, VM1's Vhost-pci<->virtio-net of VM2, VM1
sees VM2's memory, but

VM2 only sees its own memory.
What this copy achieves is to get data from VM1's memory to VM2's
memory, so that VM2 can deliver it's

own memory to its network stack.

Best,
Wei

Re: [Qemu-devel] [PATCH v2 00/18] Block layer thread safety, part 1

2017-05-24 Thread Paolo Bonzini



On 11/05/2017 16:41, Paolo Bonzini wrote:
> This series uses mutexes or atomic operations around core block layer
> operations.  The remaining parts include:
> 
> I've removed the failing assertion in bdrv_aligned_pwritev in order to
> test block migration.
> 
> Paolo
> 
> v1->v2: add missing comment for 'wakeup' member [Fam]
> rewrite throttle-groups part [Stefan]
> rename ThrottleState CoMutex [Fam]
> minor stats64 changes [Fam, Roman, me]
> fixed bdrv_flush [Fam, me]
> dropped request spinlock optimization for now [Stefan]
> avoid global dirty bitmap mutex [Fam]
> avoid introducing unlocked bdrv_get_dirty API [Stefan]
> replaced spinlock with mutex for accounting [Stefan]
> 
> 
> Paolo Bonzini (18):
>   block: access copy_on_read with atomic ops
>   block: access quiesce_counter with atomic ops
>   block: access io_limits_disabled with atomic ops
>   block: access serialising_in_flight with atomic ops
>   block: access wakeup with atomic ops
>   block: access io_plugged with atomic ops
>   throttle-groups: only start one coroutine from drained_begin
>   throttle-groups: do not use qemu_co_enter_next
>   throttle-groups: protect throttled requests with a CoMutex
>   util: add stats64 module
>   block: use Stat64 for wr_highest_offset
>   block: access write_gen with atomics
>   block: protect tracked_requests and flush_queue with reqs_lock
>   block: introduce dirty_bitmap_mutex
>   migration/block: reset dirty bitmap before reading
>   block: protect modification of dirty bitmaps with a mutex
>   block: introduce block_account_one_io
>   block: make accounting thread-safe
> 
>  block.c|   9 +-
>  block/accounting.c |  64 --
>  block/block-backend.c  |   5 +-
>  block/dirty-bitmap.c   | 112 ++--
>  block/io.c |  51 ++-
>  block/mirror.c |  14 ++-
>  block/nfs.c|   4 +-
>  block/qapi.c   |   2 +-
>  block/sheepdog.c   |   3 +-
>  block/throttle-groups.c|  91 ++-
>  blockdev.c |  46 ++
>  include/block/accounting.h |   8 +-
>  include/block/block.h  |   5 +-
>  include/block/block_int.h  |  61 -
>  include/block/dirty-bitmap.h   |  25 --
>  include/qemu/stats64.h | 193 
> +
>  include/sysemu/block-backend.h |  10 +--
>  migration/block.c  |  17 ++--
>  util/Makefile.objs |   1 +
>  util/stats64.c | 136 +
>  20 files changed, 679 insertions(+), 178 deletions(-)
>  create mode 100644 include/qemu/stats64.h
>  create mode 100644 util/stats64.c
> 

Ping?

Paolo

Re: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread Igor Mammedov

On Wed, 24 May 2017 10:09:14 +0200
Ladi Prosek  wrote:

> For reasons unknown, Windows won't online all memory, both at command
> line and hot-plugged later, unless the hotplug mem hole SRAT entry
> specifies a node greater or equal to the ones where memory is added.
> 
> Using the highest node on the machine makes recent versions of Windows
> happy.
> 
> With this example command line:
>   ... \
>   -m 1024,slots=4,maxmem=32G \
>   -numa node,nodeid=0 \
>   -numa node,nodeid=1 \
>   -numa node,nodeid=2 \
>   -numa node,nodeid=3 \
>   -object memory-backend-ram,size=1G,id=mem-mem1 \
>   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1,node=1
> 
> Windows reports a total of 1G of RAM without this commit and the expected
> 2G with this commit.
> 
> Signed-off-by: Ladi Prosek 
Reviewed-by: Igor Mammedov 

> ---
>  hw/i386/acpi-build.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index afcadac..9653583 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2404,14 +2404,17 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> MachineState *machine)
>  }
>  
>  /*
> - * Entry is required for Windows to enable memory hotplug in OS.
> + * Entry is required for Windows to enable memory hotplug in OS
> + * and for Linux to enable SWIOTLB even if booted with less than
> + * 4G of RAM. Windows works better if the entry sets proximity
> + * to the highest NUMA node in the machine.
>   * Memory devices may override proximity set by this entry,
>   * providing _PXM method if necessary.
>   */
>  if (hotplugabble_address_space_size) {
>  numamem = acpi_data_push(table_data, sizeof *numamem);
>  build_srat_memory(numamem, pcms->hotplug_memory.base,
> -  hotplugabble_address_space_size, 0,
> +  hotplugabble_address_space_size, pcms->numa_nodes 
> - 1,
>MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>  }
>

Re: [Qemu-devel] [PATCH] migration: Allow unregister of save_live handlers

Peter Xu  wrote:
> On Wed, May 24, 2017 at 09:37:19AM +0200, Juan Quintela wrote:
>> Migration non save_live handlers have an ops member that is
>> dinamically allocated by migration code.  Save_live handlers have it
>> passed as argument and are responsability of the caller.  Add a new
>> member is_allocated that remembers if ops has to be freed.  This
>> allows unregister_savevm() to work with save_live handlers.
>> 
>> Signed-off-by: Juan Quintela 
>> ---
>>  include/migration/vmstate.h | 2 ++
>>  migration/savevm.c  | 5 -
>>  2 files changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index f97411d..1d20e30 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -57,6 +57,8 @@ typedef struct SaveVMHandlers {
>>uint64_t *non_postcopiable_pending,
>>uint64_t *postcopiable_pending);
>>  LoadStateHandler *load_state;
>> +/* Has been allocated by migratation code */
>> +bool is_allocated;
>>  } SaveVMHandlers;
>>  
>>  int register_savevm(DeviceState *dev,
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d971e5e..187f386 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -628,6 +628,7 @@ int register_savevm(DeviceState *dev,
>>  SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
>>  ops->save_state = save_state;
>>  ops->load_state = load_state;
>> +ops->is_allocated = true;
>>  return register_savevm_live(dev, idstr, instance_id, version_id,
>>  ops, opaque);
>>  }
>> @@ -651,7 +652,9 @@ void unregister_savevm(DeviceState *dev, const char 
>> *idstr, void *opaque)
>>  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
>>  QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
>>  g_free(se->compat);
>> -g_free(se->ops);
>> +if (se->ops->is_allocated) {
>
> Would it be good to check against (se->ops && se->ops->is_allocated)?
> Since I see that devices registered via
> vmstate_register_with_alias_id() won't have this se->ops. I just don't
> know whether that case will be allowed to be unregistered with current
> function.

good point.  I thought that not having ->ops was wrong, but I had
completely forgot about the alias case.  Fixing it.

Thanks, Juan.

[Qemu-devel] [PATCH] vhost-user: pass message as a pointer to process_message_reply()

2017-05-24 Thread Maxime Coquelin

process_message_reply() was recently updated to get full message
content instead of only its request field.

There is no need to copy all the struct content into the stack,
so just pass its pointer.

Cc: Zhiyong Yang 
Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup when MQ")
Signed-off-by: Maxime Coquelin 
---
 hw/virtio/vhost-user.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b87a176..baf2487 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -162,11 +162,11 @@ fail:
 }
 
 static int process_message_reply(struct vhost_dev *dev,
- VhostUserMsg msg)
+ VhostUserMsg *msg)
 {
 VhostUserMsg msg_reply;
 
-if ((msg.flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
+if ((msg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
 return 0;
 }
 
@@ -174,10 +174,10 @@ static int process_message_reply(struct vhost_dev *dev,
 return -1;
 }
 
-if (msg_reply.request != msg.request) {
+if (msg_reply.request != msg->request) {
 error_report("Received unexpected msg type."
  "Expected %d received %d",
- msg.request, msg_reply.request);
+ msg->request, msg_reply.request);
 return -1;
 }
 
@@ -324,7 +324,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 }
 
 if (reply_supported) {
-return process_message_reply(dev, msg);
+return process_message_reply(dev, &msg);
 }
 
 return 0;
@@ -716,7 +716,7 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, 
uint16_t mtu)
 
 /* If reply_ack supported, slave has to ack specified MTU is valid */
 if (reply_supported) {
-return process_message_reply(dev, msg);
+return process_message_reply(dev, &msg);
 }
 
 return 0;
-- 
2.9.4

Re: [Qemu-devel] new to qemu with a simple question

2017-05-24 Thread Wang Dong


Thank you, Fam.


On 05/24/2017 11:11 AM, Fam Zheng wrote:

On Wed, 05/24 10:56, Wang Dong wrote:

Some C source code is generate from json file.

I wonder why this? What is the benefits of this?

This allows you to concentrate on the high level semantics of the QMP API,
without worrying about C syntax and structural code here and there. In one
word, more conciseness and little code duplication.

If done in C, it would be much more lines of code and hardly readable.

Fam



--
Best regards. Wang Dong

Re: [Qemu-devel] [PATCH] vhost-user: pass message as a pointer to process_message_reply()

On Wed, May 24, 2017 at 11:35 AM Maxime Coquelin 
wrote:

> process_message_reply() was recently updated to get full message
> content instead of only its request field.
>
> There is no need to copy all the struct content into the stack,
> so just pass its pointer.
>
> Cc: Zhiyong Yang 
> Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup when MQ")
> Signed-off-by: Maxime Coquelin 
> ---
>  hw/virtio/vhost-user.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b87a176..baf2487 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -162,11 +162,11 @@ fail:
>  }
>
>  static int process_message_reply(struct vhost_dev *dev,
> - VhostUserMsg msg)
> + VhostUserMsg *msg)
>

Can you make it const?


>  {
>  VhostUserMsg msg_reply;
>
> -if ((msg.flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
> +if ((msg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
>  return 0;
>  }
>
> @@ -174,10 +174,10 @@ static int process_message_reply(struct vhost_dev
> *dev,
>  return -1;
>  }
>
> -if (msg_reply.request != msg.request) {
> +if (msg_reply.request != msg->request) {
>  error_report("Received unexpected msg type."
>   "Expected %d received %d",
> - msg.request, msg_reply.request);
> + msg->request, msg_reply.request);
>  return -1;
>  }
>
> @@ -324,7 +324,7 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>  }
>
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
>
>  return 0;
> @@ -716,7 +716,7 @@ static int vhost_user_net_set_mtu(struct vhost_dev
> *dev, uint16_t mtu)
>
>  /* If reply_ack supported, slave has to ack specified MTU is valid */
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
>
>  return 0;
> --
> 2.9.4
>
>
> --
Marc-André Lureau

Re: [Qemu-devel] [PATCH] vhost-user: pass message as a pointer to process_message_reply()

2017-05-24 Thread Maxime Coquelin




On 05/24/2017 10:40 AM, Marc-André Lureau wrote:



On Wed, May 24, 2017 at 11:35 AM Maxime Coquelin 
mailto:maxime.coque...@redhat.com>> wrote:


process_message_reply() was recently updated to get full message
content instead of only its request field.

There is no need to copy all the struct content into the stack,
so just pass its pointer.

Cc: Zhiyong Yang mailto:zhiyong.y...@intel.com>>
Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup
when MQ")
Signed-off-by: Maxime Coquelin mailto:maxime.coque...@redhat.com>>
---
  hw/virtio/vhost-user.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b87a176..baf2487 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -162,11 +162,11 @@ fail:
  }

  static int process_message_reply(struct vhost_dev *dev,
- VhostUserMsg msg)
+ VhostUserMsg *msg)


Can you make it const?

Sure!

Thanks,
Maxime

Re: [Qemu-devel] [PATCH v2 4/4] 9pfs: local: metadata file for the VirtFS root

On Wed, 24 May 2017 01:08:55 +0200
Leo Gaspard  wrote:

> On 05/23/2017 07:13 PM, Eric Blake wrote:> We have to block
> VIRTFS_META_DIR at any depth in the hierarchy, but
> > can/should we change the blocking of VIRTFS_META_ROOT_FILE to only
> > happen at the root directory, rather than at all directories?  On the
> > other hand, if you can simultaneously map /path/to/a for one mount
> > point, and /path/to/a/b for another, then the root file for B is visible
> > at a lower depth than the root file for A, and blocking the metafile
> > name everywhere means that the mount A can't influence the behavior on
> > the mount for B.  
> 
> If you take this kind of vulnerabilities into account, then you also
> have to consider a mix of mapped-file and mapped-attr mounts, or even
> worse a proxy with a mapped-file mount (which I think is currently
> vulnerable to this threat if the "proxy" path points above the
> "local,security_model=mapped-file" path, as the check is done in
> "local_" functions, which are I guess not used for proxy-type virtfs)
> 
> I'm clearly not saying it's an invalid attack (there is no explicit
> documentation stating it's insecure to "nest" virtual mounts"), just
> saying it's a much larger attack surface than one internal to virtfs
> mapped-file only. Then the question of what is reasonably possible to
> forbid to the user arises, and that's not one I could answer.
> 

The attack surface is infinite. Untrusted code that could help a guest to
forge custom metadata may reside anywhere actually, not necessarily in
another QEMU-based guest... 

My motivation to hide VIRTFS_META_ROOT_FILE everywhere was more for
consistency (ie. open(VIRTFS_META_ROOT_FILE) always fails, no matter
the cwd) and for simplicity.

Cheers,

--
Greg

> Cheers & HTH,
> Leo
> 



pgpdIqlSe6iVz.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] vhost-user: pass message as a pointer to process_message_reply()

2017-05-24 Thread Yang, Zhiyong

Hi, Maxime:

> -Original Message-
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> Sent: Wednesday, May 24, 2017 4:35 PM
> To: Yang, Zhiyong ; m...@redhat.com; qemu-
> de...@nongnu.org; jfrei...@redhat.com; marcandre.lur...@redhat.com
> Cc: Maxime Coquelin 
> Subject: [PATCH] vhost-user: pass message as a pointer to
> process_message_reply()
> 
> process_message_reply() was recently updated to get full message content
> instead of only its request field.
> 
> There is no need to copy all the struct content into the stack, so just pass 
> its
> pointer.
> 
> Cc: Zhiyong Yang 
> Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup when MQ")
> Signed-off-by: Maxime Coquelin 

Good modification. Thanks.

Reviewed-by:  Zhiyong Yang 

> ---
>  hw/virtio/vhost-user.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> b87a176..baf2487 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -162,11 +162,11 @@ fail:
>  }
> 
>  static int process_message_reply(struct vhost_dev *dev,
> - VhostUserMsg msg)
> + VhostUserMsg *msg)
>  {
>  VhostUserMsg msg_reply;
> 
> -if ((msg.flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
> +if ((msg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
>  return 0;
>  }
> 
> @@ -174,10 +174,10 @@ static int process_message_reply(struct vhost_dev
> *dev,
>  return -1;
>  }
> 
> -if (msg_reply.request != msg.request) {
> +if (msg_reply.request != msg->request) {
>  error_report("Received unexpected msg type."
>   "Expected %d received %d",
> - msg.request, msg_reply.request);
> + msg->request, msg_reply.request);
>  return -1;
>  }
> 
> @@ -324,7 +324,7 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>  }
> 
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
> 
>  return 0;
> @@ -716,7 +716,7 @@ static int vhost_user_net_set_mtu(struct vhost_dev
> *dev, uint16_t mtu)
> 
>  /* If reply_ack supported, slave has to ack specified MTU is valid */
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
> 
>  return 0;
> --
> 2.9.4

Re: [Qemu-devel] [PATCH] vhost-user: pass message as a pointer to process_message_reply()

2017-05-24 Thread Jens Freimann

On Wed, May 24, 2017 at 10:42:30AM +0200, Maxime Coquelin wrote:
> 
> 
> On 05/24/2017 10:40 AM, Marc-André Lureau wrote:
> > 
> > 
> > On Wed, May 24, 2017 at 11:35 AM Maxime Coquelin
> > mailto:maxime.coque...@redhat.com>> wrote:
> > 
> > process_message_reply() was recently updated to get full message
> > content instead of only its request field.
> > 
> > There is no need to copy all the struct content into the stack,
> > so just pass its pointer.
> > 
> > Cc: Zhiyong Yang  > >
> > Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup
> > when MQ")
> > Signed-off-by: Maxime Coquelin  > >
> > ---
> >   hw/virtio/vhost-user.c | 12 ++--
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index b87a176..baf2487 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -162,11 +162,11 @@ fail:
> >   }
> > 
> >   static int process_message_reply(struct vhost_dev *dev,
> > - VhostUserMsg msg)
> > + VhostUserMsg *msg)
> > 
> > 
> > Can you make it const?
> Sure!

with that:

Reviewed-by: Jens Freimann

Re: [Qemu-devel] [PATCH v2 0/4] 9pfs: local: fix metadata of mapped-file security mode

On Wed, 24 May 2017 00:59:29 +0200
Leo Gaspard  wrote:

> On 05/23/2017 04:32 PM, Greg Kurz wrote:
> > v2: - posted patch for CVE-2017-7493 separately
> > - other changes available in each patch changelog
> > 
> > Leo,
> > 
> > If you find time to test this series, I'll gladly add your Tested-by: to
> > it before merging.  
> 
> Just tested with a base of 2.9.0 with patches [1] [2] (from my
> distribution), [3] (required to apply cleanly) and this patchset.
> 
> Things appear to work as expected, and .virtfs_metadata{,_root} appear
> to be neither readable nor writable by any user.
> 

Shall I add your Tested-by: to the patch then ?

> That said, one thing still bothering me with the fix in [3] is that it
> still "leaks" the host's uid/gid to the guest when a corresponding file
> in .virtfs_metadata is not present (while I'd have expected it to appear
> as root:root in the guest), but that's a separate issue, and I guess
> retro-compatibility prevents any fixing it.
> 

Heh, I had a tentative patch to create root:root credentials and 0700 mode
bits by default... but this could indeed break some setups, so I decided
not to post it.

> Thanks for these patches!

Thanks for the testing! :)

Cheers,

--
Greg

> Leo
> 
> 
> [1]
> https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/virtualization/qemu/force-uid0-on-9p.patch
> 
> [2]
> https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/virtualization/qemu/no-etc-install.patch
> 
> [3] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg03663.html
> 



pgpY2AamtOOTf.pgp
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] target/s390x/cpu_models: Allow some additional feature bits for the "qemu" CPU

2017-05-24 Thread Thomas Huth

Currently we only present the plain z900 feature bits to the guest,
but QEMU already emulates some additional features (but not all of
the next CPU generation, so we can not use the next CPU level as
default yet). Since newer Linux kernels are checking the feature bits
and refuse to work if a required feature is missing, it would be nice
to have a way to present more of the supported features when we are
running with the "qemu" CPU.
This patch now adds the supported features to the "full_feat" bitmap,
so that additional features can be enabled on the command line now,
for example with:

 qemu-system-s390x -cpu qemu,stfle=true,ldisp=true,eimm=true,stckf=true

Signed-off-by: Thomas Huth 
---
 Note: This is a revised version of my previous patch "Set some additional
 feature bits for the qemu CPU" - but this time, the feature bits are not
 enabled by default, so this version should not create any issues with
 migration, I think. Rationale for not enabling it by default yet: It does
 not make sense to do add all the migration compatibility magic here yet as
 long as we do not provide enough facilities yet for running recent Linux
 kernels, so having just a way to enable the features on the command line
 for further development/experiments should be enough right now.
 Another change wrt. to my original patch: I've now also added the
 additional feature bits that Aurelien suggested.

 target/s390x/cpu_models.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 8d27363..654a6bc 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -658,6 +658,29 @@ static void check_compatibility(const S390CPUModel 
*max_model,
   "available in the configuration: ");
 }
 
+/**
+ * The base TCG CPU model "qemu" is based on the z900. However, we already
+ * can also emulate some additional features of later CPU generations, so
+ * we add these additional feature bits here.
+ */
+static void add_qemu_cpu_model_features(S390FeatBitmap fbm)
+{
+int i, feats[] = {
+S390_FEAT_STFLE,
+S390_FEAT_EXTENDED_IMMEDIATE,
+S390_FEAT_LONG_DISPLACEMENT,
+S390_FEAT_LONG_DISPLACEMENT_FAST,
+S390_FEAT_STORE_CLOCK_FAST,
+S390_FEAT_GENERAL_INSTRUCTIONS_EXT,
+S390_FEAT_EXECUTE_EXT,
+S390_FEAT_STFLE_45,
+};
+
+for (i = 0; i < ARRAY_SIZE(feats); i++) {
+set_bit(feats[i], fbm);
+}
+}
+
 static S390CPUModel *get_max_cpu_model(Error **errp)
 {
 static S390CPUModel max_model;
@@ -670,10 +693,11 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
 if (kvm_enabled()) {
 kvm_s390_get_host_cpu_model(&max_model, errp);
 } else {
-/* TCG emulates a z900 */
+/* TCG emulates a z900 (with some optional additional features) */
 max_model.def = &s390_cpu_defs[0];
 bitmap_copy(max_model.features, max_model.def->default_feat,
 S390_FEAT_MAX);
+add_qemu_cpu_model_features(max_model.features);
 }
 if (!*errp) {
 cached = true;
@@ -925,11 +949,14 @@ static void s390_host_cpu_model_initfn(Object *obj)
 
 static void s390_qemu_cpu_model_initfn(Object *obj)
 {
+static S390CPUDef s390_qemu_cpu_defs;
 S390CPU *cpu = S390_CPU(obj);
 
 cpu->model = g_malloc0(sizeof(*cpu->model));
-/* TCG emulates a z900 */
-cpu->model->def = &s390_cpu_defs[0];
+/* TCG emulates a z900 (with some optional additional features) */
+memcpy(&s390_qemu_cpu_defs, &s390_cpu_defs[0], sizeof(s390_qemu_cpu_defs));
+add_qemu_cpu_model_features(s390_qemu_cpu_defs.full_feat);
+cpu->model->def = &s390_qemu_cpu_defs;
 bitmap_copy(cpu->model->features, cpu->model->def->default_feat,
 S390_FEAT_MAX);
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/3] Remove of loadvm handlers

Hi

We just have a loadvm handlers that are a new list only used in a
single place.  Just move everything to use the savevm_handlers (yes,
it is a list, and we could have a better name).

Once there, vmstate_load() had three arguments but only needs two.  Fix that.

Please, review.

Juan Quintela (3):
  migration: Use savevm_handlers instead of loadvm copy
  migration: loadvm handlers are not used
  migration: Remove section_id parameter from vmstate_load

 include/migration/migration.h |  5 
 include/migration/vmstate.h   |  2 --
 include/qemu/typedefs.h   |  1 -
 migration/migration.c |  2 --
 migration/savevm.c| 58 ---
 5 files changed, 16 insertions(+), 52 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH 1/3] migration: Use savevm_handlers instead of loadvm copy

From: Juan Quintela 

There is no reason for having the loadvm_handlers at all.  There is
only one use, and we can use the savevm handlers.

We will remove the loadvm handlers on a following patch.

Signed-off-by: Juan Quintela 
---
 migration/savevm.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index d971e5e..55ac8c1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1792,7 +1792,7 @@ struct LoadStateEntry {
  * Returns: true if the footer was good
  *  false if there is a problem (and calls error_report to say why)
  */
-static bool check_section_footer(QEMUFile *f, LoadStateEntry *le)
+static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
 {
 uint8_t read_mark;
 uint32_t read_section_id;
@@ -1805,15 +1805,15 @@ static bool check_section_footer(QEMUFile *f, 
LoadStateEntry *le)
 read_mark = qemu_get_byte(f);
 
 if (read_mark != QEMU_VM_SECTION_FOOTER) {
-error_report("Missing section footer for %s", le->se->idstr);
+error_report("Missing section footer for %s", se->idstr);
 return false;
 }
 
 read_section_id = qemu_get_be32(f);
-if (read_section_id != le->section_id) {
+if (read_section_id != se->section_id) {
 error_report("Mismatched section id in footer for %s -"
  " read 0x%x expected 0x%x",
- le->se->idstr, read_section_id, le->section_id);
+ se->idstr, read_section_id, se->section_id);
 return false;
 }
 
@@ -1887,7 +1887,7 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
  " device '%s'", instance_id, idstr);
 return ret;
 }
-if (!check_section_footer(f, le)) {
+if (!check_section_footer(f, se)) {
 return -EINVAL;
 }
 
@@ -1898,29 +1898,29 @@ static int
 qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 {
 uint32_t section_id;
-LoadStateEntry *le;
+SaveStateEntry *se;
 int ret;
 
 section_id = qemu_get_be32(f);
 
 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
-if (le->section_id == section_id) {
+QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+if (se->section_id == section_id) {
 break;
 }
 }
-if (le == NULL) {
+if (se == NULL) {
 error_report("Unknown savevm section %d", section_id);
 return -EINVAL;
 }
 
-ret = vmstate_load(f, le->se, le->version_id);
+ret = vmstate_load(f, se, se->version_id);
 if (ret < 0) {
 error_report("error while loading state section id %d(%s)",
- section_id, le->se->idstr);
+ section_id, se->idstr);
 return ret;
 }
-if (!check_section_footer(f, le)) {
+if (!check_section_footer(f, se)) {
 return -EINVAL;
 }
 
-- 
2.9.3

[Qemu-devel] [PATCH 2/3] migration: loadvm handlers are not used

From: Juan Quintela 

So we remove all traces of them.

Signed-off-by: Juan Quintela 
---
 include/migration/migration.h |  5 -
 include/migration/vmstate.h   |  2 --
 include/qemu/typedefs.h   |  1 -
 migration/migration.c |  2 --
 migration/savevm.c| 28 +---
 5 files changed, 1 insertion(+), 37 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 0e807b6..d1a353a 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -50,8 +50,6 @@ enum mig_rp_message_type {
 MIG_RP_MSG_MAX
 };
 
-typedef QLIST_HEAD(, LoadStateEntry) LoadStateEntry_Head;
-
 /* State for the incoming migration */
 struct MigrationIncomingState {
 QEMUFile *from_src_file;
@@ -89,9 +87,6 @@ struct MigrationIncomingState {
 /* The coroutine we should enter (back) after failover */
 Coroutine *migration_incoming_co;
 QemuSemaphore colo_incoming_sem;
-
-/* See savevm.c */
-LoadStateEntry_Head loadvm_handlers;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f97411d..6689562 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1020,8 +1020,6 @@ extern const VMStateInfo vmstate_info_qtailq;
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
-void loadvm_free_handlers(MigrationIncomingState *mis);
-
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
void *opaque, int version_id);
 void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 33a6aa1..51958bf 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -39,7 +39,6 @@ typedef struct I2SCodec I2SCodec;
 typedef struct ISABus ISABus;
 typedef struct ISADevice ISADevice;
 typedef struct IsaDma IsaDma;
-typedef struct LoadStateEntry LoadStateEntry;
 typedef struct MACAddr MACAddr;
 typedef struct MachineClass MachineClass;
 typedef struct MachineState MachineState;
diff --git a/migration/migration.c b/migration/migration.c
index ad29e53..891a260 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -121,7 +121,6 @@ MigrationIncomingState *migration_incoming_get_current(void)
 if (!once) {
 mis_current.state = MIGRATION_STATUS_NONE;
 memset(&mis_current, 0, sizeof(MigrationIncomingState));
-QLIST_INIT(&mis_current.loadvm_handlers);
 qemu_mutex_init(&mis_current.rp_mutex);
 qemu_event_init(&mis_current.main_thread_load_event, false);
 once = true;
@@ -134,7 +133,6 @@ void migration_incoming_state_destroy(void)
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
 qemu_event_destroy(&mis->main_thread_load_event);
-loadvm_free_handlers(mis);
 }
 
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 55ac8c1..96b7173 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1779,13 +1779,6 @@ static int loadvm_process_command(QEMUFile *f)
 return 0;
 }
 
-struct LoadStateEntry {
-QLIST_ENTRY(LoadStateEntry) entry;
-SaveStateEntry *se;
-int section_id;
-int version_id;
-};
-
 /*
  * Read a footer off the wire and check that it matches the expected section
  *
@@ -1821,22 +1814,11 @@ static bool check_section_footer(QEMUFile *f, 
SaveStateEntry *se)
 return true;
 }
 
-void loadvm_free_handlers(MigrationIncomingState *mis)
-{
-LoadStateEntry *le, *new_le;
-
-QLIST_FOREACH_SAFE(le, &mis->loadvm_handlers, entry, new_le) {
-QLIST_REMOVE(le, entry);
-g_free(le);
-}
-}
-
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
 uint32_t instance_id, version_id, section_id;
 SaveStateEntry *se;
-LoadStateEntry *le;
 char idstr[256];
 int ret;
 
@@ -1873,15 +1855,7 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-/* Add entry */
-le = g_malloc0(sizeof(*le));
-
-le->se = se;
-le->section_id = section_id;
-le->version_id = version_id;
-QLIST_INSERT_HEAD(&mis->loadvm_handlers, le, entry);
-
-ret = vmstate_load(f, le->se, le->version_id);
+ret = vmstate_load(f, se, se->version_id);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
-- 
2.9.3

[Qemu-devel] [PATCH 3/3] migration: Remove section_id parameter from vmstate_load

From: Juan Quintela 

Everything else assumes that we always load a device from its own
savevm handler.

Signed-off-by: Juan Quintela 
---
 migration/savevm.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 96b7173..e847a6b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -721,13 +721,13 @@ void vmstate_unregister(DeviceState *dev, const 
VMStateDescription *vmsd,
 }
 }
 
-static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id)
+static int vmstate_load(QEMUFile *f, SaveStateEntry *se)
 {
 trace_vmstate_load(se->idstr, se->vmsd ? se->vmsd->name : "(old)");
 if (!se->vmsd) { /* Old style */
-return se->ops->load_state(f, se->opaque, version_id);
+return se->ops->load_state(f, se->opaque, se->version_id);
 }
-return vmstate_load_state(f, se->vmsd, se->opaque, version_id);
+return vmstate_load_state(f, se->vmsd, se->opaque, se->version_id);
 }
 
 static void vmstate_save_old_style(QEMUFile *f, SaveStateEntry *se, QJSON 
*vmdesc)
@@ -1855,7 +1855,7 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-ret = vmstate_load(f, se, se->version_id);
+ret = vmstate_load(f, se);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
@@ -1888,7 +1888,7 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-ret = vmstate_load(f, se, se->version_id);
+ret = vmstate_load(f, se);
 if (ret < 0) {
 error_report("error while loading state section id %d(%s)",
  section_id, se->idstr);
-- 
2.9.3

Re: [Qemu-devel] [PATCH] block/gluster: glfs_lseek() workaround

2017-05-24 Thread Niels de Vos

On Tue, May 23, 2017 at 01:27:50PM -0400, Jeff Cody wrote:
> On current released versions of glusterfs, glfs_lseek() will sometimes
> return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
> SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
> the case of error:
> 
> LSEEK(2):
> 
> off_t lseek(int fd, off_t offset, int whence);
> 
> [...]
> 
> SEEK_HOLE
>   Adjust  the file offset to the next hole in the file greater
>   than or equal to offset.  If offset points into the middle of
>   a hole, then the file offset is set to offset.  If there is no
>   hole past offset, then the file offset is adjusted to the end
>   of the file (i.e., there is  an implicit hole at the end of
>   any file).
> 
> [...]
> 
> RETURN VALUE
>   Upon  successful  completion,  lseek()  returns  the resulting
>   offset location as measured in bytes from the beginning of the
>   file.  On error, the value (off_t) -1 is returned and errno is
>   set to indicate the error
> 
> However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
> value less than the passed offset, yet greater than zero.
> 
> For instance, here are example values observed from this call:
> 
> offs = glfs_lseek(s->fd, start, SEEK_HOLE);
> if (offs < 0) {
> return -errno;  /* D1 and (H3 or H4) */
> }
> 
> start == 7608336384
> offs == 7607877632
> 
> This causes QEMU to abort on the assert test.  When this value is
> returned, errno is also 0.
> 
> This is a reported and known bug to glusterfs:
> https://bugzilla.redhat.com/show_bug.cgi?id=1425293
> 
> Although this is being fixed in gluster, we still should work around it
> in QEMU, given that multiple released versions of gluster behave this
> way.

Versions of GlusterFS 3.8.0 - 3.8.8 are affected, 3.8.9 has the fix
already and was released in February this year. We encourage users to
update to recent versions, and provide a stable (bugfix only) update
each month.

The Red Hat Gluster Storage product that is often used in combination
with QEMU (+oVirt) does unfortunately not have an update where this is
fixed.

Using an unfixed Gluster storage environment can cause QEMU to segfault.
Although fixes exist for Gluster, not all users will have them
installed. Preventing the segfault in QEMU due to a broken storage
environment makes sense to me.

> This patch treats the return case of (offs < start) the same as if an
> error value other than ENXIO is returned; we will assume we learned
> nothing, and there are no holes in the file.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/gluster.c | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/block/gluster.c b/block/gluster.c
> index 7c76cd0..c147909e 100644
> --- a/block/gluster.c
> +++ b/block/gluster.c
> @@ -1275,7 +1275,14 @@ static int find_allocation(BlockDriverState *bs, off_t 
> start,
>  if (offs < 0) {
>  return -errno;  /* D3 or D4 */
>  }
> -assert(offs >= start);
> +
> +if (offs < start) {
> +/* This is not a valid return by lseek().  We are safe to just return
> + * -EIO in this case, and we'll treat it like D4. Unfortunately some
> + *  versions of libgfapi will return offs < start, so an assert here
> + *  will unneccesarily abort QEMU. */

This is not really correct, the problem is not in libgfapi, but in the
protocol translator on the server-side. The version of libgfapi does not
matter here, it is the version on the server. But that might be too much
detail for the comment.

> +return -EIO;
> +}
>  
>  if (offs > start) {
>  /* D2: in hole, next data at offs */
> @@ -1307,7 +1314,14 @@ static int find_allocation(BlockDriverState *bs, off_t 
> start,
>  if (offs < 0) {
>  return -errno;  /* D1 and (H3 or H4) */
>  }
> -assert(offs >= start);
> +
> +if (offs < start) {
> +/* This is not a valid return by lseek().  We are safe to just return
> + * -EIO in this case, and we'll treat it like H4. Unfortunately some
> + *  versions of libgfapi will return offs < start, so an assert here
> + *  will unneccesarily abort QEMU. */
> +return -EIO;
> +}
>  
>  if (offs > start) {
>  /*
> -- 
> 2.9.3
> 

You might want to explain the problem a little different in the commit
message. It is fine too if you think it would become too detailed, my
explanation is in the archives now in any case.

Reviewed-by: Niels de Vos

Re: [Qemu-devel] [PATCH v3 15/24] docker: add powerpc build target

2017-05-24 Thread Alex Bennée


Philippe Mathieu-Daudé  writes:

> Hi Alex,
>
> On 05/22/2017 11:08 AM, Alex Bennée wrote:
>>
>> Philippe Mathieu-Daudé  writes:
>>
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>> ---
>>>  tests/docker/Makefile.include  |  4 +--
>>>  .../docker/dockerfiles/debian-powerpc-cross.docker | 40 
>>> ++
>>>  2 files changed, 42 insertions(+), 2 deletions(-)
>>>  create mode 100644 tests/docker/dockerfiles/debian-powerpc-cross.docker
>>>
>>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>>> index 111b8090b2..9815976486 100644
>>> --- a/tests/docker/Makefile.include
>>> +++ b/tests/docker/Makefile.include
>>> @@ -56,11 +56,13 @@ docker-image-%: $(DOCKER_FILES_DIR)/%.docker
>>> "BUILD","$*")
>>>
>>>  docker-image-debian-mipsel-cross: 
>>> EXTRA_FILES:=tests/docker/dockerfiles/debian-apt-fake.sh
>>> +docker-image-debian-powerpc-cross: 
>>> EXTRA_FILES:=tests/docker/dockerfiles/debian-apt-fake.sh
>>>
>>>  # Enforce dependancies for composite images
>>>  docker-image-debian-armhf-cross: docker-image-debian
>>>  docker-image-debian-arm64-cross: docker-image-debian
>>>  docker-image-debian-mipsel-cross: docker-image-debian
>>> +docker-image-debian-powerpc-cross: docker-image-debian
>>>
>>>  # Expand all the pre-requistes for each docker image and test combination
>>>  $(foreach i,$(DOCKER_IMAGES), \
>>> @@ -111,8 +113,6 @@ docker:
>>> @echo 'NOUSER   Define to disable adding current user 
>>> to containers passwd.'
>>> @echo 'NOCACHE=1Ignore cache when build images.'
>>> @echo 'EXECUTABLE=Include executable in image.'
>>> -   @echo 'EXTRA_FILES=" [... ]"'
>>> -   @echo ' Include extra files in image.'
>>
>> I'm fairly sure you didn't want to do this.
>
> Ups, good catch :) Rebase mistake.
> Are you Ok with those 2 lines back? (I think since it is pretty much a
> copy/paste of mipsel Dockerfile).
> I'll send fixed and your other r-b after Fam review, thanks!

Sure its fine with that fixed:

Reviewed-by: Alex Bennée 

>
>>
>>>
>>>  # This rule if for directly running against an arbitrary docker target.
>>>  # It is called by the expanded docker targets (e.g. make
>>> diff --git a/tests/docker/dockerfiles/debian-powerpc-cross.docker 
>>> b/tests/docker/dockerfiles/debian-powerpc-cross.docker
>>> new file mode 100644
>>> index 00..fa2cc7a657
>>> --- /dev/null
>>> +++ b/tests/docker/dockerfiles/debian-powerpc-cross.docker
>>> @@ -0,0 +1,40 @@
>>> +#
>>> +# Docker powerpc cross-compiler target
>>> +#
>>> +# This docker target builds on the base debian image.
>>> +#
>>> +FROM qemu:debian
>>> +MAINTAINER Philippe Mathieu-Daudé 
>>> +
>>> +# Add the foreign architecture we want and install dependencies
>>> +RUN dpkg --add-architecture powerpc
>>> +RUN apt-get update
>>> +RUN DEBIAN_FRONTEND=noninteractive eatmydata \
>>> +apt-get install -y --no-install-recommends \
>>> +crossbuild-essential-powerpc
>>> +
>>> +#  to fix "following packages have unmet dependencies" ...
>>> +ADD debian-apt-fake.sh /usr/local/bin/apt-fake
>>> +RUN apt-get install -y --no-install-recommends \
>>> +equivs \
>>> +pkg-config
>>> +RUN apt-fake install \
>>> +pkg-config:powerpc=0.28-1.1-fake && \
>>> +ln -s pkg-config /usr/bin/powerpc-linux-gnu-pkg-config
>>> +ENV PKG_CONFIG_PATH /usr/lib/powerpc-linux-gnu/pkgconfig
>>> +# 
>>> +
>>> +# Specify the cross prefix for this image (see tests/docker/common.rc)
>>> +ENV QEMU_CONFIGURE_OPTS --cross-prefix=powerpc-linux-gnu-
>>> +
>>> +RUN DEBIAN_FRONTEND=noninteractive eatmydata \
>>> +apt-get build-dep -yy -a powerpc qemu
>>> +RUN DEBIAN_FRONTEND=noninteractive \
>>> +apt-get install -y --no-install-recommends \
>>> +glusterfs-common:powerpc \
>>> +libbz2-dev:powerpc \
>>> +liblzo2-dev:powerpc \
>>> +libncursesw5-dev:powerpc \
>>> +libnfs-dev:powerpc \
>>> +librdmacm-dev:powerpc \
>>> +libsnappy-dev:powerpc
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée

Re: [Qemu-devel] [PATCH] docker: Add flex and bison to centos6 image

2017-05-24 Thread Alex Bennée


Fam Zheng  writes:

> Currently there are warnings about flex and bison being missing when
> building in the centos6 image:
>
> make[1]: flex: Command not found
>  BISON dtc-parser.tab.c
> make[1]: bison: Command not found
>
> Add them.
>
> Reported-by: Thomas Huth 
> Signed-off-by: Fam Zheng 

Reviewed-by: Alex Bennée 

> ---
>  tests/docker/dockerfiles/centos6.docker | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/docker/dockerfiles/centos6.docker 
> b/tests/docker/dockerfiles/centos6.docker
> index 34e0d3b..17a4d24 100644
> --- a/tests/docker/dockerfiles/centos6.docker
> +++ b/tests/docker/dockerfiles/centos6.docker
> @@ -1,7 +1,7 @@
>  FROM centos:6
>  RUN yum install -y epel-release
>  ENV PACKAGES libfdt-devel ccache \
> -tar git make gcc g++ \
> +tar git make gcc g++ flex bison \
>  zlib-devel glib2-devel SDL-devel pixman-devel \
>  epel-release
>  RUN yum install -y $PACKAGES


--
Alex Bennée

Re: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread no-reply

Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 20170524080914.23484-1-lpro...@redhat.com
Subject: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug 
mem hole SRAT entry
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-mingw@fedora
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20170524080914.23484-1-lpro...@redhat.com -> 
patchew/20170524080914.23484-1-lpro...@redhat.com
 * [new tag] patchew/20170524083446.32261-1-maxime.coque...@redhat.com 
-> patchew/20170524083446.32261-1-maxime.coque...@redhat.com
Switched to a new branch 'test'
4beffef pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-git7xdn0/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-git7xdn0/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=ceab1df38e0d
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG su

[Qemu-devel] [PATCH v2] vhost-user: pass message as a pointer to process_message_reply()

2017-05-24 Thread Maxime Coquelin

process_message_reply() was recently updated to get full message
content instead of only its request field.

There is no need to copy all the struct content into the stack,
so just pass its pointer as const.

Cc: Zhiyong Yang 
Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup when MQ")
Reviewed-by: Jens Freimann 
Reviewed-by: Zhiyong Yang 
Signed-off-by: Maxime Coquelin 
---
V2:
 - Make msg pointer const (Marc-Andre)
 - Apply R-b's

 hw/virtio/vhost-user.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b87a176..dde094a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -162,11 +162,11 @@ fail:
 }
 
 static int process_message_reply(struct vhost_dev *dev,
- VhostUserMsg msg)
+ const VhostUserMsg *msg)
 {
 VhostUserMsg msg_reply;
 
-if ((msg.flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
+if ((msg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
 return 0;
 }
 
@@ -174,10 +174,10 @@ static int process_message_reply(struct vhost_dev *dev,
 return -1;
 }
 
-if (msg_reply.request != msg.request) {
+if (msg_reply.request != msg->request) {
 error_report("Received unexpected msg type."
  "Expected %d received %d",
- msg.request, msg_reply.request);
+ msg->request, msg_reply.request);
 return -1;
 }
 
@@ -324,7 +324,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
 }
 
 if (reply_supported) {
-return process_message_reply(dev, msg);
+return process_message_reply(dev, &msg);
 }
 
 return 0;
@@ -716,7 +716,7 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, 
uint16_t mtu)
 
 /* If reply_ack supported, slave has to ack specified MTU is valid */
 if (reply_supported) {
-return process_message_reply(dev, msg);
+return process_message_reply(dev, &msg);
 }
 
 return 0;
-- 
2.9.4

Re: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread Laszlo Ersek

On 05/24/17 10:09, Ladi Prosek wrote:
> For reasons unknown, Windows won't online all memory, both at command
> line and hot-plugged later, unless the hotplug mem hole SRAT entry
> specifies a node greater or equal to the ones where memory is added.

s/greater or equal to/greater *than* or equal to/

Thanks,
Laszlo
(always finding the important issues! ;) )

> 
> Using the highest node on the machine makes recent versions of Windows
> happy.
> 
> With this example command line:
>   ... \
>   -m 1024,slots=4,maxmem=32G \
>   -numa node,nodeid=0 \
>   -numa node,nodeid=1 \
>   -numa node,nodeid=2 \
>   -numa node,nodeid=3 \
>   -object memory-backend-ram,size=1G,id=mem-mem1 \
>   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1,node=1
> 
> Windows reports a total of 1G of RAM without this commit and the expected
> 2G with this commit.
> 
> Signed-off-by: Ladi Prosek 
> ---
>  hw/i386/acpi-build.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index afcadac..9653583 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2404,14 +2404,17 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> MachineState *machine)
>  }
>  
>  /*
> - * Entry is required for Windows to enable memory hotplug in OS.
> + * Entry is required for Windows to enable memory hotplug in OS
> + * and for Linux to enable SWIOTLB even if booted with less than
> + * 4G of RAM. Windows works better if the entry sets proximity
> + * to the highest NUMA node in the machine.
>   * Memory devices may override proximity set by this entry,
>   * providing _PXM method if necessary.
>   */
>  if (hotplugabble_address_space_size) {
>  numamem = acpi_data_push(table_data, sizeof *numamem);
>  build_srat_memory(numamem, pcms->hotplug_memory.base,
> -  hotplugabble_address_space_size, 0,
> +  hotplugabble_address_space_size, pcms->numa_nodes 
> - 1,
>MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>  }
>  
>

Re: [Qemu-devel] [PATCH v2] vhost-user: pass message as a pointer to process_message_reply()

Hi

On Wed, May 24, 2017 at 1:06 PM Maxime Coquelin 
wrote:

> process_message_reply() was recently updated to get full message
> content instead of only its request field.
>
> There is no need to copy all the struct content into the stack,
> so just pass its pointer as const.
>
> Cc: Zhiyong Yang 
> Fixes: 60cd11024f41 ("hw/virtio: fix vhost user fails to startup when MQ")
> Reviewed-by: Jens Freimann 
> Reviewed-by: Zhiyong Yang 
> Signed-off-by: Maxime Coquelin 
>

Reviewed-by: Marc-André Lureau 


> ---
> V2:
>  - Make msg pointer const (Marc-Andre)
>  - Apply R-b's
>
>  hw/virtio/vhost-user.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index b87a176..dde094a 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -162,11 +162,11 @@ fail:
>  }
>
>  static int process_message_reply(struct vhost_dev *dev,
> - VhostUserMsg msg)
> + const VhostUserMsg *msg)
>  {
>  VhostUserMsg msg_reply;
>
> -if ((msg.flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
> +if ((msg->flags & VHOST_USER_NEED_REPLY_MASK) == 0) {
>  return 0;
>  }
>
> @@ -174,10 +174,10 @@ static int process_message_reply(struct vhost_dev
> *dev,
>  return -1;
>  }
>
> -if (msg_reply.request != msg.request) {
> +if (msg_reply.request != msg->request) {
>  error_report("Received unexpected msg type."
>   "Expected %d received %d",
> - msg.request, msg_reply.request);
> + msg->request, msg_reply.request);
>  return -1;
>  }
>
> @@ -324,7 +324,7 @@ static int vhost_user_set_mem_table(struct vhost_dev
> *dev,
>  }
>
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
>
>  return 0;
> @@ -716,7 +716,7 @@ static int vhost_user_net_set_mtu(struct vhost_dev
> *dev, uint16_t mtu)
>
>  /* If reply_ack supported, slave has to ack specified MTU is valid */
>  if (reply_supported) {
> -return process_message_reply(dev, msg);
> +return process_message_reply(dev, &msg);
>  }
>
>  return 0;
> --
> 2.9.4
>
>
> --
Marc-André Lureau

[Qemu-devel] [PATCH] load_uboot_image: don't assume a full header read

2017-05-24 Thread Andrew Jones

Don't allow load_uboot_image() to proceed when less bytes than
header-size was read.

Signed-off-by: Andrew Jones 
---
 hw/core/loader.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index bf17b42cbec2..f72930ca4a41 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -611,8 +611,9 @@ static int load_uboot_image(const char *filename, hwaddr 
*ep, hwaddr *loadaddr,
 return -1;
 
 size = read(fd, hdr, sizeof(uboot_image_header_t));
-if (size < 0)
+if (size < sizeof(uboot_image_header_t)) {
 goto out;
+}
 
 bswap_uboot_header(hdr);
 
-- 
2.9.4

Re: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread Ladi Prosek

On Wed, May 24, 2017 at 11:07 AM, Laszlo Ersek  wrote:
> On 05/24/17 10:09, Ladi Prosek wrote:
>> For reasons unknown, Windows won't online all memory, both at command
>> line and hot-plugged later, unless the hotplug mem hole SRAT entry
>> specifies a node greater or equal to the ones where memory is added.
>
> s/greater or equal to/greater *than* or equal to/
>
> Thanks,
> Laszlo
> (always finding the important issues! ;) )

Haha, thanks! Igor, please let me know if you'd like me to send a
corrected version.

>>
>> Using the highest node on the machine makes recent versions of Windows
>> happy.
>>
>> With this example command line:
>>   ... \
>>   -m 1024,slots=4,maxmem=32G \
>>   -numa node,nodeid=0 \
>>   -numa node,nodeid=1 \
>>   -numa node,nodeid=2 \
>>   -numa node,nodeid=3 \
>>   -object memory-backend-ram,size=1G,id=mem-mem1 \
>>   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1,node=1
>>
>> Windows reports a total of 1G of RAM without this commit and the expected
>> 2G with this commit.
>>
>> Signed-off-by: Ladi Prosek 
>> ---
>>  hw/i386/acpi-build.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index afcadac..9653583 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2404,14 +2404,17 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
>> MachineState *machine)
>>  }
>>
>>  /*
>> - * Entry is required for Windows to enable memory hotplug in OS.
>> + * Entry is required for Windows to enable memory hotplug in OS
>> + * and for Linux to enable SWIOTLB even if booted with less than
>> + * 4G of RAM. Windows works better if the entry sets proximity
>> + * to the highest NUMA node in the machine.
>>   * Memory devices may override proximity set by this entry,
>>   * providing _PXM method if necessary.
>>   */
>>  if (hotplugabble_address_space_size) {
>>  numamem = acpi_data_push(table_data, sizeof *numamem);
>>  build_srat_memory(numamem, pcms->hotplug_memory.base,
>> -  hotplugabble_address_space_size, 0,
>> +  hotplugabble_address_space_size, pcms->numa_nodes 
>> - 1,
>>MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>>  }
>>
>>
>

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

On Wed, 24 May 2017 15:07:54 +1000
David Gibson  wrote:

> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > If the OS is not started, QEMU sends an event to the OS
> > that is lost and cannot be recovered. An unplug is not
> > able to restore QEMU in a coherent state.
> > So, while the OS is not started, disable CPU and memory hotplug.
> > We use option vector 6 to know if the OS is started
> > 
> > Signed-off-by: Laurent Vivier   
> 
> Urgh.. I'm not terribly confident that this is really correct.  As
> discussed on the previous patch, you're essentially using OV6 as a
> flag that CAS is complete.
> 
> But while it undoubtedly makes the race window much smaller, I don't
> see that there's any guarantee the guest OS will really be able to
> handle hotplug events immediately after CAS.
> 
> In particular if the CAS process completes partially but then needs to
> trigger a reboot, I think that would end up setting the ov6 variable,
> but the OS would definitely not be in a state to accept events.
> 

We never have any guarantee that the OS will process an event that
we've sent actually (think of a kernel crash just after a successful
CAS negotiation for example, or any failure with the various guest
components involved in the process of hotplug).

> Mike, I really think we need some input from someone familiar with how
> these hotplug events are supposed to work.  What do we need to do to
> handle lost or stale events, such as those delivered when an OS is not
> booted.
> 

AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.

https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm

I'm not sure we can do anything better than being able to "cancel" a previous
hotplug attempt if it takes too long, but I'm not necessarily the expert you're
looking for :)

> > ---
> >  hw/ppc/spapr.c | 22 +++---
> >  1 file changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index eceb4cc..2e9320d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2625,6 +2625,7 @@ out:
> >  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
> > *dev,
> >  Error **errp)
> >  {
> > +sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> >  PCDIMMDevice *dimm = PC_DIMM(dev);
> >  PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> >  MemoryRegion *mr = ddc->get_memory_region(dimm);
> > @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  goto out;
> >  }
> >  
> > +if (dev->hotplugged) {
> > +if (!ms->os_name) {
> > +error_setg(&local_err, "Memory hotplug not supported without 
> > OS");
> > +goto out;
> > +}
> > +}
> > +
> >  out:
> >  error_propagate(errp, local_err);
> >  }
> > @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  Error **errp)
> >  {
> >  MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> > +sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> >  MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
> >  Error *local_err = NULL;
> >  CPUCore *cc = CPU_CORE(dev);
> > @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  int node_id;
> >  int index;
> >  
> > -if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> > -error_setg(&local_err, "CPU hotplug not supported for this 
> > machine");
> > -goto out;
> > +if (dev->hotplugged) {
> > +if (!mc->has_hotpluggable_cpus) {
> > +error_setg(&local_err,
> > +   "CPU hotplug not supported for this machine");
> > +goto out;
> > +}
> > +if (!ms->os_name) {
> > +error_setg(&local_err, "CPU hotplug not supported without OS");
> > +goto out;
> > +}
> >  }
> >  
> >  if (strcmp(base_core_type, type)) {  
> 



pgpvqcoIvgmxW.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

2017-05-24 Thread Alexey

On Wed, May 24, 2017 at 03:53:05PM +0800, Peter Xu wrote:
> On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
> > This patch provides blocktime calculation per vCPU,
> > as a summary and as a overlapped value for all vCPUs.
> > 
> > This approach was suggested by Peter Xu, as an improvements of
> > previous approch where QEMU kept tree with faulted page address and cpus 
> > bitmask
> > in it. Now QEMU is keeping array with faulted page address as value and vCPU
> > as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > list for blocktime per vCPU (could be traced with page_fault_addr)
> > 
> > Blocktime will not calculated if postcopy_blocktime field of
> > MigrationIncomingState wasn't initialized.
> > 
> > Signed-off-by: Alexey Perevalov 
> > ---
> >  migration/postcopy-ram.c | 102 
> > ++-
> >  migration/trace-events   |   5 ++-
> >  2 files changed, 105 insertions(+), 2 deletions(-)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index d647769..e70c44b 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -23,6 +23,7 @@
> >  #include "postcopy-ram.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/balloon.h"
> > +#include 
> >  #include "qemu/error-report.h"
> >  #include "trace.h"
> >  
> > @@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char 
> > *block_name, void *host_addr,
> >  return 0;
> >  }
> >  
> > +static int get_mem_fault_cpu_index(uint32_t pid)
> > +{
> > +CPUState *cpu_iter;
> > +
> > +CPU_FOREACH(cpu_iter) {
> > +if (cpu_iter->thread_id == pid) {
> > +return cpu_iter->cpu_index;
> > +}
> > +}
> > +trace_get_mem_fault_cpu_index(pid);
> > +return -1;
> > +}
> > +
> > +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> > +RAMBlock *rb)
> > +{
> > +int cpu;
> > +unsigned long int nr_bit;
> > +MigrationIncomingState *mis = migration_incoming_get_current();
> > +PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > +int64_t now_ms;
> > +
> > +if (!dc || ptid == 0) {
> > +return;
> > +}
> > +cpu = get_mem_fault_cpu_index(ptid);
> > +if (cpu < 0) {
> > +return;
> > +}
> > +nr_bit = get_copied_bit_offset(addr);
> > +if (test_bit(nr_bit, mis->copied_pages)) {
> > +return;
> > +}
> > +now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +if (dc->vcpu_addr[cpu] == 0) {
> > +atomic_inc(&dc->smp_cpus_down);
> > +}
> > +
> > +atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> > +atomic_xchg__nocheck(&dc->last_begin, now_ms);
> > +atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> 
> Looks like this is not what you and Dave have discussed?
> 
> (Btw, sorry to have not followed the thread recently, so I just went
>  over the discussion again...)
> 
> What I see that Dave suggested is (I copied from Dave's email):
> 
> blocktime_start:
>set CPU stall address
>check bitmap entry
>  if set then zero stall-address
> 
> While here it is:
> 
> blocktime_start:
>check bitmap entry
>  if set then return
>set CPU stall address
> 
> I don't think current version can really solve the risk condition. See
> this possible sequence:
> 
>receive-thread fault-thread 
>-- 
>   blocktime_start
> check bitmap entry,
>   if set then return
>blocktime_end
>  set bitmap entry
>  read CPU stall address,
>if none-0 then zero it
> set CPU stall address [1]
> 
> Then imho the address set at [1] will be stall again until forever.
> 
agree, I check is in incorrect order

> I think we should follow exactly what Dave has suggested.
> 
> And.. after a second thought, I am afraid even this would not satisfy
> all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
> AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
> the question is, can it quickly trigger another page fault?
>
yes, it can

> Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
> showing vcpu-thread X as well):
> 
>   receive-thread   fault-threadvcpu-thread X
>   --   -
>fault at addr A1
>fault_addr[X]=A1
>   UFFDIO_COPY page A1
>   check fault_addr[X] with A1
> if match, clear fault_addr[X]
>vcpu X starts
> 
> This is fine.
> 
> While since "vcpu X starts" can be right after UFFDIO_COPY, can this
> be possible?
Previous picture isn't possible, due to mark_postcopy_blocktime_end
is being called right after ioctl, and vCPU is w

Re: [Qemu-devel] [PATCH] pc: ACPI BIOS: use highest NUMA node for hotplug mem hole SRAT entry

2017-05-24 Thread Igor Mammedov

On Wed, 24 May 2017 11:16:14 +0200
Ladi Prosek  wrote:

> On Wed, May 24, 2017 at 11:07 AM, Laszlo Ersek  wrote:
> > On 05/24/17 10:09, Ladi Prosek wrote:  
> >> For reasons unknown, Windows won't online all memory, both at command
> >> line and hot-plugged later, unless the hotplug mem hole SRAT entry
> >> specifies a node greater or equal to the ones where memory is added.  
> >
> > s/greater or equal to/greater *than* or equal to/
> >
> > Thanks,
> > Laszlo
> > (always finding the important issues! ;) )  
> 
> Haha, thanks! Igor, please let me know if you'd like me to send a
> corrected version.
if you are going to respin see for nit below

also could you check that 'make V=1 check' is happy,
since patch changes SRAT table, bios tables test may fail
if that happen then please add to commit message
you can verify that updating reference blobs fixes make check with
tests/acpi-test-data/rebuild-expected-aml.sh

---
note to maintainer:
tests/acpi-test-data need to be updated due to change in SRAT table

> 
> >>
> >> Using the highest node on the machine makes recent versions of Windows
> >> happy.
> >>
> >> With this example command line:
> >>   ... \
> >>   -m 1024,slots=4,maxmem=32G \
> >>   -numa node,nodeid=0 \
> >>   -numa node,nodeid=1 \
> >>   -numa node,nodeid=2 \
> >>   -numa node,nodeid=3 \
> >>   -object memory-backend-ram,size=1G,id=mem-mem1 \
> >>   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1,node=1
> >>
> >> Windows reports a total of 1G of RAM without this commit and the expected
> >> 2G with this commit.
> >>
> >> Signed-off-by: Ladi Prosek 
> >> ---
> >>  hw/i386/acpi-build.c | 7 +--
> >>  1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index afcadac..9653583 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -2404,14 +2404,17 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
> >> MachineState *machine)
> >>  }
> >>
> >>  /*
> >> - * Entry is required for Windows to enable memory hotplug in OS.
> >> + * Entry is required for Windows to enable memory hotplug in OS
> >> + * and for Linux to enable SWIOTLB even if booted with less than
s/even if/when/

> >> + * 4G of RAM. Windows works better if the entry sets proximity
> >> + * to the highest NUMA node in the machine.
> >>   * Memory devices may override proximity set by this entry,
> >>   * providing _PXM method if necessary.
> >>   */
> >>  if (hotplugabble_address_space_size) {
> >>  numamem = acpi_data_push(table_data, sizeof *numamem);
> >>  build_srat_memory(numamem, pcms->hotplug_memory.base,
> >> -  hotplugabble_address_space_size, 0,
> >> +  hotplugabble_address_space_size, 
> >> pcms->numa_nodes - 1,
> >>MEM_AFFINITY_HOTPLUGGABLE | 
> >> MEM_AFFINITY_ENABLED);
> >>  }
> >>
> >>  
> >

Re: [Qemu-devel] [PATCH 0/3] Remove of loadvm handlers

On Wed, May 24, 2017 at 10:55:16AM +0200, Juan Quintela wrote:
> Hi
> 
> We just have a loadvm handlers that are a new list only used in a
> single place.  Just move everything to use the savevm_handlers (yes,
> it is a list, and we could have a better name).
> 
> Once there, vmstate_load() had three arguments but only needs two.  Fix that.
> 
> Please, review.
> 
> Juan Quintela (3):
>   migration: Use savevm_handlers instead of loadvm copy
>   migration: loadvm handlers are not used
>   migration: Remove section_id parameter from vmstate_load
> 
>  include/migration/migration.h |  5 
>  include/migration/vmstate.h   |  2 --
>  include/qemu/typedefs.h   |  1 -
>  migration/migration.c |  2 --
>  migration/savevm.c| 58 
> ---
>  5 files changed, 16 insertions(+), 52 deletions(-)
> 
> -- 
> 2.9.3
> 

This series looks nice to me. :-)

Series:

Reviewed-by: Peter Xu 

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH 1/7] qcow2: Remove unused Error in do_perform_cow()

2017-05-24 Thread Alberto Garcia

On Tue 23 May 2017 10:21:49 PM CEST, Eric Blake wrote:

>> qcow2_encrypt_sectors() does not need an Error parameter, and we're
>> not checking its value anyway, so we can safely remove it.
>
> Misleading. You are NOT removing the Error parameter from
> qcow2_encrypt_sectors(), but rather are explicitly ignoring any errors
> by passing NULL.

Ok, I'll update the comment in the next revision with something like
what you suggest.

Berto

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory

On Wed, 24 May 2017 14:52:36 +1000
David Gibson  wrote:
[...]
> 
> This patch seems like a good idea regardless of the rest, so I've
> fixed the minor nits Greg pointed out and merged to ppc-for-2.10.
> 

David,

Commit d2e4c6a1437fab2fbb4553b598f25e282c475199 in your ppc-for-2.10 branch
doesn't compile:

+static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
*dev,
+  Error **errp)
+{
+PCDIMMDevice *dimm = PC_DIMM(dev);
+PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+MemoryRegion *mr = ddc->get_memory_region(dimm);
+uint64_t size = memory_region_size(mr);
+char *mem_dev;
+
+if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+error_setg(&local_err, "Hotplugged memory size must be a multiple of "

s/&local_err/errp/

+  "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+goto out;

s/goto out/return/

+}
+
+mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
+if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
+error_setg(errp, "Memory backend has bad page size. "
+   "Use 'memory-backend-file' with correct mem-path.");
+}
+}
+


pgp3FQ4_CE5r1.pgp
Description: OpenPGP digital signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS

2017-05-24 Thread Igor Mammedov

On Wed, 24 May 2017 11:28:57 +0200
Greg Kurz  wrote:

> On Wed, 24 May 2017 15:07:54 +1000
> David Gibson  wrote:
> 
> > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:  
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > > 
> > > Signed-off-by: Laurent Vivier 
> > 
> > Urgh.. I'm not terribly confident that this is really correct.  As
> > discussed on the previous patch, you're essentially using OV6 as a
> > flag that CAS is complete.
> > 
> > But while it undoubtedly makes the race window much smaller, I don't
> > see that there's any guarantee the guest OS will really be able to
> > handle hotplug events immediately after CAS.
> > 
> > In particular if the CAS process completes partially but then needs to
> > trigger a reboot, I think that would end up setting the ov6 variable,
> > but the OS would definitely not be in a state to accept events.
wouldn't guest on reboot pick up updated fdt and online hotplugged
before crash cpu along with initial cpus?

> We never have any guarantee that the OS will process an event that
> we've sent actually (think of a kernel crash just after a successful
> CAS negotiation for example, or any failure with the various guest
> components involved in the process of hotplug).
> 
> > Mike, I really think we need some input from someone familiar with how
> > these hotplug events are supposed to work.  What do we need to do to
> > handle lost or stale events, such as those delivered when an OS is not
> > booted.
> >   
> 
> AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> 
> https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> 
> I'm not sure we can do anything better than being able to "cancel" a previous
> hotplug attempt if it takes too long, but I'm not necessarily the expert 
> you're
> looking for :)
From x86/ACPI world:
 - if hotplug happens early at boot before guest OS is running
   hotplug notification (SCI interrupt) stays pending and once guest
   is up it will/should handle it and online CPU
 - if guest crashed and is rebooted it will pickup updated apci tables (fdt 
equivalent)
   with all present cpus (including hotplugged one before crash) and online
   hotplugged cpu along with coldplugged ones
 - if guest looses SCI somehow, it's considered guest issue and such cpu
   stays unpluggable until guest picks it somehow (reboot, manually running 
cpus scan
   method from ACPI or another cpu hotplug event) and explicitly ejects it.

Taking in account that CPUs don't support surprise removal and requires
guest cooperation it's fine to leave CPU plugged in until guest ejects it.
That's what I'd expect to happen on baremetal, 
you hotplug CPU, hardware notifies OS about it and that's all,
cpu won't suddenly pop out if OS isn't able to online it.

More over that hotplugged cpu might be executing some code or one of
already present cpus might be executing initialization routines to online
it (think of host overcommit and arbitrary delays) so it is not really safe
to remove hotplugged but not onlined cpu without OS consent
(i.e. explicit eject by OS/firmware). I think the lost event handling should be
fixed on guest side and not in QEMU.

Re: [Qemu-devel] [virtio-dev] Re: virtio crypto device implemenation

2017-05-24 Thread Cornelia Huck

On Wed, 24 May 2017 04:13:47 +0300
"Michael S. Tsirkin"  wrote:

> On Tue, May 23, 2017 at 04:08:25PM +, Zeng, Xin wrote:
> > Hi, Michael, 
> >As you know, Lei Gong from Huawei and I are co-working  on virtio crypto 
> > device spec, he is focusing on symmetric algorithm part, I am focusing on 
> > asymmetric part.  Now I am planning the implementation for asymmetric part, 
> > would you please give me your point regarding the questions below?
> >Current virtio crypto device implementation from Lei Gong:
> >The virtio crypto device implementation has been upstreamed to QEMU and 
> > it has a qemu backend implementation for symmetric algorithm part, the 
> > front end Linux device driver for symmetric part has been upstreamed to 
> > Linux kernel as well.
> >My questions:
> >From my side, I planned to add the asymmetric part support in upstreamed 
> > front end device driver, and I don't want to add the asymmetric algorithm 
> > support to current virtio crypto device's qemu backend, instead, I would 
> > like to implement and upstream a DPDK vhost-user based backend for 
> > asymmetric algorithm, and accordingly Lei Gong will help to upstream a 
> > vhost user agent for virtio crypto device in QEMU,  is this approach 
> > acceptable? Is a qemu backend a mandatory requirement for the virtio crypto 
> > device?  Is there a general policy for this?
> > 
> > Thanks
> 
> Parity on QEMU side is naturally preferable.  I don't think we should require 
> it
> at all times, but if there's no implementation outside vhost-user,
> and if the feature includes a non-trivial amount of code, how
> will it be tested? I don't think we want to require all testers to use
> dpdk. An implementation under tests using libvhost-user might
> be a solution.

From the s390 perspective, I'd naturally prefer a qemu implementation.
I think there is value in being able to try it out on a variety of
platforms, so that you can shake out problems such as endianness easily.

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory

2017-05-24 Thread David Gibson

On Wed, May 24, 2017 at 11:55:13AM +0200, Greg Kurz wrote:
1;4601;0c> On Wed, 24 May 2017 14:52:36 +1000
> David Gibson  wrote:
> [...]
> > 
> > This patch seems like a good idea regardless of the rest, so I've
> > fixed the minor nits Greg pointed out and merged to ppc-for-2.10.
> > 
> 
> David,
> 
> Commit d2e4c6a1437fab2fbb4553b598f25e282c475199 in your ppc-for-2.10 branch
> doesn't compile:
> 
> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState 
> *dev,
> +  Error **errp)
> +{
> +PCDIMMDevice *dimm = PC_DIMM(dev);
> +PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +MemoryRegion *mr = ddc->get_memory_region(dimm);
> +uint64_t size = memory_region_size(mr);
> +char *mem_dev;
> +
> +if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +error_setg(&local_err, "Hotplugged memory size must be a multiple of 
> "
> 
> s/&local_err/errp/
> 
> +  "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> +goto out;
> 
> s/goto out/return/
> 
> +}
> +
> +mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, 
> NULL);
> +if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> +error_setg(errp, "Memory backend has bad page size. "
> +   "Use 'memory-backend-file' with correct mem-path.");
> +}
> +}
> +

Sorry, I found and fixed that already, but forgot to push the update.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2] migration: Allow unregister of save_live handlers

Migration non save_live handlers have an ops member that is
dinamically allocated by migration code.  Save_live handlers have it
passed as argument and are responsability of the caller.  Add a new
member is_allocated that remembers if ops has to be freed.  This
allows unregister_savevm() to work with save_live handlers.

Signed-off-by: Juan Quintela 

---

check that se->ops is not NULL (thanks peterx)
---
 include/migration/vmstate.h | 2 ++
 migration/savevm.c  | 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f97411d..1d20e30 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -57,6 +57,8 @@ typedef struct SaveVMHandlers {
   uint64_t *non_postcopiable_pending,
   uint64_t *postcopiable_pending);
 LoadStateHandler *load_state;
+/* Has been allocated by migratation code */
+bool is_allocated;
 } SaveVMHandlers;
 
 int register_savevm(DeviceState *dev,
diff --git a/migration/savevm.c b/migration/savevm.c
index d971e5e..569add3 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -628,6 +628,7 @@ int register_savevm(DeviceState *dev,
 SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
 ops->save_state = save_state;
 ops->load_state = load_state;
+ops->is_allocated = true;
 return register_savevm_live(dev, idstr, instance_id, version_id,
 ops, opaque);
 }
@@ -651,7 +652,9 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
 QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
 g_free(se->compat);
-g_free(se->ops);
+if (se->ops && se->ops->is_allocated) {
+g_free(se->ops);
+}
 g_free(se);
 }
 }
-- 
2.9.3

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

On Wed, May 24, 2017 at 12:37:20PM +0300, Alexey wrote:
> On Wed, May 24, 2017 at 03:53:05PM +0800, Peter Xu wrote:
> > On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:
> > > This patch provides blocktime calculation per vCPU,
> > > as a summary and as a overlapped value for all vCPUs.
> > > 
> > > This approach was suggested by Peter Xu, as an improvements of
> > > previous approch where QEMU kept tree with faulted page address and cpus 
> > > bitmask
> > > in it. Now QEMU is keeping array with faulted page address as value and 
> > > vCPU
> > > as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
> > > list for blocktime per vCPU (could be traced with page_fault_addr)
> > > 
> > > Blocktime will not calculated if postcopy_blocktime field of
> > > MigrationIncomingState wasn't initialized.
> > > 
> > > Signed-off-by: Alexey Perevalov 
> > > ---
> > >  migration/postcopy-ram.c | 102 
> > > ++-
> > >  migration/trace-events   |   5 ++-
> > >  2 files changed, 105 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index d647769..e70c44b 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -23,6 +23,7 @@
> > >  #include "postcopy-ram.h"
> > >  #include "sysemu/sysemu.h"
> > >  #include "sysemu/balloon.h"
> > > +#include 
> > >  #include "qemu/error-report.h"
> > >  #include "trace.h"
> > >  
> > > @@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char 
> > > *block_name, void *host_addr,
> > >  return 0;
> > >  }
> > >  
> > > +static int get_mem_fault_cpu_index(uint32_t pid)
> > > +{
> > > +CPUState *cpu_iter;
> > > +
> > > +CPU_FOREACH(cpu_iter) {
> > > +if (cpu_iter->thread_id == pid) {
> > > +return cpu_iter->cpu_index;
> > > +}
> > > +}
> > > +trace_get_mem_fault_cpu_index(pid);
> > > +return -1;
> > > +}
> > > +
> > > +static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
> > > +RAMBlock *rb)
> > > +{
> > > +int cpu;
> > > +unsigned long int nr_bit;
> > > +MigrationIncomingState *mis = migration_incoming_get_current();
> > > +PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
> > > +int64_t now_ms;
> > > +
> > > +if (!dc || ptid == 0) {
> > > +return;
> > > +}
> > > +cpu = get_mem_fault_cpu_index(ptid);
> > > +if (cpu < 0) {
> > > +return;
> > > +}
> > > +nr_bit = get_copied_bit_offset(addr);
> > > +if (test_bit(nr_bit, mis->copied_pages)) {
> > > +return;
> > > +}
> > > +now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +if (dc->vcpu_addr[cpu] == 0) {
> > > +atomic_inc(&dc->smp_cpus_down);
> > > +}
> > > +
> > > +atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
> > > +atomic_xchg__nocheck(&dc->last_begin, now_ms);
> > > +atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);
> > 
> > Looks like this is not what you and Dave have discussed?
> > 
> > (Btw, sorry to have not followed the thread recently, so I just went
> >  over the discussion again...)
> > 
> > What I see that Dave suggested is (I copied from Dave's email):
> > 
> > blocktime_start:
> >set CPU stall address
> >check bitmap entry
> >  if set then zero stall-address
> > 
> > While here it is:
> > 
> > blocktime_start:
> >check bitmap entry
> >  if set then return
> >set CPU stall address
> > 
> > I don't think current version can really solve the risk condition. See
> > this possible sequence:
> > 
> >receive-thread fault-thread 
> >-- 
> >   blocktime_start
> > check bitmap entry,
> >   if set then return
> >blocktime_end
> >  set bitmap entry
> >  read CPU stall address,
> >if none-0 then zero it
> > set CPU stall address [1]
> > 
> > Then imho the address set at [1] will be stall again until forever.
> > 
> agree, I check is in incorrect order
> 
> > I think we should follow exactly what Dave has suggested.
> > 
> > And.. after a second thought, I am afraid even this would not satisfy
> > all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
> > AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
> > the question is, can it quickly trigger another page fault?
> >
> yes, it can
> 
> > Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
> > showing vcpu-thread X as well):
> > 
> >   receive-thread   fault-threadvcpu-thread X
> >   --   -
> >fault at addr A1
> >fault_addr[X]=A1
> >   UFFDIO_COPY page A1
> >   check fault_

Re: [Qemu-devel] [PATCH v2] migration: Allow unregister of save_live handlers

On Wed, May 24, 2017 at 01:21:52PM +0200, Juan Quintela wrote:
> Migration non save_live handlers have an ops member that is
> dinamically allocated by migration code.  Save_live handlers have it
> passed as argument and are responsability of the caller.  Add a new
> member is_allocated that remembers if ops has to be freed.  This
> allows unregister_savevm() to work with save_live handlers.
> 
> Signed-off-by: Juan Quintela 

I thought Laurent will prepare one patch that will directly remove
existing register_savevm() callers, no?

> 
> ---
> 
> check that se->ops is not NULL (thanks peterx)
> ---
>  include/migration/vmstate.h | 2 ++
>  migration/savevm.c  | 5 -
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index f97411d..1d20e30 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -57,6 +57,8 @@ typedef struct SaveVMHandlers {
>uint64_t *non_postcopiable_pending,
>uint64_t *postcopiable_pending);
>  LoadStateHandler *load_state;
> +/* Has been allocated by migratation code */
> +bool is_allocated;
>  } SaveVMHandlers;
>  
>  int register_savevm(DeviceState *dev,
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d971e5e..569add3 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -628,6 +628,7 @@ int register_savevm(DeviceState *dev,
>  SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
>  ops->save_state = save_state;
>  ops->load_state = load_state;
> +ops->is_allocated = true;
>  return register_savevm_live(dev, idstr, instance_id, version_id,
>  ops, opaque);
>  }
> @@ -651,7 +652,9 @@ void unregister_savevm(DeviceState *dev, const char 
> *idstr, void *opaque)
>  if (strcmp(se->idstr, id) == 0 && se->opaque == opaque) {
>  QTAILQ_REMOVE(&savevm_state.handlers, se, entry);
>  g_free(se->compat);
> -g_free(se->ops);
> +if (se->ops && se->ops->is_allocated) {
> +g_free(se->ops);
> +}
>  g_free(se);
>  }
>  }
> -- 
> 2.9.3
> 

-- 
Peter Xu

Re: [Qemu-devel] [PATCH V6 04/10] migration: split ufd_version_check onto receive/request features part

On Wed, May 24, 2017 at 09:45:48AM +0300, Alexey wrote:

[...]

> > 
> > >  return false;
> > >  }
> > >  
> > > -ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |
> > > - (__u64)1 << _UFFDIO_UNREGISTER;
> > > +ioctl_mask = 1 << _UFFDIO_REGISTER |
> > > + 1 << _UFFDIO_UNREGISTER;
> > 
> > Could I ask why we explicitly removed (__u64) here? Since I see the
> > old one better.
> maybe my change not robust, in any case thank to point me, but now I
> think, here should be a constant instead of ioctl_mask, like
> UFFD_API_IOCTLS, the total meaning of that check it's make sure kernel
> returns to us no error and accepted features.
> ok, from the beginning:
> 
> if we request unsupported feature (we check it before) or internal
> state of userfault ctx inside kernel isn't UFFD_STATE_WAIT_API (for
> example we are in the middle of the coping process)
>   ioctl should end with EINVAL error and ioctls field in
>   uffdio_api will be empty
> 
> Right now I think ioctls check for UFFD_API is not necessary.
> We just say here, we will use _UFFDIO_REGISTER, _UFFDIO_UNREGISTER,
> but kernel supports it unconditionally, by contrast with
> UFFDIO_REGISTER ioctl - it also returns ioctl field in uffdio_register
> structure, here can be a variations.

Sorry I didn't get the point...

AFAIU here (__u64) makes the constant "1" a 64bit variable. Just like
when we do bit shift we normally have "1ULL<<40". I liked it since
even if _UFFDIO_REGISTER is defined as >32 it will not overflow since
by default a constant "1" is a "int" typed (and it's 32bit width).

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2] migration: Allow unregister of save_live handlers

Peter Xu  wrote:
> On Wed, May 24, 2017 at 01:21:52PM +0200, Juan Quintela wrote:
>> Migration non save_live handlers have an ops member that is
>> dinamically allocated by migration code.  Save_live handlers have it
>> passed as argument and are responsability of the caller.  Add a new
>> member is_allocated that remembers if ops has to be freed.  This
>> allows unregister_savevm() to work with save_live handlers.
>> 
>> Signed-off-by: Juan Quintela 
>
> I thought Laurent will prepare one patch that will directly remove
> existing register_savevm() callers, no?

s390, vmxnet3 and slirp are using it.  I am not sure that we will get
that ones "changed" so easily.

Later, Juan.

Re: [Qemu-devel] [PATCH 2/4] migration: Inactivate images after .save_live_complete_precopy()

Kevin Wolf  wrote:
> Block migration may still access the image during its
> .save_live_complete_precopy() implementation, so we should only
> inactivate the image afterwards.
>
> Another reason for the change is that inactivating an image fails when
> there is still a non-device BlockBackend using it, which includes the
> BBs used by block migration. We want to give block migration a chance to
> release the BBs before trying to inactivate the image (this will be done
> in another patch).
>
> Signed-off-by: Kevin Wolf 

Reviewed-by: Juan Quintela 


> ---
>  migration/migration.c | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 0304c01..846ba09 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1787,17 +1787,19 @@ static void migration_completion(MigrationState *s, 
> int current_active_state,
>  
>  if (!ret) {
>  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
> +if (ret >= 0) {
> +qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> +qemu_savevm_state_complete_precopy(s->to_dst_file, false);
> +}
>  /*
>   * Don't mark the image with BDRV_O_INACTIVE flag if
>   * we will go into COLO stage later.
>   */
>  if (ret >= 0 && !migrate_colo_enabled()) {
>  ret = bdrv_inactivate_all();
> -}
> -if (ret >= 0) {
> -qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
> -qemu_savevm_state_complete_precopy(s->to_dst_file, false);
> -s->block_inactive = true;
> +if (ret >= 0) {
> +s->block_inactive = true;
> +}
>  }
>  }
>  qemu_mutex_unlock_iothread();

Re: [Qemu-devel] [PATCH V6 08/10] migration: calculate vCPU blocktime on dst side

2017-05-24 Thread Alexey Perevalov


On 05/24/2017 02:22 PM, Peter Xu wrote:

On Wed, May 24, 2017 at 12:37:20PM +0300, Alexey wrote:

On Wed, May 24, 2017 at 03:53:05PM +0800, Peter Xu wrote:

On Tue, May 23, 2017 at 02:31:09PM +0300, Alexey Perevalov wrote:

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov 
---
  migration/postcopy-ram.c | 102 ++-
  migration/trace-events   |   5 ++-
  2 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d647769..e70c44b 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,6 +23,7 @@
  #include "postcopy-ram.h"
  #include "sysemu/sysemu.h"
  #include "sysemu/balloon.h"
+#include 
  #include "qemu/error-report.h"
  #include "trace.h"
  
@@ -577,6 +578,101 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,

  return 0;
  }
  
+static int get_mem_fault_cpu_index(uint32_t pid)

+{
+CPUState *cpu_iter;
+
+CPU_FOREACH(cpu_iter) {
+if (cpu_iter->thread_id == pid) {
+return cpu_iter->cpu_index;
+}
+}
+trace_get_mem_fault_cpu_index(pid);
+return -1;
+}
+
+static void mark_postcopy_blocktime_begin(uint64_t addr, uint32_t ptid,
+RAMBlock *rb)
+{
+int cpu;
+unsigned long int nr_bit;
+MigrationIncomingState *mis = migration_incoming_get_current();
+PostcopyBlocktimeContext *dc = mis->blocktime_ctx;
+int64_t now_ms;
+
+if (!dc || ptid == 0) {
+return;
+}
+cpu = get_mem_fault_cpu_index(ptid);
+if (cpu < 0) {
+return;
+}
+nr_bit = get_copied_bit_offset(addr);
+if (test_bit(nr_bit, mis->copied_pages)) {
+return;
+}
+now_ms = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+if (dc->vcpu_addr[cpu] == 0) {
+atomic_inc(&dc->smp_cpus_down);
+}
+
+atomic_xchg__nocheck(&dc->vcpu_addr[cpu], addr);
+atomic_xchg__nocheck(&dc->last_begin, now_ms);
+atomic_xchg__nocheck(&dc->page_fault_vcpu_time[cpu], now_ms);

Looks like this is not what you and Dave have discussed?

(Btw, sorry to have not followed the thread recently, so I just went
  over the discussion again...)

What I see that Dave suggested is (I copied from Dave's email):

blocktime_start:
set CPU stall address
check bitmap entry
  if set then zero stall-address

While here it is:

blocktime_start:
check bitmap entry
  if set then return
set CPU stall address

I don't think current version can really solve the risk condition. See
this possible sequence:

receive-thread fault-thread
-- 
   blocktime_start
 check bitmap entry,
   if set then return
blocktime_end
  set bitmap entry
  read CPU stall address,
if none-0 then zero it
 set CPU stall address [1]
 
Then imho the address set at [1] will be stall again until forever.



agree, I check is in incorrect order


I think we should follow exactly what Dave has suggested.

And.. after a second thought, I am afraid even this would not satisfy
all risk conditions. What if we consider the UFFDIO_COPY ioctl in?
AFAIU after UFFDIO_COPY the faulted vcpu can be running again, then
the question is, can it quickly trigger another page fault?


yes, it can


Firstly, a workable sequence is (adding UFFDIO_COPY ioctl in, and
showing vcpu-thread X as well):

   receive-thread   fault-threadvcpu-thread X
   --   -
fault at addr A1
fault_addr[X]=A1
   UFFDIO_COPY page A1
   check fault_addr[X] with A1
 if match, clear fault_addr[X]
vcpu X starts

This is fine.

While since "vcpu X starts" can be right after UFFDIO_COPY, can this
be possible?

Previous picture isn't possible, due to mark_postcopy_blocktime_end
is being called right after ioctl, and vCPU is waking up
inside ioctl, so check fault_addr will be after vcpu X starts.


   receive-thread   fault-threadvcpu-thread X
   --   -
fault at addr A1
fault_addr[X]

Re: [Qemu-devel] [PATCH V6 04/10] migration: split ufd_version_check onto receive/request features part

2017-05-24 Thread Alexey Perevalov


On 05/24/2017 02:33 PM, Peter Xu wrote:

On Wed, May 24, 2017 at 09:45:48AM +0300, Alexey wrote:

[...]


  return false;
  }
  
-ioctl_mask = (__u64)1 << _UFFDIO_REGISTER |

- (__u64)1 << _UFFDIO_UNREGISTER;
+ioctl_mask = 1 << _UFFDIO_REGISTER |
+ 1 << _UFFDIO_UNREGISTER;

Could I ask why we explicitly removed (__u64) here? Since I see the
old one better.

maybe my change not robust, in any case thank to point me, but now I
think, here should be a constant instead of ioctl_mask, like
UFFD_API_IOCTLS, the total meaning of that check it's make sure kernel
returns to us no error and accepted features.
ok, from the beginning:

if we request unsupported feature (we check it before) or internal
state of userfault ctx inside kernel isn't UFFD_STATE_WAIT_API (for
example we are in the middle of the coping process)
ioctl should end with EINVAL error and ioctls field in
uffdio_api will be empty

Right now I think ioctls check for UFFD_API is not necessary.
We just say here, we will use _UFFDIO_REGISTER, _UFFDIO_UNREGISTER,
but kernel supports it unconditionally, by contrast with
UFFDIO_REGISTER ioctl - it also returns ioctl field in uffdio_register
structure, here can be a variations.

Sorry I didn't get the point...

I misprinted
>We just say here, we will use _UFFDIO_REGISTER


s/_UFFDIO_REGISTER/_UFFDIO_API/g

but the point, ioctl_mask is not necessary here, kernel always returns it.
But for _UFFDIO_UNREGISTER, later, not in this function, yes that check is 
required.
 



AFAIU here (__u64) makes the constant "1" a 64bit variable. Just like
when we do bit shift we normally have "1ULL<<40". I liked it since
even if _UFFDIO_REGISTER is defined as >32 it will not overflow since
by default a constant "1" is a "int" typed (and it's 32bit width).



Thanks,



--
Best regards,
Alexey Perevalov

Re: [Qemu-devel] [PATCH V6 07/10] migration: add bitmap for copied page

On Wed, May 24, 2017 at 10:56:37AM +0300, Alexey wrote:
> On Wed, May 24, 2017 at 02:57:36PM +0800, Peter Xu wrote:
> > On Tue, May 23, 2017 at 02:31:08PM +0300, Alexey Perevalov wrote:
> > > This patch adds ability to track down already copied
> > > pages, it's necessary for calculation vCPU block time in
> > > postcopy migration feature and maybe for restore after
> > > postcopy migration failure.
> > > 
> > > Functions which work with RAMBlock are placed into ram.c,
> > > due to TARGET_WORDS_BIGENDIAN is poisoned int postcopy-ram.c -
> > > hardware independed code.
> > > 
> > > Signed-off-by: Alexey Perevalov 
> > > ---
> > >  include/migration/migration.h | 16 +++
> > >  migration/postcopy-ram.c  | 22 ---
> > >  migration/ram.c   | 63 
> > > +++
> > >  3 files changed, 97 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 449cb07..4e05c83 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -101,6 +101,20 @@ struct MigrationIncomingState {
> > >  LoadStateEntry_Head loadvm_handlers;
> > >  
> > >  /*
> > > + * bitmap indicates whether page copied,
> > > + * based on ramblock offset
> > > + * now it is using only for blocktime calculation in
> > > + * postcopy migration, so livetime of this entry:
> > > + * since user requested blocktime calculation,
> > > + * till the end of postcopy migration
> > > + * as an example it could represend following memory map
> > > + * ___
> > > + * |4k pages | hugepages | 4k pages
> > 
> > Can we really do this?
> 
> > 
> > The problem is AFAIU migration stream is sending pages only in target
> > page size, even for huge pages... so even for huge pages we should
> > still need per TARGET_PAGE_SIZE bitmap?
> sending maybe, but copying to userfault fd is doing in hugepage size

Yes you are right. :)

> in case of hugetlbfs memory backend.
> > 
> > (Please refer to ram_state_init() on init of RAMBlock.bmap)
> I thought to use bitmap per ramblock, but I decided to not do it,
> because of following reasons:
>   1. There is only 4k ramblocks, for such ramblock it's not
>   optimal

Could you explain what do you mean by "there is only 4k ramblocks"?

>   2. copied pages bitmap - it's attribute of incoming migration,
>   but ramblock it's general structure, although bmap it's about
>   dirty pages of precopy migrations, hmm, but RAMBlock also
>   contains unsentmap and it's for postcopy.

I thought you mean that copied pages are specific fields for incoming
so not suitable for RAMBlock struct? But the last sentence seems to
not support that idea... Hmm... Actually it sounds like a good idea to
me to put it into RAMBlock (but I maybe wrong).

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v5 1/4] net/rocker: Remove the dead error handling

2017-05-24 Thread Marcel Apfelbaum

- Original Message -
> From: "Markus Armbruster" 
> To: "Philippe Mathieu-Daudé" 
> Cc: qemu-devel@nongnu.org, "Mao Zhongyi" , 
> j...@resnulli.us, jasow...@redhat.com, "Michael
> S. Tsirkin" , "Marcel Apfelbaum" 
> Sent: Wednesday, May 24, 2017 8:35:04 AM
> Subject: Re: [Qemu-devel] [PATCH v5 1/4] net/rocker: Remove the dead error 
> handling
> 
> Philippe Mathieu-Daudé  writes:
> 
> > Hi Markus,
> >
> > On 05/23/2017 06:27 AM, Markus Armbruster wrote:
> > [...]
> >> There's one more cleanup opportunity:
> >>
> > [...]
> >>>  if (pci_dma_read(dev, le64_to_cpu(info->desc.buf_addr), info->buf,
> >>>  size)) {
> >>>  return NULL;
> >>>  }
> >>
> >> None of the pci_dma_read() calls outside rocker check the return value.
> >> Just as well, because it always returns 0.  Please clean this up in a
> >> separate followup patch.
> >
> > It may be the correct way to do it but this sounds like we are missing
> > something somewhere... pci_dma_read() calls pci_dma_rw() which always
> > returns 0. Why not let it returns void? It is inlined and never used
> > by address. Else we should document why returning 0 is correct, and
> > what is the reason to not use a void prototype.
> >
> > pci_dma_rw() calls dma_memory_rw() which does return a boolean value,
> > false on success (MEMTX_OK) and true on error
> > (MEMTX_ERROR/DECODE_ERROR)
> 
> PCI question.  Michael, Marcel?
> 

Hi Markus,

Looking at the git history, pci_dma_rw used to call cpu_physical_memory_rw
which, at that time (commit ec17457), returned void. Since the interface 
dictated
to return int, 0 is returned as "always OK".

The callers to pci_dma_read did not bother to check it for obvious reasons 
(even if they should).

In the meantime the memory API has changed to allow returning errors, but since 
the callers of
pci_dma_rw don't check the return value, why bother to update the PCI DMA?

History aside (and my speculations above), it seems  the right move is to update
the return value and check it by callers, but honestly I don't have any idea
if the emulated devices expect pci dma to fail.
Adding Paolo and David for more insights.

Thanks,
Marcel

[Qemu-devel] [PATCH] migration: remove register_savevm()

2017-05-24 Thread Laurent Vivier

We can replace the four remaining calls of register_savevm() by
calls to register_savevm_live(). So we can remove the function and
as we don't allocate anymore the ops pointer with g_new0()
we don't have to free it then.

Signed-off-by: Laurent Vivier 
---
 hw/net/vmxnet3.c|  8 ++--
 hw/s390x/s390-skeys.c   |  9 +++--
 hw/s390x/s390-virtio-ccw.c  |  8 ++--
 include/migration/vmstate.h |  8 
 migration/savevm.c  | 16 
 slirp/slirp.c   |  8 ++--
 6 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 8b1fab2..4df3110 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2262,6 +2262,11 @@ static const MemoryRegionOps b1_ops = {
 },
 };
 
+static SaveVMHandlers savevm_vmxnet3_msix = {
+.save_state = vmxnet3_msix_save,
+.load_state = vmxnet3_msix_load,
+};
+
 static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
 {
 uint64_t dsn_payload;
@@ -2331,8 +2336,7 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error 
**errp)
   vmxnet3_device_serial_num(s));
 }
 
-register_savevm(dev, "vmxnet3-msix", -1, 1,
-vmxnet3_msix_save, vmxnet3_msix_load, s);
+register_savevm_live(dev, "vmxnet3-msix", -1, 1, &savevm_vmxnet3_msix, s);
 }
 
 static void vmxnet3_instance_init(Object *obj)
diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
index e2d4e1a..a3dc088 100644
--- a/hw/s390x/s390-skeys.c
+++ b/hw/s390x/s390-skeys.c
@@ -363,6 +363,11 @@ static inline bool s390_skeys_get_migration_enabled(Object 
*obj, Error **errp)
 return ss->migration_enabled;
 }
 
+static SaveVMHandlers savevm_s390_storage_keys = {
+.save_state = s390_storage_keys_save,
+.load_state = s390_storage_keys_load,
+};
+
 static inline void s390_skeys_set_migration_enabled(Object *obj, bool value,
 Error **errp)
 {
@@ -376,8 +381,8 @@ static inline void s390_skeys_set_migration_enabled(Object 
*obj, bool value,
 ss->migration_enabled = value;
 
 if (ss->migration_enabled) {
-register_savevm(NULL, TYPE_S390_SKEYS, 0, 1, s390_storage_keys_save,
-s390_storage_keys_load, ss);
+register_savevm_live(NULL, TYPE_S390_SKEYS, 0, 1,
+ &savevm_s390_storage_keys, ss);
 } else {
 unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss);
 }
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index fdd4384..697a2d6 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -104,6 +104,11 @@ void s390_memory_init(ram_addr_t mem_size)
 s390_skeys_init();
 }
 
+static SaveVMHandlers savevm_gtod = {
+.save_state = gtod_save,
+.load_state = gtod_load,
+};
+
 static void ccw_init(MachineState *machine)
 {
 int ret;
@@ -146,8 +151,7 @@ static void ccw_init(MachineState *machine)
 s390_create_virtio_net(BUS(css_bus), "virtio-net-ccw");
 
 /* Register savevm handler for guest TOD clock */
-register_savevm(NULL, "todclock", 0, 1,
-gtod_save, gtod_load, kvm_state);
+register_savevm_live(NULL, "todclock", 0, 1, &savevm_gtod, kvm_state);
 }
 
 static void s390_cpu_plug(HotplugHandler *hotplug_dev,
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f97411d..21aa6ca 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -59,14 +59,6 @@ typedef struct SaveVMHandlers {
 LoadStateHandler *load_state;
 } SaveVMHandlers;
 
-int register_savevm(DeviceState *dev,
-const char *idstr,
-int instance_id,
-int version_id,
-SaveStateHandler *save_state,
-LoadStateHandler *load_state,
-void *opaque);
-
 int register_savevm_live(DeviceState *dev,
  const char *idstr,
  int instance_id,
diff --git a/migration/savevm.c b/migration/savevm.c
index d971e5e..32badfc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -617,21 +617,6 @@ int register_savevm_live(DeviceState *dev,
 return 0;
 }
 
-int register_savevm(DeviceState *dev,
-const char *idstr,
-int instance_id,
-int version_id,
-SaveStateHandler *save_state,
-LoadStateHandler *load_state,
-void *opaque)
-{
-SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
-ops->save_state = save_state;
-ops->load_state = load_state;
-return register_savevm_live(dev, idstr, instance_id, version_id,
-ops, opaque);
-}
-
 void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque)
 {
 SaveStateEntry *se, *new_se;
@@ -651,7 +636,6 @@ void unregister_savevm(DeviceState *dev, const char *idstr, 
void *opaque)
 if

Re: [Qemu-devel] [RFC 0/3] qmp: Return extra information on qom-list-types

Eduardo Habkost  writes:

> On Tue, May 23, 2017 at 04:12:43PM +0200, Markus Armbruster wrote:
>> Eduardo Habkost  writes:
>> 
>> > This series adds 'abstract' and 'parent-types' fields to the
>> > output of qom-list-types.
>> 
>> Peeking at PATCH 3, it looks like 'parent-types' is the transitive
>> closure of the single direct parent type (TypeImpl member parent_type).
>> Do we need information on interfaces as well?
>
> I think we should, but it is more complex so I plan to do in a
> separate patch.

Okay.

>> > For reference, below are the sizes of the output of
>> > "qom-list-types abstract=true" on qemu-system-x86_64, before and
>> > after applying the patches:
>> >
>> > * before: 11724 bytes
>> > * with 'abstract' field: 20119 bytes
>> > * with 'abstract' and 'parent-types': 44383 bytes
>> 
>> Obvious ways to save space:
>> 
>> * Make 'abstract' optional, default to false, present only when
>>   necessary (79 out of 456 times right now)
>
> Good idea.
>
>> 
>> * Pare down 'parent-types' to *direct* parent types.  The indirect ones
>>   are redundant.
>
> On the one hand, I assume clients don't care if a given type is a
> direct parent or indirect parent, and including only the direct
> parent type will require them to make extra queries.

Given the direct parents, computing their transitive closure is trivial.
So a single query should suffice.

> On the other hand, if the client wants to save a few queries it
> can use the "implements" argument, already? Not sure.
>
>> 
>> A less obvious way:
>> 
>> * 'parent-types' defines a relation with adjacency lists.  If the
>>   relation is tree-shaped, we can send the tree instead, i.e. a tree of
>>   device names rather than list of (device name, adjacency list).  New
>>   command unless we're willing to break qom-list-types.
>> 
>> > I'm not sure if extending qom-list-types with this info is the
>> > right way to go, or if we should keep qom-list-types as-is and
>> > add a new "query-qom-type" QMP command to query info on one QOM
>> > type at a time.
>> 
>> Might lead to more traffic rather than less.  Depends on what
>> information the client needs to query.
>
> I think queries that return all available QOM types are likely to
> (should?) be cached by clients.

Perhaps libvirt developers can help us here.

>  I believe the size will stay
> acceptable if we implement the two suggestions above.

Please do.

Re: [Qemu-devel] [PATCH V6 07/10] migration: add bitmap for copied page

2017-05-24 Thread Alexey Perevalov


On 05/24/2017 03:01 PM, Peter Xu wrote:

On Wed, May 24, 2017 at 10:56:37AM +0300, Alexey wrote:

On Wed, May 24, 2017 at 02:57:36PM +0800, Peter Xu wrote:

On Tue, May 23, 2017 at 02:31:08PM +0300, Alexey Perevalov wrote:

This patch adds ability to track down already copied
pages, it's necessary for calculation vCPU block time in
postcopy migration feature and maybe for restore after
postcopy migration failure.

Functions which work with RAMBlock are placed into ram.c,
due to TARGET_WORDS_BIGENDIAN is poisoned int postcopy-ram.c -
hardware independed code.

Signed-off-by: Alexey Perevalov 
---
  include/migration/migration.h | 16 +++
  migration/postcopy-ram.c  | 22 ---
  migration/ram.c   | 63 +++
  3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 449cb07..4e05c83 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -101,6 +101,20 @@ struct MigrationIncomingState {
  LoadStateEntry_Head loadvm_handlers;
  
  /*

+ * bitmap indicates whether page copied,
+ * based on ramblock offset
+ * now it is using only for blocktime calculation in
+ * postcopy migration, so livetime of this entry:
+ * since user requested blocktime calculation,
+ * till the end of postcopy migration
+ * as an example it could represend following memory map
+ * ___
+ * |4k pages | hugepages | 4k pages

Can we really do this?
The problem is AFAIU migration stream is sending pages only in target
page size, even for huge pages... so even for huge pages we should
still need per TARGET_PAGE_SIZE bitmap?

sending maybe, but copying to userfault fd is doing in hugepage size

Yes you are right. :)


in case of hugetlbfs memory backend.

(Please refer to ram_state_init() on init of RAMBlock.bmap)

I thought to use bitmap per ramblock, but I decided to not do it,
because of following reasons:
1. There is only 4k ramblocks, for such ramblock it's not
optimal

Could you explain what do you mean by "there is only 4k ramblocks"?

sorry, could be ramblocks with 4k size,
as example, your's qemu hmp info ramblock shows
 Block NamePSize  Offset Used  Total
/objects/mem2 MiB  0x 
0x2000 0x2000
vga.vram4 KiB  0x2006 
0x0100 0x0100
/rom@etc/acpi/tables4 KiB  0x2113 
0x0002 0x0020
 pc.bios4 KiB  0x2000 
0x0004 0x0004
  :00:03.0/e1000.rom4 KiB  0x2107 
0x0004 0x0004
  :00:04.0/e1000.rom4 KiB  0x210b 
0x0004 0x0004
  :00:05.0/e1000.rom4 KiB  0x210f 
0x0004 0x0004
  pc.rom4 KiB  0x2004 
0x0002 0x0002
:00:02.0/vga.rom4 KiB  0x2106 
0x0001 0x0001
   /rom@etc/table-loader4 KiB  0x2133 
0x1000 0x1000
  /rom@etc/acpi/rsdp4 KiB  0x21331000 
0x1000 0x1000


/rom@etc/table-loader and /rom@etc/acpi/rsdp has only one 4K page.



2. copied pages bitmap - it's attribute of incoming migration,
but ramblock it's general structure, although bmap it's about
dirty pages of precopy migrations, hmm, but RAMBlock also
contains unsentmap and it's for postcopy.

I thought you mean that copied pages are specific fields for incoming
so not suitable for RAMBlock struct? But the last sentence seems to
not support that idea... Hmm... Actually it sounds like a good idea to
me to put it into RAMBlock (but I maybe wrong).

Thanks,



--
Best regards,
Alexey Perevalov

Re: [Qemu-devel] [PATCH 3/5] migration: Remove use of old MigrationParams

Juan Quintela  writes:

> We have change in the previous patch to use migration capabilities for
> it.  Notice that we continue using the old command line flags from
> migrate command from the time being.  Remove the set_params method as

for the time being

> now it is empty.
>
> For savevm, one can't do a:
>
> savevm -b/-i foo
>
> but now one can do:
>
> migrate_set_capability block on
> savevm foo
>
> And we can't use block migration. We could disable block capability
> unconditionally, but it would not be much better.

I think I get what you're trying to say, but only because I have plenty
of context right now.  Let me try to rephrase:

  migration: Use new configuration instead of old MigrationParams

  The previous commit introduced a MigrationCapability and a
  MigrationParameter for block migration.  Use them instead of the old
  MigrationParams.

  Take care to reject attempts to combine block migration with
  snapshots, e.g. like this:

  migrate_set_capability block on
  savevm foo

> Signed-off-by: Juan Quintela 

Preferably with a commit message I can still understand three weeks from
now:
Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH 3/5] migration: Remove use of old MigrationParams

I got confused and replied to an old version.  Please ignore.

Markus Armbruster  writes:

> Juan Quintela  writes:
>
>> We have change in the previous patch to use migration capabilities for
>> it.  Notice that we continue using the old command line flags from
>> migrate command from the time being.  Remove the set_params method as
>
> for the time being
>
>> now it is empty.
>>
>> For savevm, one can't do a:
>>
>> savevm -b/-i foo
>>
>> but now one can do:
>>
>> migrate_set_capability block on
>> savevm foo
>>
>> And we can't use block migration. We could disable block capability
>> unconditionally, but it would not be much better.
>
> I think I get what you're trying to say, but only because I have plenty
> of context right now.  Let me try to rephrase:
>
>   migration: Use new configuration instead of old MigrationParams
>
>   The previous commit introduced a MigrationCapability and a
>   MigrationParameter for block migration.  Use them instead of the old
>   MigrationParams.
>
>   Take care to reject attempts to combine block migration with
>   snapshots, e.g. like this:
>
>   migrate_set_capability block on
>   savevm foo
>
>> Signed-off-by: Juan Quintela 
>
> Preferably with a commit message I can still understand three weeks from
> now:
> Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH 3/5] migration: Remove use of old MigrationParams

Markus Armbruster  writes:

> Juan Quintela  writes:
>
>> We have change in the previous patch to use migration capabilities for
>> it.  Notice that we continue using the old command line flags from
>> migrate command from the time being.  Remove the set_params method as
>> now it is empty.
>>
>> For savevm, one can't do a:
>>
>> savevm -b/-i foo
>
> Yes (savem has no such options).
>
>> but now one can do:
>>
>> migrate_set_capability block on
>> savevm foo
>>
>> And we can't use block migration. We could disable block capability
>> unconditionally, but it would not be much better.
>
> This leaves me confused: what does the example do?  Reading ahead...
> looks like it fails with "Block migration and snapshots are
> incompatible".  What are you trying to say here?

I think I now get what you're trying to say, but only because I've
picked up enough context.  Let me try to rephrase:

  migration: Use new configuration instead of old MigrationParams

  The previous commit introduced a MigrationCapability and a
  MigrationParameter for block migration.  Use them instead of the old
  MigrationParams.

  Take care to reject attempts to combine block migration with
  snapshots, e.g. like this:

  migrate_set_capability block on
  savevm foo

>> Signed-off-by: Juan Quintela 
>> Reviewed-by: Eric Blake 
>
> Patch looks good to me.

Preferably with a commit message I can still understand three weeks from
now:
Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH] migration: remove register_savevm()

2017-05-24 Thread Samuel Thibault

Laurent Vivier, on mer. 24 mai 2017 14:10:48 +0200, wrote:
> We can replace the four remaining calls of register_savevm() by
> calls to register_savevm_live(). So we can remove the function and
> as we don't allocate anymore the ops pointer with g_new0()
> we don't have to free it then.
> 
> Signed-off-by: Laurent Vivier 

Acked-by: Samuel Thibault 

> ---
>  hw/net/vmxnet3.c|  8 ++--
>  hw/s390x/s390-skeys.c   |  9 +++--
>  hw/s390x/s390-virtio-ccw.c  |  8 ++--
>  include/migration/vmstate.h |  8 
>  migration/savevm.c  | 16 
>  slirp/slirp.c   |  8 ++--
>  6 files changed, 25 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 8b1fab2..4df3110 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -2262,6 +2262,11 @@ static const MemoryRegionOps b1_ops = {
>  },
>  };
>  
> +static SaveVMHandlers savevm_vmxnet3_msix = {
> +.save_state = vmxnet3_msix_save,
> +.load_state = vmxnet3_msix_load,
> +};
> +
>  static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
>  {
>  uint64_t dsn_payload;
> @@ -2331,8 +2336,7 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, 
> Error **errp)
>vmxnet3_device_serial_num(s));
>  }
>  
> -register_savevm(dev, "vmxnet3-msix", -1, 1,
> -vmxnet3_msix_save, vmxnet3_msix_load, s);
> +register_savevm_live(dev, "vmxnet3-msix", -1, 1, &savevm_vmxnet3_msix, 
> s);
>  }
>  
>  static void vmxnet3_instance_init(Object *obj)
> diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
> index e2d4e1a..a3dc088 100644
> --- a/hw/s390x/s390-skeys.c
> +++ b/hw/s390x/s390-skeys.c
> @@ -363,6 +363,11 @@ static inline bool 
> s390_skeys_get_migration_enabled(Object *obj, Error **errp)
>  return ss->migration_enabled;
>  }
>  
> +static SaveVMHandlers savevm_s390_storage_keys = {
> +.save_state = s390_storage_keys_save,
> +.load_state = s390_storage_keys_load,
> +};
> +
>  static inline void s390_skeys_set_migration_enabled(Object *obj, bool value,
>  Error **errp)
>  {
> @@ -376,8 +381,8 @@ static inline void 
> s390_skeys_set_migration_enabled(Object *obj, bool value,
>  ss->migration_enabled = value;
>  
>  if (ss->migration_enabled) {
> -register_savevm(NULL, TYPE_S390_SKEYS, 0, 1, s390_storage_keys_save,
> -s390_storage_keys_load, ss);
> +register_savevm_live(NULL, TYPE_S390_SKEYS, 0, 1,
> + &savevm_s390_storage_keys, ss);
>  } else {
>  unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss);
>  }
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index fdd4384..697a2d6 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -104,6 +104,11 @@ void s390_memory_init(ram_addr_t mem_size)
>  s390_skeys_init();
>  }
>  
> +static SaveVMHandlers savevm_gtod = {
> +.save_state = gtod_save,
> +.load_state = gtod_load,
> +};
> +
>  static void ccw_init(MachineState *machine)
>  {
>  int ret;
> @@ -146,8 +151,7 @@ static void ccw_init(MachineState *machine)
>  s390_create_virtio_net(BUS(css_bus), "virtio-net-ccw");
>  
>  /* Register savevm handler for guest TOD clock */
> -register_savevm(NULL, "todclock", 0, 1,
> -gtod_save, gtod_load, kvm_state);
> +register_savevm_live(NULL, "todclock", 0, 1, &savevm_gtod, kvm_state);
>  }
>  
>  static void s390_cpu_plug(HotplugHandler *hotplug_dev,
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index f97411d..21aa6ca 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -59,14 +59,6 @@ typedef struct SaveVMHandlers {
>  LoadStateHandler *load_state;
>  } SaveVMHandlers;
>  
> -int register_savevm(DeviceState *dev,
> -const char *idstr,
> -int instance_id,
> -int version_id,
> -SaveStateHandler *save_state,
> -LoadStateHandler *load_state,
> -void *opaque);
> -
>  int register_savevm_live(DeviceState *dev,
>   const char *idstr,
>   int instance_id,
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d971e5e..32badfc 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -617,21 +617,6 @@ int register_savevm_live(DeviceState *dev,
>  return 0;
>  }
>  
> -int register_savevm(DeviceState *dev,
> -const char *idstr,
> -int instance_id,
> -int version_id,
> -SaveStateHandler *save_state,
> -LoadStateHandler *load_state,
> -void *opaque)
> -{
> -SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
> -ops->save_state = save_state;
> -ops->load

Re: [Qemu-devel] [PULL 00/21] s390x patches

2017-05-24 Thread Stefan Hajnoczi

On Tue, May 23, 2017 at 01:12:45PM +0200, Cornelia Huck wrote:
> The following changes since commit 56821559f0ba682fe6b367815572e6f974d329ab:
> 
>   Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging 
> (2017-05-18 13:36:15 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/cohuck/qemu tags/s390x-20170523
> 
> for you to fetch changes up to cb4f4bc3535f554daa3266aaa447843949a68193:
> 
>   s390/kvm: do not reset riccb on initial cpu reset (2017-05-19 12:31:28 
> +0200)
> 
> 
> s390x updates:
> - support for vfio-ccw to passthrough channel devices
> - allow ccw bios to boot from scsi generic devices
> - bugfix for initial reset
> 
> 
> 
> Christian Borntraeger (1):
>   s390/kvm: do not reset riccb on initial cpu reset
> 
> Cornelia Huck (1):
>   linux-headers: update
> 
> Dong Jia Shi (6):
>   s390x/css: realize css_create_sch
>   s390x/css: device support for s390-ccw passthrough
>   vfio/ccw: get io region info
>   vfio/ccw: get irqs info and set the eventfd fd
>   vfio/ccw: update sense data if a unit check is pending
>   MAINTAINERS: Add vfio-ccw maintainer
> 
> Eric Farman (8):
>   pc-bios/s390-ccw: Remove duplicate blk_factor adjustment
>   pc-bios/s390-ccw: Move SCSI block factor to outer read
>   pc-bios/s390-ccw: Break up virtio-scsi read into multiples
>   pc-bios/s390-ccw: Refactor scsi_inquiry function
>   pc-bios/s390-ccw: Get list of supported VPD pages
>   pc-bios/s390-ccw: Get Block Limits VPD device data
>   pc-bios/s390-ccw: Build a reasonable max_sectors limit
>   pc-bios/s390-ccw.img: rebuild image
> 
> Xiao Feng Ren (5):
>   s390x/css: add s390-squash-mcss machine option
>   s390x/css: realize css_sch_build_schib
>   vfio/ccw: vfio based subchannel passthrough driver
>   s390x/css: introduce and realize ccw-request callback
>   s390x/css: ccw translation infrastructure
> 
>  MAINTAINERS|   8 +
>  default-configs/s390x-softmmu.mak  |   1 +
>  hw/s390x/3270-ccw.c|   6 +-
>  hw/s390x/Makefile.objs |   1 +
>  hw/s390x/css-bridge.c  |   2 +
>  hw/s390x/css.c | 290 +-
>  hw/s390x/s390-ccw.c| 153 
>  hw/s390x/s390-virtio-ccw.c |  32 +-
>  hw/s390x/virtio-ccw.c  |   7 +-
>  hw/vfio/Makefile.objs  |   1 +
>  hw/vfio/ccw.c  | 434 
> +
>  include/hw/s390x/css-bridge.h  |   1 +
>  include/hw/s390x/css.h |  67 ++--
>  include/hw/s390x/s390-ccw.h|  39 ++
>  include/hw/s390x/s390-virtio-ccw.h |   1 +
>  include/hw/vfio/vfio-common.h  |   1 +
>  include/standard-headers/asm-x86/hyperv.h  |   7 +-
>  include/standard-headers/linux/input-event-codes.h |   1 +
>  include/standard-headers/linux/input.h |  11 +-
>  include/standard-headers/linux/pci_regs.h  |   3 +-
>  linux-headers/asm-arm/kvm.h|  10 +-
>  linux-headers/asm-arm/unistd-common.h  |   1 +
>  linux-headers/asm-arm64/kvm.h  |  10 +-
>  linux-headers/asm-powerpc/kvm.h|   3 +
>  linux-headers/asm-powerpc/unistd.h |   1 +
>  linux-headers/asm-s390/kvm.h   |  29 +-
>  linux-headers/asm-s390/unistd.h|   4 +-
>  linux-headers/asm-x86/kvm.h|   3 +
>  linux-headers/asm-x86/unistd_32.h  |   2 +
>  linux-headers/asm-x86/unistd_64.h  |   1 +
>  linux-headers/asm-x86/unistd_x32.h |   1 +
>  linux-headers/linux/kvm.h  |  25 ++
>  linux-headers/linux/userfaultfd.h  |  11 +-
>  linux-headers/linux/vfio.h |  18 +
>  linux-headers/linux/vfio_ccw.h |  24 ++
>  pc-bios/s390-ccw.img   | Bin 26472 -> 26480 bytes
>  pc-bios/s390-ccw/s390-ccw.h|   7 +
>  pc-bios/s390-ccw/scsi.h|  30 ++
>  pc-bios/s390-ccw/virtio-scsi.c |  85 +++-
>  pc-bios/s390-ccw/virtio-scsi.h |   2 +
>  pc-bios/s390-ccw/virtio.h  |   1 +
>  qemu-options.hx|   6 +-
>  scripts/update-linux-headers.sh|   2 +-
>  target/s390x/cpu.c |   7 +-
>  target/s390x/cpu.h |  16 +-
>  target/s390x/ioinst.c  |   9 +
>  46 files changed, 1293 insertions(+), 81 deletions(-)
>

Re: [Qemu-devel] [PATCH v6 0/9] qemu-img: add measure sub-command

2017-05-24 Thread Stefan Hajnoczi

On Mon, May 08, 2017 at 10:15:27AM -0400, Stefan Hajnoczi wrote:
> v6:
>  * Change bdrv_measure() return type to BlockMeasureInfo * [Eric]
>  * Clarify that holes in sparse POSIX files are still counted [Eric]
> 
> v5:
>  * Use UINT64_MAX instead of ~0ULL [Berto]
>  * Document qemu-img measure ofmt, fmt, output_fmt, and snapshot_param
>[Berto]
> 
> v4:
>  * Make qcow2 refcount calculation conservative [Maor]
>  * Include actual qemu-img convert image size in test cases
> 
> v3:
>  * Drop RFC, this is ready to go for QEMU 2.10
>  * Use "required size" instead of "required bytes" in qemu-img output for
>consistency [Nir]
>  * Clarify BlockMeasureInfo semantics [Max]
>  * Clarify bdrv_measure() opts argument and error handling [Nir]
>  * Handle -o backing_file= for qcow2 [Max]
>  * Handle snapshot options in qemu-img measure
>  * Probe input image for allocated data clusters for qcow2.  Didn't centralize
>this because there are format-specific aspects such as the cluster_size.  
> It
>may make sense to centralize it later (with a bit more complexity) if
>support is added to more formats.
>  * Add qemu-img(1) man page section for 'measure' sub-command [Max]
>  * Extend test case to cover additional scenarios [Nir]
> 
> RFCv2:
>  * Publishing RFC again to discuss the new user-visible interfaces.  Code has
>changed quite a bit, I have not kept any Reviewed-by tags.
>  * Rename qemu-img sub-command "measure" and API bdrv_measure() [Nir]
>  * Report both "required bytes" and "fully allocated bytes" to handle the 
> empty
>image file and prealloc use cases [Nir and Dan]
>  * Use bdrv_getlength() instead of bdrv_nb_sectors() [Berto]
>  * Rename "err" label "out" in qemu-img-cmds.c [Nir]
>  * Add basic qcow2 support, doesn't support qemu-img convert from existing 
> files yet
> 
> RFCv1:
>  * Publishing patch series with just raw support, no qcow2 yet.  Please review
>the command-line interface and let me know if you are happy with this
>approach.
> 
> Users and management tools sometimes need to know the size required for a new
> disk image so that an LVM volume, SAN LUN, etc can be allocated ahead of time.
> Image formats like qcow2 have non-trivial metadata that makes it hard to
> estimate the exact size without knowledge of file format internals.
> 
> This patch series introduces a new qemu-img sub-command that calculates the
> required size for both image creation and conversion scenarios.
> 
> The conversion scenario is:
> 
>   $ qemu-img measure -f raw -O qcow2 input.img
>   required size: 1327680
>   fully allocated size: 1074069504
> 
> Here an existing image file is taken and the output includes the space 
> required
> for data from the input image file.
> 
> The creation scenario is:
> 
>   $ qemu-img measure -O qcow2 --size 5G
>   required size: 327680
>   fully allocated size: 1074069504
> 
> Stefan Hajnoczi (9):
>   block: add bdrv_measure() API
>   raw-format: add bdrv_measure() support
>   qcow2: extract preallocation calculation function
>   qcow2: make refcount size calculation conservative
>   qcow2: extract image creation option parsing
>   qcow2: add bdrv_measure() support
>   qemu-img: add measure subcommand
>   qemu-iotests: support per-format golden output files
>   iotests: add test 178 for qemu-img measure
> 
>  qapi/block-core.json |  25 +++
>  include/block/block.h|   2 +
>  include/block/block_int.h|   2 +
>  block.c  |  35 
>  block/qcow2.c| 372 
> +--
>  block/raw-format.c   |  26 +++
>  qemu-img.c   | 228 
>  qemu-img-cmds.hx |   6 +
>  qemu-img.texi|  30 
>  tests/qemu-iotests/178   | 168 ++
>  tests/qemu-iotests/178.out.qcow2 | 278 +
>  tests/qemu-iotests/178.out.raw   | 146 +++
>  tests/qemu-iotests/check |   5 +
>  tests/qemu-iotests/group |   1 +
>  14 files changed, 1230 insertions(+), 94 deletions(-)
>  create mode 100755 tests/qemu-iotests/178
>  create mode 100644 tests/qemu-iotests/178.out.qcow2
>  create mode 100644 tests/qemu-iotests/178.out.raw

Ping


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero

2017-05-24 Thread Felipe Franciosi


> On 23 May 2017, at 05:27, Peter Xu  wrote:
> 
> On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote:
>> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set
>> to ram_state.bytes_transferred which is, at this point, zero. The next
>> time migration_bitmap_sync() is called, an iteration has happened and
>> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second
>> has passed, so the auto converge logic will be triggered and
>> bytes_xfer_now will also be set to 'x' bytes.
>> 
>> This condition is currently masked by dirty_rate_high_cnt, which will
>> wait for a few iterations before throttling. It would otherwise always
>> assume zero bytes have been copied and therefore throttle the guest
>> (possibly) prematurely.
>> 
>> Given bytes_xfer_prev is only used by the auto convergence logic, it
>> makes sense to only set its value after a check has been made against
>> bytes_xfer_now.
>> 
>> Signed-off-by: Felipe Franciosi 
>> ~
>> ---
>> migration/ram.c | 4 
>> 1 file changed, 4 deletions(-)
>> 
>> diff --git a/migration/ram.c b/migration/ram.c
>> index f59fdd4..793af39 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs)
>> 
>> rs->bitmap_sync_count++;
>> 
>> -if (!rs->bytes_xfer_prev) {
>> -rs->bytes_xfer_prev = ram_bytes_transferred();
>> -}
>> -
>> if (!rs->time_last_bitmap_sync) {
>> rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> }
>> -- 
>> 1.9.5
>> 
>> 
> 
> I feel like this patch wants to correctly initialize bytes_xfer_prev,
> however I still see problem. E.g., when user specify auto-convergence
> during migration, and in the first iteration we'll always have a very
> small bytes_xfer_prev (with this patch, it'll be zero) with a very big
> bytes_xfer_now (which is the ram_bytes_transferred() value).

Interesting point. Worth noting, that's no different than what happens today 
anyway (bytes_xfer_prev would be initialised to the first non-zero 
bytes_transferred). I therefore don't think it should stop or slow this patch's 
acceptance.

> If so, do
> you think squash below change together with current one would be more
> accurate?

As a matter of fact I had a different idea (below) to fix what you are 
describing. I was still experimenting with this code, so haven't sent a patch 
yet. But I'm going to send a series soon for comments. Basically, I think some 
other changes are required to make sure these numbers are correct:

1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2
- The original commit msg from 070afca25 (Jason J. Herne) says that convergence 
should be triggered after two passes. Current code does it after three passes 
(and four passes the first time around; see number 2 below).
- I personally feel this counter should go away altogether. If the migration is 
not converging and the VM is going to be throttled, there's no point stressing 
the network any further; just start throttling straight away.

2) dirty_pages_rate should be updated before the autoconverge logic.
- Right now, we delay throttling by a further iteration, as dirty_pages_rate is 
set after the first pass through the autoconverge logic (it is zero the first 
time around).
- The "if (rs->dirty_pages_rate &&..." part of the conditional can then be 
removed, as it won't ever be zero.

3) bytes_xfer counters should be updated alongside dirty_pages counters (for 
the same period).
- This fixes the issue you described, as bytes_xfer_* will correspond to the 
period.

I'll send the series shortly. Thoughts in the meantime?

Thanks,
Felipe

> 
> diff --git a/migration/ram.c b/migration/ram.c
> index f59fdd4..e01a218 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -703,7 +703,8 @@ static void migration_bitmap_sync(RAMState *rs)
>throttling */
> bytes_xfer_now = ram_bytes_transferred();
> 
> -if (rs->dirty_pages_rate &&
> +/* Skip first iteration when bytes_xfer_prev not inited */
> +if (rs->bytes_xfer_prev && rs->dirty_pages_rate &&
>(rs->num_dirty_pages_period * TARGET_PAGE_SIZE >
>(bytes_xfer_now - rs->bytes_xfer_prev) / 2) &&
>(rs->dirty_rate_high_cnt++ >= 2)) {
> 
> Even I am not sure whether we can move bytes_xfer_prev out of RAMState
> since it's only used by migration_bitmap_sync().
> 
> Thanks,
> 
> -- 
> Peter Xu

Re: [Qemu-devel] [RFC 0/3] qmp: Return extra information on qom-list-types

2017-05-24 Thread Eduardo Habkost

On Wed, May 24, 2017 at 02:13:16PM +0200, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> > On Tue, May 23, 2017 at 04:12:43PM +0200, Markus Armbruster wrote:
> >> Eduardo Habkost  writes:
> >> 
> >> > This series adds 'abstract' and 'parent-types' fields to the
> >> > output of qom-list-types.
> >> 
> >> Peeking at PATCH 3, it looks like 'parent-types' is the transitive
> >> closure of the single direct parent type (TypeImpl member parent_type).
> >> Do we need information on interfaces as well?
> >
> > I think we should, but it is more complex so I plan to do in a
> > separate patch.
> 
> Okay.
> 
> >> > For reference, below are the sizes of the output of
> >> > "qom-list-types abstract=true" on qemu-system-x86_64, before and
> >> > after applying the patches:
> >> >
> >> > * before: 11724 bytes
> >> > * with 'abstract' field: 20119 bytes
> >> > * with 'abstract' and 'parent-types': 44383 bytes
> >> 
> >> Obvious ways to save space:
> >> 
> >> * Make 'abstract' optional, default to false, present only when
> >>   necessary (79 out of 456 times right now)
> >
> > Good idea.
> >
> >> 
> >> * Pare down 'parent-types' to *direct* parent types.  The indirect ones
> >>   are redundant.
> >
> > On the one hand, I assume clients don't care if a given type is a
> > direct parent or indirect parent, and including only the direct
> > parent type will require them to make extra queries.
> 
> Given the direct parents, computing their transitive closure is trivial.
> So a single query should suffice.

If you have the full list, yes.  I was thinking of something like
asking "does type FOO is a (direct or indirect) child of BAR?".

"qom-list-types implements=FOO" wouldn't suffice.

"qom-list-types implements=BAR" would be enough, but it would
also return all subtypes of BAR.

However, I think this example is not very realistic.  libvirt,
for example, does query everything about QEMU capabilities only
once, and then cache it.  Even if a client doesn't cache it, the
size of "qom-list-types implements=BAR" should be reasonable.

> 
> > On the other hand, if the client wants to save a few queries it
> > can use the "implements" argument, already? Not sure.
> >
> >> 
> >> A less obvious way:
> >> 
> >> * 'parent-types' defines a relation with adjacency lists.  If the
> >>   relation is tree-shaped, we can send the tree instead, i.e. a tree of
> >>   device names rather than list of (device name, adjacency list).  New
> >>   command unless we're willing to break qom-list-types.
> >> 
> >> > I'm not sure if extending qom-list-types with this info is the
> >> > right way to go, or if we should keep qom-list-types as-is and
> >> > add a new "query-qom-type" QMP command to query info on one QOM
> >> > type at a time.
> >> 
> >> Might lead to more traffic rather than less.  Depends on what
> >> information the client needs to query.
> >
> > I think queries that return all available QOM types are likely to
> > (should?) be cached by clients.
> 
> Perhaps libvirt developers can help us here.

The only usage of qom-list-types I found in libvirt has no
command arguments, and is called by code that checks for cached
information first.

> 
> >  I believe the size will stay
> > acceptable if we implement the two suggestions above.
> 
> Please do.

-- 
Eduardo

Re: [Qemu-devel] [PATCH] libvhost-user: Fix VHOST_USER_NET_SET_MTU entry



- Original Message -
> From: "Dr. David Alan Gilbert" 
> 
> VHOST_USER_'s 20th entry is defined as VHOST_USER_INPUT_GET_CONFIG but
> should be VHOST_USER_NET_SET_MTU.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Marc-André Lureau 

> ---
>  contrib/libvhost-user/libvhost-user.c | 2 +-
>  contrib/libvhost-user/libvhost-user.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c
> b/contrib/libvhost-user/libvhost-user.c
> index af4faad60b..645cb36c4c 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -62,7 +62,7 @@ vu_request_to_string(int req)
>  REQ(VHOST_USER_GET_QUEUE_NUM),
>  REQ(VHOST_USER_SET_VRING_ENABLE),
>  REQ(VHOST_USER_SEND_RARP),
> -REQ(VHOST_USER_INPUT_GET_CONFIG),
> +REQ(VHOST_USER_NET_SET_MTU),
>  REQ(VHOST_USER_MAX),
>  };
>  #undef REQ
> diff --git a/contrib/libvhost-user/libvhost-user.h
> b/contrib/libvhost-user/libvhost-user.h
> index 156b50e989..ecf9539ffa 100644
> --- a/contrib/libvhost-user/libvhost-user.h
> +++ b/contrib/libvhost-user/libvhost-user.h
> @@ -60,7 +60,7 @@ typedef enum VhostUserRequest {
>  VHOST_USER_GET_QUEUE_NUM = 17,
>  VHOST_USER_SET_VRING_ENABLE = 18,
>  VHOST_USER_SEND_RARP = 19,
> -VHOST_USER_INPUT_GET_CONFIG = 20,
> +VHOST_USER_NET_SET_MTU = 20,
>  VHOST_USER_MAX
>  } VhostUserRequest;
>  
> --
> 2.12.2
> 
>

Re: [Qemu-devel] new to qemu with a simple question

On 05/23/2017 09:56 PM, Wang Dong wrote:
> 
> Hi guys,
> 
> I am new to qemu. But I need do some job in it right now.
> 
> When I try read qmp code. I found a interesting part against it.
> 
> Some C source code is generate from json file.
> 
> I wonder why this? What is the benefits of this?

Boring and repetitive code that is easy to typo is best written by a
computer, which excels at boring and repetitive tasks.  Writing our
description in a more concise higher language and generating C code from
that lets us focus on the actual design, rather than the mundane
correctness of the code implementing the design.

It's the same reason that people use bison/yacc rather than hand-written
parsers for complex grammars - you isolate the correctness of the code
to the correctness of the generator, and free yourself to now only have
to worry about the bigger picture of the input you feed to the
generator.  And on another level, it's why we write programs in C
instead of assembly.  Abstraction is good.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/3] migration: Use savevm_handlers instead of loadvm copy

2017-05-24 Thread Laurent Vivier

On 24/05/2017 10:55, Juan Quintela wrote:
> From: Juan Quintela 
> 
> There is no reason for having the loadvm_handlers at all.  There is
> only one use, and we can use the savevm handlers.
> 
> We will remove the loadvm handlers on a following patch.
> 
> Signed-off-by: Juan Quintela 
> ---
>  migration/savevm.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d971e5e..55ac8c1 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1792,7 +1792,7 @@ struct LoadStateEntry {
>   * Returns: true if the footer was good
>   *  false if there is a problem (and calls error_report to say why)
>   */
> -static bool check_section_footer(QEMUFile *f, LoadStateEntry *le)
> +static bool check_section_footer(QEMUFile *f, SaveStateEntry *se)
>  {
>  uint8_t read_mark;
>  uint32_t read_section_id;
> @@ -1805,15 +1805,15 @@ static bool check_section_footer(QEMUFile *f, 
> LoadStateEntry *le)
>  read_mark = qemu_get_byte(f);
>  
>  if (read_mark != QEMU_VM_SECTION_FOOTER) {
> -error_report("Missing section footer for %s", le->se->idstr);
> +error_report("Missing section footer for %s", se->idstr);
>  return false;
>  }
>  
>  read_section_id = qemu_get_be32(f);
> -if (read_section_id != le->section_id) {
> +if (read_section_id != se->section_id) {
>  error_report("Mismatched section id in footer for %s -"
>   " read 0x%x expected 0x%x",
> - le->se->idstr, read_section_id, le->section_id);
> + se->idstr, read_section_id, se->section_id);
>  return false;
>  }
>  
> @@ -1887,7 +1887,7 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
> MigrationIncomingState *mis)
>   " device '%s'", instance_id, idstr);
>  return ret;
>  }
> -if (!check_section_footer(f, le)) {
> +if (!check_section_footer(f, se)) {
>  return -EINVAL;
>  }
>  
> @@ -1898,29 +1898,29 @@ static int
>  qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>  {
>  uint32_t section_id;
> -LoadStateEntry *le;
> +SaveStateEntry *se;
>  int ret;
>  
>  section_id = qemu_get_be32(f);
>  
>  trace_qemu_loadvm_state_section_partend(section_id);
> -QLIST_FOREACH(le, &mis->loadvm_handlers, entry) {
> -if (le->section_id == section_id) {
> +QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +if (se->section_id == section_id) {
>  break;
>  }
>  }
> -if (le == NULL) {
> +if (se == NULL) {
>  error_report("Unknown savevm section %d", section_id);
>  return -EINVAL;
>  }
>  
> -ret = vmstate_load(f, le->se, le->version_id);
> +ret = vmstate_load(f, se, se->version_id);
Are you sure you can replace le->version_id by se->version?

Because according to code in qemu_loadvm_section_start_full(),
we can have le->version_id <= se->version_id.

Thanks,
Laurent

Re: [Qemu-devel] [PATCH v2 05/13] vvfat: introduce offset_to_bootsector, offset_to_fat and offset_to_root_dir

On 05/23/2017 11:10 PM, Philippe Mathieu-Daudé wrote:
> Hi Hervé,
> 
> On 05/22/2017 06:11 PM, Hervé Poussineau wrote:
>> - offset_to_bootsector is the number of sectors up to FAT bootsector
>> - offset_to_fat is the number of sectors up to first File Allocation
>> Table
>> - offset_to_root_dir is the number of sectors up to root directory sector
> 
> Eventually your commit description can end here, adding the 3 following
> lines below the "---" separator.

No. Stuff after the --- is intended for things that are useful to
reviewers, but not helpful in the long run.  But in this case:

> 
>> Replace first_sectors_number - 1 by offset_to_bootsector.

Knowing the conversion that was made DOES make it easier to read this
patch, even in the long run.  So this information belongs before the ---.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 1/4] qemu-io: Don't die on second open

On 05/24/2017 01:28 AM, Fam Zheng wrote:
> On Thu, 05/18 21:32, Eric Blake wrote:
>> Failure to open a file in qemu-io should normally return 1 on
>> failure to end the command loop, on the presumption that when
>> batching commands all on the command line, failure to open means
>> nothing further can be attempted. But when executing qemu-io
>> interactively, there is a special case: if open is executed a
>> second time, we print a hint that the user should try the
>> interactive 'close' first.  But the hint is useless if we don't
>> actually LET them try 'close'.
>>
>> This has been awkward since at least as far back as commit
>> 43642b3, in 2011 (probably earlier, but git blame has a harder
>> time going past the file renames at that point).
>>

>> @@ -63,7 +63,7 @@ static int openfile(char *name, int flags, bool 
>> writethrough, bool force_share,
>>  if (qemuio_blk) {
>>  error_report("file open already, try 'help close'");
>>  QDECREF(opts);
>> -return 1;
>> +return 0;
>>  }
>>
>>  if (force_share) {
>> -- 
>> 2.9.4
>>
>>
> 
> Hmm, failure is failure, why is "return 0" better than "return 1"?

"return 0" is what tells the caller to keep interpreting commands.
"return 1" is what tells the caller to exit immediately.  Most qemu-io
commands return 0 always.  Only a few need to return 1 ("quit"
absolutely needs to, and we are arguing about whether "open" should do
so).  Pre-patch, "open" always failed with 1, exiting the interpreter
immediately with nonzero status.  Which is wrong if we print a help
message on the second attempt at open (since a second attempt is only
possible in interactive mode), but DOES match what you want to have
happen on a first open attempt from the command line:

$ qemu-io unreadable_file

But you are arguing that in batch mode, when you do:

$ qemu-io
qemu-io> open unreadable_file

that you want to keep things in qemu-io, rather than exiting
immediately.  If so, then we need to make openfile() return 0 always,
and instead teach the caller to check whether it is invoked in command
line mode (cause non-zero exit status if nothing was opened) or in
interactive mode (keep qemu-io running, to let the user try to open
something else).  In command line mode, the caller should then use the
status of whether a file was actually opened (rather than the return
code) when dealing with the call to openfile() for the command line
argument.

That's a bigger change; but I can try it if you think it is worth it.

> 
> If the control flow in the caller has a problem, fix it there? Specifically, I
> don't think failed interactive open need to exit program, at all.
> 
> Fam
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] kvmclock: update system_time_msr address forcibly

2017-05-24 Thread Denis V. Lunev

On 05/24/2017 05:07 PM, Denis Plotnikov wrote:
> Do an update of system_time_msr address every time before reading
> the value of tsc_timestamp from guest's kvmclock page.
>
> It should be done in a forcible manner because there is a situation
> when system_time_msr has been set by kvm but qemu doesn't aware of it.
> This leads to updates of kvmclock_offset without respect of guest's
> kvmclock values.
>
> The situation appears when L2 linux guest runs over L1 linux guest and
> the action inducing system_time_msr update is tpr access reporting.
> Some L1 linux guests turn off processing TPR access and when L0
> gets an L2 exit induced by TPR MSR access it doesn't enter L1 and
> processed it by itself.
> Thus, L1 kvm doesn't know about that TPR access happening and doesn't
> exit to qemu which in turn doesn't set system_time_msr address.
>
> This patch fixes this by making sure it knows the correct address every
> time it is needed.
>
> Signed-off-by: Denis Plotnikov 
> ---
>  hw/i386/kvm/clock.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index e713162..035196a 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -48,11 +48,38 @@ struct pvclock_vcpu_time_info {
>  uint8_tpad[2];
>  } __attribute__((__packed__)); /* 32 bytes */
>  
> +static void update_all_system_time_msr(void)
> +{
> +CPUState *cpu;
> +CPUX86State *env;
> +struct {
> +struct kvm_msrs info;
> +struct kvm_msr_entry entries[1];
> +} msr_data;
> +int ret;
> +
> +msr_data.info.nmsrs = 1;
> +msr_data.entries[0].index = MSR_KVM_SYSTEM_TIME;
> +
> +CPU_FOREACH(cpu) {
> +ret = kvm_vcpu_ioctl(cpu, KVM_GET_MSRS, &msr_data);
> +
> +if (ret < 0) {
> +fprintf(stderr, "KVM_GET_MSRS failed: %s\n", strerror(ret));
> +abort();
> +}
> +
> +assert(ret == 1);
> +env = cpu->env_ptr;
> +env->system_time_msr = msr_data.entries[0].data;
> +}
> +}
> +
>  static uint64_t kvmclock_current_nsec(KVMClockState *s)
>  {
>  CPUState *cpu = first_cpu;
>  CPUX86State *env = cpu->env_ptr;
> -hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
> +hwaddr kvmclock_struct_pa;
>  uint64_t migration_tsc = env->tsc;
>  struct pvclock_vcpu_time_info time;
>  uint64_t delta;
> @@ -60,6 +87,9 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>  uint64_t nsec_hi;
>  uint64_t nsec;
>  
> +update_all_system_time_msr();
> +kvmclock_struct_pa = env->system_time_msr & ~1ULL;
> +
should we do this once/per guest boot?

Den
>  if (!(env->system_time_msr & 1ULL)) {
>  /* KVM clock not active */
>  return 0;

Re: [Qemu-devel] [PATCH v4 1/8] tpm-backend: Remove unneeded member variable from backend class


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

TPMDriverOps inside TPMBackend is not required, as it is supposed to be a class
member. The only possible reason for keeping in TPMBackend was, to get the
backend type in tpm.c where dedicated backend api, tpm_backend_get_type() is
present.

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger 


---
  hw/tpm/tpm_passthrough.c | 4 
  include/sysemu/tpm_backend.h | 1 -
  tpm.c| 2 +-
  3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/tpm/tpm_passthrough.c b/hw/tpm/tpm_passthrough.c
index 9234eb3..a0baf5f 100644
--- a/hw/tpm/tpm_passthrough.c
+++ b/hw/tpm/tpm_passthrough.c
@@ -46,8 +46,6 @@
  #define TPM_PASSTHROUGH(obj) \
  OBJECT_CHECK(TPMPassthruState, (obj), TYPE_TPM_PASSTHROUGH)

-static const TPMDriverOps tpm_passthrough_driver;
-
  /* data structures */
  typedef struct TPMPassthruThreadParams {
  TPMState *tpm_state;
@@ -462,8 +460,6 @@ static TPMBackend *tpm_passthrough_create(QemuOpts *opts, 
const char *id)
  /* let frontend set the fe_model to proper value */
  tb->fe_model = -1;

-tb->ops = &tpm_passthrough_driver;
-
  if (tpm_passthrough_handle_device_opts(opts, tb)) {
  goto err_exit;
  }
diff --git a/include/sysemu/tpm_backend.h b/include/sysemu/tpm_backend.h
index b58f52d..e7f590d 100644
--- a/include/sysemu/tpm_backend.h
+++ b/include/sysemu/tpm_backend.h
@@ -50,7 +50,6 @@ struct TPMBackend {
  enum TpmModel fe_model;
  char *path;
  char *cancel_path;
-const TPMDriverOps *ops;

  QLIST_ENTRY(TPMBackend) list;
  };
diff --git a/tpm.c b/tpm.c
index 9a7c711..0ee021a 100644
--- a/tpm.c
+++ b/tpm.c
@@ -258,7 +258,7 @@ static TPMInfo *qmp_query_tpm_inst(TPMBackend *drv)
  res->model = drv->fe_model;
  res->options = g_new0(TpmTypeOptions, 1);

-switch (drv->ops->type) {
+switch (tpm_backend_get_type(drv)) {
  case TPM_TYPE_PASSTHROUGH:
  res->options->type = TPM_TYPE_OPTIONS_KIND_PASSTHROUGH;
  tpo = g_new0(TPMPassthroughOptions, 1);

Re: [Qemu-devel] [PATCH v4 2/8] tpm-backend: Move thread handling inside TPMBackend


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

Move thread handling inside TPMBackend, this way backend implementations need
not to maintain their own thread life cycle, instead they needs to implement
'handle_request()' class method that always been called from a thread.

This change made tpm_backend_int.h kind of useless, hence removed it.

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-24 Thread Alberto Garcia

On Tue 23 May 2017 04:36:52 PM CEST, Eric Blake wrote:

>> here's a patch series that rewrites the copy-on-write code in the
>> qcow2 driver to reduce the number of I/O operations.
>
> And it competes with Denis and Anton's patches:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg04547.html
>
> What plan of attack should we take on merging the best parts of these
> two series?

I took a look at that series and unless I overlooked something important
it seems that there's actually not that much overlap. Denis and Anton's
series deals with cluster preallocation and mine reduces the number of
I/O operations when there's a COW scenario. Most of my modifications are
in the peform_cow() function, but they barely touch that one.

I think we can review both separately, either of us can rebase our
series on top of the other, I don't expect big changes or conflicts.

Berto

Re: [Qemu-devel] [PATCH] migration: remove register_savevm()

2017-05-24 Thread Dmitry Fleytman


> On 24 May 2017, at 15:10 PM, Laurent Vivier  wrote:
> 
> We can replace the four remaining calls of register_savevm() by
> calls to register_savevm_live(). So we can remove the function and
> as we don't allocate anymore the ops pointer with g_new0()
> we don't have to free it then.
> 
> Signed-off-by: Laurent Vivier 
> ---
> hw/net/vmxnet3.c|  8 ++—


Acked-by: Dmitry Fleytman mailto:dmi...@daynix.com>>

> hw/s390x/s390-skeys.c   |  9 +++--
> hw/s390x/s390-virtio-ccw.c  |  8 ++--
> include/migration/vmstate.h |  8 
> migration/savevm.c  | 16 
> slirp/slirp.c   |  8 ++--
> 6 files changed, 25 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 8b1fab2..4df3110 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -2262,6 +2262,11 @@ static const MemoryRegionOps b1_ops = {
> },
> };
> 
> +static SaveVMHandlers savevm_vmxnet3_msix = {
> +.save_state = vmxnet3_msix_save,
> +.load_state = vmxnet3_msix_load,
> +};
> +
> static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
> {
> uint64_t dsn_payload;
> @@ -2331,8 +2336,7 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, 
> Error **errp)
>   vmxnet3_device_serial_num(s));
> }
> 
> -register_savevm(dev, "vmxnet3-msix", -1, 1,
> -vmxnet3_msix_save, vmxnet3_msix_load, s);
> +register_savevm_live(dev, "vmxnet3-msix", -1, 1, &savevm_vmxnet3_msix, 
> s);
> }
> 
> static void vmxnet3_instance_init(Object *obj)
> diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
> index e2d4e1a..a3dc088 100644
> --- a/hw/s390x/s390-skeys.c
> +++ b/hw/s390x/s390-skeys.c
> @@ -363,6 +363,11 @@ static inline bool 
> s390_skeys_get_migration_enabled(Object *obj, Error **errp)
> return ss->migration_enabled;
> }
> 
> +static SaveVMHandlers savevm_s390_storage_keys = {
> +.save_state = s390_storage_keys_save,
> +.load_state = s390_storage_keys_load,
> +};
> +
> static inline void s390_skeys_set_migration_enabled(Object *obj, bool value,
> Error **errp)
> {
> @@ -376,8 +381,8 @@ static inline void 
> s390_skeys_set_migration_enabled(Object *obj, bool value,
> ss->migration_enabled = value;
> 
> if (ss->migration_enabled) {
> -register_savevm(NULL, TYPE_S390_SKEYS, 0, 1, s390_storage_keys_save,
> -s390_storage_keys_load, ss);
> +register_savevm_live(NULL, TYPE_S390_SKEYS, 0, 1,
> + &savevm_s390_storage_keys, ss);
> } else {
> unregister_savevm(DEVICE(ss), TYPE_S390_SKEYS, ss);
> }
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index fdd4384..697a2d6 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -104,6 +104,11 @@ void s390_memory_init(ram_addr_t mem_size)
> s390_skeys_init();
> }
> 
> +static SaveVMHandlers savevm_gtod = {
> +.save_state = gtod_save,
> +.load_state = gtod_load,
> +};
> +
> static void ccw_init(MachineState *machine)
> {
> int ret;
> @@ -146,8 +151,7 @@ static void ccw_init(MachineState *machine)
> s390_create_virtio_net(BUS(css_bus), "virtio-net-ccw");
> 
> /* Register savevm handler for guest TOD clock */
> -register_savevm(NULL, "todclock", 0, 1,
> -gtod_save, gtod_load, kvm_state);
> +register_savevm_live(NULL, "todclock", 0, 1, &savevm_gtod, kvm_state);
> }
> 
> static void s390_cpu_plug(HotplugHandler *hotplug_dev,
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index f97411d..21aa6ca 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -59,14 +59,6 @@ typedef struct SaveVMHandlers {
> LoadStateHandler *load_state;
> } SaveVMHandlers;
> 
> -int register_savevm(DeviceState *dev,
> -const char *idstr,
> -int instance_id,
> -int version_id,
> -SaveStateHandler *save_state,
> -LoadStateHandler *load_state,
> -void *opaque);
> -
> int register_savevm_live(DeviceState *dev,
>  const char *idstr,
>  int instance_id,
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d971e5e..32badfc 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -617,21 +617,6 @@ int register_savevm_live(DeviceState *dev,
> return 0;
> }
> 
> -int register_savevm(DeviceState *dev,
> -const char *idstr,
> -int instance_id,
> -int version_id,
> -SaveStateHandler *save_state,
> -LoadStateHandler *load_state,
> -void *opaque)
> -{
> -SaveVMHandlers *ops = g_new0(SaveVMHandlers, 1);
> -ops->save_state = save_state;
> -ops->load_state = load_state;
> -

Re: [Qemu-devel] [PATCH v4 3/8] tpm-backend: Initialize and free data members in it's own methods


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

Initialize and free TPMBackend data members in it's own instance_init() and
instance_finalize methods.

Took the opportunity to remove unneeded destroy() method from TpmDriverOps
interface as TPMBackend is a Qemu Object, we can use object_unref() inplace of
tpm_backend_destroy() to free the backend object, hence removed destroy() from
TPMDriverOps interface.

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [PATCH v4 4/8] tpm-backend: Made few interface methods optional


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

This allows backend implementations left optional interface methods.
For mandatory methods assertion checks added.

Took the opportunity to remove unused methods:
   tpm_backend_get_type()
   tpm_backend_get_desc()

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [RFC 0/3] qmp: Return extra information on qom-list-types

On 05/24/2017 07:13 AM, Markus Armbruster wrote:

>>> Might lead to more traffic rather than less.  Depends on what
>>> information the client needs to query.
>>
>> I think queries that return all available QOM types are likely to
>> (should?) be cached by clients.
> 
> Perhaps libvirt developers can help us here.

Libvirt is very likely going to cache things (libvirt already tries to
cache as much information as it needs from a single run of a qemu
binary, and only recomputes the cache when either libvirt or qemu
binaries are detected to have changed to a newer timestamp).  Reading
the entire list, and then libvirt computing transitive closure on that
list if needed, is probably acceptable, compared to qemu having to
compute transitive closure up front for a larger QMP message.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] Fix nmi injection failure when vcpu got blocked

2017-05-24 Thread Radim Krčmář

Please use tags in patches.
We usually begin the subject with "KVM: x86:" when touching
arch/x86/kvm/x86.c.

2017-05-24 13:48+0800, Zhuangyanying:
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -8394,7 +8394,8 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu 
> *vcpu)
>   if (vcpu->arch.pv.pv_unhalted)
>   return true;
>  
> - if (atomic_read(&vcpu->arch.nmi_queued))
> + if (vcpu->arch.nmi_pending ||
> + atomic_read(&vcpu->arch.nmi_queued))
>   return true;

Hm, I think we've been missing '&& kvm_x86_ops->nmi_allowed(vcpu)'.

The undesired resume if we have suppressed NMI is not making it much
worse, but wouldn't "kvm_test_request(KVM_REQ_NMI, vcpu)" also work
here?

>   if (kvm_test_request(KVM_REQ_SMI, vcpu))

Thanks.

Re: [Qemu-devel] [PATCH v4 1/8] tpm-backend: Remove unneeded member variable from backend class

Hi

On Tue, May 16, 2017 at 12:00 PM Amarnath Valluri <
amarnath.vall...@intel.com> wrote:

> TPMDriverOps inside TPMBackend is not required, as it is supposed to be a
> class
> member. The only possible reason for keeping in TPMBackend was, to get the
> backend type in tpm.c where dedicated backend api, tpm_backend_get_type()
> is
> present.
>
> Signed-off-by: Amarnath Valluri 
>

Please add & keep my r-b tags from previous iterations in commit message if
they are no or only minor changes.



> ---
>  hw/tpm/tpm_passthrough.c | 4 
>  include/sysemu/tpm_backend.h | 1 -
>  tpm.c| 2 +-
>  3 files changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/hw/tpm/tpm_passthrough.c b/hw/tpm/tpm_passthrough.c
> index 9234eb3..a0baf5f 100644
> --- a/hw/tpm/tpm_passthrough.c
> +++ b/hw/tpm/tpm_passthrough.c
> @@ -46,8 +46,6 @@
>  #define TPM_PASSTHROUGH(obj) \
>  OBJECT_CHECK(TPMPassthruState, (obj), TYPE_TPM_PASSTHROUGH)
>
> -static const TPMDriverOps tpm_passthrough_driver;
> -
>  /* data structures */
>  typedef struct TPMPassthruThreadParams {
>  TPMState *tpm_state;
> @@ -462,8 +460,6 @@ static TPMBackend *tpm_passthrough_create(QemuOpts
> *opts, const char *id)
>  /* let frontend set the fe_model to proper value */
>  tb->fe_model = -1;
>
> -tb->ops = &tpm_passthrough_driver;
> -
>  if (tpm_passthrough_handle_device_opts(opts, tb)) {
>  goto err_exit;
>  }
> diff --git a/include/sysemu/tpm_backend.h b/include/sysemu/tpm_backend.h
> index b58f52d..e7f590d 100644
> --- a/include/sysemu/tpm_backend.h
> +++ b/include/sysemu/tpm_backend.h
> @@ -50,7 +50,6 @@ struct TPMBackend {
>  enum TpmModel fe_model;
>  char *path;
>  char *cancel_path;
> -const TPMDriverOps *ops;
>
>  QLIST_ENTRY(TPMBackend) list;
>  };
> diff --git a/tpm.c b/tpm.c
> index 9a7c711..0ee021a 100644
> --- a/tpm.c
> +++ b/tpm.c
> @@ -258,7 +258,7 @@ static TPMInfo *qmp_query_tpm_inst(TPMBackend *drv)
>  res->model = drv->fe_model;
>  res->options = g_new0(TpmTypeOptions, 1);
>
> -switch (drv->ops->type) {
> +switch (tpm_backend_get_type(drv)) {
>  case TPM_TYPE_PASSTHROUGH:
>  res->options->type = TPM_TYPE_OPTIONS_KIND_PASSTHROUGH;
>  tpo = g_new0(TPMPassthroughOptions, 1);
> --
> 2.7.4
>
>
> --
Marc-André Lureau

Re: [Qemu-devel] [PATCH v4 5/8] tmp backend: Add new api to read backend TpmInfo


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

TPM configuration options are backend implementation details and shall not be
part of base TPMBackend object, and these shall not be accessed directly outside
of the class, hence added a new interface method, get_tpm_options() to
TPMDriverOps., which shall be implemented by the derived classes to return
configured tpm options.

A new tpm backend api - tpm_backend_query_tpm() which uses _get_tpm_options() to
prepare TpmInfo.

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [PATCH 1/3] migration: Use savevm_handlers instead of loadvm copy

Laurent Vivier  wrote:
> On 24/05/2017 10:55, Juan Quintela wrote:
>> From: Juan Quintela 
>> 
>> There is no reason for having the loadvm_handlers at all.  There is
>> only one use, and we can use the savevm handlers.
>> 
>> We will remove the loadvm handlers on a following patch.
>> 
>> Signed-off-by: Juan Quintela 


>> +QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> +if (se->section_id == section_id) {
>>  break;
>>  }
>>  }
>> -if (le == NULL) {
>> +if (se == NULL) {
>>  error_report("Unknown savevm section %d", section_id);
>>  return -EINVAL;
>>  }
>>  
>> -ret = vmstate_load(f, le->se, le->version_id);
>> +ret = vmstate_load(f, se, se->version_id);
> Are you sure you can replace le->version_id by se->version?
>
> Because according to code in qemu_loadvm_section_start_full(),
> we can have le->version_id <= se->version_id.

You are right.  We don't use basically anymore version_id, but we used
to use it.

I will arrive to a different solution.  Thanks a lot.

Later, Juan.

Re: [Qemu-devel] [PATCH v4 6/8] tpm-backend: Move realloc_buffer() implementation to tpm-tis model


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

buffer reallocation is very unlikely to be backend specific. Hence move inside
the tis.

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [PATCH v4 7/8] tpm-passthrough: move reusable code to utils


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

Signed-off-by: Amarnath Valluri 


Reviewed-by: Stefan Berger

Re: [Qemu-devel] [PATCH 2/4] block: Guarantee that *file is set on bdrv_get_block_status()

On 05/18/2017 09:32 PM, Eric Blake wrote:
> We document that *file is valid if the return is not an error and
> includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract
> when a driver (such as blkdebug) lacks a callback.  Broken in
> commit 67a0fd2 (v2.6), when we added the file parameter.
> 

> +++ b/block/io.c
> @@ -1749,6 +1749,7 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  int64_t n;
>  int64_t ret, ret2;
> 
> +*file = NULL;
>  total_sectors = bdrv_nb_sectors(bs);
>  if (total_sectors < 0) {
>  return total_sectors;
> @@ -1769,6 +1770,7 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
>  if (bs->drv->protocol_name) {
>  ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
> +*file = bs;
>  }
>  return ret;
>  }

Continuing context:

*file = NULL;
ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors,
pnum,
file);


Guess I need a v2, to remove the now-redundant second initialization of
*file.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] Status of q35-lite?

2017-05-24 Thread Stefan Hajnoczi

Hi Chao,
Last year there was an effort to make "built-in" hardware optional so
that VMs can be launched with a minimal set of devices.  You made SATA,
PIT, and SMBus optional.

What is the current status of this work?

Are there more devices that could benefit from an optional switch?

For reference, here is the discussion that kicked of the effort to make
some core devices optional:
"[RFC 0/9] Introduce light weight PC platform pc-lite"
https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg04842.html

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] VhostUserRequest # 20

Hi

- Original Message -
> * Marc-André Lureau (marcandre.lur...@redhat.com) wrote:
> > HI
> > 
> > - Original Message -
> > > Hi,
> > >   libvhost-user.h defines:
> > >VHOST_USER_INPUT_GET_CONFIG = 20,
> > 
> > That slipped by mistake from my vhost-user-input series WIP, please send a
> > fix.
> > (luckily, this is only internal header for now)
> 
> OK, just posted.
> 
> However, I just spotted, or should I say gcc just spotted another issue:
> 
>   CC  tests/vhost-user-bridge.o
> /home/dgilbert/git/qemu-world3/tests/vhost-user-bridge.c:228:23: warning:
> variables 'front' and 'iov' used in loop condition not modified in loop body
> [-Wfor-loop-analysis]
> for (cur = front; front != iov; cur++) {
>   ^~~~
> 1 warning generated.
> 
> 
> static void
> iov_restore_front(struct iovec *front, struct iovec *iov, size_t bytes)
> {
> struct iovec *cur;
> 
> for (cur = front; front != iov; cur++) {
> bytes -= cur->iov_len;
> }
> 
> cur->iov_base -= bytes;
> cur->iov_len += bytes;
> }
> 
> What's that actually intending to do?

It was meant to revert the effect of iov_discard_front()

The code should read:

for (cur = front; cur != iov; cur++) {

In practice, it doesn't reach the loop since the front sg buffer is big enough 
to discard the header..

Does that looks correct to you?


thanks

> 
> Dave
> 
> > thanks
> > 
> > > while
> > >   vhost-user.c defines:
> > >VHOST_USER_NET_SET_MTU = 20,
> > > 
> > > who wins?
> > > 
> > > Dave
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> > > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [Qemu-devel] [PATCH v4 8/8] tpm: Added support for TPM emulator


On 05/16/2017 03:58 AM, Amarnath Valluri wrote:

This change introduces a new TPM backend driver that can communicate with
swtpm(software TPM emulator) using unix domain socket interface.

Swtpm uses two unix sockets, one for plain TPM commands and responses, and one
for out-of-band control messages.

The swtpm and associated tools can be found here:
 https://github.com/stefanberger/swtpm

The swtpm's control channel protocol specification can be found here:
 https://github.com/stefanberger/swtpm/wiki/Control-Channel-Specification

Usage:
 # setup TPM state directory
 mkdir /tmp/mytpm
 chown -R tss:root /tmp/mytpm
 /usr/bin/swtpm_setup --tpm-state /tmp/mytpm --createek

 # Ask qemu to use TPM emulator with given tpm state directory
 qemu-system-x86_64 \
 [...] \
 -tpmdev emulator,id=tpm0,tpmstatedir=/tmp/mytpm,logfile=/tmp/swtpm.log 
\
 -device tpm-tis,tpmdev=tpm0 \
 [...]

Signed-off-by: Amarnath Valluri 


Since you are not supporting migration in this patch, you probably have 
to add a migrate_add_blocker() call somewhere along the lines of this here:


https://github.com/stefanberger/qemu-tpm/commit/27d332dc3b2c6bfd0fcd38e69f5c899651f3a5d8#diff-3a0192eef5d20837af490c32bf396f4eR641

Otherwise it looks good to me.

   Stefan

[Qemu-devel] [PATCH] kvmclock: update system_time_msr address forcibly

2017-05-24 Thread Denis Plotnikov

Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.

It should be done in a forcible manner because there is a situation
when system_time_msr has been set by kvm but qemu doesn't aware of it.
This leads to updates of kvmclock_offset without respect of guest's
kvmclock values.

The situation appears when L2 linux guest runs over L1 linux guest and
the action inducing system_time_msr update is tpr access reporting.
Some L1 linux guests turn off processing TPR access and when L0
gets an L2 exit induced by TPR MSR access it doesn't enter L1 and
processed it by itself.
Thus, L1 kvm doesn't know about that TPR access happening and doesn't
exit to qemu which in turn doesn't set system_time_msr address.

This patch fixes this by making sure it knows the correct address every
time it is needed.

Signed-off-by: Denis Plotnikov 
---
 hw/i386/kvm/clock.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index e713162..035196a 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -48,11 +48,38 @@ struct pvclock_vcpu_time_info {
 uint8_tpad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
+static void update_all_system_time_msr(void)
+{
+CPUState *cpu;
+CPUX86State *env;
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[1];
+} msr_data;
+int ret;
+
+msr_data.info.nmsrs = 1;
+msr_data.entries[0].index = MSR_KVM_SYSTEM_TIME;
+
+CPU_FOREACH(cpu) {
+ret = kvm_vcpu_ioctl(cpu, KVM_GET_MSRS, &msr_data);
+
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_MSRS failed: %s\n", strerror(ret));
+abort();
+}
+
+assert(ret == 1);
+env = cpu->env_ptr;
+env->system_time_msr = msr_data.entries[0].data;
+}
+}
+
 static uint64_t kvmclock_current_nsec(KVMClockState *s)
 {
 CPUState *cpu = first_cpu;
 CPUX86State *env = cpu->env_ptr;
-hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+hwaddr kvmclock_struct_pa;
 uint64_t migration_tsc = env->tsc;
 struct pvclock_vcpu_time_info time;
 uint64_t delta;
@@ -60,6 +87,9 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 uint64_t nsec_hi;
 uint64_t nsec;
 
+update_all_system_time_msr();
+kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+
 if (!(env->system_time_msr & 1ULL)) {
 /* KVM clock not active */
 return 0;
-- 
2.7.4

Re: [Qemu-devel] [PATCH] kvmclock: update system_time_msr address forcibly

2017-05-24 Thread Denis Plotnikov




On 24.05.2017 17:09, Denis V. Lunev wrote:

On 05/24/2017 05:07 PM, Denis Plotnikov wrote:

Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.

It should be done in a forcible manner because there is a situation
when system_time_msr has been set by kvm but qemu doesn't aware of it.
This leads to updates of kvmclock_offset without respect of guest's
kvmclock values.

The situation appears when L2 linux guest runs over L1 linux guest and
the action inducing system_time_msr update is tpr access reporting.
Some L1 linux guests turn off processing TPR access and when L0
gets an L2 exit induced by TPR MSR access it doesn't enter L1 and
processed it by itself.
Thus, L1 kvm doesn't know about that TPR access happening and doesn't
exit to qemu which in turn doesn't set system_time_msr address.

This patch fixes this by making sure it knows the correct address every
time it is needed.

Signed-off-by: Denis Plotnikov 
---
 hw/i386/kvm/clock.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index e713162..035196a 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -48,11 +48,38 @@ struct pvclock_vcpu_time_info {
 uint8_tpad[2];
 } __attribute__((__packed__)); /* 32 bytes */

+static void update_all_system_time_msr(void)
+{
+CPUState *cpu;
+CPUX86State *env;
+struct {
+struct kvm_msrs info;
+struct kvm_msr_entry entries[1];
+} msr_data;
+int ret;
+
+msr_data.info.nmsrs = 1;
+msr_data.entries[0].index = MSR_KVM_SYSTEM_TIME;
+
+CPU_FOREACH(cpu) {
+ret = kvm_vcpu_ioctl(cpu, KVM_GET_MSRS, &msr_data);
+
+if (ret < 0) {
+fprintf(stderr, "KVM_GET_MSRS failed: %s\n", strerror(ret));
+abort();
+}
+
+assert(ret == 1);
+env = cpu->env_ptr;
+env->system_time_msr = msr_data.entries[0].data;
+}
+}
+
 static uint64_t kvmclock_current_nsec(KVMClockState *s)
 {
 CPUState *cpu = first_cpu;
 CPUX86State *env = cpu->env_ptr;
-hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+hwaddr kvmclock_struct_pa;
 uint64_t migration_tsc = env->tsc;
 struct pvclock_vcpu_time_info time;
 uint64_t delta;
@@ -60,6 +87,9 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 uint64_t nsec_hi;
 uint64_t nsec;

+update_all_system_time_msr();
+kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+

should we do this once/per guest boot?
practically - yes. I can barely imagine that the pv_clock page address 
may be changed after being set once.

But we don't know the exact moment when the guest is going to write it.
And not to be dependent of any other event I decided to check it every 
time before using since it won't make any performance issues because 
this invocation happens on vm state changes only.


Den

 if (!(env->system_time_msr & 1ULL)) {
 /* KVM clock not active */
 return 0;




--
Best,
Denis

[Qemu-devel] Monitor file path length limitation

2017-05-24 Thread Simon


Hello,

It seems that the monitor file path length is limited to about 100 
characters (107 to be precise).


Currently I'm using the '-monitor' parameter to create the monitor file 
in a directory which gather all the files related to a given VM (monitor 
file, PID file, disk image, etc.):


-monitor 
'unix:/home/user/vmtools/test/out/foobar.iso/foobar.iso_A/foobar.iso_B/foobar.iso_C/foobar.iso_D/foobar.iso_E/monitor.sock,server,nowait'


With this command Qemu does not produce any warning or error message but 
create the socket file "mon" instead of the expected "monitor.sock".


Things go even worse if I add a further nesting:

-monitor 
'unix:/home/user/vmtools/test/out/foobar.iso/foobar.iso_A/foobar.iso_B/foobar.iso_C/foobar.iso_D/foobar.iso_E/foobar.iso_F/monitor.sock,server,nowait'


Now, still without any warning, Qemu creates a file named "foo" in the 
parent directory ('foobar.iso_E/') instead of the expected directory 
('foobar.iso_E/').


Is there a clean way to work around this limitation? For instance, is 
there an alternative option I can use to generate this monitor file? Or 
should I just assume that absolute paths are not supported in the 
'-monitor' option and change the working directory before launching Qemu 
so I can pass "./monitor.sock" as monitor file path (I really prefer to 
use absolute path whenever possible though).


As a side-note, is there any technical reason for such a limitation?

Regards,
Simon.

Re: [Qemu-devel] Monitor file path length limitation

On 05/24/2017 10:16 AM, Simon wrote:
> Hello,
> 
> It seems that the monitor file path length is limited to about 100
> characters (107 to be precise).

Welcome to the joy of Unix socket (AF_UNIX) files.  The kernel imposes a
hard length limit on sockaddr_un.sun_path[] (see 'man 7 unix') - and it
is indeed 107 characters plus a NUL terminator.

> -monitor
> 'unix:/home/user/vmtools/test/out/foobar.iso/foobar.iso_A/foobar.iso_B/foobar.iso_C/foobar.iso_D/foobar.iso_E/monitor.sock,server,nowait'
> 
> 
> With this command Qemu does not produce any warning or error message but
> create the socket file "mon" instead of the expected "monitor.sock".

Ideally, qemu should be telling you your path is too long to be created
as a unix socket, and failing up front, rather than silently truncating
and perhaps doing the wrong thing.

> Is there a clean way to work around this limitation?

Don't use longer path names than the kernel supports.

> 
> As a side-note, is there any technical reason for such a limitation?

Yes, but you'll have to ask kernel folks for the reason.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Monitor file path length limitation

2017-05-24 Thread Simon


Thank you for your answer Eric, I've learned a new thing today :) !

Re: [Qemu-devel] [PATCH] migration: remove register_savevm()

2017-05-24 Thread Cornelia Huck

On Wed, 24 May 2017 14:10:48 +0200
Laurent Vivier  wrote:

> We can replace the four remaining calls of register_savevm() by
> calls to register_savevm_live(). So we can remove the function and
> as we don't allocate anymore the ops pointer with g_new0()
> we don't have to free it then.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/net/vmxnet3.c|  8 ++--
>  hw/s390x/s390-skeys.c   |  9 +++--
>  hw/s390x/s390-virtio-ccw.c  |  8 ++--
>  include/migration/vmstate.h |  8 
>  migration/savevm.c  | 16 
>  slirp/slirp.c   |  8 ++--
>  6 files changed, 25 insertions(+), 32 deletions(-)

Acked-by: Cornelia Huck

[Qemu-devel] [PATCH] sockets: report error if UNIX socket path is too long

2017-05-24 Thread Daniel P. Berrange

The 'struct sockaddr_un' only allows 108 bytes for the socket
path. Currently QEMU uses snprintf() and so silently truncates
the socket path provided by the user. This is undesirable because
the user will then be unable to connect to the path they asked
for. This change makes QEMU bounds check and report an explicit
error message.

Signed-off-by: Daniel P. Berrange 
---
 util/qemu-sockets.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index d8183f7..e249dbe 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -855,6 +855,12 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
 memset(&un, 0, sizeof(un));
 un.sun_family = AF_UNIX;
 if (saddr->path && strlen(saddr->path)) {
+if (strlen(saddr->path) > sizeof(un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long", saddr->path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(un.sun_path));
+return -1;
+}
 snprintf(un.sun_path, sizeof(un.sun_path), "%s", saddr->path);
 } else {
 const char *tmpdir = getenv("TMPDIR");
-- 
2.9.3

Re: [Qemu-devel] Monitor file path length limitation

2017-05-24 Thread Daniel P. Berrange

On Wed, May 24, 2017 at 10:26:58AM -0500, Eric Blake wrote:
> On 05/24/2017 10:16 AM, Simon wrote:
> > Hello,
> > 
> > It seems that the monitor file path length is limited to about 100
> > characters (107 to be precise).
> 
> Welcome to the joy of Unix socket (AF_UNIX) files.  The kernel imposes a
> hard length limit on sockaddr_un.sun_path[] (see 'man 7 unix') - and it
> is indeed 107 characters plus a NUL terminator.
> 
> > -monitor
> > 'unix:/home/user/vmtools/test/out/foobar.iso/foobar.iso_A/foobar.iso_B/foobar.iso_C/foobar.iso_D/foobar.iso_E/monitor.sock,server,nowait'
> > 
> > 
> > With this command Qemu does not produce any warning or error message but
> > create the socket file "mon" instead of the expected "monitor.sock".
> 
> Ideally, qemu should be telling you your path is too long to be created
> as a unix socket, and failing up front, rather than silently truncating
> and perhaps doing the wrong thing.

I just copied you on a patch todo that :-)



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [svt-core] [PATCH] kvmclock: update system_time_msr address forcibly

2017-05-24 Thread Roman Kagan

On Wed, May 24, 2017 at 05:07:24PM +0300, Denis Plotnikov wrote:
> Do an update of system_time_msr address every time before reading
> the value of tsc_timestamp from guest's kvmclock page.
> 
> It should be done in a forcible manner because there is a situation
> when system_time_msr has been set by kvm but qemu doesn't aware of it.
> This leads to updates of kvmclock_offset without respect of guest's
> kvmclock values.
> 
> The situation appears when L2 linux guest runs over L1 linux guest and
> the action inducing system_time_msr update is tpr access reporting.
> Some L1 linux guests turn off processing TPR access and when L0
> gets an L2 exit induced by TPR MSR access it doesn't enter L1 and
> processed it by itself.
> Thus, L1 kvm doesn't know about that TPR access happening and doesn't
> exit to qemu which in turn doesn't set system_time_msr address.
> 
> This patch fixes this by making sure it knows the correct address every
> time it is needed.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>  hw/i386/kvm/clock.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index e713162..035196a 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -48,11 +48,38 @@ struct pvclock_vcpu_time_info {
>  uint8_tpad[2];
>  } __attribute__((__packed__)); /* 32 bytes */
>  
> +static void update_all_system_time_msr(void)
> +{
> +CPUState *cpu;
> +CPUX86State *env;
> +struct {
> +struct kvm_msrs info;
> +struct kvm_msr_entry entries[1];
> +} msr_data;
> +int ret;
> +
> +msr_data.info.nmsrs = 1;
> +msr_data.entries[0].index = MSR_KVM_SYSTEM_TIME;
> +
> +CPU_FOREACH(cpu) {
> +ret = kvm_vcpu_ioctl(cpu, KVM_GET_MSRS, &msr_data);
> +
> +if (ret < 0) {
> +fprintf(stderr, "KVM_GET_MSRS failed: %s\n", strerror(ret));
> +abort();
> +}
> +
> +assert(ret == 1);
> +env = cpu->env_ptr;
> +env->system_time_msr = msr_data.entries[0].data;
> +}
> +}
> +
>  static uint64_t kvmclock_current_nsec(KVMClockState *s)
>  {
>  CPUState *cpu = first_cpu;
>  CPUX86State *env = cpu->env_ptr;
> -hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
> +hwaddr kvmclock_struct_pa;
>  uint64_t migration_tsc = env->tsc;
>  struct pvclock_vcpu_time_info time;
>  uint64_t delta;
> @@ -60,6 +87,9 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
>  uint64_t nsec_hi;
>  uint64_t nsec;
>  
> +update_all_system_time_msr();

I'd rather just cpu_synchronize_state(cpu) here.  

> +kvmclock_struct_pa = env->system_time_msr & ~1ULL;
> +
>  if (!(env->system_time_msr & 1ULL)) {
>  /* KVM clock not active */
>  return 0;

Roman.

Re: [Qemu-devel] [PATCH] sockets: report error if UNIX socket path is too long