date:20160705

On Tue,  5 Jul 2016 10:12:49 +0530
Bharata B Rao  wrote:

> Introduce CPUState.prefer_arch_id_over_cpu_index and
> MachineClass.prefer_arch_id_over_cpu_index that allow target
> machines to optionally switch to using arch_id instead of cpu_index
> as instance_id in vmstate_register(). This will help allow successful
> migration in cases where holes are introduced in cpu_index range
> after CPU hot removals.
> 
> Whether to use arch_id or cpu_index is based on machine type version
> and hence added MachineClass.prefer_arch_id_over_cpu_index. However
> the enforcement is via and during CPU creation and hence added
> CPUState.prefer_arch_id_over_cpu_index. So it becomes a two step
> process for the target to enable the use of arch_id:
> 
> 1. Set MachineClass.prefer_arch_id_over_cpu_index.
you could reuse compat mechanism like x86 instead of adding doing #1
see PC_COMPAT_2_6 and how it's used

> 2. Ensure CPUState.prefer_arch_id_over_cpu_index is set for all CPUs
>based on 1. above.
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Bharata B Rao 
> ---
>  exec.c  | 10 --
>  include/hw/boards.h |  1 +
>  include/qom/cpu.h   |  4 
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8ce8e90..7cc1d06 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -616,15 +616,21 @@ static void cpu_release_index(CPUState *cpu)
>  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
>  }
>  
> +/*
> + * TODO: cpu_index and instance_id are of type int
> while .get_arch_id()is
> + * of type int64_t. What is the consequence of changing instance_id
> to int64_t ?
> + */
>  static void cpu_vmstate_register(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
> +int instance_id = cpu->prefer_arch_id_over_cpu_index ?
> +  cc->get_arch_id(cpu) : cpu->cpu_index;
>  
>  if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> -vmstate_register(NULL, cpu->cpu_index, &vmstate_cpu_common,
> cpu);
> +vmstate_register(NULL, instance_id, &vmstate_cpu_common,
> cpu); }
>  if (cc->vmsd != NULL) {
> -vmstate_register(NULL, cpu->cpu_index, cc->vmsd, cpu);
> +vmstate_register(NULL, instance_id, cc->vmsd, cpu);
>  }
>  }
>  
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 3ed6155..decabba 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -123,6 +123,7 @@ struct MachineClass {
>  ram_addr_t default_ram_size;
>  bool option_rom_has_mr;
>  bool rom_file_has_mr;
> +bool prefer_arch_id_over_cpu_index;
>  
>  HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> DeviceState *dev);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 32f3af3..1f1706e 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -273,6 +273,9 @@ struct qemu_work_item {
>   * @kvm_fd: vCPU file descriptor for KVM.
>   * @work_mutex: Lock to prevent multiple access to queued_work_*.
>   * @queued_work_first: First asynchronous work pending.
> + * @prefer_arch_id_over_cpu_index: Set to enforce the use of
> + * CPUClass.get_arch_id() over cpu_index during vmstate
> registration
> + * and any other uses by target machine where arch_id is
> preferred. *
>   * State of one CPU core or thread.
>   */
> @@ -360,6 +363,7 @@ struct CPUState {
> (absolute value) offset as small as possible.  This reduces
> code size, especially for hosts without large memory offsets.  */
>  uint32_t tcg_exit_req;
> +bool prefer_arch_id_over_cpu_index;
this property applies to all cpu, so shouldn't it be class property?

>  };
>  
>  QTAILQ_HEAD(CPUTailQ, CPUState);

Re: [Qemu-devel] [RFC PATCH v0 4/5] xics: Use arch_id instead of cpu_index in XICS code

On Tue, 5 Jul 2016 14:59:20 +1000
David Gibson  wrote:

> On Tue, Jul 05, 2016 at 10:12:51AM +0530, Bharata B Rao wrote:
> > xics maintains an array of ICPState structures which is indexed
> > by cpu_index. Change this to index the ICPState array by arch_id
> > for pseries-2.7 onwards. This allows migration of guest to suceed
> > when there are holes in cpu_index range due to CPU hot removal.
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/intc/xics.c   | 14 ++
> >  hw/intc/xics_kvm.c   | 12 ++--
> >  hw/intc/xics_spapr.c | 33 ++---
> >  3 files changed, 42 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> > index cd48f42..6b79a8e 100644
> > --- a/hw/intc/xics.c
> > +++ b/hw/intc/xics.c
> > @@ -50,9 +50,12 @@ int xics_get_cpu_index_by_dt_id(int cpu_dt_id)
> >  void xics_cpu_destroy(XICSState *xics, PowerPCCPU *cpu)
> >  {
> >  CPUState *cs = CPU(cpu);
> > -ICPState *ss = &xics->ss[cs->cpu_index];
> > +CPUClass *cc = CPU_GET_CLASS(cs);
> > +int server = cs->prefer_arch_id_over_cpu_index ?
> > cc->get_arch_id(cs) :
> > + cs->cpu_index;
> 
> Requiring this conditional in all the callsites is pretty ugly.  I'd
> prefer to see a get_arch_id() helper that implements this fallback
> internally.
A cpu can look at prefer_arch_id_over_cpu_index and switch
behavior of get_arch_id() internally.

> 
> > +ICPState *ss = &xics->ss[server];
> >  
> > -assert(cs->cpu_index < xics->nr_servers);
> > +assert(server < xics->nr_servers);
> >  assert(cs == ss->cs);
> >  
> >  ss->output = NULL;
> > @@ -63,10 +66,13 @@ void xics_cpu_setup(XICSState *xics, PowerPCCPU
> > *cpu) {
> >  CPUState *cs = CPU(cpu);
> >  CPUPPCState *env = &cpu->env;
> > -ICPState *ss = &xics->ss[cs->cpu_index];
> > +CPUClass *cc = CPU_GET_CLASS(cs);
> > +int server = cs->prefer_arch_id_over_cpu_index ?
> > cc->get_arch_id(cs) :
> > + cs->cpu_index;
> > +ICPState *ss = &xics->ss[server];
> >  XICSStateClass *info = XICS_COMMON_GET_CLASS(xics);
> >  
> > -assert(cs->cpu_index < xics->nr_servers);
> > +assert(server < xics->nr_servers);
> >  
> >  ss->cs = cs;
> >  
> > diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> > index edbd62f..2610458 100644
> > --- a/hw/intc/xics_kvm.c
> > +++ b/hw/intc/xics_kvm.c
> > @@ -326,14 +326,14 @@ static const TypeInfo ics_kvm_info = {
> >   */
> >  static void xics_kvm_cpu_setup(XICSState *xics, PowerPCCPU *cpu)
> >  {
> > -CPUState *cs;
> > -ICPState *ss;
> > +CPUState *cs = CPU(cpu);
> >  KVMXICSState *xicskvm = XICS_SPAPR_KVM(xics);
> > +CPUClass *cc = CPU_GET_CLASS(cs);
> > +int server = cs->prefer_arch_id_over_cpu_index ?
> > cc->get_arch_id(cs) :
> > + cs->cpu_index;
> > +ICPState *ss = ss = &xics->ss[server];
> >  
> > -cs = CPU(cpu);
> > -ss = &xics->ss[cs->cpu_index];
> > -
> > -assert(cs->cpu_index < xics->nr_servers);
> > +assert(server < xics->nr_servers);
> >  if (xicskvm->kernel_xics_fd == -1) {
> >  abort();
> >  }
> > diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> > index 618826d..fbc8205 100644
> > --- a/hw/intc/xics_spapr.c
> > +++ b/hw/intc/xics_spapr.c
> > @@ -43,16 +43,21 @@ static target_ulong h_cppr(PowerPCCPU *cpu,
> > sPAPRMachineState *spapr, target_ulong opcode, target_ulong *args)
> >  {
> >  CPUState *cs = CPU(cpu);
> > +CPUClass *cc = CPU_GET_CLASS(cs);
> > +int server = cs->prefer_arch_id_over_cpu_index ?
> > cc->get_arch_id(cs) :
> > + cs->cpu_index;
> >  target_ulong cppr = args[0];
> >  
> > -icp_set_cppr(spapr->xics, cs->cpu_index, cppr);
> > +icp_set_cppr(spapr->xics, server, cppr);
> >  return H_SUCCESS;
> >  }
> >  
> >  static target_ulong h_ipi(PowerPCCPU *cpu, sPAPRMachineState
> > *spapr, target_ulong opcode, target_ulong *args)
> >  {
> > -target_ulong server = xics_get_cpu_index_by_dt_id(args[0]);
> > +CPUState *cs = CPU(cpu);
> > +target_ulong server = cs->prefer_arch_id_over_cpu_index ?
> > args[0] :
> > +  xics_get_cpu_index_by_dt_id(args[0]);
> >  target_ulong mfrr = args[1];
> >  
> >  if (server >= spapr->xics->nr_servers) {
> > @@ -67,7 +72,10 @@ static target_ulong h_xirr(PowerPCCPU *cpu,
> > sPAPRMachineState *spapr, target_ulong opcode, target_ulong *args)
> >  {
> >  CPUState *cs = CPU(cpu);
> > -uint32_t xirr = icp_accept(spapr->xics->ss + cs->cpu_index);
> > +CPUClass *cc = CPU_GET_CLASS(cs);
> > +int server = cs->prefer_arch_id_over_cpu_index ?
> > cc->get_arch_id(cs) :
> > + cs->cpu_index;
> > +uint32_t xirr = icp_accept(spapr->xics->ss + server);
> >  
> >  args[0] = xirr;
> >  return H_SUCCESS;
> > @@ -77,7 +85,10 @@ static target_ulong h_xirr_x(PowerPCCPU *cpu,
> > sPAPRMachineState *spapr, target_ulong opcode, target_ulong *args)
> >  {
>

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Sat, 07/02 14:36, Max Reitz wrote:
> On 28.06.2016 03:47, Fam Zheng wrote:
> > This was the only exceptional module init function that does something
> > else than a simple list of bdrv_register() calls, in all the block
> > drivers.
> 
> This sounds like this patch specifically wants to drop the check from
> bdrv_quorum_init().
> 
> I think keeping the check would be better; while we can probably assume
> that SHA256 is supported if CONFIG_GNUTLS_HASH is defined, it would
> still technically be better to actually check for that support.

This one is effectively a static check so I think it's practically okay to skip
it, and yes, the idea is this makes it easier to do dynamic driver loading.

Thanks, 
Fam

> 
> However, since this is probably only a theoretical issue, if there's a
> good reason (like preparation for dynamic block driver modules) for
> having to drop the check from bdrv_quorum_init(), I think it's alright
> to do so.
> 
> Max
> 
> > The qcrypto_hash_supports is actually a static check, determined at
> > compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> > consistency.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block/Makefile.objs | 2 +-
> >  block/quorum.c  | 4 
> >  2 files changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 44a5416..c87d605 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> > qcow2-snapshot.o qcow2-c
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> >  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> > -block-obj-y += quorum.o
> > +block-obj-$(CONFIG_GNUTLS_HASH) += quorum.o
> >  block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
> >  block-obj-y += block-backend.o snapshot.o qapi.o
> >  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > diff --git a/block/quorum.c b/block/quorum.c
> > index 331b726..18fbed8 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -1113,10 +1113,6 @@ static BlockDriver bdrv_quorum = {
> >  
> >  static void bdrv_quorum_init(void)
> >  {
> > -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> > -/* SHA256 hash support is required for quorum device */
> > -return;
> > -}
> >  bdrv_register(&bdrv_quorum);
> >  }
> >  
> > 
> 
>

Re: [Qemu-devel] [RFC PATCH v0 0/5] sPAPR: Fix migration when CPUs are removed in random order

2016-07-05 Thread Greg Kurz

On Tue,  5 Jul 2016 10:12:47 +0530
Bharata B Rao  wrote:

> device_add/del based CPU hotplug and unplug support is upstream for
> sPAPR PowerPC and is under development for x86. Both of these will
> support CPU device removal in random order (and not necessarily in LIFO
> order). Random order removal will result in holes in cpu_index range
> which causes migration to fail. This needs fixes in both generic code
> as well as arch specific code.
> 
> - We need to use arch_id as the instance_id when registering CPU devices
>   using vmstate_register. To support forward migration, as per Igor's
>   suggestion, this needs to be done conditionally based on machine type
>   version.
> - From pseries-2.7 onwards, we start using arch_id for migration as well
>   as in XICS code.
> 
> In fact, Igor even suggested to move the vmstate registration calls
> into cpu_common_realizefn and yet-to-be-introduced cpu_common_unrealizefn.
> But I haven't done that in this patchset as I am hitting an unrelated
> issue with that movement.
> 
> This patchset depends on Greg Kurz's patchset where among other things,
> he is deriving cpu_dt_it (arch_id) based on core-id.
> (https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg00210.html) 
>   

I'm respinning my patchset according to the latest comments I got. This
includes rebasing on top of Eduardo's tree to have the global parse_features
bits, as mentioned here:

https://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg00622.html

I'll rework this patch to apply on the current code:

https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg07564.html

> Tested changes to XICS code with both kernel_irqchip=on/off.
> This applies on ppc-for-2.7 branch of David's tree.
> 
> Bharata B Rao (5):
>   cpu: Factor out cpu vmstate_[un]register into separate routines
>   cpu: Optionally use arch_id instead of cpu_index in cpu
> vmstate_register()
>   spapr: Implement CPUClass.get_arch_id() for PowerPC CPUs
>   xics: Use arch_id instead of cpu_index in XICS code
>   spapr: Prefer arch_id over cpu_index
> 
>  exec.c  | 53 
> +
>  hw/intc/xics.c  | 14 
>  hw/intc/xics_kvm.c  | 12 +-
>  hw/intc/xics_spapr.c| 33 ++--
>  hw/ppc/spapr.c  |  2 ++
>  hw/ppc/spapr_cpu_core.c |  5 +
>  include/hw/boards.h |  1 +
>  include/qom/cpu.h   |  4 
>  target-ppc/translate_init.c |  8 +++
>  9 files changed, 96 insertions(+), 36 deletions(-)
>

Re: [Qemu-devel] [PATCH] json-streamer: fix double-free on exiting during a parse

2016-07-05 Thread Changlong Xie


On 07/05/2016 02:56 PM, Fam Zheng wrote:

On Mon, 07/04 14:40, Paolo Bonzini wrote:

Now that json-streamer tries not to leak tokens on incomplete parse,
the tokens can be freed twice if QEMU destroys the json-streamer
object during the parser->emit call.  To fix this, create the new
empty GQueue earlier, so that it is already in place when the old
one is passed to parser->emit.

Reported-by: Changlong Xie 
Signed-off-by: Paolo Bonzini 


Two meta questions:

Is there a reproducer and/or test case coverage?


tests/qemu-iotests/071



Does qemu-stable need this?



http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg00465.html

Thanks
-Xie


Fam


---
  qobject/json-streamer.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index 7164390..c51c202 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -39,6 +39,7 @@ static void json_message_process_token(JSONLexer *lexer, 
GString *input,
  {
  JSONMessageParser *parser = container_of(lexer, JSONMessageParser, lexer);
  JSONToken *token;
+GQueue *tokens;

  switch (type) {
  case JSON_LCURLY:
@@ -96,9 +97,12 @@ out_emit:
  /* send current list of tokens to parser and reset tokenizer */
  parser->brace_count = 0;
  parser->bracket_count = 0;
-/* parser->emit takes ownership of parser->tokens.  */
-parser->emit(parser, parser->tokens);
+/* parser->emit takes ownership of parser->tokens.  Remove our own
+ * reference to parser->tokens before handing it out to parser->emit.
+ */
+tokens = parser->tokens;
  parser->tokens = g_queue_new();
+parser->emit(parser, tokens);
  parser->token_size = 0;
  }

--
1.8.3.1





.

Re: [Qemu-devel] [RFC PATCH v0 2/5] cpu: Optionally use arch_id instead of cpu_index in cpu vmstate_register()

On Tue,  5 Jul 2016 10:12:49 +0530
Bharata B Rao  wrote:

> Introduce CPUState.prefer_arch_id_over_cpu_index and
> MachineClass.prefer_arch_id_over_cpu_index that allow target
> machines to optionally switch to using arch_id instead of cpu_index
> as instance_id in vmstate_register(). This will help allow successful
> migration in cases where holes are introduced in cpu_index range
> after CPU hot removals.
> 
> Whether to use arch_id or cpu_index is based on machine type version
> and hence added MachineClass.prefer_arch_id_over_cpu_index. However
> the enforcement is via and during CPU creation and hence added
> CPUState.prefer_arch_id_over_cpu_index. So it becomes a two step
> process for the target to enable the use of arch_id:
> 
> 1. Set MachineClass.prefer_arch_id_over_cpu_index.
> 2. Ensure CPUState.prefer_arch_id_over_cpu_index is set for all CPUs
>based on 1. above.
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Bharata B Rao 
> ---
>  exec.c  | 10 --
>  include/hw/boards.h |  1 +
>  include/qom/cpu.h   |  4 
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8ce8e90..7cc1d06 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -616,15 +616,21 @@ static void cpu_release_index(CPUState *cpu)
>  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
>  }
>  
> +/*
> + * TODO: cpu_index and instance_id are of type int
> while .get_arch_id()is
> + * of type int64_t. What is the consequence of changing instance_id
> to int64_t ?
> + */
ARM potentially could have get_arch_id() == MPIDR (return 64bit int)
that would get truncated in this case.

I wonder if we could add "int CPUState::migration_id" and let machine
code set it to what it uses for stable cpu id and then cpu_realize
could call vmstate_register with it.

Re: [Qemu-devel] [PATCH] json-streamer: fix double-free on exiting during a parse

On Tue, 07/05 15:16, Changlong Xie wrote:
> On 07/05/2016 02:56 PM, Fam Zheng wrote:
> > On Mon, 07/04 14:40, Paolo Bonzini wrote:
> > > Now that json-streamer tries not to leak tokens on incomplete parse,
> > > the tokens can be freed twice if QEMU destroys the json-streamer
> > > object during the parser->emit call.  To fix this, create the new
> > > empty GQueue earlier, so that it is already in place when the old
> > > one is passed to parser->emit.
> > > 
> > > Reported-by: Changlong Xie 
> > > Signed-off-by: Paolo Bonzini 
> > 
> > Two meta questions:
> > 
> > Is there a reproducer and/or test case coverage?
> 
> tests/qemu-iotests/071
> 
> > 
> > Does qemu-stable need this?
> > 
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg00465.html

Get it! Thanks!

Fam

Re: [Qemu-devel] [RFC PATCH v0 2/5] cpu: Optionally use arch_id instead of cpu_index in cpu vmstate_register()

2016-07-05 Thread Greg Kurz

On Tue,  5 Jul 2016 10:12:49 +0530
Bharata B Rao  wrote:

> Introduce CPUState.prefer_arch_id_over_cpu_index and
> MachineClass.prefer_arch_id_over_cpu_index that allow target
> machines to optionally switch to using arch_id instead of cpu_index
> as instance_id in vmstate_register(). This will help allow successful
> migration in cases where holes are introduced in cpu_index range
> after CPU hot removals.
> 
> Whether to use arch_id or cpu_index is based on machine type version
> and hence added MachineClass.prefer_arch_id_over_cpu_index. However the
> enforcement is via and during CPU creation and hence added
> CPUState.prefer_arch_id_over_cpu_index. So it becomes a two step
> process for the target to enable the use of arch_id:
> 
> 1. Set MachineClass.prefer_arch_id_over_cpu_index.
> 2. Ensure CPUState.prefer_arch_id_over_cpu_index is set for all CPUs
>based on 1. above.
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Bharata B Rao 
> ---
>  exec.c  | 10 --
>  include/hw/boards.h |  1 +
>  include/qom/cpu.h   |  4 
>  3 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 8ce8e90..7cc1d06 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -616,15 +616,21 @@ static void cpu_release_index(CPUState *cpu)
>  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
>  }
>  
> +/*
> + * TODO: cpu_index and instance_id are of type int while .get_arch_id()is
> + * of type int64_t. What is the consequence of changing instance_id to 
> int64_t ?
> + */

The current migration code assumes instance_id is a 32-bit word:

migration/savevm.c:qemu_put_be32(f, se->instance_id);

>  static void cpu_vmstate_register(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
> +int instance_id = cpu->prefer_arch_id_over_cpu_index ?
> +  cc->get_arch_id(cpu) : cpu->cpu_index;
>  
>  if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> -vmstate_register(NULL, cpu->cpu_index, &vmstate_cpu_common, cpu);
> +vmstate_register(NULL, instance_id, &vmstate_cpu_common, cpu);
>  }
>  if (cc->vmsd != NULL) {
> -vmstate_register(NULL, cpu->cpu_index, cc->vmsd, cpu);
> +vmstate_register(NULL, instance_id, cc->vmsd, cpu);
>  }
>  }
>  
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 3ed6155..decabba 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -123,6 +123,7 @@ struct MachineClass {
>  ram_addr_t default_ram_size;
>  bool option_rom_has_mr;
>  bool rom_file_has_mr;
> +bool prefer_arch_id_over_cpu_index;
>  
>  HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
> DeviceState *dev);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 32f3af3..1f1706e 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -273,6 +273,9 @@ struct qemu_work_item {
>   * @kvm_fd: vCPU file descriptor for KVM.
>   * @work_mutex: Lock to prevent multiple access to queued_work_*.
>   * @queued_work_first: First asynchronous work pending.
> + * @prefer_arch_id_over_cpu_index: Set to enforce the use of
> + * CPUClass.get_arch_id() over cpu_index during vmstate registration
> + * and any other uses by target machine where arch_id is preferred.
>   *
>   * State of one CPU core or thread.
>   */
> @@ -360,6 +363,7 @@ struct CPUState {
> (absolute value) offset as small as possible.  This reduces code
> size, especially for hosts without large memory offsets.  */
>  uint32_t tcg_exit_req;
> +bool prefer_arch_id_over_cpu_index;
>  };
>  
>  QTAILQ_HEAD(CPUTailQ, CPUState);

Re: [Qemu-devel] [RFC PATCH v0 1/5] cpu: Factor out cpu vmstate_[un]register into separate routines

On Tue, 5 Jul 2016 12:05:35 +0530
Bharata B Rao  wrote:

> On Tue, Jul 05, 2016 at 07:49:38AM +0200, Igor Mammedov wrote:
> > On Tue, 5 Jul 2016 10:46:07 +0530
> > Bharata B Rao  wrote:
> > 
> > > On Tue, Jul 05, 2016 at 02:56:13PM +1000, David Gibson wrote:
> > > > On Tue, Jul 05, 2016 at 10:12:48AM +0530, Bharata B Rao wrote:
> > > > > Consolidates cpu vmstate_[un]register calls into separate
> > > > > routines. No functionality change except that
> > > > > vmstate_unregister calls are now done under !CONFIG_USER_ONLY
> > > > > to match with vmstate_register calls.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao 
> > > > 
> > > > Reviewed-by: David Gibson 
> > > > 
> > > > > ---
> > > > >  exec.c | 47 ---
> > > > >  1 file changed, 28 insertions(+), 19 deletions(-)
> > > > > 
> > > > > diff --git a/exec.c b/exec.c
> > > > > index 0122ef7..8ce8e90 100644
> > > > > --- a/exec.c
> > > > > +++ b/exec.c
> > > > > @@ -594,9 +594,7 @@ AddressSpace
> > > > > *cpu_get_address_space(CPUState *cpu, int asidx) /* Return
> > > > > the AddressSpace corresponding to the specified index */
> > > > > return cpu->cpu_ases[asidx].as; }
> > > > > -#endif
> > > > >  
> > > > > -#ifndef CONFIG_USER_ONLY
> > > > >  static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> > > > >  
> > > > >  static int cpu_get_free_index(Error **errp)
> > > > > @@ -617,6 +615,31 @@ static void cpu_release_index(CPUState
> > > > > *cpu) {
> > > > >  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> > > > >  }
> > > > > +
> > > > > +static void cpu_vmstate_register(CPUState *cpu)
> > > > > +{
> > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > +
> > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > +vmstate_register(NULL, cpu->cpu_index,
> > > > > &vmstate_cpu_common, cpu);
> > > > > +}
> > > > > +if (cc->vmsd != NULL) {
> > > > > +vmstate_register(NULL, cpu->cpu_index, cc->vmsd,
> > > > > cpu);
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +static void cpu_vmstate_unregister(CPUState *cpu)
> > > > > +{
> > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > +
> > > > > +if (cc->vmsd != NULL) {
> > > > > +vmstate_unregister(NULL, cc->vmsd, cpu);
> > > > > +}
> > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > +vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
> > > > > +}
> > > > > +}
> > > > > +
> > > > 
> > > > Given you're factoring this out, would it make sense to defined
> > > > no-op versions for CONFIG_USER_ONLY, to reduce the amount of
> > > > ifdefs at the call site?
> > > 
> > > I did that in a subsequent patch that moved the calls to these
> > > routines into cpu_common_[un]realize()cpu_common_[un]realize(),
> > > but ended up in some unrelated issue and hence didn't include
> > > that patch yet.
> > I'd prefer to see it moved to cpu_common_[un]realize() directly
> > without tis intermediate transition as compat logic could be
> > implemented much cleaner if it's there.
> 
> If I implement cpu_common_unrealize() and the associated logic similar
> to the existing cpu_common_realize(), it would involve changes to all
> archs.
maybe I'm missing something but why all arch will be involved?
The only ones that could be affected are the ones that have their own
cpu_xxx_unrealize() implemented without calling CPUClass::unrealize.

 
>I have done the change only to ppc in the below experimental
> patch. Is this kind of change preferred or is it possible to do this
> in non-invasive way ?
> 
> diff --git a/exec.c b/exec.c
> index 2e8ad14..5274cc8 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -617,7 +617,7 @@ static void cpu_release_index(CPUState *cpu)
>  }
>  
>  /* TODO: cpu_index is int while .get_arch_id()is int64_t */
> -static void cpu_vmstate_register(CPUState *cpu)
> +void cpu_vmstate_register(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  int instance_id = cpu->prefer_arch_id_over_cpu_index ?
> @@ -631,7 +631,7 @@ static void cpu_vmstate_register(CPUState *cpu)
>  }
>  }
>  
> -static void cpu_vmstate_unregister(CPUState *cpu)
> +void cpu_vmstate_unregister(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  
> @@ -660,6 +660,14 @@ static void cpu_release_index(CPUState *cpu)
>  {
>  return;
>  }
> +
> +void cpu_vmstate_register(CPUState *cpu)
> +{
> +}
> +
> +void cpu_vmstate_unregister(CPUState *cpu)
> +{
> +}
>  #endif
>  
>  void cpu_exec_exit(CPUState *cpu)
> @@ -680,8 +688,6 @@ void cpu_exec_exit(CPUState *cpu)
>  cpu->cpu_index = -1;
>  #if defined(CONFIG_USER_ONLY)
>  cpu_list_unlock();
> -#else
> -cpu_vmstate_unregister(cpu);
>  #endif
>  }
>  
> @@ -724,8 +730,6 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
>  QTAILQ_INSERT_TAIL(&cpus, cpu, node);
>  #if defined(CONFIG_USER_ONLY)
>  cpu_list_unlock();
> -#else
> -cpu_vmstate_register(cpu);
>  #endif
>  }
>  
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h

[Qemu-devel] [Bug 1474263] Re: "Image format was not specified" warning should be suppressed for the vvfat (and probably nbd) driver

2016-07-05 Thread felix

> I could actually see the use of non-raw over NBD.  We support nested
> protocols (where you can use qcow2->qcow2->file), that is, where a file
> contains a qcow2 file whose contents are themselves a qcow2 image.
> (Perhaps useful in nested guests, where the outer qcow2 layer serves a
> disk to an L0 guest, which in turn uses the inner layer to present a
> disk to an L1 guest).  In such a case, opening just one layer of qcow2
> for service over NBD will expose the inner qcow2 image, and connecting
> qemu as an NBD client with format=raw will directly manipulate the qcow2
> data seen by the L0 guest, while connecting as an NBD client with
> format=qcow2 will see the raw data seen by the L1 guest.

Seems like an academic exercise, really. But if this use case is
practical, I believe three levels of wrapping is just as useful, and the
only way to work with that one is to run two or three instances of qemu-
nbd. With more layers, the set-up quickly becomes tangled and
unmanageable.

And I still doubt anyone would actually want to create a set-up like
this. It seems incredibly wasteful (but then, so is virtualisation in
general, so maybe that isn't an issue) and doesn't seem to accomplish
anything that couldn't be done with just one layer of wrapping.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1474263

Title:
  "Image format was not specified" warning should be suppressed for the
  vvfat (and probably nbd) driver

Status in QEMU:
  New

Bug description:
  Running

  qemu -drive file.driver=vvfat,file.dir=.

  displays

  WARNING: Image format was not specified for 'json:{"dir": ".", "driver": 
"vvfat"}' and probing guessed raw.
   Automatically detecting the format is dangerous for raw images, 
write operations on block 0 will be restricted.
   Specify the 'raw' format explicitly to remove the restrictions.

  However, since "images" provided by the vvfat driver are always raw
  (and the first sector isn't writeable in any case), this warning is
  superfluous and should not be displayed.

  A similar warning is displayed for NBD devices; I suspect it should be
  also disabled for similar reasons, but I'm not sure if serving non-raw
  images is actually a violation of the protocol. qemu-nbd translates
  them to raw images, for what it's worth, even if it may be suppressed
  with -f raw.

  Noticed on 2.3.0; the code that causes this behaviour is still
  apparently present in today's git master
  (f3a1b5068cea303a55e2a21a97e66d057eaae638). Speaking of versions: you
  may want to update the copyright notice that qemu -version displays.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1474263/+subscriptions

Re: [Qemu-devel] [RFC PATCH v0 1/5] cpu: Factor out cpu vmstate_[un]register into separate routines

2016-07-05 Thread Greg Kurz

On Tue, 5 Jul 2016 12:05:35 +0530
Bharata B Rao  wrote:

> On Tue, Jul 05, 2016 at 07:49:38AM +0200, Igor Mammedov wrote:
> > On Tue, 5 Jul 2016 10:46:07 +0530
> > Bharata B Rao  wrote:
> >   
> > > On Tue, Jul 05, 2016 at 02:56:13PM +1000, David Gibson wrote:  
> > > > On Tue, Jul 05, 2016 at 10:12:48AM +0530, Bharata B Rao wrote:  
> > > > > Consolidates cpu vmstate_[un]register calls into separate
> > > > > routines. No functionality change except that vmstate_unregister
> > > > > calls are now done under !CONFIG_USER_ONLY to match with
> > > > > vmstate_register calls.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao   
> > > > 
> > > > Reviewed-by: David Gibson 
> > > >   
> > > > > ---
> > > > >  exec.c | 47 ---
> > > > >  1 file changed, 28 insertions(+), 19 deletions(-)
> > > > > 
> > > > > diff --git a/exec.c b/exec.c
> > > > > index 0122ef7..8ce8e90 100644
> > > > > --- a/exec.c
> > > > > +++ b/exec.c
> > > > > @@ -594,9 +594,7 @@ AddressSpace *cpu_get_address_space(CPUState
> > > > > *cpu, int asidx) /* Return the AddressSpace corresponding to the
> > > > > specified index */ return cpu->cpu_ases[asidx].as;
> > > > >  }
> > > > > -#endif
> > > > >  
> > > > > -#ifndef CONFIG_USER_ONLY
> > > > >  static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> > > > >  
> > > > >  static int cpu_get_free_index(Error **errp)
> > > > > @@ -617,6 +615,31 @@ static void cpu_release_index(CPUState *cpu)
> > > > >  {
> > > > >  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> > > > >  }
> > > > > +
> > > > > +static void cpu_vmstate_register(CPUState *cpu)
> > > > > +{
> > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > +
> > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > +vmstate_register(NULL, cpu->cpu_index,
> > > > > &vmstate_cpu_common, cpu);
> > > > > +}
> > > > > +if (cc->vmsd != NULL) {
> > > > > +vmstate_register(NULL, cpu->cpu_index, cc->vmsd, cpu);
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +static void cpu_vmstate_unregister(CPUState *cpu)
> > > > > +{
> > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > +
> > > > > +if (cc->vmsd != NULL) {
> > > > > +vmstate_unregister(NULL, cc->vmsd, cpu);
> > > > > +}
> > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > +vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
> > > > > +}
> > > > > +}
> > > > > +  
> > > > 
> > > > Given you're factoring this out, would it make sense to defined
> > > > no-op versions for CONFIG_USER_ONLY, to reduce the amount of ifdefs
> > > > at the call site?  
> > > 
> > > I did that in a subsequent patch that moved the calls to these
> > > routines into cpu_common_[un]realize()cpu_common_[un]realize(), but ended 
> > > up in some
> > > unrelated issue and hence didn't include that patch yet.  
> > I'd prefer to see it moved to cpu_common_[un]realize() directly
> > without tis intermediate transition as compat logic could be
> > implemented much cleaner if it's there.  
> 
> If I implement cpu_common_unrealize() and the associated logic similar
> to the existing cpu_common_realize(), it would involve changes to all
> archs. I have done the change only to ppc in the below experimental patch.
> Is this kind of change preferred or is it possible to do this in
> non-invasive way ?
> 

Even if affects all targets, the change is quite mechanical and should not
be a problem I guess.

> diff --git a/exec.c b/exec.c
> index 2e8ad14..5274cc8 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -617,7 +617,7 @@ static void cpu_release_index(CPUState *cpu)
>  }
>  
>  /* TODO: cpu_index is int while .get_arch_id()is int64_t */
> -static void cpu_vmstate_register(CPUState *cpu)
> +void cpu_vmstate_register(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  int instance_id = cpu->prefer_arch_id_over_cpu_index ?
> @@ -631,7 +631,7 @@ static void cpu_vmstate_register(CPUState *cpu)
>  }
>  }
>  
> -static void cpu_vmstate_unregister(CPUState *cpu)
> +void cpu_vmstate_unregister(CPUState *cpu)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cpu);
>  
> @@ -660,6 +660,14 @@ static void cpu_release_index(CPUState *cpu)
>  {
>  return;
>  }
> +
> +void cpu_vmstate_register(CPUState *cpu)
> +{
> +}
> +
> +void cpu_vmstate_unregister(CPUState *cpu)
> +{
> +}
>  #endif
>  
>  void cpu_exec_exit(CPUState *cpu)
> @@ -680,8 +688,6 @@ void cpu_exec_exit(CPUState *cpu)
>  cpu->cpu_index = -1;
>  #if defined(CONFIG_USER_ONLY)
>  cpu_list_unlock();
> -#else
> -cpu_vmstate_unregister(cpu);
>  #endif
>  }
>  
> @@ -724,8 +730,6 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
>  QTAILQ_INSERT_TAIL(&cpus, cpu, node);
>  #if defined(CONFIG_USER_ONLY)
>  cpu_list_unlock();
> -#else
> -cpu_vmstate_register(cpu);
>  #endif
>  }
>  
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 1f1706e..08eab39 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -874,4

Re: [Qemu-devel] Regression: block: Add .bdrv_co_pwrite_zeroes()

2016-07-05 Thread Peter Lieven


Am 05.07.2016 um 03:53 schrieb Eric Blake:

On 07/04/2016 07:49 AM, Peter Lieven wrote:

Hi,

the above commit:

commit d05aa8bb4a8b6aa9a915ec5074fb12ae632d2323
Author: Eric Blake 
Date:   Wed Jun 1 15:10:03 2016 -0600

 block: Add .bdrv_co_pwrite_zeroes()

introduces a regression (at least for me).

The Limits from the iSCSI Block Limits VPD have no requirement of being
a power of two.
We use Dell Equallogic iSCSI SANs for instance. They have an internal
page size of 15MB. And
they advertise this page size as max_ws_len, opt_transfer_len and
opt_discard_alignment.

A non-power-of-2 max_ws_len shouldn't be a problem, but opt_transfer_len
and opt_discard_alignment not being a power of 2 impacts other code.
15MB is a rather odd page size.


I know, not my idea ;-) I think at least opt_discard_alignment of 15MB used to 
work
before.




I think we cannot assert that that these alignments are a power of 2.

Perhaps that means we should just fix our code to round things down to
the nearest power of 2 (8MB) for the opt_transfer_len and
opt_discard_alignment values.  Can you post a stack-trace of the actual
assertion you are hitting?



Sure:

qemu-system-x86_64: block/io.c:1165: bdrv_co_do_pwrite_zeroes: Assertion 
`is_power_of_2(alignment)' failed.

Program received signal SIGABRT, Aborted.
0x75222c37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56../nptl/sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht 
gefunden.
(gdb) bt full
#0  0x75222c37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
resultvar = 0
pid = 9610
selftid = 9610
#1  0x75226028 in __GI_abort () at abort.c:89
save_stage = 2
act = {__sigaction_handler = {sa_handler = 0x7fffe1fe, sa_sigaction 
= 0x7fffe1fe}, sa_mask = {__val = {140737307379972,
  93824998987040, 1165, 93823560581120, 140737306023251, 0, 
93825220359792, 47244640260, 93825009612448, 256, 0, 0, 0, 21474836480,
  140737354027008, 140737307395000}}, sa_flags = 1438406118, sa_restorer 
= 0x55bc5ca0 <__PRETTY_FUNCTION__.34924>}
sigs = {__val = {32, 0 }}
#2  0x7521bbf6 in __assert_fail_base (fmt=0x7536c3b8 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x55bc55e6 "is_power_of_2(alignment)", 
file=file@entry=0x55bc5520 "block/io.c", line=line@entry=1165,
function=function@entry=0x55bc5ca0 <__PRETTY_FUNCTION__.34924> 
"bdrv_co_do_pwrite_zeroes") at assert.c:92
str = 0x586f5780 ""
total = 4096
#3  0x7521bca2 in __GI___assert_fail (assertion=assertion@entry=0x55bc55e6 
"is_power_of_2(alignment)",
file=file@entry=0x55bc5520 "block/io.c", line=line@entry=1165,
function=function@entry=0x55bc5ca0 <__PRETTY_FUNCTION__.34924> 
"bdrv_co_do_pwrite_zeroes") at assert.c:101
No locals.
#4  0x55a8a968 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x565b2df0, 
offset=offset@entry=1359998976, count=count@entry=4096,
flags=flags@entry=6) at block/io.c:1165
drv = 0x560b3580 
qiov = {iov = 0x1, niov = 4096, nalloc = 0, size = 
140737148473344}
iov = {iov_base = 0x0, iov_len = 0}
ret = 0
need_flush = false
head = 0
tail = 0
max_write_zeroes = 
alignment = 15728640
__PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
#5  0x55a8ae3b in bdrv_aligned_pwritev (bs=bs@entry=0x565b2df0, 
req=req@entry=0x62ee3970, offset=offset@entry=1359998976,
bytes=bytes@entry=4096, qiov=0x58909e58, flags=, 
flags@entry=0) at block/io.c:1290
---Type  to continue, or q  to quit---
drv = 0x560b3580 
waited = 
ret = 
start_sector = 2656248
end_sector = 2656256
__PRETTY_FUNCTION__ = "bdrv_aligned_pwritev"
#6  0x55a8b95f in bdrv_co_pwritev (bs=0x565b2df0, 
offset=offset@entry=1359998976, bytes=bytes@entry=4096,
qiov=qiov@entry=0x58909e58, flags=flags@entry=0) at block/io.c:1514
req = {bs = 0x565b2df0, offset = 1359998976, bytes = 4096, type = 
BDRV_TRACKED_WRITE, serialising = false,
  overlap_offset = 1359998976, overlap_bytes = 4096, list = {le_next = 
0x5a3c9cc0, le_prev = 0x565b5f28}, co = 0x57ccc100,
  wait_queue = {entries = {tqh_first = 0x0, tqh_last = 
0x62ee39b8}}, waiting_for = 0x0}
align = 512
head_buf = 0x0
tail_buf = 
local_qiov = {iov = 0x0, niov = 1448841360, nalloc = 21845, size = 0}
use_local_qiov = 
ret = 
__PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#7  0x55a7cae3 in blk_co_pwritev (blk=0x565b2c30, 
offset=1359998976, bytes=4096, qiov=0x58909e58, flags=0)
at block/block-backend.c:788
ret = 
#8  0x55a7cc2e in blk_aio_write_entry (opaque=0x57da5200) at 
block/block-backend.c:977
acb = 0x57da

Re: [Qemu-devel] [PATCH v10 08/26] acpi: add DMAR scope definition for root IOAPIC

On Mon, Jul 04, 2016 at 06:22:56PM +0300, Michael S. Tsirkin wrote:

[...]

> > @@ -2425,6 +2427,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
> >  AcpiDmarHardwareUnit *drhd;
> >  uint8_t dmar_flags = 0;
> >  X86IOMMUState *iommu = x86_iommu_get_default();
> > +AcpiDmarDeviceScope *scope = NULL;
> > +/* Root complex IOAPIC use one path[0] only */
> > +uint8_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
> 
> just use int or unsigned or size_t for types like this.

Will fix.

[...]

> > diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
> > index ea9be0b..0dbdde3 100644
> > --- a/include/hw/acpi/acpi-defs.h
> > +++ b/include/hw/acpi/acpi-defs.h
> > @@ -571,6 +571,20 @@ enum {
> >  /*
> >   * Sub-structures for DMAR
> >   */
> > +
> > +#define ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC (0x03)
> 
> Again, pls use literal with comment in code.

Will fix.

[...]

> > diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
> > index c5c073d..312b47f 100644
> > --- a/include/hw/pci-host/q35.h
> > +++ b/include/hw/pci-host/q35.h
> > @@ -175,4 +175,12 @@ typedef struct Q35PCIHost {
> >  
> >  uint64_t mch_mcfg_base(void);
> >  
> > +/*
> > + * Arbitary but unique BNF number for IOAPIC device.
> > + *
> > + * TODO: make sure there would have no conflict with real PCI bus
> 
> How are you going to do this?

Still not think about it yet (on my todo list). Please shoot if
there's any suggestion.

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH v2 15/18] target-i386: do not ignore error and fix apic parent

On Mon, 4 Jul 2016 18:39:21 +0200
Igor Mammedov  wrote:

> On Mon, 4 Jul 2016 17:20:59 +0300
> "Michael S. Tsirkin"  wrote:
> 
> > On Fri, Jun 24, 2016 at 06:06:03PM +0200, Igor Mammedov wrote:
> > > object_property_add_child() silently fails with error that it
> > > can't create duplicate propery 'apic' as we already have 'apic'
> > > property
> > 
> > I thought that one was 'apic-id'?
> indeed, but with error_abort it prints:
>   qemu-system-x86_64: attempt to add duplicate property 'apic' to
>   object (type 'qemu64-x86_64-cpu')
> 
> I'll try to trace where it comes from.

We have cpu feature 'apic', so patch is correct, I'll fix commit
message that refers mistakingly to APIC ID
 
> 
> > 
> > > registered for AIPC ID.
> > 
> > APIC
> > 
> > > As result generic device_realize puts
> > > apic as into unattached container.
> > > 
> > > As it's programming error, abort on it and fix property name for
> > > apic_state to 'lapic', this way apic is a child of cpu instance.
> > > 
> > > Signed-off-by: Igor Mammedov 
> > > ---
> > >  target-i386/cpu.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > > index ebf4140..04c0b79 100644
> > > --- a/target-i386/cpu.c
> > > +++ b/target-i386/cpu.c
> > > @@ -2763,8 +2763,9 @@ static void x86_cpu_apic_create(X86CPU *cpu,
> > > Error **errp) 
> > >  cpu->apic_state = DEVICE(object_new(apic_type));
> > >  
> > > -object_property_add_child(OBJECT(cpu), "apic",
> > > -  OBJECT(cpu->apic_state), NULL);
> > > +object_property_add_child(OBJECT(cpu), "lapic",
> > > +  OBJECT(cpu->apic_state),
> > > &error_abort); +
> > >  qdev_prop_set_uint8(cpu->apic_state, "id", cpu->apic_id);
> > >  /* TODO: convert to link<> */
> > >  apic = APIC_COMMON(cpu->apic_state);
> > > -- 
> > > 1.8.3.1
> 
>

Re: [Qemu-devel] [PATCH v10 17/26] x86-iommu: introduce IEC notifiers

On Mon, Jul 04, 2016 at 04:22:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 21/06/2016 09:47, Peter Xu wrote:
> > This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
> > invalidation notifier list. When vIOMMU receives IEC invalidate
> > request, all the registered units will be notified with specific
> > invalidation requests.
> > 
> > Intel IOMMU is the first provider that generates such a event.
> > 
> > Signed-off-by: Peter Xu 
> 
> Please consider switching this to a NotifierList.

Noted in my todo. Thanks,

-- peterx

Re: [Qemu-devel] [RFC PATCH v0 1/5] cpu: Factor out cpu vmstate_[un]register into separate routines

2016-07-05 Thread Bharata B Rao

On Tue, Jul 05, 2016 at 09:22:30AM +0200, Igor Mammedov wrote:
> On Tue, 5 Jul 2016 12:05:35 +0530
> Bharata B Rao  wrote:
> 
> > On Tue, Jul 05, 2016 at 07:49:38AM +0200, Igor Mammedov wrote:
> > > On Tue, 5 Jul 2016 10:46:07 +0530
> > > Bharata B Rao  wrote:
> > > 
> > > > On Tue, Jul 05, 2016 at 02:56:13PM +1000, David Gibson wrote:
> > > > > On Tue, Jul 05, 2016 at 10:12:48AM +0530, Bharata B Rao wrote:
> > > > > > Consolidates cpu vmstate_[un]register calls into separate
> > > > > > routines. No functionality change except that
> > > > > > vmstate_unregister calls are now done under !CONFIG_USER_ONLY
> > > > > > to match with vmstate_register calls.
> > > > > > 
> > > > > > Signed-off-by: Bharata B Rao 
> > > > > 
> > > > > Reviewed-by: David Gibson 
> > > > > 
> > > > > > ---
> > > > > >  exec.c | 47 ---
> > > > > >  1 file changed, 28 insertions(+), 19 deletions(-)
> > > > > > 
> > > > > > diff --git a/exec.c b/exec.c
> > > > > > index 0122ef7..8ce8e90 100644
> > > > > > --- a/exec.c
> > > > > > +++ b/exec.c
> > > > > > @@ -594,9 +594,7 @@ AddressSpace
> > > > > > *cpu_get_address_space(CPUState *cpu, int asidx) /* Return
> > > > > > the AddressSpace corresponding to the specified index */
> > > > > > return cpu->cpu_ases[asidx].as; }
> > > > > > -#endif
> > > > > >  
> > > > > > -#ifndef CONFIG_USER_ONLY
> > > > > >  static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> > > > > >  
> > > > > >  static int cpu_get_free_index(Error **errp)
> > > > > > @@ -617,6 +615,31 @@ static void cpu_release_index(CPUState
> > > > > > *cpu) {
> > > > > >  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> > > > > >  }
> > > > > > +
> > > > > > +static void cpu_vmstate_register(CPUState *cpu)
> > > > > > +{
> > > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > > +
> > > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > > +vmstate_register(NULL, cpu->cpu_index,
> > > > > > &vmstate_cpu_common, cpu);
> > > > > > +}
> > > > > > +if (cc->vmsd != NULL) {
> > > > > > +vmstate_register(NULL, cpu->cpu_index, cc->vmsd,
> > > > > > cpu);
> > > > > > +}
> > > > > > +}
> > > > > > +
> > > > > > +static void cpu_vmstate_unregister(CPUState *cpu)
> > > > > > +{
> > > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > > +
> > > > > > +if (cc->vmsd != NULL) {
> > > > > > +vmstate_unregister(NULL, cc->vmsd, cpu);
> > > > > > +}
> > > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > > +vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
> > > > > > +}
> > > > > > +}
> > > > > > +
> > > > > 
> > > > > Given you're factoring this out, would it make sense to defined
> > > > > no-op versions for CONFIG_USER_ONLY, to reduce the amount of
> > > > > ifdefs at the call site?
> > > > 
> > > > I did that in a subsequent patch that moved the calls to these
> > > > routines into cpu_common_[un]realize()cpu_common_[un]realize(),
> > > > but ended up in some unrelated issue and hence didn't include
> > > > that patch yet.
> > > I'd prefer to see it moved to cpu_common_[un]realize() directly
> > > without tis intermediate transition as compat logic could be
> > > implemented much cleaner if it's there.
> > 
> > If I implement cpu_common_unrealize() and the associated logic similar
> > to the existing cpu_common_realize(), it would involve changes to all
> > archs.
> maybe I'm missing something but why all arch will be involved?
> The only ones that could be affected are the ones that have their own
> cpu_xxx_unrealize() implemented without calling CPUClass::unrealize.

You are right, I didn't realize that most archs don't define their own
cpu_xxx_unrealize call. So this should be a simple change. Will include
a patch to move vmstate_[un]register() calls to cpu_common_[un]realize()
in the next version.

Regards,
Bharata.

Re: [Qemu-devel] [PATCH v10 24/26] kvm-irqchip: do explicit commit when update irq

On Mon, Jul 04, 2016 at 04:23:32PM +0200, Paolo Bonzini wrote:
> FWIW I prefer this to the "v10.2".

Let me drop v10.2 then. Thanks,

-- peterx

Re: [Qemu-devel] [PATCH 1/2] linux-user: Add loop control ioctls

2016-07-05 Thread Laurent Vivier



Le 04/07/2016 à 18:06, Peter Maydell a écrit :
> Add support for the /dev/loop-control ioctls:
>  LOOP_CTL_ADD
>  LOOP_CTL_REMOVE
>  LOOP_CTL_GET_FREE
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Laurent Vivier 

> ---
>  linux-user/ioctls.h   |  4 
>  linux-user/linux_loop.h   | 11 ++-
>  linux-user/syscall_defs.h |  4 
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
> index 804f099..72cd32a 100644
> --- a/linux-user/ioctls.h
> +++ b/linux-user/ioctls.h
> @@ -356,6 +356,10 @@
>IOCTL(LOOP_GET_STATUS64, IOC_W, MK_PTR(MK_STRUCT(STRUCT_loop_info64)))
>IOCTL(LOOP_CHANGE_FD, 0, TYPE_INT)
>  
> +  IOCTL(LOOP_CTL_ADD, 0, TYPE_INT)
> +  IOCTL(LOOP_CTL_REMOVE, 0, TYPE_INT)
> +  IOCTL(LOOP_CTL_GET_FREE, 0, TYPE_NULL)
> +
>IOCTL(MTIOCTOP, IOC_W, MK_PTR(MK_STRUCT(STRUCT_mtop)))
>IOCTL(MTIOCGET, IOC_R, MK_PTR(MK_STRUCT(STRUCT_mtget)))
>IOCTL(MTIOCPOS, IOC_R, MK_PTR(MK_STRUCT(STRUCT_mtpos)))
> diff --git a/linux-user/linux_loop.h b/linux-user/linux_loop.h
> index 8974caa..fd7608b 100644
> --- a/linux-user/linux_loop.h
> +++ b/linux-user/linux_loop.h
> @@ -1,4 +1,6 @@
> -/* Copied from 2.6.25 kernel headers to avoid problems on older hosts.  */
> +/* Copied from 2.6.25 kernel headers to avoid problems on older hosts,
> + * and subsequently updated to match newer additions to the API.
> + */
>  #ifndef _LINUX_LOOP_H
>  #define _LINUX_LOOP_H
>  
> @@ -91,5 +93,12 @@ struct loop_info64 {
>  #define LOOP_SET_STATUS640x4C04
>  #define LOOP_GET_STATUS640x4C05
>  #define LOOP_CHANGE_FD   0x4C06
> +#define LOOP_SET_CAPACITY   0x4C07
> +#define LOOP_SET_DIRECT_IO  0x4C08
> +
> +/* /dev/loop-control interface */
> +#define LOOP_CTL_ADD0x4C80
> +#define LOOP_CTL_REMOVE 0x4C81
> +#define LOOP_CTL_GET_FREE   0x4C82
>  
>  #endif
> diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
> index 6650e26..0591abc 100644
> --- a/linux-user/syscall_defs.h
> +++ b/linux-user/syscall_defs.h
> @@ -1129,6 +1129,10 @@ struct target_pollfd {
>  #define TARGET_LOOP_GET_STATUS64  0x4C05
>  #define TARGET_LOOP_CHANGE_FD 0x4C06
>  
> +#define TARGET_LOOP_CTL_ADD   0x4C80
> +#define TARGET_LOOP_CTL_REMOVE0x4C81
> +#define TARGET_LOOP_CTL_GET_FREE  0x4C82
> +
>  /* fb ioctls */
>  #define TARGET_FBIOGET_VSCREENINFO0x4600
>  #define TARGET_FBIOPUT_VSCREENINFO0x4601
>

[Qemu-devel] [PATCH for-2.7 3/8] s390x/ipl: Support IPL from selected SCSI device

From: Alexander Yarygin 

If bootindex is specified for a device, we need to IPL from
it. Currently it works for ccw devices, but not for SCSI. To be able to
IPL from the specific device, pc-bios needs to know its address.
For this reason we add special QEMU_SCSI IPL type into the IPLB
structure, that contains the scsi device address.

We enhance the ipl block with a currently qemu-only parameter block
that allows us to specify a concrete scsi device.

Signed-off-by: Alexander Yarygin 
Reviewed-by: Eric Farman 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/ipl.c | 18 ++
 hw/s390x/ipl.h | 13 +
 2 files changed, 31 insertions(+)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index e6bf7cf..78998cd 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -217,6 +217,8 @@ static bool s390_gen_initial_iplb(S390IPLState *ipl)
 VirtioCcwDevice *ccw_dev = (VirtioCcwDevice *) object_dynamic_cast(
 OBJECT(qdev_get_parent_bus(dev_st)->parent),
 TYPE_VIRTIO_CCW_DEVICE);
+SCSIDevice *sd = (SCSIDevice *) object_dynamic_cast(OBJECT(dev_st),
+TYPE_SCSI_DEVICE);
 if (ccw_dev) {
 ipl->iplb.len = cpu_to_be32(S390_IPLB_MIN_CCW_LEN);
 ipl->iplb.blk0_len =
@@ -225,6 +227,22 @@ static bool s390_gen_initial_iplb(S390IPLState *ipl)
 ipl->iplb.ccw.devno = cpu_to_be16(ccw_dev->sch->devno);
 ipl->iplb.ccw.ssid = ccw_dev->sch->ssid & 3;
 return true;
+} else if (sd) {
+SCSIBus *bus = scsi_bus_from_device(sd);
+VirtIOSCSI *vdev = container_of(bus, VirtIOSCSI, bus);
+VirtIOSCSICcw *scsi_ccw = container_of(vdev, VirtIOSCSICcw, vdev);
+VirtioCcwDevice *ccw = &scsi_ccw->parent_obj;
+
+ipl->iplb.len = cpu_to_be32(S390_IPLB_MIN_QEMU_SCSI_LEN);
+ipl->iplb.blk0_len =
+cpu_to_be32(S390_IPLB_MIN_QEMU_SCSI_LEN - 
S390_IPLB_HEADER_LEN);
+ipl->iplb.pbt = S390_IPL_TYPE_QEMU_SCSI;
+ipl->iplb.scsi.lun = cpu_to_be32(sd->lun);
+ipl->iplb.scsi.target = cpu_to_be16(sd->id);
+ipl->iplb.scsi.channel = cpu_to_be16(sd->channel);
+ipl->iplb.scsi.devno = cpu_to_be16(ccw->sch->devno);
+ipl->iplb.scsi.ssid = ccw->sch->ssid & 3;
+return true;
 }
 }
 
diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index 9aa4d94..ed3f2c8 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -46,6 +46,16 @@ struct IplBlockFcp {
 } QEMU_PACKED;
 typedef struct IplBlockFcp IplBlockFcp;
 
+struct IplBlockQemuScsi {
+uint32_t lun;
+uint16_t target;
+uint16_t channel;
+uint8_t  reserved0[77];
+uint8_t  ssid;
+uint16_t devno;
+} QEMU_PACKED;
+typedef struct IplBlockQemuScsi IplBlockQemuScsi;
+
 union IplParameterBlock {
 struct {
 uint32_t len;
@@ -59,6 +69,7 @@ union IplParameterBlock {
 union {
 IplBlockCcw ccw;
 IplBlockFcp fcp;
+IplBlockQemuScsi scsi;
 };
 } QEMU_PACKED;
 struct {
@@ -102,10 +113,12 @@ typedef struct S390IPLState S390IPLState;
 
 #define S390_IPL_TYPE_FCP 0x00
 #define S390_IPL_TYPE_CCW 0x02
+#define S390_IPL_TYPE_QEMU_SCSI 0xff
 
 #define S390_IPLB_HEADER_LEN 8
 #define S390_IPLB_MIN_CCW_LEN 200
 #define S390_IPLB_MIN_FCP_LEN 384
+#define S390_IPLB_MIN_QEMU_SCSI_LEN 200
 
 static inline bool iplb_valid_len(IplParameterBlock *iplb)
 {
-- 
2.9.0

[Qemu-devel] [PATCH for-2.7 5/8] s390x/css: factor out some generic code from virtio_ccw_device_realize()

From: Sascha Silbe 

A lot of what virtio_ccw_device_realize() does isn't specific to
virtio; it would apply to emulated CCW as well. Factor it out to make
it easier to implement emulated CCW devices later on.

Signed-off-by: Sascha Silbe 
Reviewed-by: Dong Jia Shi 
Reviewed-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c | 143 +
 hw/s390x/virtio-ccw.c  | 122 +
 hw/s390x/virtio-ccw.h  |   2 -
 include/hw/s390x/css.h |  18 +++
 4 files changed, 175 insertions(+), 110 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 7666881..54991f5 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1340,6 +1340,116 @@ SubchDev *css_find_subch(uint8_t m, uint8_t cssid, 
uint8_t ssid, uint16_t schid)
 return channel_subsys.css[real_cssid]->sch_set[ssid]->sch[schid];
 }
 
+/**
+ * Return free device number in subchannel set.
+ *
+ * Return index of the first free device number in the subchannel set
+ * identified by @p cssid and @p ssid, beginning the search at @p
+ * start and wrapping around at MAX_DEVNO. Return a value exceeding
+ * MAX_SCHID if there are no free device numbers in the subchannel
+ * set.
+ */
+static uint32_t css_find_free_devno(uint8_t cssid, uint8_t ssid,
+uint16_t start)
+{
+uint32_t round;
+
+for (round = 0; round <= MAX_DEVNO; round++) {
+uint16_t devno = (start + round) % MAX_DEVNO;
+
+if (!css_devno_used(cssid, ssid, devno)) {
+return devno;
+}
+}
+return MAX_DEVNO + 1;
+}
+
+/**
+ * Return first free subchannel (id) in subchannel set.
+ *
+ * Return index of the first free subchannel in the subchannel set
+ * identified by @p cssid and @p ssid, if there is any. Return a value
+ * exceeding MAX_SCHID if there are no free subchannels in the
+ * subchannel set.
+ */
+static uint32_t css_find_free_subch(uint8_t cssid, uint8_t ssid)
+{
+uint32_t schid;
+
+for (schid = 0; schid <= MAX_SCHID; schid++) {
+if (!css_find_subch(1, cssid, ssid, schid)) {
+return schid;
+}
+}
+return MAX_SCHID + 1;
+}
+
+/**
+ * Return first free subchannel (id) in subchannel set for a device number
+ *
+ * Verify the device number @p devno is not used yet in the subchannel
+ * set identified by @p cssid and @p ssid. Set @p schid to the index
+ * of the first free subchannel in the subchannel set, if there is
+ * any. Return true if everything succeeded and false otherwise.
+ */
+static bool css_find_free_subch_for_devno(uint8_t cssid, uint8_t ssid,
+  uint16_t devno, uint16_t *schid,
+  Error **errp)
+{
+uint32_t free_schid;
+
+assert(schid);
+if (css_devno_used(cssid, ssid, devno)) {
+error_setg(errp, "Device %x.%x.%04x already exists",
+   cssid, ssid, devno);
+return false;
+}
+free_schid = css_find_free_subch(cssid, ssid);
+if (free_schid > MAX_SCHID) {
+error_setg(errp, "No free subchannel found for %x.%x.%04x",
+   cssid, ssid, devno);
+return false;
+}
+*schid = free_schid;
+return true;
+}
+
+/**
+ * Return first free subchannel (id) and device number
+ *
+ * Locate the first free subchannel and first free device number in
+ * any of the subchannel sets of the channel subsystem identified by
+ * @p cssid. Return false if no free subchannel / device number could
+ * be found. Otherwise set @p ssid, @p devno and @p schid to identify
+ * the available subchannel and device number and return true.
+ *
+ * May modify @p ssid, @p devno and / or @p schid even if no free
+ * subchannel / device number could be found.
+ */
+static bool css_find_free_subch_and_devno(uint8_t cssid, uint8_t *ssid,
+  uint16_t *devno, uint16_t *schid,
+  Error **errp)
+{
+uint32_t free_schid, free_devno;
+
+assert(ssid && devno && schid);
+for (*ssid = 0; *ssid <= MAX_SSID; (*ssid)++) {
+free_schid = css_find_free_subch(cssid, *ssid);
+if (free_schid > MAX_SCHID) {
+continue;
+}
+free_devno = css_find_free_devno(cssid, *ssid, free_schid);
+if (free_devno > MAX_DEVNO) {
+continue;
+}
+*schid = free_schid;
+*devno = free_devno;
+return true;
+}
+error_setg(errp, "Virtual channel subsystem is full!");
+return false;
+}
+
 bool css_subch_visible(SubchDev *sch)
 {
 if (sch->ssid > channel_subsys.max_ssid) {
@@ -1762,3 +1872,36 @@ PropertyInfo css_devid_propinfo = {
 .get = get_css_devid,
 .set = set_css_devid,
 };
+
+SubchDev *css_create_virtual_sch(CssDevId bus_id, Error **errp)
+{
+uint16_t schid = 0;
+SubchDev *sch;
+
+if (bus_id.valid) {
+/* Enforce use of virtual cssid. */
+

[Qemu-devel] [PATCH for-2.7 1/8] pc-bios/s390-ccw: Pass selected SCSI device to IPL

From: "Eugene (jno) Dvurechenski" 

There is ,bootindex=%d argument to specify the lookup order of
boot devices.

If a bootindex assigned to the device, then IPL Parameter Info Block
is created for that device when it is IPLed from.

If it is a mere SCSI device (not FCP), then IPIB is created with a
special SCSI type and its fields are used to store SCSI address of the
device. This new ipl block is private to qemu for now.

If the device to IPL from is specified this way, then SCSI bus lookup
is bypassed and prescribed devices uses the address specified.

Signed-off-by: Eugene (jno) Dvurechenski 
Signed-off-by: Alexander Yarygin 
Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw/iplb.h| 12 
 pc-bios/s390-ccw/main.c| 12 
 pc-bios/s390-ccw/virtio-scsi.c | 11 +++
 pc-bios/s390-ccw/virtio.h  |  2 ++
 4 files changed, 37 insertions(+)

diff --git a/pc-bios/s390-ccw/iplb.h b/pc-bios/s390-ccw/iplb.h
index 1cf509f..86abc56 100644
--- a/pc-bios/s390-ccw/iplb.h
+++ b/pc-bios/s390-ccw/iplb.h
@@ -43,6 +43,16 @@ struct IplBlockFcp {
 } __attribute__ ((packed));
 typedef struct IplBlockFcp IplBlockFcp;
 
+struct IplBlockQemuScsi {
+uint32_t lun;
+uint16_t target;
+uint16_t channel;
+uint8_t  reserved0[77];
+uint8_t  ssid;
+uint16_t devno;
+} __attribute__ ((packed));
+typedef struct IplBlockQemuScsi IplBlockQemuScsi;
+
 struct IplParameterBlock {
 uint32_t len;
 uint8_t  reserved0[3];
@@ -55,6 +65,7 @@ struct IplParameterBlock {
 union {
 IplBlockCcw ccw;
 IplBlockFcp fcp;
+IplBlockQemuScsi scsi;
 };
 } __attribute__ ((packed));
 typedef struct IplParameterBlock IplParameterBlock;
@@ -63,6 +74,7 @@ extern IplParameterBlock iplb 
__attribute__((__aligned__(PAGE_SIZE)));
 
 #define S390_IPL_TYPE_FCP 0x00
 #define S390_IPL_TYPE_CCW 0x02
+#define S390_IPL_TYPE_QEMU_SCSI 0xff
 
 static inline bool store_iplb(IplParameterBlock *iplb)
 {
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 9446ecc..345b848 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -84,6 +84,18 @@ static void virtio_setup(void)
 debug_print_int("ssid ", blk_schid.ssid);
 found = find_dev(&schib, dev_no);
 break;
+case S390_IPL_TYPE_QEMU_SCSI:
+{
+VDev *vdev = virtio_get_device();
+
+vdev->scsi_device_selected = true;
+vdev->selected_scsi_device.channel = iplb.scsi.channel;
+vdev->selected_scsi_device.target = iplb.scsi.target;
+vdev->selected_scsi_device.lun = iplb.scsi.lun;
+blk_schid.ssid = iplb.scsi.ssid & 0x3;
+found = find_dev(&schib, iplb.scsi.devno);
+break;
+}
 default:
 panic("List-directed IPL not supported yet!\n");
 }
diff --git a/pc-bios/s390-ccw/virtio-scsi.c b/pc-bios/s390-ccw/virtio-scsi.c
index 3bb48e9..d850a8d 100644
--- a/pc-bios/s390-ccw/virtio-scsi.c
+++ b/pc-bios/s390-ccw/virtio-scsi.c
@@ -204,6 +204,17 @@ static void virtio_scsi_locate_device(VDev *vdev)
 debug_print_int("config.scsi.max_target ", vdev->config.scsi.max_target);
 debug_print_int("config.scsi.max_lun", vdev->config.scsi.max_lun);
 
+if (vdev->scsi_device_selected) {
+sdev->channel = vdev->selected_scsi_device.channel;
+sdev->target = vdev->selected_scsi_device.target;
+sdev->lun = vdev->selected_scsi_device.lun;
+
+IPL_check(sdev->channel == 0, "non-zero channel requested");
+IPL_check(sdev->target <= vdev->config.scsi.max_target, "target# 
high");
+IPL_check(sdev->lun <= vdev->config.scsi.max_lun, "LUN# high");
+return;
+}
+
 for (target = 0; target <= vdev->config.scsi.max_target; target++) {
 sdev->channel = channel;
 sdev->target = target; /* sdev->lun will be 0 here */
diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h
index 3c6e915..eb35ea5 100644
--- a/pc-bios/s390-ccw/virtio.h
+++ b/pc-bios/s390-ccw/virtio.h
@@ -274,6 +274,8 @@ struct VDev {
 uint64_t scsi_last_block;
 uint32_t scsi_dev_cyls;
 uint8_t scsi_dev_heads;
+bool scsi_device_selected;
+ScsiDevice selected_scsi_device;
 };
 typedef struct VDev VDev;
 
-- 
2.9.0

[Qemu-devel] [PATCH for-2.7 0/8] More s390x patches for 2.7

The following patches (and the s390x pci patches already on the
list) make up the last batch of s390x patches I plan to do for
2.7 (except for bug fixes that may crop up, of course).

We have several fixes/enhancements in the boot code and a bit
of refactoring in the css code that makes the future addition
of device types easier.

Alexander Yarygin (1):
  s390x/ipl: Support IPL from selected SCSI device

Cornelia Huck (1):
  pc-bios/s390-ccw.img: rebuild image

David Hildenbrand (1):
  s390x/ipl: fix reboots for migration from different bios

Eugene (jno) Dvurechenski (1):
  pc-bios/s390-ccw: Pass selected SCSI device to IPL

Jing Liu (2):
  s390x/css: Factor out virtual css bridge and bus
  s390x/css: Unplug handler of virtual css bridge

Sascha Silbe (2):
  s390x/css: factor out some generic code from
virtio_ccw_device_realize()
  s390x/css: use define for "virtual-css-bridge" literal

 hw/s390x/Makefile.objs |   2 +
 hw/s390x/ccw-device.c  |  27 
 hw/s390x/ccw-device.h  |  43 +++
 hw/s390x/css-bridge.c  | 124 +++
 hw/s390x/css.c | 143 ++
 hw/s390x/ipl.c |  37 +-
 hw/s390x/ipl.h |  15 +++
 hw/s390x/s390-virtio-ccw.c |   3 +-
 hw/s390x/virtio-ccw.c  | 271 -
 hw/s390x/virtio-ccw.h  |  20 +--
 include/hw/s390x/css-bridge.h  |  31 +
 include/hw/s390x/css.h |  18 +++
 pc-bios/s390-ccw.img   | Bin 26424 -> 26440 bytes
 pc-bios/s390-ccw/iplb.h|  12 ++
 pc-bios/s390-ccw/main.c|  12 ++
 pc-bios/s390-ccw/virtio-scsi.c |  11 ++
 pc-bios/s390-ccw/virtio.h  |   2 +
 17 files changed, 531 insertions(+), 240 deletions(-)
 create mode 100644 hw/s390x/ccw-device.c
 create mode 100644 hw/s390x/ccw-device.h
 create mode 100644 hw/s390x/css-bridge.c
 create mode 100644 include/hw/s390x/css-bridge.h

-- 
2.9.0

[Qemu-devel] [PATCH for-2.7 4/8] s390x/ipl: fix reboots for migration from different bios

From: David Hildenbrand 

When migrating from a different QEMU version, the start_address and
bios_start_address may differ. During migration these values are migrated
and overwrite the values that were detected by QEMU itself.

On a reboot, QEMU will reload its own BIOS, but use the migrated start
addresses, which does not work if the values differ.

Fix this by not relying on the migrated values anymore, but still
provide them during migration, so existing QEMUs continue to work.

Signed-off-by: David Hildenbrand 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Cornelia Huck 
---
 hw/s390x/ipl.c | 11 +--
 hw/s390x/ipl.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 78998cd..a54284c 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -69,8 +69,8 @@ static const VMStateDescription vmstate_ipl = {
 .version_id = 0,
 .minimum_version_id = 0,
 .fields = (VMStateField[]) {
-VMSTATE_UINT64(start_addr, S390IPLState),
-VMSTATE_UINT64(bios_start_addr, S390IPLState),
+VMSTATE_UINT64(compat_start_addr, S390IPLState),
+VMSTATE_UINT64(compat_bios_start_addr, S390IPLState),
 VMSTATE_STRUCT(iplb, S390IPLState, 0, vmstate_iplb, IplParameterBlock),
 VMSTATE_BOOL(iplb_valid, S390IPLState),
 VMSTATE_UINT8(cssid, S390IPLState),
@@ -192,6 +192,13 @@ static void s390_ipl_realize(DeviceState *dev, Error 
**errp)
 stq_p(rom_ptr(INITRD_PARM_SIZE), initrd_size);
 }
 }
+/*
+ * Don't ever use the migrated values, they could come from a different
+ * BIOS and therefore don't work. But still migrate the values, so
+ * QEMUs relying on it don't break.
+ */
+ipl->compat_start_addr = ipl->start_addr;
+ipl->compat_bios_start_addr = ipl->bios_start_addr;
 qemu_register_reset(qdev_reset_all_fn, dev);
 error:
 error_propagate(errp, err);
diff --git a/hw/s390x/ipl.h b/hw/s390x/ipl.h
index ed3f2c8..c891095 100644
--- a/hw/s390x/ipl.h
+++ b/hw/s390x/ipl.h
@@ -93,7 +93,9 @@ struct S390IPLState {
 /*< private >*/
 DeviceState parent_obj;
 uint64_t start_addr;
+uint64_t compat_start_addr;
 uint64_t bios_start_addr;
+uint64_t compat_bios_start_addr;
 bool enforce_bios;
 IplParameterBlock iplb;
 bool iplb_valid;
-- 
2.9.0

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

2016-07-05 Thread Sascha Silbe

Dear Fam (or Zheng?),

Fam Zheng  writes:

> This was the only exceptional module init function that does something
> else than a simple list of bdrv_register() calls, in all the block
> drivers.
>
> The qcrypto_hash_supports is actually a static check, determined at
> compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> consistency.

Good idea.


[block/Makefile.objs]
> @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> qcow2-snapshot.o qcow2-c
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> -block-obj-y += quorum.o
> +block-obj-$(CONFIG_GNUTLS_HASH) += quorum.o
[...]
[block/quorum.c]
>  static void bdrv_quorum_init(void)
>  {
> -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> -/* SHA256 hash support is required for quorum device */
> -return;
> -}
>  bdrv_register(&bdrv_quorum);

The quorum driver needs SHA256 which was introduced in gnutls
2.11.1. However configure sets CONFIG_GNUTLS_HASH when gnutls 2.9.10+ is
present. You should either bump the version in configure or add an
explicit configure check for SHA256.

Sascha
-- 
Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
https://se-silbe.de/
USt-IdNr. DE281696641

[Qemu-devel] [PATCH for-2.7 8/8] s390x/css: Unplug handler of virtual css bridge

From: Jing Liu 

The previous patch moved virtual css bridge and bus out from
virtio-ccw, but kept the direct reference of virtio-ccw specific
unplug function inside css-bridge.c.

To make the virtual css bus and bridge useful for non-virtio devices,
this introduces a common unplug function pointer "unplug" to call
specific virtio-ccw unplug parts. Thus, the tight coupling to
virtio-ccw can be removed.

This unplug pointer is a member of CCWDeviceClass, which is introduced
as an abstract device layer called "ccw-device". This layer is between
DeviceState and specific devices which are plugged in virtual css bus,
like virtio-ccw device. The specific unplug handlers should be assigned
to "unplug" during initialization.

Signed-off-by: Jing Liu 
Reviewed-by: Sascha Silbe 
Reviewed-by: Dong Jia Shi 
Reviewed-by: Yi Min Zhao 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/Makefile.objs |  1 +
 hw/s390x/ccw-device.c  | 27 
 hw/s390x/ccw-device.h  | 43 +
 hw/s390x/css-bridge.c  | 39 +--
 hw/s390x/ipl.c | 14 
 hw/s390x/virtio-ccw.c  | 86 ++
 hw/s390x/virtio-ccw.h  | 11 +++
 7 files changed, 164 insertions(+), 57 deletions(-)
 create mode 100644 hw/s390x/ccw-device.c
 create mode 100644 hw/s390x/ccw-device.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 141ce1a..41ac4ec 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -9,6 +9,7 @@ obj-y += css.o
 obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
 obj-y += css-bridge.o
+obj-y += ccw-device.o
 obj-y += s390-pci-bus.o s390-pci-inst.o
 obj-y += s390-skeys.o
 obj-$(CONFIG_KVM) += s390-skeys-kvm.o
diff --git a/hw/s390x/ccw-device.c b/hw/s390x/ccw-device.c
new file mode 100644
index 000..28ea204
--- /dev/null
+++ b/hw/s390x/ccw-device.c
@@ -0,0 +1,27 @@
+/*
+ * Common device infrastructure for devices in the virtual css
+ *
+ * Copyright 2016 IBM Corp.
+ * Author(s): Jing Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "ccw-device.h"
+
+static const TypeInfo ccw_device_info = {
+.name = TYPE_CCW_DEVICE,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(CcwDevice),
+.class_size = sizeof(CCWDeviceClass),
+.abstract = true,
+};
+
+static void ccw_device_register(void)
+{
+type_register_static(&ccw_device_info);
+}
+
+type_init(ccw_device_register)
diff --git a/hw/s390x/ccw-device.h b/hw/s390x/ccw-device.h
new file mode 100644
index 000..59ba01b
--- /dev/null
+++ b/hw/s390x/ccw-device.h
@@ -0,0 +1,43 @@
+/*
+ * Common device infrastructure for devices in the virtual css
+ *
+ * Copyright 2016 IBM Corp.
+ * Author(s): Jing Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef HW_S390X_CCW_DEVICE_H
+#define HW_S390X_CCW_DEVICE_H
+#include "qom/object.h"
+#include "hw/qdev-core.h"
+#include "hw/s390x/css.h"
+
+typedef struct CcwDevice {
+DeviceState parent_obj;
+SubchDev *sch;
+/* .. */
+CssDevId bus_id;
+} CcwDevice;
+
+typedef struct CCWDeviceClass {
+DeviceClass parent_class;
+void (*unplug)(HotplugHandler *, DeviceState *, Error **);
+} CCWDeviceClass;
+
+static inline CcwDevice *to_ccw_dev_fast(DeviceState *d)
+{
+return container_of(d, CcwDevice, parent_obj);
+}
+
+#define TYPE_CCW_DEVICE "ccw-device"
+
+#define CCW_DEVICE(obj) OBJECT_CHECK(CcwDevice, (obj), TYPE_CCW_DEVICE)
+#define CCW_DEVICE_GET_CLASS(obj) \
+OBJECT_GET_CLASS(CCWDeviceClass, (obj), TYPE_CCW_DEVICE)
+#define CCW_DEVICE_CLASS(klass) \
+OBJECT_CLASS_CHECK(CCWDeviceClass, (klass), TYPE_CCW_DEVICE)
+
+#endif
diff --git a/hw/s390x/css-bridge.c b/hw/s390x/css-bridge.c
index e74cc1c..e4c24e2 100644
--- a/hw/s390x/css-bridge.c
+++ b/hw/s390x/css-bridge.c
@@ -11,13 +11,48 @@
  */
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "virtio-ccw.h"
 #include "hw/hotplug.h"
 #include "hw/sysbus.h"
 #include "qemu/bitops.h"
 #include "hw/s390x/css.h"
+#include "ccw-device.h"
 #include "hw/s390x/css-bridge.h"
 
+/*
+ * Invoke device-specific unplug handler, disable the subchannel
+ * (including sending a channel report to the guest) and remove the
+ * device from the virtual css bus.
+ */
+static void ccw_device_unplug(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(dev);
+CCWDeviceClass *k = CCW_DEVICE_GET_CLASS(ccw_dev);
+SubchDev *sch = ccw_dev->sch;
+Error *err = NULL;
+
+if (k->unplug) {
+k->unplug(hotplug_dev, dev, &err);
+if (err) {
+error_propagate(errp, err);
+return;
+}
+}
+
+/*
+ * We should arrive here only for device_d

[Qemu-devel] [PATCH for-2.7 6/8] s390x/css: use define for "virtual-css-bridge" literal

From: Sascha Silbe 

Introduce a TYPE_* define (like we already use for a couple of other
QOM types) for the name of the virtual CSS bridge QOM type instead of
sprinkling the same string literal over several source files.

Signed-off-by: Sascha Silbe 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/s390-virtio-ccw.c | 2 +-
 hw/s390x/virtio-ccw.c  | 4 ++--
 hw/s390x/virtio-ccw.h  | 3 +++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 52f079a..3b79e96 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -29,7 +29,7 @@
 #include "hw/s390x/s390-virtio-ccw.h"
 
 static const char *const reset_dev_types[] = {
-"virtual-css-bridge",
+TYPE_VIRTUAL_CSS_BRIDGE,
 "s390-sclp-event-facility",
 "s390-flic",
 "diag288",
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 67d7867..0afc0d3 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -130,7 +130,7 @@ VirtualCssBus *virtual_css_bus_init(void)
 DeviceState *dev;
 
 /* Create bridge device */
-dev = qdev_create(NULL, "virtual-css-bridge");
+dev = qdev_create(NULL, TYPE_VIRTUAL_CSS_BRIDGE);
 qdev_init_nofail(dev);
 
 /* Create bus on bridge device */
@@ -1626,7 +1626,7 @@ static void virtual_css_bridge_class_init(ObjectClass 
*klass, void *data)
 }
 
 static const TypeInfo virtual_css_bridge_info = {
-.name  = "virtual-css-bridge",
+.name  = TYPE_VIRTUAL_CSS_BRIDGE,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(SysBusDevice),
 .class_init= virtual_css_bridge_class_init,
diff --git a/hw/s390x/virtio-ccw.h b/hw/s390x/virtio-ccw.h
index 7243fb0..6144625 100644
--- a/hw/s390x/virtio-ccw.h
+++ b/hw/s390x/virtio-ccw.h
@@ -101,6 +101,9 @@ static inline int virtio_ccw_rev_max(VirtioCcwDevice *dev)
 return dev->max_rev;
 }
 
+/* virtual css bridge type */
+#define TYPE_VIRTUAL_CSS_BRIDGE "virtual-css-bridge"
+
 /* virtual css bus type */
 typedef struct VirtualCssBus {
 BusState parent_obj;
-- 
2.9.0

[Qemu-devel] [PATCH for-2.7 7/8] s390x/css: Factor out virtual css bridge and bus

From: Jing Liu 

Currently, common base layers virtual css bridge and bus are
defined in hw/s390x/virtio-ccw.c(h). In order to support
multiple types of devices in the virtual channel subsystem,
especially non virtio-ccw, refactoring work needs to be done.

This work is just a pure code move without any functional change
except dropping an empty function virtual_css_bridge_init() and
virtio_ccw_busdev_unplug() changing. virtio_ccw_busdev_unplug()
is specific to virtio-ccw but gets referenced from the common
virtual css bridge code. To keep the functional changes to a
minimum we export this function from virtio-ccw.c and continue
to reference it inside virtual_css_bridge_class_init()
(now living in hw/s390x/css-bridge.c). A follow-up patch will
clean this up.

Signed-off-by: Jing Liu 
Reviewed-by: Sascha Silbe 
Reviewed-by: Dong Jia Shi 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/Makefile.objs|  1 +
 hw/s390x/css-bridge.c | 89 +++
 hw/s390x/s390-virtio-ccw.c|  1 +
 hw/s390x/virtio-ccw.c | 79 ++
 hw/s390x/virtio-ccw.h | 14 +--
 include/hw/s390x/css-bridge.h | 31 +++
 6 files changed, 127 insertions(+), 88 deletions(-)
 create mode 100644 hw/s390x/css-bridge.c
 create mode 100644 include/hw/s390x/css-bridge.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index 2203617..141ce1a 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -8,6 +8,7 @@ obj-y += ipl.o
 obj-y += css.o
 obj-y += s390-virtio-ccw.o
 obj-y += virtio-ccw.o
+obj-y += css-bridge.o
 obj-y += s390-pci-bus.o s390-pci-inst.o
 obj-y += s390-skeys.o
 obj-$(CONFIG_KVM) += s390-skeys-kvm.o
diff --git a/hw/s390x/css-bridge.c b/hw/s390x/css-bridge.c
new file mode 100644
index 000..e74cc1c
--- /dev/null
+++ b/hw/s390x/css-bridge.c
@@ -0,0 +1,89 @@
+/*
+ * css bridge implementation
+ *
+ * Copyright 2012,2016 IBM Corp.
+ * Author(s): Cornelia Huck 
+ *Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "virtio-ccw.h"
+#include "hw/hotplug.h"
+#include "hw/sysbus.h"
+#include "qemu/bitops.h"
+#include "hw/s390x/css.h"
+#include "hw/s390x/css-bridge.h"
+
+static void virtual_css_bus_reset(BusState *qbus)
+{
+/* This should actually be modelled via the generic css */
+css_reset();
+}
+
+static void virtual_css_bus_class_init(ObjectClass *klass, void *data)
+{
+BusClass *k = BUS_CLASS(klass);
+
+k->reset = virtual_css_bus_reset;
+}
+
+static const TypeInfo virtual_css_bus_info = {
+.name = TYPE_VIRTUAL_CSS_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(VirtualCssBus),
+.class_init = virtual_css_bus_class_init,
+};
+
+VirtualCssBus *virtual_css_bus_init(void)
+{
+VirtualCssBus *cbus;
+BusState *bus;
+DeviceState *dev;
+
+/* Create bridge device */
+dev = qdev_create(NULL, TYPE_VIRTUAL_CSS_BRIDGE);
+qdev_init_nofail(dev);
+
+/* Create bus on bridge device */
+bus = qbus_create(TYPE_VIRTUAL_CSS_BUS, dev, "virtual-css");
+cbus = VIRTUAL_CSS_BUS(bus);
+
+/* Enable hotplugging */
+qbus_set_hotplug_handler(bus, dev, &error_abort);
+
+return cbus;
+ }
+
+/* Virtual-css Bus Bridge Device /
+
+static void virtual_css_bridge_class_init(ObjectClass *klass, void *data)
+{
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+hc->unplug = virtio_ccw_busdev_unplug;
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+}
+
+static const TypeInfo virtual_css_bridge_info = {
+.name  = TYPE_VIRTUAL_CSS_BRIDGE,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(SysBusDevice),
+.class_init= virtual_css_bridge_class_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_HOTPLUG_HANDLER },
+{ }
+}
+};
+
+static void virtual_css_register(void)
+{
+type_register_static(&virtual_css_bridge_info);
+type_register_static(&virtual_css_bus_info);
+}
+
+type_init(virtual_css_register)
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3b79e96..caf0a68 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -27,6 +27,7 @@
 #include "hw/compat.h"
 #include "ipl.h"
 #include "hw/s390x/s390-virtio-ccw.h"
+#include "hw/s390x/css-bridge.h"
 
 static const char *const reset_dev_types[] = {
 TYPE_VIRTUAL_CSS_BRIDGE,
diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 0afc0d3..8f1a0e8 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -33,31 +33,11 @@
 #include "hw/s390x/css.h"
 #include "virtio-ccw.h"
 #include "trace.h"
+#include "hw/s390x/css-bridge.h"
 
 static void virtio_ccw_bus_new(VirtioBusState *bus, size_t bus_s

Re: [Qemu-devel] [PATCH v2 0/3] seabios: add serial console support

On Mon, Jul 04, 2016 at 10:39:51PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> Next round of patches.  Changes:
> 
>  * Moved it all to a new sercon.c file.
>  * Code maps cp437 to utf8 now, giving a much nicer display.  Compare
>"Use the ↑ and ↓ keys to change the selection." (this series) with
>"Use the ^ and v keys to change the selection." (sgabios)  ;-)
>  * Simplified keyboard code, using enqueue_key now.
>  * Restructed code, to cleanup things and to address review comments.

Currently libvirt has an option to turn on serial console support
for the BIOS. When this is set it adds the sga device. How will
libvirt know when seabios has this feature built-in, and thus does
not need to add the sga device ? 


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH for-2.7 2/8] pc-bios/s390-ccw.img: rebuild image

Contains:
- pc-bios/s390-ccw: Pass selected SCSI device to IPL

Signed-off-by: Cornelia Huck 
---
 pc-bios/s390-ccw.img | Bin 26424 -> 26440 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/pc-bios/s390-ccw.img b/pc-bios/s390-ccw.img
index 
ea5fd9a4797530e79e88f591f66c4de8e93a604e..089f6ba5e924c17796cccac4ecbfa7a78b00ba22
 100644
GIT binary patch
delta 7745
zcmZ`;3w)HtwV!Wylij?(Y+jp92%85Xyh0!W0|eaI2vI^b#fXt22pct8
zs?}~X;FU`2wt^p}RD;2m_+s0lphaAHLs8N$S_G`SC50#<-~FGngm8O%^ZO;|?4193
z%$YN1W;VMID*F#Ao-Au*fe8ON!{omGtwpay&+7?)n%C3XHCnMrRe8=j{XlqZfXa5|
z*GbP$J6$^cgDYRJerEjb6*-@~ewgtWxy*3N9$6NK^It_3B0l_ig-zBQ$)?AcrmT|N
zjMX=wjgIRov0-XnB~{Zc>AwFpYAkLFe}2#}>rHm+przVXy}nyP>EW2Z<=
zO?C&0yvs#vxg)s61tA#|65+Mz9W{>d=cFfS)VhR5#c1MVzU6S=V)WUtX%1#Q87C)X
zoSdX4SHeD7-#T6h^3a(qidnB->()Q=-AeOb%mY3)+C+TZMCN&JaV5kLew;C{n)$~U
z6r$xK3}nkcMzkt#$hncfR2It1k&~5sWr=xMQq`dttg7&-x~~6?^s{~`8_gNnDa2=N
z52%ouE}RKjgvlJb@LJF%@_c@HNt8Rx?wLomtN-ZnH4E`#dwEgF5^yxD!bJWPx@hH;
zXMIr5`0i0-$f61)C7G?p5hCU=^Rt=%h&^Y>)s{Q+GwupF>ftodUT+hoLxsVUs_-?!
z%z}U;=!5$rxSKD}TjmWL!LI5#-$`V`zLCw{jXa)YZa$3;+9H=k4NgqZHVGxr74*V}
zONgK8@`b3qc6$N=qmQ!s66QZ9KeOGyXy@Q8W
zF~r9F3WoT6pX>THNI-X
zWxKpBcBt~aygxR>y?&$OptYGpnqykYmA&Trx<3%?p>@~R3&s98`+Kl6Rpj*hjJ?c0
zBWwkjK$
z{z1UW(IT4WA5yh8x5$I>N9<|KSYa$HFJyis@%Gj7?S!$07Z1x|g5920N)mG)qi$oq
zhfz~yp1OA2Y6>fm4Xddni|!i;F7*tfw-P$-OXgo9KK(0sOwBjg46;wX)1JDCCEsJz
zMdoW6rOEpe8x3g>$s>scMEVj3nZKlzKmVr8vF3hf8)2hAy~&@IMj}tJCGIqlhu9J}
zfv$gBMf~&Fgy7pKvCscV7Fkn9aDR&4IbbK{3%0X^?cBrsE5zHn<$h~+`XY2`5pV@}
zi#(Sa+@*D5nK1oztEC}nXvQKf-67hxqiRP8MN89rQ5eI?+qzm=k#vteg_9o1Nl)ct
zBa)MDL(-E<>?yaiKQ5OvGxHJbdyb4wp5k76?Ef>r_5%V(9i(aKD@Y~9NFxy+5TE*U
z=KsW|5-A1WOvbk0J)X?_J>m_Lzr?7Z|JfBjV6r>Kl;T0M;ouToRFXjByndZ-k-|$1ZBQ;mq
zDgCJ_X}@7itJWY1l1|s)b
z+tYIDlJ4T6K`dUyd>V@z*j8)6?_Z04hxpC6q#wFrwTzJl)>=fov6r>(WUZCN%S*Dr
zmgP>m@2Jt=*eq&VwBX`<9QjO>x$VcKB8d0vWF$41<=5M8${pUJ^2SHvwAXZY}zw8yB@3iBRW)DhsdfygUUI%5z_~ps?8kqRCdb6q&S&>j+}PC
zS<~*pZBhtzmk*`uxh%I28gu)Hti!g9e2cKh_6!JcGP2=0ttx{?+l6b{_<#=uk(
z&a-Yio9E^rwi9pdVIO(5m`Z<@CGf>YjOLq7QVA<^by!iAM!{QQ#
z;uzNTORvJrF0+}6Jn4v5Dl#dQ*ZwZ|4=%9{JrT~ZIb=2m{J|yRdQJMROm>W-yPzXi
zc}XsE3`s%fgw)iJCS$LykzU7eL(;sC1CD75UJ%JSxyrtd(wvKmV~8Jj7PuL5AmC7o
zg~!}yHeWc?Z$ELXtjPN$u?DgNkxDYgy5#a9HaTU;Xk{e0X^KVuZpdp&yIhn%OZk!9
zou6+|pOIbpqm;Ui_@NbwA?`6bx8RO!yrFdc>v5Rn*8?sR9^~Rh2)Qlt-GWE02@}HA
z++Sh2lV#Pgkl9C`bN(QA49}H|hflV|bNS^wi-EODwH!X8F*QD&HcF0_^j(hjk*tv`
z+ef6tq-qnyS)UWqBB+*hSqjiC1+mqYf;
zu#@9+W+5I2o4FSbo5W{(X5w(#tHjXm?OY)JBhM(=vVGLtQpaUz&=X)9Jex@6+YX9R
zZy+BQ^jcl-;a(fZyQ_zLZOlYjR+!^oJ!K{R!U;m~yO#`!VIt
z;LC`g)xW)+0ti1R&3xMRv~U0i+2A_jqYn^&p=Th+4y^R(EVp?RgZMTa`zPl4HXN<7
z!eO)$b!X)b^Sr%!D%t#-jJ}P{dl>yTk8^SMj3?fK=8H~H8E}RQ@N#~Dds;W6UT2Lv
zUj7fFm24Od5--LWTF&re-0>I=uk1E4aMcYsxMEa1!T%!FqEfp#os_QdJul`g^C!t$
z^iX+UQL%Xk^`Y0+%FRW?lqK^0qAE*lIy$7%+#~H{9#Ni@8^)|FiTNc}dyHSZ+84yL
z3%72(9r4sg$^43JMeZi~;EqFT@PtF4+?D2$a_ZPBN6ce*_jHN83$uht#0FeA#+w1s
z0CGZMg?xQ%snvZF)%jFFIPuL44adxpy<;;CF*BvTxH#qj3gl&S6454WiYH|x5^s6t
z`jOxcRrVhV5sT%k#S;w{+;NL2@nK1tlD;7vIc~Gcx*Bvd
zKYASuMtvc_9ha?~l9}W4-BFt;1M5}?yxBD(PrG_;KPLJ*4!>qB`nVcL_Ij7YYpPUt
zwm4OO`?vN#9
zp^oBo;l^b%AgU1Z;8H{J?A8=03Y8)+1)|vH2sv?B1$VS3ZA2r$#_tXESo()ihJp$
zV_dl0-7fjLhWKFa{Sclpd5kD)*6$fygcIE`V!eu|N){fEEe5|e5zcH!&3@sqXiWy%
z(8`)Ma!P5oJ1yYYSuI*^4{pV4B*m)j)rVeBJk=;SgTKe(@sy?Aq4#uJ#BF572@m>7
z%rqUp8xc4bioVO!CJE(2u`r6bK#~XdE0>D<$~Ix7hizuu@d#R#LxV%5jeI`9bB&u5
zR*EAd68REbA$Vls;5Y=+BTyLzY(J!x#{SjJ#fG!ng*87JXz1pleI9K-&|MG4!lM_VU;!W=c@i>
zK4qK4nShGPcrc=3BtmPEjS~kCtHal72?0U%g503Xy9>Fh1B{cOtgU>IroG<#Jt;9#-Ybw}y
ziFo5z%x{AI+j$pF7}!R^@HR3J$2JPmHd1CCN(q&!0u8AQWr8V%GXVaCh?35-TiuVa
zxp@OSEHv{!uV?R3t^3-m9Ku%C;m2bno>vHsU+_lGN@#Q?Ix2V4y}e5q|14v2c`CWg
z^Yh+_FA?2PPP^W2_xAF0)yM~CD3;+#43G4o4X=KTjm_dlmPg~kZ?*2;+ps|CXib8*
zFZKv+N`%^kGO&o&@TTSs*?nzg_+3}E^r5Cv3zx2FSkNR#PtK5*$(bE@POgtBUjD%H
zQL7tPJm6Tc&Gx2
zk*Dh?GC`Mt()eaj7>L*fia;X{==w>LBfyhz7zzAI;3CU({ZuR{0zEYs6#9QLfwq8F
zfx^z|Eug)+TR#m&Ga~yG0&@lE5zr=G|7Ji!pVjp<6`;MKJJ4l;?%Qp;|(8p}o@oDy!yz6v@qVrbKp~L$U6kpg|KaGo(j)ubMoBoko(=a}IMr9Z^
kqf)-muulof`bD?k-?T3*x)py7gS^%Mqn6(-PGpB7e>ZnbT5V^hQjjDe|q-RNvlDF-nuN&N99|GIp2BCgsh<
zweudG<$6ROySnZ4w`=$P_<3#5vvRTKYg^Reh&q2Nz)OgHUuKjZEn(JaQ}MZ7ThXOj
zZNFm@7oi3?)(tW@{mZDfxJEj4sdA|<){^mp|Av|$xF|&Y7Wzzyl52I74ZAJb{u(h#
zw(F9;aV&i<0O$-O^Un~U)t7j2D}ca;Kej<*ii5?tpxyL{s4FirEvdA(ifyGrJZBf?
z00J1eqRP}Hwxy`oQ!H+9dy_=5%N}~2QI^49t@_t0dy}B=iF4@FGLA+oL@a%YJxxy2
ze`fXnWj;WB{BY*aFh7X+%x_p{7mx2_{;fbrh-Y*C*VLJEfWBV&yF93WMcE=37=|mG
zL}nPh
zQ~LXF{G%h_6XK;yuA;CpX!n_fjy!j3r$j0*P?BtWPiVtNlknGyBto^#AdT2#%)i6@
z7wq}J%*|k6A={#}%;WrEbI_JnMT#*e*!&>#UlO13xcn~W
zfwW8>d5NWs%x_@nNI5Gue~I;$%j2>0@CsI`VSX8_WCSj%>a8nCet^DPzh~Jz)|<%u
zgRF<%f?k{d+HHom9%GrC^&;*k5TAZchGK2bj8$mdGJkms?9XA?Ee|N
z-E*hzOG6!0cmEop*q$bTo8R1qh%7Le-A>!|j2G!uf_yi{(A@0-35llaeI6Hm$`#GliHe>Xm_V;ZIIyqWIf0wrbUSWKd6x|akU=W1
zTVo&M(@x4hrU9BQvC?C**;1Y&EyHcr+RA(fYrQJhn^q0nO<@HcKusZ8%EsM-$04octAE|L@+{)+jX>~bdS
z6+mwSO~U1L%0bB+Y>92GGoAI!S+)=uR*=rI_;+Mbq_X{odk(JaxiDgwdIcAz#t2=jn%W<}z%GMW{pGbW2
zPi+112Bo#8OLu?ZVhREq%E~8Lc_1rakTz@fYzyziy6!BF;HI+pL&mBPwzjUqeuwxC
zgVK(6p+3t%8n*g~ctgXp7DtM3ks;sRAR6c3l-W#j$s6F$zfjM}Tv=$#
z@XBP4(Q;3Wbqo^EfFqb

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, 07/05 09:58, Sascha Silbe wrote:
> Dear Fam (or Zheng?),

Hi Sascha,

Zheng is the last name here. :)

> 
> Fam Zheng  writes:
> 
> > This was the only exceptional module init function that does something
> > else than a simple list of bdrv_register() calls, in all the block
> > drivers.
> >
> > The qcrypto_hash_supports is actually a static check, determined at
> > compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> > consistency.
> 
> Good idea.
> 
> 
> [block/Makefile.objs]
> > @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> > qcow2-snapshot.o qcow2-c
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> >  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> > -block-obj-y += quorum.o
> > +block-obj-$(CONFIG_GNUTLS_HASH) += quorum.o
> [...]
> [block/quorum.c]
> >  static void bdrv_quorum_init(void)
> >  {
> > -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> > -/* SHA256 hash support is required for quorum device */
> > -return;
> > -}
> >  bdrv_register(&bdrv_quorum);
> 
> The quorum driver needs SHA256 which was introduced in gnutls
> 2.11.1. However configure sets CONFIG_GNUTLS_HASH when gnutls 2.9.10+ is
> present. You should either bump the version in configure or add an
> explicit configure check for SHA256.

Yes, I just noticed commit 0c16c056a4f removed CONFIG_GNUTLS_HASH so I need to
rebase anyway (that commit also fixed this version requirement we have been
missing as you mentioned). Thanks for reviewing!

Fam

Re: [Qemu-devel] [RFC PATCH v0 1/5] cpu: Factor out cpu vmstate_[un]register into separate routines

On Tue, 5 Jul 2016 13:08:28 +0530
Bharata B Rao  wrote:

> On Tue, Jul 05, 2016 at 09:22:30AM +0200, Igor Mammedov wrote:
> > On Tue, 5 Jul 2016 12:05:35 +0530
> > Bharata B Rao  wrote:
> > 
> > > On Tue, Jul 05, 2016 at 07:49:38AM +0200, Igor Mammedov wrote:
> > > > On Tue, 5 Jul 2016 10:46:07 +0530
> > > > Bharata B Rao  wrote:
> > > > 
> > > > > On Tue, Jul 05, 2016 at 02:56:13PM +1000, David Gibson wrote:
> > > > > > On Tue, Jul 05, 2016 at 10:12:48AM +0530, Bharata B Rao
> > > > > > wrote:
> > > > > > > Consolidates cpu vmstate_[un]register calls into separate
> > > > > > > routines. No functionality change except that
> > > > > > > vmstate_unregister calls are now done
> > > > > > > under !CONFIG_USER_ONLY to match with vmstate_register
> > > > > > > calls.
> > > > > > > 
> > > > > > > Signed-off-by: Bharata B Rao 
> > > > > > 
> > > > > > Reviewed-by: David Gibson 
> > > > > > 
> > > > > > > ---
> > > > > > >  exec.c | 47
> > > > > > > --- 1 file
> > > > > > > changed, 28 insertions(+), 19 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/exec.c b/exec.c
> > > > > > > index 0122ef7..8ce8e90 100644
> > > > > > > --- a/exec.c
> > > > > > > +++ b/exec.c
> > > > > > > @@ -594,9 +594,7 @@ AddressSpace
> > > > > > > *cpu_get_address_space(CPUState *cpu, int asidx) /* Return
> > > > > > > the AddressSpace corresponding to the specified index */
> > > > > > > return cpu->cpu_ases[asidx].as; }
> > > > > > > -#endif
> > > > > > >  
> > > > > > > -#ifndef CONFIG_USER_ONLY
> > > > > > >  static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> > > > > > >  
> > > > > > >  static int cpu_get_free_index(Error **errp)
> > > > > > > @@ -617,6 +615,31 @@ static void
> > > > > > > cpu_release_index(CPUState *cpu) {
> > > > > > >  bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> > > > > > >  }
> > > > > > > +
> > > > > > > +static void cpu_vmstate_register(CPUState *cpu)
> > > > > > > +{
> > > > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > > > +
> > > > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > > > +vmstate_register(NULL, cpu->cpu_index,
> > > > > > > &vmstate_cpu_common, cpu);
> > > > > > > +}
> > > > > > > +if (cc->vmsd != NULL) {
> > > > > > > +vmstate_register(NULL, cpu->cpu_index, cc->vmsd,
> > > > > > > cpu);
> > > > > > > +}
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void cpu_vmstate_unregister(CPUState *cpu)
> > > > > > > +{
> > > > > > > +CPUClass *cc = CPU_GET_CLASS(cpu);
> > > > > > > +
> > > > > > > +if (cc->vmsd != NULL) {
> > > > > > > +vmstate_unregister(NULL, cc->vmsd, cpu);
> > > > > > > +}
> > > > > > > +if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
> > > > > > > +vmstate_unregister(NULL, &vmstate_cpu_common,
> > > > > > > cpu);
> > > > > > > +}
> > > > > > > +}
> > > > > > > +
> > > > > > 
> > > > > > Given you're factoring this out, would it make sense to
> > > > > > defined no-op versions for CONFIG_USER_ONLY, to reduce the
> > > > > > amount of ifdefs at the call site?
> > > > > 
> > > > > I did that in a subsequent patch that moved the calls to these
> > > > > routines into
> > > > > cpu_common_[un]realize()cpu_common_[un]realize(), but ended
> > > > > up in some unrelated issue and hence didn't include that
> > > > > patch yet.
> > > > I'd prefer to see it moved to cpu_common_[un]realize() directly
> > > > without tis intermediate transition as compat logic could be
> > > > implemented much cleaner if it's there.
> > > 
> > > If I implement cpu_common_unrealize() and the associated logic
> > > similar to the existing cpu_common_realize(), it would involve
> > > changes to all archs.
> > maybe I'm missing something but why all arch will be involved?
> > The only ones that could be affected are the ones that have their
> > own cpu_xxx_unrealize() implemented without calling
> > CPUClass::unrealize.
> 
> You are right, I didn't realize that most archs don't define their own
> cpu_xxx_unrealize call. So this should be a simple change. Will
> include a patch to move vmstate_[un]register() calls to
> cpu_common_[un]realize() in the next version.
just replace this patch with cpu_common_[un]realize() variant

> 
> Regards,
> Bharata.
>

[Qemu-devel] [PATCH v11 00/28] IOMMU: Enable interrupt remapping for Intel IOMMU

This is v11 of Intel IR work. It is rebased to mst's branch
"tags/for_upstream", commit:

  "278a2a2 vmw_pvscsi: remove unnecessary internal msi state flag"

This series mainly fixed several issues in v10 review comments, fixed
one bug with RHEL guests, added acked-by for Paolo, and a fresh new
rebase as mentioned above.

To make it fast, I only did quick tests for this versiont. But at
least it should cover basic functions like: IOAPIC, MSI, multiple
vcpus, different guests (4.7 upstream and rhel 7.2), vhost, split/off
irqchips. More tests to be done.

Meanwhile, there are several pending issues to be solved, which is
queued in my todo list and I'll continue the work after this series is
merged.

Online branch:

  https://github.com/xzpeter/qemu vtd-intr-v11

Please review, thanks.

v11 changes (using v10 patch index):
- patch 2: splitted into two patches, one to rename VTD_* macros, the
  other to provide x86_iommu_get_default(). [mst]
- patch 5: removing DMAR_REPORT_F_INTR, with comments [mst]
- patch 8: use size_t for ioapic_scope_size, removing
  ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC [mst]
- patch 13: better handle "subhandle" field: this is a problem I found
  in v10 when testing with rhel guests.
- patch 18: tiny tweaks to let QEMU build with --disable-kvm ("#ifdef
  CONFIG_KVM")
- patch 24: still using v10 of this patch, and dropping v10.2 version
  (so this patch content is unmodified from v10)
- new patch added: throw error when kernel irqchip=on is specified.
- put all new trace-events into specific directories
- add acked-by for Paolo (patch 16, 21-24, 26, 27)

v10 changes:
- Fix issue when specify more than 1 vcpus.  This is introduced in v9
  after rebased to Marcel's patches.  The problem is that, before
  Marcel's patch, we will first create IOMMU then IOAPIC, while the
  order is switched after Marcel's changes.  This affects patch 18
  ("register IOMMU IEC notifier for ioapic") and I need to do the
  registration after IOAPIC realization.
- Display readable error message if user specify more than one x86
  vIOMMU, rather than an assertion fail. (patch 2)
- Correct vtd iec notifier "global" parameter: if granularity bit is
  clear (not set), then it's a global invalidation (patch 17,
  inverted meaning for granularity).
- added one more patch (patch 26) to add some trace events for irqchip
  msi routes operations.
- rebase to latest master

v9 changes:
- addressed several possible acpi issue with BE machines, and comment
  fix [Igor]
- removed patch 16 in v8 since it's useless after rebasing to Marcel's
  patches
- move vtd_svt_mask into vtd_irte_get() and declare it as constant.
- rebase to latest master, with Marcel's "-device intel-iommu" patch v2
  - re-arrange patch order, moving x86-iommu to the beginning (so that
I can add "intremap" property for it, which can be further shared
by future AMD IOMMUs)
  - add device property "intremap" for X86 IOMMU device (new patch 4
in v9)
  - replace all existing references of MachineState.iommu_intr to
device property X86IOMMUState.intr_supported, removing
MachineState.iommu_intr
  - some other minor changes due to the rebase

v8 changes:
- rebase to latest master
- patch 7
  - remove VTD_IR_IOAPICEntry, which is useless now
  - fix possible issue on big endian machines for VTD_IRTE,
VTD_IR_MSIAddress
- patch 12
  - fix endianess issue with bit-field defines: fix BE issue with
VTD_MSIMessage, do cpu_to_*() or reverse when necessary on
bit-field uses.
- patch 19
  - used le32_to_cpu() for dest_id, and added my s-o-b line beneath
Jan's.

v7 changes (using v6 patch index):
- patch 10: trivial change in debug string (remove one more "\n")
- patch 17-18: ioapic remote irr patches, sent seperately
  already. So removed from this series.
- patch 24: 
  - fix commit message: only irqfd msi routes are maintained, not
all msi routes.
  - skip all IOAPIC msi entries (dev == NULL). We only need to
housekeep irqfd users.
- added patches
  - pick up Radim's patch on adding MHMV ecap bits [Radim]
- remove all vtd_* patches, instead, use x86-iommu ones at the first
  place. This introduced lots of patch order changes and content
  changes, which affected from original patch 8 to the end. Sorry!
  [Jan]

v6 changes:
- patch 10: use write_with_attrs() rather than write(), preparing
  for SID verification [Jan]
- patch 17-18: add r-b line from Radim [Radim]
- new patch 19: put together Jan's EIM patch [Jan]
- new patch 20: add SID validation process
- new patch 21-22: introduce X86IOMMU class, which is the parent of
  IntelIOMMU class. Patch 21 only introduce the class and did
  nothing, patch 22 cleaned up all the vtd_*() hooks into x86
  ones. This is only a start. In the future, we can abstract more
  things into X86IOMMU class, like iotlb, address spaces mgmt,
  etc. [Jan]
- new patch 23-25: this is to do IEC notify to all irqfd consumers
  like vhost/vfio. patch 23 changed interface for
  kvm_irqchip_add_msi_route(), p

[Qemu-devel] [PATCH v11 01/28] x86-iommu: introduce parent class

Introducing parent class for intel-iommu devices named "x86-iommu". This
is preparation work to abstract shared functionalities out from Intel
and AMD IOMMUs. Currently, only the parent class is introduced. It does
nothing yet.

Signed-off-by: Peter Xu 
---
 hw/i386/Makefile.objs |  2 +-
 hw/i386/intel_iommu.c |  5 ++--
 hw/i386/x86-iommu.c   | 53 +++
 include/hw/i386/intel_iommu.h |  3 ++-
 include/hw/i386/x86-iommu.h   | 46 +
 5 files changed, 105 insertions(+), 4 deletions(-)
 create mode 100644 hw/i386/x86-iommu.c
 create mode 100644 include/hw/i386/x86-iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..90e94ff 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
-obj-y += intel_iommu.o
+obj-y += x86-iommu.o intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 464f2a0..a430d7d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2061,17 +2061,18 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 static void vtd_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+X86IOMMUClass *x86_class = X86_IOMMU_CLASS(klass);
 
 dc->reset = vtd_reset;
-dc->realize = vtd_realize;
 dc->vmsd = &vtd_vmstate;
 dc->props = vtd_properties;
 dc->hotpluggable = false;
+x86_class->realize = vtd_realize;
 }
 
 static const TypeInfo vtd_info = {
 .name  = TYPE_INTEL_IOMMU_DEVICE,
-.parent= TYPE_SYS_BUS_DEVICE,
+.parent= TYPE_X86_IOMMU_DEVICE,
 .instance_size = sizeof(IntelIOMMUState),
 .class_init= vtd_class_init,
 };
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
new file mode 100644
index 000..d739afb
--- /dev/null
+++ b/hw/i386/x86-iommu.c
@@ -0,0 +1,53 @@
+/*
+ * QEMU emulation of common X86 IOMMU
+ *
+ * Copyright (C) 2016 Peter Xu, Red Hat 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
+
+static void x86_iommu_realize(DeviceState *dev, Error **errp)
+{
+X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
+if (x86_class->realize) {
+x86_class->realize(dev, errp);
+}
+}
+
+static void x86_iommu_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+dc->realize = x86_iommu_realize;
+}
+
+static const TypeInfo x86_iommu_info = {
+.name  = TYPE_X86_IOMMU_DEVICE,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(X86IOMMUState),
+.class_init= x86_iommu_class_init,
+.class_size= sizeof(X86IOMMUClass),
+.abstract  = true,
+};
+
+static void x86_iommu_register_types(void)
+{
+type_register_static(&x86_iommu_info);
+}
+
+type_init(x86_iommu_register_types)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b024ffa..680a0c4 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -23,6 +23,7 @@
 #define INTEL_IOMMU_H
 #include "hw/qdev.h"
 #include "sysemu/dma.h"
+#include "hw/i386/x86-iommu.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -90,7 +91,7 @@ struct VTDIOTLBEntry {
 
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
-SysBusDevice busdev;
+X86IOMMUState x86_iommu;
 MemoryRegion csrmem;
 uint8_t csr[DMAR_REG_SIZE]; /* register values */
 uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
new file mode 100644
index 000..924f39a
--- /dev/null
+++ b/include/hw/i386/x86-iommu.h
@@ -0,0 +1,46 @@
+/*
+ * Common IOMMU interface for X86 platform
+ *
+ * Copyright (C) 2016 Peter Xu, Red Hat 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be

[Qemu-devel] [PATCH v11 02/28] intel_iommu: rename VTD_PCI_DEVFN_MAX to x86-iommu

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 11 +++
 include/hw/i386/intel_iommu.h |  1 -
 include/hw/i386/x86-iommu.h   |  2 ++
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a430d7d..3ee5782 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -26,6 +26,8 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/i386/pc.h"
+#include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -192,7 +194,7 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
 
 VTD_DPRINTF(CACHE, "global context_cache_gen=1");
 while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
 vtd_as = vtd_bus->dev_as[devfn_it];
 if (!vtd_as) {
 continue;
@@ -964,7 +966,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState 
*s,
 vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
 if (vtd_bus) {
 devfn = VTD_SID_TO_DEVFN(source_id);
-for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
 vtd_as = vtd_bus->dev_as[devfn_it];
 if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
 VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
@@ -1916,7 +1918,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 
 if (!vtd_bus) {
 /* No corresponding free() */
-vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * 
VTD_PCI_DEVFN_MAX);
+vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \
+X86_IOMMU_PCI_DEVFN_MAX);
 vtd_bus->bus = bus;
 key = (uintptr_t)bus;
 g_hash_table_insert(s->vtd_as_by_busptr, &key, vtd_bus);
@@ -2032,7 +2035,7 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 IntelIOMMUState *s = opaque;
 VTDAddressSpace *vtd_as;
 
-assert(0 <= devfn && devfn <= VTD_PCI_DEVFN_MAX);
+assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
 
 vtd_as = vtd_find_add_as(s, bus, devfn);
 return &vtd_as->as;
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 680a0c4..0794309 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -35,7 +35,6 @@
 #define VTD_PCI_BUS_MAX 256
 #define VTD_PCI_SLOT_MAX32
 #define VTD_PCI_FUNC_MAX8
-#define VTD_PCI_DEVFN_MAX   256
 #define VTD_PCI_SLOT(devfn) (((devfn) >> 3) & 0x1f)
 #define VTD_PCI_FUNC(devfn) ((devfn) & 0x07)
 #define VTD_SID_TO_BUS(sid) (((sid) >> 8) & 0xff)
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 924f39a..fac693d 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -30,6 +30,8 @@
 #define  X86_IOMMU_GET_CLASS(obj) \
 OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
 
+#define X86_IOMMU_PCI_DEVFN_MAX   256
+
 typedef struct X86IOMMUState X86IOMMUState;
 typedef struct X86IOMMUClass X86IOMMUClass;
 
-- 
2.4.11

[Qemu-devel] [PATCH v11 03/28] x86-iommu: provide x86_iommu_get_default

Instead of searching the device tree every time, one static variable is
declared for the default system x86 IOMMU device.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c|  9 ++---
 hw/i386/x86-iommu.c | 23 +++
 include/hw/i386/x86-iommu.h |  6 ++
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b1adf04..1186401 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -52,7 +52,7 @@
 #include "hw/i386/ich9.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
-#include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -2598,12 +2598,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 
 static bool acpi_has_iommu(void)
 {
-bool ambiguous;
-Object *intel_iommu;
-
-intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
-   &ambiguous);
-return intel_iommu && !ambiguous;
+return !!x86_iommu_get_default();
 }
 
 static
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index d739afb..f395139 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -21,6 +21,28 @@
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "qemu/error-report.h"
+
+/* Default X86 IOMMU device */
+static X86IOMMUState *x86_iommu_default = NULL;
+
+static void x86_iommu_set_default(X86IOMMUState *x86_iommu)
+{
+assert(x86_iommu);
+
+if (x86_iommu_default) {
+error_report("QEMU does not support multiple vIOMMUs "
+ "for x86 yet.");
+exit(1);
+}
+
+x86_iommu_default = x86_iommu;
+}
+
+X86IOMMUState *x86_iommu_get_default(void)
+{
+return x86_iommu_default;
+}
 
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
@@ -28,6 +50,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 if (x86_class->realize) {
 x86_class->realize(dev, errp);
 }
+x86_iommu_set_default(X86_IOMMU_DEVICE(dev));
 }
 
 static void x86_iommu_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index fac693d..b2401a6 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -45,4 +45,10 @@ struct X86IOMMUState {
 SysBusDevice busdev;
 };
 
+/**
+ * x86_iommu_get_default - get default IOMMU device
+ * @return: pointer to default IOMMU device
+ */
+X86IOMMUState *x86_iommu_get_default(void);
+
 #endif
-- 
2.4.11

[Qemu-devel] [PATCH v11 08/28] intel_iommu: set IR bit for ECAP register

Enable IR in IOMMU Extended Capability register.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 6 ++
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 8ea408d..3d99544 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1958,6 +1958,8 @@ static AddressSpace *vtd_find_add_as(X86IOMMUState 
*x86_iommu, PCIBus *bus,
  */
 static void vtd_init(IntelIOMMUState *s)
 {
+X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
 memset(s->csr, 0, DMAR_REG_SIZE);
 memset(s->wmask, 0, DMAR_REG_SIZE);
 memset(s->w1cmask, 0, DMAR_REG_SIZE);
@@ -1979,6 +1981,10 @@ static void vtd_init(IntelIOMMUState *s)
  VTD_CAP_SAGAW | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS;
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
+if (x86_iommu->intr_supported) {
+s->ecap |= VTD_ECAP_IR;
+}
+
 vtd_reset_context_cache(s);
 vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b648e69..5b98a11 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,8 @@
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO(DMAR_IOTLB_REG_OFFSET << 4)
 #define VTD_ECAP_QI (1ULL << 1)
+/* Interrupt Remapping support */
+#define VTD_ECAP_IR (1ULL << 3)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
-- 
2.4.11

[Qemu-devel] [PATCH v11 04/28] x86-iommu: q35: generalize find_add_as()

Remove VT-d calls in common q35 codes. Instead, we provide a general
find_add_as() for x86-iommu type.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 15 ---
 include/hw/i386/intel_iommu.h |  5 -
 include/hw/i386/x86-iommu.h   |  3 +++
 3 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3ee5782..2ac79ab 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1910,8 +1910,10 @@ static Property vtd_properties[] = {
 };
 
 
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
+static AddressSpace *vtd_find_add_as(X86IOMMUState *x86_iommu, PCIBus *bus,
+ int devfn)
 {
+IntelIOMMUState *s = (IntelIOMMUState *)x86_iommu;
 uintptr_t key = (uintptr_t)bus;
 VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key);
 VTDAddressSpace *vtd_dev_as;
@@ -1939,7 +1941,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 address_space_init(&vtd_dev_as->as,
&vtd_dev_as->iommu, "intel_iommu");
 }
-return vtd_dev_as;
+return &vtd_dev_as->as;
 }
 
 /* Do the initialization. It will also be called when reset, so pay
@@ -2032,13 +2034,11 @@ static void vtd_reset(DeviceState *dev)
 
 static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
-IntelIOMMUState *s = opaque;
-VTDAddressSpace *vtd_as;
+X86IOMMUState *x86_iommu = opaque;
+X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(x86_iommu);
 
 assert(0 <= devfn && devfn <= X86_IOMMU_PCI_DEVFN_MAX);
-
-vtd_as = vtd_find_add_as(s, bus, devfn);
-return &vtd_as->as;
+return x86_class->find_add_as(x86_iommu, bus, devfn);
 }
 
 static void vtd_realize(DeviceState *dev, Error **errp)
@@ -2071,6 +2071,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 dc->props = vtd_properties;
 dc->hotpluggable = false;
 x86_class->realize = vtd_realize;
+x86_class->find_add_as = vtd_find_add_as;
 }
 
 static const TypeInfo vtd_info = {
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0794309..e36b896 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -125,9 +125,4 @@ struct IntelIOMMUState {
 VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by 
bus number */
 };
 
-/* Find the VTD Address space associated with the given bus pointer,
- * create a new one if none exists
- */
-VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
-
 #endif
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index b2401a6..581da16 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -21,6 +21,7 @@
 #define IOMMU_COMMON_H
 
 #include "hw/sysbus.h"
+#include "exec/memory.h"
 
 #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
 #define  X86_IOMMU_DEVICE(obj) \
@@ -39,6 +40,8 @@ struct X86IOMMUClass {
 SysBusDeviceClass parent;
 /* Intel/AMD specific realize() hook */
 DeviceRealize realize;
+/* Find/Add IOMMU address space for specific PCI device */
+AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
 };
 
 struct X86IOMMUState {
-- 
2.4.11

[Qemu-devel] [PATCH v11 05/28] x86-iommu: introduce "intremap" property

Adding one property for intel-iommu devices to specify whether we should
support interrupt remapping. By default, IR is disabled. To enable it,
we should use (take Intel IOMMU as example):

  -device intel_iommu,intremap=on

This property can be shared by Intel and future AMD IOMMUs.

Signed-off-by: Peter Xu 
---
 hw/i386/x86-iommu.c | 23 +++
 include/hw/i386/x86-iommu.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index f395139..4280839 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -59,9 +59,32 @@ static void x86_iommu_class_init(ObjectClass *klass, void 
*data)
 dc->realize = x86_iommu_realize;
 }
 
+static bool x86_iommu_intremap_prop_get(Object *o, Error **errp)
+{
+X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+return s->intr_supported;
+}
+
+static void x86_iommu_intremap_prop_set(Object *o, bool value, Error **errp)
+{
+X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+s->intr_supported = value;
+}
+
+static void x86_iommu_instance_init(Object *o)
+{
+X86IOMMUState *s = X86_IOMMU_DEVICE(o);
+
+/* By default, do not support IR */
+s->intr_supported = false;
+object_property_add_bool(o, "intremap", x86_iommu_intremap_prop_get,
+ x86_iommu_intremap_prop_set, NULL);
+}
+
 static const TypeInfo x86_iommu_info = {
 .name  = TYPE_X86_IOMMU_DEVICE,
 .parent= TYPE_SYS_BUS_DEVICE,
+.instance_init = x86_iommu_instance_init,
 .instance_size = sizeof(X86IOMMUState),
 .class_init= x86_iommu_class_init,
 .class_size= sizeof(X86IOMMUClass),
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 581da16..10779c1 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -46,6 +46,7 @@ struct X86IOMMUClass {
 
 struct X86IOMMUState {
 SysBusDevice busdev;
+bool intr_supported;/* Whether vIOMMU supports IR */
 };
 
 /**
-- 
2.4.11

[Qemu-devel] [PATCH v11 07/28] intel_iommu: allow queued invalidation for IR

Queued invalidation is required for IR. This patch add basic support for
interrupt cache invalidate requests. Since we currently have no IR cache
implemented yet, we can just skip all interrupt cache invalidation
requests for now.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 9 +
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2ac79ab..8ea408d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1404,6 +1404,15 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+case VTD_INV_DESC_IEC:
+VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
+"not implemented yet");
+/*
+ * Since currently we do not cache interrupt entries, we can
+ * just mark this descriptor as "good" and move on.
+ */
+break;
+
 default:
 VTD_DPRINTF(GENERAL, "error: unkonw Invalidation Descriptor type "
 "hi 0x%"PRIx64 " lo 0x%"PRIx64 " type %"PRIu8,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e5f514c..b648e69 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -286,6 +286,8 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_TYPE   0xf
 #define VTD_INV_DESC_CC 0x1 /* Context-cache Invalidate Desc */
 #define VTD_INV_DESC_IOTLB  0x2
+#define VTD_INV_DESC_IEC0x4 /* Interrupt Entry Cache
+   Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT   0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_NONE   0   /* Not an Invalidate Descriptor */
 
-- 
2.4.11

[Qemu-devel] [PATCH v11 06/28] acpi: enable INTR for DMAR report structure

In ACPI DMA remapping report structure, enable INTR flag when specified.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c  | 14 +-
 include/hw/i386/intel_iommu.h |  2 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1186401..773b20e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -59,6 +59,7 @@
 
 #include "qapi/qmp/qint.h"
 #include "qom/qom-qobject.h"
+#include "hw/i386/x86-iommu.h"
 
 #include "hw/acpi/ipmi.h"
 
@@ -2513,6 +2514,10 @@ build_mcfg_q35(GArray *table_data, BIOSLinker *linker, 
AcpiMcfgInfo *info)
 build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
 }
 
+/*
+ * VT-d spec 8.1 DMA Remapping Reporting Structure
+ * (version Oct. 2014 or later)
+ */
 static void
 build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 {
@@ -2520,10 +2525,17 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 
 AcpiTableDmar *dmar;
 AcpiDmarHardwareUnit *drhd;
+uint8_t dmar_flags = 0;
+X86IOMMUState *iommu = x86_iommu_get_default();
+
+assert(iommu);
+if (iommu->intr_supported) {
+dmar_flags |= 0x1;  /* Flags: 0x1: INT_REMAP */
+}
 
 dmar = acpi_data_push(table_data, sizeof(*dmar));
 dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
-dmar->flags = 0;/* No intr_remap for now */
+dmar->flags = dmar_flags;
 
 /* DMAR Remapping Hardware Unit Definition structure */
 drhd = acpi_data_push(table_data, sizeof(*drhd));
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index e36b896..638d77f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -44,6 +44,8 @@
 #define VTD_HOST_ADDRESS_WIDTH  39
 #define VTD_HAW_MASK((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
 
+#define DMAR_REPORT_F_INTR  (1)
+
 typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct IntelIOMMUState IntelIOMMUState;
-- 
2.4.11

[Qemu-devel] [PATCH v11 11/28] intel_iommu: handle interrupt remap enable

Handle writting to IRE bit in global command register.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c793031..a12091e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1183,6 +1183,22 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool 
en)
 }
 }
 
+/* Handle Interrupt Remap Enable/Disable */
+static void vtd_handle_gcmd_ire(IntelIOMMUState *s, bool en)
+{
+VTD_DPRINTF(CSR, "Interrupt Remap Enable %s", (en ? "on" : "off"));
+
+if (en) {
+s->intr_enabled = true;
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRES);
+} else {
+s->intr_enabled = false;
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_IRES, 0);
+}
+}
+
 /* Handle write to Global Command Register */
 static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 {
@@ -1207,6 +1223,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 /* Set/update the interrupt remapping root-table pointer */
 vtd_handle_gcmd_sirtp(s);
 }
+if (changed & VTD_GCMD_IRE) {
+/* Interrupt remap enable/disable */
+vtd_handle_gcmd_ire(s, val & VTD_GCMD_IRE);
+}
 }
 
 /* Handle write to Context Command Register */
-- 
2.4.11

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

2016-07-05 Thread Alberto Garcia

On Tue 05 Jul 2016 09:58:25 AM CEST, Sascha Silbe wrote:

> The quorum driver needs SHA256 which was introduced in gnutls 2.11.1.

Are you sure about that?

* Version 1.7.4 (released 2007-02-05)

[...]

** API and ABI modifications:
GNUTLS_MAC_SHA256,
GNUTLS_MAC_SHA384,
GNUTLS_MAC_SHA512: New gnutls_mac_algorithm_t values.
GNUTLS_DIG_SHA256,
GNUTLS_DIG_SHA384,
GNUTLS_DIG_SHA512: New gnutls_digest_algorithm_t values.
GNUTLS_SIGN_RSA_SHA256,
GNUTLS_SIGN_RSA_SHA384,
GNUTLS_SIGN_RSA_SHA512: New gnutls_sign_algorithm_t values.

https://gitlab.com/gnutls/gnutls/blob/master/NEWS#L6570

Berto

[Qemu-devel] [PATCH v11 10/28] intel_iommu: define interrupt remap table addr register

Defined Interrupt Remap Table Address register to store IR table
pointer. Also, do proper handling on global command register writes to
store table pointer and its size.

One more debug flag "DEBUG_IR" is added for interrupt remapping.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 52 +-
 hw/i386/intel_iommu_internal.h |  4 
 include/hw/i386/intel_iommu.h  |  5 
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3d99544..c793031 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -33,7 +33,7 @@
 #ifdef DEBUG_INTEL_IOMMU
 enum {
 DEBUG_GENERAL, DEBUG_CSR, DEBUG_INV, DEBUG_MMU, DEBUG_FLOG,
-DEBUG_CACHE,
+DEBUG_CACHE, DEBUG_IR,
 };
 #define VTD_DBGBIT(x)   (1 << DEBUG_##x)
 static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
@@ -903,6 +903,19 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
+{
+uint64_t value = 0;
+value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
+s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
+s->intr_root = value & VTD_IRTA_ADDR_MASK;
+
+/* TODO: invalidate interrupt entry cache */
+
+VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
+s->intr_root, s->intr_size);
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
 s->context_cache_gen++;
@@ -1141,6 +1154,16 @@ static void vtd_handle_gcmd_srtp(IntelIOMMUState *s)
 vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
 }
 
+/* Set Interrupt Remap Table Pointer */
+static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
+{
+VTD_DPRINTF(CSR, "set Interrupt Remap Table Pointer");
+
+vtd_interrupt_remap_table_setup(s);
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
+}
+
 /* Handle Translation Enable/Disable */
 static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
 {
@@ -1180,6 +1203,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 /* Queued Invalidation Enable */
 vtd_handle_gcmd_qie(s, val & VTD_GCMD_QIE);
 }
+if (val & VTD_GCMD_SIRTP) {
+/* Set/update the interrupt remapping root-table pointer */
+vtd_handle_gcmd_sirtp(s);
+}
 }
 
 /* Handle write to Context Command Register */
@@ -1841,6 +1868,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 vtd_update_fsts_ppf(s);
 break;
 
+case DMAR_IRTA_REG:
+VTD_DPRINTF(IR, "DMAR_IRTA_REG write addr 0x%"PRIx64
+", size %d, val 0x%"PRIx64, addr, size, val);
+if (size == 4) {
+vtd_set_long(s, addr, val);
+} else {
+vtd_set_quad(s, addr, val);
+}
+break;
+
+case DMAR_IRTA_REG_HI:
+VTD_DPRINTF(IR, "DMAR_IRTA_REG_HI write addr 0x%"PRIx64
+", size %d, val 0x%"PRIx64, addr, size, val);
+assert(size == 4);
+vtd_set_long(s, addr, val);
+break;
+
 default:
 VTD_DPRINTF(GENERAL, "error: unhandled reg write addr 0x%"PRIx64
 ", size %d, val 0x%"PRIx64, addr, size, val);
@@ -2034,6 +2078,12 @@ static void vtd_init(IntelIOMMUState *s)
 /* Fault Recording Registers, 128-bit */
 vtd_define_quad(s, DMAR_FRCD_REG_0_0, 0, 0, 0);
 vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
+
+/*
+ * Interrupt remapping registers, not support extended interrupt
+ * mode for now.
+ */
+vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5b98a11..309833f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -172,6 +172,10 @@
 #define VTD_RTADDR_RTT  (1ULL << 11)
 #define VTD_RTADDR_ADDR_MASK(VTD_HAW_MASK ^ 0xfffULL)
 
+/* IRTA_REG */
+#define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_SIZE_MASK  (0xfULL)
+
 /* ECAP_REG */
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO(DMAR_IOTLB_REG_OFFSET << 4)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 638d77f..83d1905 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -125,6 +125,11 @@ struct IntelIOMMUState {
 MemoryRegionIOMMUOps iommu_ops;
 GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
reference */
 VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by 
bus number */
+
+/* interrupt remapping */
+bool intr_enabled;  /* Whether guest enabled IR */
+dma_addr_t intr_root;   /* Interrupt remapping table poi

Re: [Qemu-devel] [PATCH] char: do not use atexit cleanup handler

2016-07-05 Thread Paolo Bonzini



On 04/07/2016 21:53, Gerd Hoffmann wrote:
>   Hi,
> 
 What about graphics threads ? In particular I'd be thinking of spice
 which uses threads and chardevs.
>>>
>>> I think it should be quiesced after pause_all_vcpus returns.  Marc-André
>>> should know, but it's better to check with Gerd.
>>
>> In theory, spice_server_vm_stop() should be called at this point,
> 
> Yes, that should handle the qxl worker thread.
> 
>> and all chardev in spice are stopped too there, as well as the qxl
>> worker processing thread (although the thread is not joined here
>> neither..).
> 
> The chardevs are handled in iothread context anyway, so I don't think
> they need any special care.

Writes to chardevs can be done from non-iothread threads, but SPICE
doesn't do that.

Paolo

[Qemu-devel] [PATCH v11 16/28] ioapic: introduce ioapic_entry_parse() helper

Abstract IOAPIC entry parsing logic into a helper function.

Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c | 110 +++
 1 file changed, 54 insertions(+), 56 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 36dd42a..c4469e4 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -50,18 +50,56 @@ static IOAPICCommonState *ioapics[MAX_IOAPICS];
 /* global variable from ioapic_common.c */
 extern int ioapic_no;
 
+struct ioapic_entry_info {
+/* fields parsed from IOAPIC entries */
+uint8_t masked;
+uint8_t trig_mode;
+uint16_t dest_idx;
+uint8_t dest_mode;
+uint8_t delivery_mode;
+uint8_t vector;
+
+/* MSI message generated from above parsed fields */
+uint32_t addr;
+uint32_t data;
+};
+
+static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
+{
+bzero(info, sizeof(*info));
+info->masked = (entry >> IOAPIC_LVT_MASKED_SHIFT) & 1;
+info->trig_mode = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
+/*
+ * By default, this would be dest_id[8] + reserved[8]. When IR
+ * is enabled, this would be interrupt_index[15] +
+ * interrupt_format[1]. This field never means anything, but
+ * only used to generate corresponding MSI.
+ */
+info->dest_idx = (entry >> IOAPIC_LVT_DEST_IDX_SHIFT) & 0x;
+info->dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
+info->delivery_mode = (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) \
+& IOAPIC_DM_MASK;
+if (info->delivery_mode == IOAPIC_DM_EXTINT) {
+info->vector = pic_read_irq(isa_pic);
+} else {
+info->vector = entry & IOAPIC_VECTOR_MASK;
+}
+
+info->addr = APIC_DEFAULT_ADDRESS | \
+(info->dest_idx << MSI_ADDR_DEST_IDX_SHIFT) | \
+(info->dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+info->data = (info->vector << MSI_DATA_VECTOR_SHIFT) | \
+(info->trig_mode << MSI_DATA_TRIGGER_SHIFT) | \
+(info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+}
+
 static void ioapic_service(IOAPICCommonState *s)
 {
 AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
-uint32_t addr, data;
+struct ioapic_entry_info info;
 uint8_t i;
-uint8_t trig_mode;
-uint8_t vector;
-uint8_t delivery_mode;
 uint32_t mask;
 uint64_t entry;
-uint16_t dest_idx;
-uint8_t dest_mode;
 
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
 mask = 1 << i;
@@ -69,33 +107,18 @@ static void ioapic_service(IOAPICCommonState *s)
 int coalesce = 0;
 
 entry = s->ioredtbl[i];
-if (!(entry & IOAPIC_LVT_MASKED)) {
-trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-/*
- * By default, this would be dest_id[8] +
- * reserved[8]. When IR is enabled, this would be
- * interrupt_index[15] + interrupt_format[1]. This
- * field never means anything, but only used to
- * generate corresponding MSI.
- */
-dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
-dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
-delivery_mode =
-(entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
-if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+ioapic_entry_parse(entry, &info);
+if (!info.masked) {
+if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
 s->irr &= ~mask;
 } else {
 coalesce = s->ioredtbl[i] & IOAPIC_LVT_REMOTE_IRR;
 s->ioredtbl[i] |= IOAPIC_LVT_REMOTE_IRR;
 }
-if (delivery_mode == IOAPIC_DM_EXTINT) {
-vector = pic_read_irq(isa_pic);
-} else {
-vector = entry & IOAPIC_VECTOR_MASK;
-}
+
 #ifdef CONFIG_KVM
 if (kvm_irqchip_is_split()) {
-if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
 kvm_set_irq(kvm_state, i, 1);
 kvm_set_irq(kvm_state, i, 0);
 } else {
@@ -112,13 +135,7 @@ static void ioapic_service(IOAPICCommonState *s)
  * the IOAPIC message into a MSI one, and its
  * address space will decide whether we need a
  * translation. */
-addr = APIC_DEFAULT_ADDRESS | \
-(dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
-(dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
-data = (vector << MSI_DATA_VECTOR_SHIFT) |
-(trig_mode << MSI_DATA_TRIGGER_SHIFT) |
-(delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
-stl_le_phys(ioapic_as, addr, data);
+stl_le_phys(ioa

[Qemu-devel] [PATCH v11 12/28] intel_iommu: define several structs for IOMMU IR

Several data structs are defined to better support the rest of the
patches: IRTE to parse remapping table entries, and IOAPIC/MSI related
structure bits to parse interrupt entries to be filled in by guest
kernel.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h | 74 +++
 1 file changed, 74 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 83d1905..9a898c1 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -52,6 +52,8 @@ typedef struct IntelIOMMUState IntelIOMMUState;
 typedef struct VTDAddressSpace VTDAddressSpace;
 typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
+typedef union VTD_IRTE VTD_IRTE;
+typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -90,6 +92,78 @@ struct VTDIOTLBEntry {
 bool write_flags;
 };
 
+/* Interrupt Remapping Table Entry Definition */
+union VTD_IRTE {
+struct {
+#ifdef HOST_WORDS_BIGENDIAN
+uint32_t dest_id:32; /* Destination ID */
+uint32_t __reserved_1:8; /* Reserved 1 */
+uint32_t vector:8;   /* Interrupt Vector */
+uint32_t irte_mode:1;/* IRTE Mode */
+uint32_t __reserved_0:3; /* Reserved 0 */
+uint32_t __avail:4;  /* Available spaces for software */
+uint32_t delivery_mode:3;/* Delivery Mode */
+uint32_t trigger_mode:1; /* Trigger Mode */
+uint32_t redir_hint:1;   /* Redirection Hint */
+uint32_t dest_mode:1;/* Destination Mode */
+uint32_t fault_disable:1;/* Fault Processing Disable */
+uint32_t present:1;  /* Whether entry present/available */
+#else
+uint32_t present:1;  /* Whether entry present/available */
+uint32_t fault_disable:1;/* Fault Processing Disable */
+uint32_t dest_mode:1;/* Destination Mode */
+uint32_t redir_hint:1;   /* Redirection Hint */
+uint32_t trigger_mode:1; /* Trigger Mode */
+uint32_t delivery_mode:3;/* Delivery Mode */
+uint32_t __avail:4;  /* Available spaces for software */
+uint32_t __reserved_0:3; /* Reserved 0 */
+uint32_t irte_mode:1;/* IRTE Mode */
+uint32_t vector:8;   /* Interrupt Vector */
+uint32_t __reserved_1:8; /* Reserved 1 */
+uint32_t dest_id:32; /* Destination ID */
+#endif
+uint16_t source_id:16;   /* Source-ID */
+#ifdef HOST_WORDS_BIGENDIAN
+uint64_t __reserved_2:44;/* Reserved 2 */
+uint64_t sid_vtype:2;/* Source-ID Validation Type */
+uint64_t sid_q:2;/* Source-ID Qualifier */
+#else
+uint64_t sid_q:2;/* Source-ID Qualifier */
+uint64_t sid_vtype:2;/* Source-ID Validation Type */
+uint64_t __reserved_2:44;/* Reserved 2 */
+#endif
+} QEMU_PACKED;
+uint64_t data[2];
+};
+
+#define VTD_IR_INT_FORMAT_COMPAT (0) /* Compatible Interrupt */
+#define VTD_IR_INT_FORMAT_REMAP  (1) /* Remappable Interrupt */
+
+/* Programming format for MSI/MSI-X addresses */
+union VTD_IR_MSIAddress {
+struct {
+#ifdef HOST_WORDS_BIGENDIAN
+uint32_t __head:12;  /* Should always be: 0x0fee */
+uint32_t index_l:15; /* Interrupt index bit 14-0 */
+uint32_t int_mode:1; /* Interrupt format */
+uint32_t sub_valid:1;/* SHV: Sub-Handle Valid bit */
+uint32_t index_h:1;  /* Interrupt index bit 15 */
+uint32_t __not_care:2;
+#else
+uint32_t __not_care:2;
+uint32_t index_h:1;  /* Interrupt index bit 15 */
+uint32_t sub_valid:1;/* SHV: Sub-Handle Valid bit */
+uint32_t int_mode:1; /* Interrupt format */
+uint32_t index_l:15; /* Interrupt index bit 14-0 */
+uint32_t __head:12;  /* Should always be: 0x0fee */
+#endif
+} QEMU_PACKED;
+uint32_t data;
+};
+
+/* When IR is enabled, all MSI/MSI-X data bits should be zero */
+#define VTD_IR_MSI_DATA  (0)
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
 X86IOMMUState x86_iommu;
-- 
2.4.11

[Qemu-devel] [PATCH v11 09/28] acpi: add DMAR scope definition for root IOAPIC

To enable interrupt remapping for intel IOMMU device, each IOAPIC device
in the system reported via ACPI MADT must be explicitly enumerated under
one specific remapping hardware unit. This patch adds the root-complex
IOAPIC into the default DMAR device.

Please refer to VT-d spec 8.3.1.1 for more information.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c| 20 +---
 include/hw/acpi/acpi-defs.h | 13 +
 include/hw/pci-host/q35.h   |  8 
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 773b20e..a26a4bb 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -81,6 +81,9 @@
 #define ACPI_BUILD_DPRINTF(fmt, ...)
 #endif
 
+/* Default IOAPIC ID */
+#define ACPI_BUILD_IOAPIC_ID 0x0
+
 typedef struct AcpiMcfgInfo {
 uint64_t mcfg_base;
 uint32_t mcfg_size;
@@ -384,7 +387,6 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
PCMachineState *pcms)
 io_apic = acpi_data_push(table_data, sizeof *io_apic);
 io_apic->type = ACPI_APIC_IO;
 io_apic->length = sizeof(*io_apic);
-#define ACPI_BUILD_IOAPIC_ID 0x0
 io_apic->io_apic_id = ACPI_BUILD_IOAPIC_ID;
 io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
 io_apic->interrupt = cpu_to_le32(0);
@@ -2527,6 +2529,9 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 AcpiDmarHardwareUnit *drhd;
 uint8_t dmar_flags = 0;
 X86IOMMUState *iommu = x86_iommu_get_default();
+AcpiDmarDeviceScope *scope = NULL;
+/* Root complex IOAPIC use one path[0] only */
+size_t ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);
 
 assert(iommu);
 if (iommu->intr_supported) {
@@ -2538,13 +2543,22 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
 dmar->flags = dmar_flags;
 
 /* DMAR Remapping Hardware Unit Definition structure */
-drhd = acpi_data_push(table_data, sizeof(*drhd));
+drhd = acpi_data_push(table_data, sizeof(*drhd) + ioapic_scope_size);
 drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
-drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
+drhd->length = cpu_to_le16(sizeof(*drhd) + ioapic_scope_size);
 drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
 drhd->pci_segment = cpu_to_le16(0);
 drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
 
+/* Scope definition for the root-complex IOAPIC. See VT-d spec
+ * 8.3.1 (version Oct. 2014 or later). */
+scope = &drhd->scope[0];
+scope->entry_type = 0x03;   /* Type: 0x03 for IOAPIC */
+scope->length = ioapic_scope_size;
+scope->enumeration_id = ACPI_BUILD_IOAPIC_ID;
+scope->bus = Q35_PSEUDO_BUS_PLATFORM;
+scope->path[0] = cpu_to_le16(Q35_PSEUDO_DEVFN_IOAPIC);
+
 build_header(linker, table_data, (void *)(table_data->data + dmar_start),
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index ea9be0b..41c1d95 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -571,6 +571,18 @@ enum {
 /*
  * Sub-structures for DMAR
  */
+
+/* Device scope structure for DRHD. */
+struct AcpiDmarDeviceScope {
+uint8_t entry_type;
+uint8_t length;
+uint16_t reserved;
+uint8_t enumeration_id;
+uint8_t bus;
+uint16_t path[0];   /* list of dev:func pairs */
+} QEMU_PACKED;
+typedef struct AcpiDmarDeviceScope AcpiDmarDeviceScope;
+
 /* Type 0: Hardware Unit Definition */
 struct AcpiDmarHardwareUnit {
 uint16_t type;
@@ -579,6 +591,7 @@ struct AcpiDmarHardwareUnit {
 uint8_t reserved;
 uint16_t pci_segment;   /* The PCI Segment associated with this unit */
 uint64_t address;   /* Base address of remapping hardware register-set */
+AcpiDmarDeviceScope scope[0];
 } QEMU_PACKED;
 typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit;
 
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index 0d64032..94486fd 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -179,4 +179,12 @@ typedef struct Q35PCIHost {
 
 uint64_t mch_mcfg_base(void);
 
+/*
+ * Arbitary but unique BNF number for IOAPIC device.
+ *
+ * TODO: make sure there would have no conflict with real PCI bus
+ */
+#define Q35_PSEUDO_BUS_PLATFORM (0xff)
+#define Q35_PSEUDO_DEVFN_IOAPIC (0x00)
+
 #endif /* HW_Q35_H */
-- 
2.4.11

Re: [Qemu-devel] [PULL 0/8] ipxe: update submodule from 4e03af8ec to 041863191

2016-07-05 Thread Paolo Bonzini



On 04/07/2016 14:58, Gerd Hoffmann wrote:
> On Mo, 2016-07-04 at 13:38 +0100, Peter Maydell wrote:
>> On 4 July 2016 at 07:30, Gerd Hoffmann  wrote:
>>>   Hi,
>>>
>>> Here comes the ipxe update for 2.7, rebasing the ipxe module to latest
>>> master and also adding boot roms for e1000e and vmxnet3.
>>>
>>> v2: two incremental tweaks to make sure the two new roms are installed
>>> properly.
>>>
>>> please pull,
>>>   Gerd
>>>
>>> The following changes since commit c7288767523f6510cf557707d3eb5e78e519b90d:
>>>
>>>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160623' 
>>> into staging (2016-06-23 11:53:14 +0100)
>>>
>>> are available in the git repository at:
>>>
>>>
>>>   git://git.kraxel.org/qemu tags/pull-ipxe-20160704-1
>>>
>>> for you to fetch changes up to 8df42d855c38a1b23b6ba9f38ab71b9d7fb24216:
>>>
>>>   build: add pc-bios to config-host.mak deps (2016-07-01 13:31:44 +0200)
>>>
>>> 
>>> ipxe: update submodule from 4e03af8ec to 041863191
>>
>> This submodule commit isn't present in the repo on git.qemu.org:
>>
>> $ git submodule update
>> fatal: reference is not a tree: 04186319181298083ef28695a8309028b26fe83c
>> Unable to checkout '04186319181298083ef28695a8309028b26fe83c' in
>> submodule path 'roms/ipxe'
> 
> Hmm, seems our mirror is a few commits behind.
> 
> # git log --oneline origin/master..ipxe.org/master 
> fda8916 [dhcpv6] Include RFC5970 client architecture options in DHCPv6
> requests
> 3d9f094 [dhcp] Allow for variable encapsulation of architecture-specific
> options
> 3bb61c3 [pxe] Disable interrupts on the PIC before starting NBP
> c22da4b [bios] Do not enable interrupts when printing to the console
> c9f6a86 [efi] Fix uninitialised data in HII IFR structures
> 0418631 [thunderx] Fix compilation with older versions of gcc
> 632e57f [efi] Do not copy garbage bytes into SNP device path MAC address
> 
> Commit 0418631 is dated Wed Jun 22, almost two weeks ago.  Last commit
> in our mirror is dated from Jun 20th.
> 
> Stefan?  Did something broke between 20th and 22th?  Or are our sync
> intervals that big?  In case of the later:  Can we make them smaller,
> like once per day or so?

It seems to have updated now.

Paolo

[Qemu-devel] [PATCH v11 18/28] x86-iommu: introduce IEC notifiers

This patch introduces x86 IOMMU IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate
request, all the registered units will be notified with specific
invalidation requests.

Intel IOMMU is the first provider that generates such a event.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 36 +---
 hw/i386/intel_iommu_internal.h | 24 
 hw/i386/trace-events   |  3 +++
 hw/i386/x86-iommu.c| 29 +
 include/hw/i386/x86-iommu.h| 40 
 5 files changed, 121 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 71b274d..a79c5c1 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -904,6 +904,12 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
+   uint32_t index, uint32_t mask)
+{
+x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), global, index, mask);
+}
+
 static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 {
 uint64_t value = 0;
@@ -911,7 +917,8 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState 
*s)
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
 s->intr_root = value & VTD_IRTA_ADDR_MASK;
 
-/* TODO: invalidate interrupt entry cache */
+/* Notify global invalidation */
+vtd_iec_notify_all(s, true, 0, 0);
 
 VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
 s->intr_root, s->intr_size);
@@ -1413,6 +1420,21 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
 return true;
 }
 
+static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+VTD_DPRINTF(INV, "inv ir glob %d index %d mask %d",
+inv_desc->iec.granularity,
+inv_desc->iec.index,
+inv_desc->iec.index_mask);
+
+vtd_iec_notify_all(s, !inv_desc->iec.granularity,
+   inv_desc->iec.index,
+   inv_desc->iec.index_mask);
+
+return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
 VTDInvDesc inv_desc;
@@ -1453,12 +1475,12 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 break;
 
 case VTD_INV_DESC_IEC:
-VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
-"not implemented yet");
-/*
- * Since currently we do not cache interrupt entries, we can
- * just mark this descriptor as "good" and move on.
- */
+VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
+"Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+inv_desc.hi, inv_desc.lo);
+if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
+return false;
+}
 break;
 
 default:
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e1a08cb..10c20fe 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -296,12 +296,28 @@ typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX   0xUL
 
+/* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
+struct VTDInvDescIEC {
+uint32_t type:4;/* Should always be 0x4 */
+uint32_t granularity:1; /* If set, it's global IR invalidation */
+uint32_t resved_1:22;
+uint32_t index_mask:5;  /* 2^N for continuous int invalidation */
+uint32_t index:16;  /* Start index to invalidate */
+uint32_t reserved_2:16;
+};
+typedef struct VTDInvDescIEC VTDInvDescIEC;
+
 /* Queued Invalidation Descriptor */
-struct VTDInvDesc {
-uint64_t lo;
-uint64_t hi;
+union VTDInvDesc {
+struct {
+uint64_t lo;
+uint64_t hi;
+};
+union {
+VTDInvDescIEC iec;
+};
 };
-typedef struct VTDInvDesc VTDInvDesc;
+typedef union VTDInvDesc VTDInvDesc;
 
 /* Masks for struct VTDInvDesc */
 #define VTD_INV_DESC_TYPE   0xf
diff --git a/hw/i386/trace-events b/hw/i386/trace-events
index ea77bc2..b4882c1 100644
--- a/hw/i386/trace-events
+++ b/hw/i386/trace-events
@@ -10,3 +10,6 @@ xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV 
Device MMIO space (ad
 # hw/i386/pc.c
 mhp_pc_dimm_assigned_slot(int slot) "0x%d"
 mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+
+# hw/i386/x86-iommu.c
+x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC 
invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 4280839..ce26b2a 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -22,6 +22,33 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "qemu/err

[Qemu-devel] [PATCH v11 13/28] intel_iommu: add IR translation faults defines

Adding translation fault definitions for interrupt remapping. Please
refer to VT-d spec section 7.1.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu_internal.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 309833f..2a9987f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -271,6 +271,19 @@ typedef enum VTDFaultReason {
  * context-entry.
  */
 VTD_FR_CONTEXT_ENTRY_TT,
+
+/* Interrupt remapping transition faults */
+VTD_FR_IR_REQ_RSVD = 0x20, /* One or more IR request reserved
+* fields set */
+VTD_FR_IR_INDEX_OVER = 0x21, /* Index value greater than max */
+VTD_FR_IR_ENTRY_P = 0x22,/* Present (P) not set in IRTE */
+VTD_FR_IR_ROOT_INVAL = 0x23, /* IR Root table invalid */
+VTD_FR_IR_IRTE_RSVD = 0x24,  /* IRTE Rsvd field non-zero with
+  * Present flag set */
+VTD_FR_IR_REQ_COMPAT = 0x25, /* Encountered compatible IR
+  * request while disabled */
+VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
+
 /* This is not a normal fault reason. We use this to indicate some faults
  * that are not referenced by the VT-d specification.
  * Fault event with such reason should not be recorded.
-- 
2.4.11

Re: [Qemu-devel] [PATCH v2 3/6] x86: fill high bits of mtrr mask

* Eduardo Habkost (ehabk...@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:06PM +0100, Dr. David Alan Gilbert (git) wrote:
> [...]
> > @@ -2084,6 +2085,27 @@ static int kvm_get_msrs(X86CPU *cpu)
> >  }
> >  
> >  assert(ret == cpu->kvm_msr_buf->nmsrs);
> > +/*
> > + * MTRR masks: Each mask consists of 5 parts
> > + * a  10..0: must be zero
> > + * b  11   : valid bit
> > + * c n-1.12: actual mask bits
> > + * d  51..n: reserved must be zero
> > + * e  63.52: reserved must be zero
> > + *
> > + * 'n' is the number of physical bits supported by the CPU and is
> > + * apparently always <= 52.   We know our 'n' but don't know what
> > + * the destinations 'n' is; it might be smaller, in which case
> > + * it masks (c) on loading. It might be larger, in which case
> > + * we fill 'd' so that d..c is consistent irrespetive of the 'n'
> > + * we're migrating to.
> > + */
> > +if (cpu->fill_mtrr_mask && cpu->phys_bits < 52) {
> > +mtrr_top_bits = MAKE_64BIT_MASK(cpu->phys_bits, 52 - 
> > cpu->phys_bits);
> > +} else {
> > +mtrr_top_bits = 0;
> 
> How/where did you find this 52-bit limit? Is it documented
> somewhere?

It seems to come from AMDs original specification of AMD64; but you're
right we could do with a constant rather than the magical 52 everywhere.

Looking in AMD doc 24593 Rev 3.26 (AMD64 Arch Programmer's manual vol. 2)
p.191 Fig 7.6 MTRRphysBasen Register it shows it as PhysBase running from 51:32
and 63:52 being Reserved/MBZ;
The corresponding Intel doc (Intel 64 & IA-32 Architectures Dev manual 3A 11-25
fig 11-7) doesn't have that limit shown; however it does talk about 52-bit
physical addresses in a few places; e.g. 4.4 PAE paging talks about
'paging translates 32-bit linear addresses to 52-bit physical addresses'
I think the most relevant place in the Intel doc is 5.13.3 'Reserved Bit 
Checking'
which has a
  Table 5-8 'IA-32e Mode Page Level Protection Matrix with Execute-Disable Bit 
Capability Enabled'
this is a table of reserved bit fields and for each of the
page tables it shows bits checked as [51:MAXPHYADDR].
Any suggestions for a name for the 52 constant? I guess MaxMaxPhyAddress?

I guess someone decided that 4PB ought to be enough for anyone.

Dave

> 
> > +}
> > +
> >  for (i = 0; i < ret; i++) {
> >  uint32_t index = msrs[i].index;
> >  switch (index) {
> > @@ -2279,7 +2301,8 @@ static int kvm_get_msrs(X86CPU *cpu)
> >  break;
> >  case MSR_MTRRphysBase(0) ... MSR_MTRRphysMask(MSR_MTRRcap_VCNT - 
> > 1):
> >  if (index & 1) {
> > -env->mtrr_var[MSR_MTRRphysIndex(index)].mask = 
> > msrs[i].data;
> > +env->mtrr_var[MSR_MTRRphysIndex(index)].mask = 
> > msrs[i].data |
> > +   
> > mtrr_top_bits;
> >  } else {
> >  env->mtrr_var[MSR_MTRRphysIndex(index)].base = 
> > msrs[i].data;
> >  }
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v11 19/28] ioapic: register IOMMU IEC notifier for ioapic

Let IOAPIC the first consumer of x86 IOMMU IEC invalidation
notifiers. This is only used for split irqchip case, when vIOMMU
receives IR invalidation requests, IOAPIC will be notified to update
kernel irq routes. For simplicity, we just update all IOAPIC routes,
even if the invalidated entries are not IOAPIC ones.

Since now we are creating IOMMUs using "-device" parameter, IOMMU
device will be created after IOAPIC.  We need to do the registration
after machine done by leveraging machine_done notifier.

Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c  | 31 +++
 include/hw/i386/ioapic_internal.h |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index c4469e4..819585d 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -31,6 +31,7 @@
 #include "sysemu/kvm.h"
 #include "target-i386/cpu.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/x86-iommu.h"
 
 //#define DEBUG_IOAPIC
 
@@ -198,6 +199,16 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 #endif
 }
 
+#ifdef CONFIG_KVM
+static void ioapic_iec_notifier(void *private, bool global,
+uint32_t index, uint32_t mask)
+{
+IOAPICCommonState *s = (IOAPICCommonState *)private;
+/* For simplicity, we just update all the routes */
+ioapic_update_kvm_routes(s);
+}
+#endif
+
 void ioapic_eoi_broadcast(int vector)
 {
 IOAPICCommonState *s;
@@ -354,6 +365,24 @@ static const MemoryRegionOps ioapic_io_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
+static void ioapic_machine_done_notify(Notifier *notifier, void *data)
+{
+#ifdef CONFIG_KVM
+IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
+machine_done);
+
+if (kvm_irqchip_is_split()) {
+X86IOMMUState *iommu = x86_iommu_get_default();
+if (iommu) {
+/* Register this IOAPIC with IOMMU IEC notifier, so that
+ * when there are IR invalidates, we can be notified to
+ * update kernel IR cache. */
+x86_iommu_iec_register_notifier(iommu, ioapic_iec_notifier, s);
+}
+}
+#endif
+}
+
 static void ioapic_realize(DeviceState *dev, Error **errp)
 {
 IOAPICCommonState *s = IOAPIC_COMMON(dev);
@@ -364,6 +393,8 @@ static void ioapic_realize(DeviceState *dev, Error **errp)
 qdev_init_gpio_in(dev, ioapic_set_irq, IOAPIC_NUM_PINS);
 
 ioapics[ioapic_no] = s;
+s->machine_done.notify = ioapic_machine_done_notify;
+qemu_add_machine_init_done_notifier(&s->machine_done);
 }
 
 static void ioapic_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/i386/ioapic_internal.h 
b/include/hw/i386/ioapic_internal.h
index 31dafb3..84e3deb 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -25,6 +25,7 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 #include "hw/sysbus.h"
+#include "qemu/notify.h"
 
 #define MAX_IOAPICS 1
 
@@ -107,6 +108,7 @@ struct IOAPICCommonState {
 uint8_t ioregsel;
 uint32_t irr;
 uint64_t ioredtbl[IOAPIC_NUM_PINS];
+Notifier machine_done;
 };
 
 void ioapic_reset_common(DeviceState *dev);
-- 
2.4.11

[Qemu-devel] [PATCH v11 14/28] intel_iommu: Add support for PCI MSI remap

This patch enables interrupt remapping for PCI devices.

To play the trick, one memory region "iommu_ir" is added as child region
of the original iommu memory region, covering range 0xfeeX (which is
the address range for APIC). All the writes to this range will be taken
as MSI, and translation is carried out only when IR is enabled.

Idea suggested by Paolo Bonzini.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 251 +
 hw/i386/intel_iommu_internal.h |   2 +
 include/hw/i386/intel_iommu.h  |  66 +++
 3 files changed, 319 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a12091e..90bf9e9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1982,6 +1982,252 @@ static Property vtd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+/* Read IRTE entry with specific index */
+static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
+VTD_IRTE *entry)
+{
+dma_addr_t addr = 0x00;
+
+addr = iommu->intr_root + index * sizeof(*entry);
+if (dma_memory_read(&address_space_memory, addr, entry,
+sizeof(*entry))) {
+VTD_DPRINTF(GENERAL, "error: fail to access IR root at 0x%"PRIx64
+" + %"PRIu16, iommu->intr_root, index);
+return -VTD_FR_IR_ROOT_INVAL;
+}
+
+if (!entry->present) {
+VTD_DPRINTF(GENERAL, "error: present flag not set in IRTE"
+" entry index %u value 0x%"PRIx64 " 0x%"PRIx64,
+index, le64_to_cpu(entry->data[1]),
+le64_to_cpu(entry->data[0]));
+return -VTD_FR_IR_ENTRY_P;
+}
+
+if (entry->__reserved_0 || entry->__reserved_1 || \
+entry->__reserved_2) {
+VTD_DPRINTF(GENERAL, "error: IRTE entry index %"PRIu16
+" reserved fields non-zero: 0x%"PRIx64 " 0x%"PRIx64,
+index, le64_to_cpu(entry->data[1]),
+le64_to_cpu(entry->data[0]));
+return -VTD_FR_IR_IRTE_RSVD;
+}
+
+/*
+ * TODO: Check Source-ID corresponds to SVT (Source Validation
+ * Type) bits
+ */
+
+return 0;
+}
+
+/* Fetch IRQ information of specific IR index */
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq 
*irq)
+{
+VTD_IRTE irte;
+int ret = 0;
+
+bzero(&irte, sizeof(irte));
+
+ret = vtd_irte_get(iommu, index, &irte);
+if (ret) {
+return ret;
+}
+
+irq->trigger_mode = irte.trigger_mode;
+irq->vector = irte.vector;
+irq->delivery_mode = irte.delivery_mode;
+/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+#define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
+#define  VTD_IR_APIC_DEST_SHIFT(8)
+irq->dest = (le32_to_cpu(irte.dest_id) & VTD_IR_APIC_DEST_MASK) >> \
+VTD_IR_APIC_DEST_SHIFT;
+irq->dest_mode = irte.dest_mode;
+irq->redir_hint = irte.redir_hint;
+
+VTD_DPRINTF(IR, "remapping interrupt index %d: trig:%u,vec:%u,"
+"deliver:%u,dest:%u,dest_mode:%u", index,
+irq->trigger_mode, irq->vector, irq->delivery_mode,
+irq->dest, irq->dest_mode);
+
+return 0;
+}
+
+/* Generate one MSI message from VTDIrq info */
+static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
+{
+VTD_MSIMessage msg = {};
+
+/* Generate address bits */
+msg.dest_mode = irq->dest_mode;
+msg.redir_hint = irq->redir_hint;
+msg.dest = irq->dest;
+msg.__addr_head = cpu_to_le32(0xfee);
+/* Keep this from original MSI address bits */
+msg.__not_used = irq->msi_addr_last_bits;
+
+/* Generate data bits */
+msg.vector = irq->vector;
+msg.delivery_mode = irq->delivery_mode;
+msg.level = 1;
+msg.trigger_mode = irq->trigger_mode;
+
+msg_out->address = msg.msi_addr;
+msg_out->data = msg.msi_data;
+}
+
+/* Interrupt remapping for MSI/MSI-X entry */
+static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated)
+{
+int ret = 0;
+VTD_IR_MSIAddress addr;
+uint16_t index;
+VTDIrq irq = {0};
+
+assert(origin && translated);
+
+if (!iommu || !iommu->intr_enabled) {
+goto do_not_translate;
+}
+
+if (origin->address & VTD_MSI_ADDR_HI_MASK) {
+VTD_DPRINTF(GENERAL, "error: MSI addr high 32 bits nonzero"
+" during interrupt remapping: 0x%"PRIx32,
+(uint32_t)((origin->address & VTD_MSI_ADDR_HI_MASK) >> \
+VTD_MSI_ADDR_HI_SHIFT));
+return -VTD_FR_IR_REQ_RSVD;
+}
+
+addr.data = origin->address & VTD_MSI_ADDR_LO_MASK;
+if (le16_to_cpu(addr.__head) != 0xfee) {
+VTD_DPRINTF(GENERAL, "error: MSI addr low 32 bits invalid: "
+"0x%"PRIx32, addr.data);
+return -VTD_FR_IR_REQ_RSVD;
+}
+
+/*

Re: [Qemu-devel] [PATCH v2 4/6] x86: Set physical address bits based on host

* Eduardo Habkost (ehabk...@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:07PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > A special case based on the previous phys-bits property; if it's
> > the magic value 0 then use the hosts capabilities.
> > 
> > This becomes the default on new machine types.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  include/hw/i386/pc.h |  5 +
> >  target-i386/cpu.c| 36 +++-
> >  2 files changed, 40 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > index d85e924..bf31609 100644
> > --- a/include/hw/i386/pc.h
> > +++ b/include/hw/i386/pc.h
> > @@ -379,6 +379,11 @@ bool e820_get_entry(int, uint32_t, uint64_t *, 
> > uint64_t *);
> >  .driver = TYPE_X86_CPU,\
> >  .property = "fill-mtrr-mask",\
> >  .value = "off",\
> > +},\
> > +{\
> > +.driver = TYPE_X86_CPU,\
> > +.property = "phys-bits",\
> > +.value = "40",\
> >  },
> >  
> >  #define PC_COMPAT_2_5 \
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 5737aba..d45d2a6 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -2957,6 +2957,40 @@ static void x86_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> > & CPUID_EXT2_AMD_ALIASES);
> >  }
> >  
> > +/* For 64bit systems think about the number of physical bits to 
> > present.
> > + * ideally this should be the same as the host; anything other than 
> > matching
> > + * the host can cause incorrect guest behaviour.
> > + * QEMU used to pick the magic value of 40 bits that corresponds to
> > + * consumer AMD devices but nothing esle.
> > + */
> > +if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> > +uint32_t eax;
> > +/* Read the hosts physical address size, and compare it to what we
> > + * were asked for; note old machine types default to 40 bits
> > + */
> > +uint32_t host_phys_bits = 0;
> > +host_cpuid(0x8000, 0, &eax, NULL, NULL, NULL);
> > +if (eax >= 0x8008) {
> > +host_cpuid(0x8008, 0, &eax, NULL, NULL, NULL);
> > +/* Note: According to AMD doc 25481 rev 2.34 they have a field
> > + * at 23:16 that can specify a maximum physical address bits 
> > for
> > + * the guest that can override this value; but I've not seen
> > + * anything with that set.
> > + */
> > +host_phys_bits = eax & 0xff;
> > +} else {
> > +/* It's an odd 64 bit machine that doesn't have the leaf for
> > + * physical address bits; fall back to 36 that's most older 
> > Intel.
> > + */
> > +host_phys_bits = 36;
> > +}
> 
> Why do we need to calculate host_phys_bits when phys_bits is
> already set? Shouldn't we put all the code above after the "if
> (cpu->phys_bits)" check?

Because I reuse host_phys_bits to generate the warning if you've
explicitly set phys-bits and it doesn't match the host.

> > +
> > +if (cpu->phys_bits == 0) {
> > +/* The user asked for us to use the host physical bits */
> > +cpu->phys_bits = host_phys_bits;
> > +
> > +}
> > +}
> >  
> >  cpu_exec_init(cs, &error_abort);
> >  
> > @@ -3259,7 +3293,7 @@ static Property x86_cpu_properties[] = {
> >  DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
> >  DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> >  DEFINE_PROP_BOOL("fill-mtrr-mask", X86CPU, fill_mtrr_mask, true),
> > -DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 40),
> > +DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0),
> 
> I would put this part (that sets the default to 0) and the
> PC_COMPAT_2_6 part in a separate patch. This way we can include
> the mechanism for setting phys-bits=0 even if we didn't reach a
> conclusion about the proper pc-2.7 default yet.

Will do.

Dave

> 
> >  DEFINE_PROP_UINT32("level", X86CPU, env.cpuid_level, 0),
> >  DEFINE_PROP_UINT32("xlevel", X86CPU, env.cpuid_xlevel, 0),
> >  DEFINE_PROP_UINT32("xlevel2", X86CPU, env.cpuid_xlevel2, 0),
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v11 20/28] intel_iommu: Add support for Extended Interrupt Mode

From: Jan Kiszka 

As neither QEMU nor KVM support more than 255 CPUs so far, this is
simple: we only need to switch the destination ID translation in
vtd_remap_irq_get if EIME is set.

Once CFI support is there, it will have to take EIM into account as
well. So far, nothing to do for this.

This patch allows to use x2APIC in split irqchip mode of KVM.

Signed-off-by: Jan Kiszka 
[use le32_to_cpu() to retrieve dest_id]
Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 16 +---
 hw/i386/intel_iommu_internal.h |  2 ++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a79c5c1..506d7cf 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -916,6 +916,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState 
*s)
 value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
 s->intr_root = value & VTD_IRTA_ADDR_MASK;
+s->intr_eime = value & VTD_IRTA_EIME;
 
 /* Notify global invalidation */
 vtd_iec_notify_all(s, true, 0, 0);
@@ -2060,11 +2061,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
uint16_t index, VTDIrq *irq
 irq->trigger_mode = irte.trigger_mode;
 irq->vector = irte.vector;
 irq->delivery_mode = irte.delivery_mode;
-/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+irq->dest = le32_to_cpu(irte.dest_id);
+if (!iommu->intr_eime) {
 #define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
 #define  VTD_IR_APIC_DEST_SHIFT(8)
-irq->dest = (le32_to_cpu(irte.dest_id) & VTD_IR_APIC_DEST_MASK) >> \
-VTD_IR_APIC_DEST_SHIFT;
+irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
+VTD_IR_APIC_DEST_SHIFT;
+}
 irq->dest_mode = irte.dest_mode;
 irq->redir_hint = irte.redir_hint;
 
@@ -2326,7 +2329,7 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
 if (x86_iommu->intr_supported) {
-s->ecap |= VTD_ECAP_IR;
+s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
 }
 
 vtd_reset_context_cache(s);
@@ -2380,10 +2383,9 @@ static void vtd_init(IntelIOMMUState *s)
 vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
 
 /*
- * Interrupt remapping registers, not support extended interrupt
- * mode for now.
+ * Interrupt remapping registers.
  */
-vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
+vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf80fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 10c20fe..72b0114 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,7 @@
 
 /* IRTA_REG */
 #define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_EIME   (1ULL << 11)
 #define VTD_IRTA_SIZE_MASK  (0xfULL)
 
 /* ECAP_REG */
@@ -184,6 +185,7 @@
 #define VTD_ECAP_QI (1ULL << 1)
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR (1ULL << 3)
+#define VTD_ECAP_EIM(1ULL << 4)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 3bca390..2fdca5b 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -271,6 +271,7 @@ struct IntelIOMMUState {
 bool intr_enabled;  /* Whether guest enabled IR */
 dma_addr_t intr_root;   /* Interrupt remapping table pointer */
 uint32_t intr_size; /* Number of IR table entries */
+bool intr_eime; /* Extended interrupt mode enabled */
 };
 
 #endif
-- 
2.4.11

[Qemu-devel] [PATCH v11 15/28] q35: ioapic: add support for emulated IOAPIC IR

This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.

Currently, only emulated IOAPIC is supported.

Idea suggested by Jan Kiszka and Rita Sinha in the following patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c |  6 +-
 hw/i386/pc.c  |  3 +++
 hw/intc/ioapic.c  | 28 
 include/hw/i386/apic-msidef.h |  1 +
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/pc.h  |  4 
 6 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 90bf9e9..7cc6d18 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -28,6 +28,7 @@
 #include "hw/i386/pc.h"
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
+#include "hw/pci-host/q35.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2379,7 +2380,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 
 static void vtd_realize(DeviceState *dev, Error **errp)
 {
-PCIBus *bus = PC_MACHINE(qdev_get_machine())->bus;
+PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+PCIBus *bus = pcms->bus;
 IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
 
 VTD_DPRINTF(GENERAL, "");
@@ -2395,6 +2397,8 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 vtd_init(s);
 sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
 pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+/* Pseudo address space under root PCI bus. */
+pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index cd1745e..4c67bae 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1375,6 +1375,9 @@ void pc_memory_init(PCMachineState *pcms,
 rom_add_option(option_rom[i].name, option_rom[i].bootindex);
 }
 pcms->fw_cfg = fw_cfg;
+
+/* Init default IOAPIC address space */
+pcms->ioapic_as = &address_space_memory;
 }
 
 qemu_irq pc_allocate_cpu_irq(void)
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 273bb08..36dd42a 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -29,6 +29,8 @@
 #include "hw/i386/ioapic_internal.h"
 #include "include/hw/pci/msi.h"
 #include "sysemu/kvm.h"
+#include "target-i386/cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 //#define DEBUG_IOAPIC
 
@@ -50,13 +52,15 @@ extern int ioapic_no;
 
 static void ioapic_service(IOAPICCommonState *s)
 {
+AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+uint32_t addr, data;
 uint8_t i;
 uint8_t trig_mode;
 uint8_t vector;
 uint8_t delivery_mode;
 uint32_t mask;
 uint64_t entry;
-uint8_t dest;
+uint16_t dest_idx;
 uint8_t dest_mode;
 
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
@@ -67,7 +71,14 @@ static void ioapic_service(IOAPICCommonState *s)
 entry = s->ioredtbl[i];
 if (!(entry & IOAPIC_LVT_MASKED)) {
 trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-dest = entry >> IOAPIC_LVT_DEST_SHIFT;
+/*
+ * By default, this would be dest_id[8] +
+ * reserved[8]. When IR is enabled, this would be
+ * interrupt_index[15] + interrupt_format[1]. This
+ * field never means anything, but only used to
+ * generate corresponding MSI.
+ */
+dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
 dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
 delivery_mode =
 (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
@@ -97,8 +108,17 @@ static void ioapic_service(IOAPICCommonState *s)
 #else
 (void)coalesce;
 #endif
-apic_deliver_irq(dest, dest_mode, delivery_mode, vector,
- trig_mode);
+/* No matter whether IR is enabled, we translate
+ * the IOAPIC message into a MSI one, and its
+ * address space will decide whether we need a
+ * translation. */
+addr = APIC_DEFAULT_ADDRESS | \
+(dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
+(dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+data = (vector << MSI_DATA_VECTOR_SHIFT) |
+(trig_mode << MSI_DATA_TRIGGER_SHIFT) |
+(delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+stl_le_phys(ioapic_as, addr, data);
 }
 }
 }
diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
index 6e2eb71..

[Qemu-devel] [PATCH v11 22/28] kvm-irqchip: simplify kvm_irqchip_add_msi_route

Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.

Signed-off-by: Peter Xu 
Reviewed-by: Paolo Bonzini 
---
 hw/i386/kvm/pci-assign.c |  8 ++--
 hw/misc/ivshmem.c|  3 +--
 hw/vfio/pci.c| 11 +--
 hw/virtio/virtio-pci.c   |  9 +++--
 include/sysemu/kvm.h | 13 -
 kvm-all.c| 18 --
 kvm-stub.c   |  2 +-
 target-i386/kvm.c|  3 +--
 8 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 3623aa1..82b752d 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -973,10 +973,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 }
 
 if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-MSIMessage msg = msi_get_message(pci_dev, 0);
 int virq;
 
-virq = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev);
 if (virq < 0) {
 perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
 return;
@@ -1041,7 +1040,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 uint16_t entries_nr = 0;
 int i, r = 0;
 MSIXTableEntry *entry = adev->msix_table;
-MSIMessage msg;
 
 /* Get the usable entry number for allocating */
 for (i = 0; i < adev->msix_max; i++, entry++) {
@@ -1078,9 +1076,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 continue;
 }
 
-msg.address = entry->addr_lo | ((uint64_t)entry->addr_hi << 32);
-msg.data = entry->data;
-r = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev);
 if (r < 0) {
 return r;
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index c4dde3a..8512523 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -441,13 +441,12 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int 
vector,
  Error **errp)
 {
 PCIDevice *pdev = PCI_DEVICE(s);
-MSIMessage msg = msix_get_message(pdev, vector);
 int ret;
 
 IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
 assert(!s->msi_vectors[vector].pdev);
 
-ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
+ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
 if (ret < 0) {
 error_setg(errp, "kvm_irqchip_add_msi_route failed");
 return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 44783c5..93455e5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -417,11 +417,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-  MSIMessage *msg, bool msix)
+  int vector_n, bool msix)
 {
 int virq;
 
-if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi) || !msg) {
+if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi)) {
 return;
 }
 
@@ -429,7 +429,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
-virq = kvm_irqchip_add_msi_route(kvm_state, *msg, &vdev->pdev);
+virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev);
 if (virq < 0) {
 event_notifier_cleanup(&vector->kvm_interrupt);
 return;
@@ -495,7 +495,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 vfio_update_kvm_msi_virq(vector, *msg, pdev);
 }
 } else {
-vfio_add_kvm_msi_virq(vdev, vector, msg, true);
+vfio_add_kvm_msi_virq(vdev, vector, nr, true);
 }
 
 /*
@@ -639,7 +639,6 @@ retry:
 
 for (i = 0; i < vdev->nr_vectors; i++) {
 VFIOMSIVector *vector = &vdev->msi_vectors[i];
-MSIMessage msg = msi_get_message(&vdev->pdev, i);
 
 vector->vdev = vdev;
 vector->virq = -1;
@@ -656,7 +655,7 @@ retry:
  * Attempt to enable route through KVM irqchip,
  * default to userspace handling if unavailable.
  */
-vfio_add_kvm_msi_virq(vdev, vector, &msg, false);
+vfio_add_kvm_msi_virq(vdev, vector, i, false);
 }
 
 /* Set interrupt type prior to possible interrupts */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 2b34b43..cbdfd59 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -699,14 +699,13 @@ static uint32_t virtio_read_config(PCIDevice *pci_dev,
 
 static int kvm_virtio_p

[Qemu-devel] [PATCH v11 17/28] intel_iommu: add support for split irqchip

In split irqchip mode, IOAPIC is working in user space, only update
kernel irq routes when entry changed. When IR is enabled, we directly
update the kernel with translated messages. It works just like a kernel
cache for the remapping entries.

Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.

And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.

Reviewed-by: Paolo Bonzini 
[move trace-events lines into target-i386/trace-events]
Signed-off-by: Peter Xu 
---
 Makefile.objs |  1 +
 hw/i386/intel_iommu.c |  7 +++
 include/hw/i386/intel_iommu.h |  1 +
 include/hw/i386/x86-iommu.h   |  5 +
 target-i386/kvm.c | 27 +++
 target-i386/trace-events  |  4 
 6 files changed, 45 insertions(+)
 create mode 100644 target-i386/trace-events

diff --git a/Makefile.objs b/Makefile.objs
index 7f1f0a3..6d5ddcf 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -153,6 +153,7 @@ trace-events-y += hw/alpha/trace-events
 trace-events-y += ui/trace-events
 trace-events-y += audio/trace-events
 trace-events-y += net/trace-events
+trace-events-y += target-i386/trace-events
 trace-events-y += target-sparc/trace-events
 trace-events-y += target-s390x/trace-events
 trace-events-y += target-ppc/trace-events
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7cc6d18..71b274d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2167,6 +2167,12 @@ do_not_translate:
 return 0;
 }
 
+static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
+ MSIMessage *dst, uint16_t sid)
+{
+return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu), src, dst);
+}
+
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
uint64_t *data, unsigned size,
MemTxAttrs attrs)
@@ -2412,6 +2418,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 dc->hotpluggable = false;
 x86_class->realize = vtd_realize;
 x86_class->find_add_as = vtd_find_add_as;
+x86_class->int_remap = vtd_int_remap;
 }
 
 static const TypeInfo vtd_info = {
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b3f17d7..3bca390 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -26,6 +26,7 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
+#include "hw/sysbus.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 10779c1..b419ae5 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -22,6 +22,7 @@
 
 #include "hw/sysbus.h"
 #include "exec/memory.h"
+#include "hw/pci/pci.h"
 
 #define  TYPE_X86_IOMMU_DEVICE  ("x86-iommu")
 #define  X86_IOMMU_DEVICE(obj) \
@@ -32,6 +33,7 @@
 OBJECT_GET_CLASS(X86IOMMUClass, obj, TYPE_X86_IOMMU_DEVICE)
 
 #define X86_IOMMU_PCI_DEVFN_MAX   256
+#define X86_IOMMU_SID_INVALID (0x)
 
 typedef struct X86IOMMUState X86IOMMUState;
 typedef struct X86IOMMUClass X86IOMMUClass;
@@ -42,6 +44,9 @@ struct X86IOMMUClass {
 DeviceRealize realize;
 /* Find/Add IOMMU address space for specific PCI device */
 AddressSpace *(*find_add_as)(X86IOMMUState *s, PCIBus *bus, int devfn);
+/* MSI-based interrupt remapping */
+int (*int_remap)(X86IOMMUState *iommu, MSIMessage *src,
+ MSIMessage *dst, uint16_t sid);
 };
 
 struct X86IOMMUState {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f3698f1..bfa40b2 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -35,6 +35,7 @@
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/intel_iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -42,6 +43,7 @@
 #include "hw/pci/msi.h"
 #include "migration/migration.h"
 #include "exec/memattrs.h"
+#include "trace.h"
 
 //#define DEBUG_KVM
 
@@ -3323,6 +3325,31 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t 
dev_id)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev)
 {
+X86IOMMUState *iommu = x86_iommu_get_default();
+
+if (iommu) {
+int ret;
+MSIMessage src, dst;
+X86IOMMUClass *class = X86_IOMMU_GET_CLASS(iommu);
+
+src.address = route->u.msi.address_hi;
+src.address <<= VTD_MSI_ADDR_HI_SHIFT;
+src.address |= route->u.msi.address_lo;
+src.data = route->u.msi.data;
+
+ret = cla

[Qemu-devel] [PATCH v11 23/28] kvm-irqchip: i386: add hook for add/remove virq

Adding two hooks to be notified when adding/removing msi routes. There
are two kinds of MSI routes:

- in kvm_irqchip_add_irq_route(): before assigning IRQFD. Used by
  vhost, vfio, etc.

- in kvm_irqchip_send_msi(): when sending direct MSI message, if
  direct MSI not allowed, we will first create one MSI route entry
  in the kernel, then trigger it.

This patch only hooks the first one (irqfd case). We do not need to
take care for the 2nd one, since it's only used by QEMU userspace
(kvm-apic) and the messages will always do in-time translation when
triggered. While we need to note them down for the 1st one, so that we
can notify the kernel when cache invalidation happens.

Also, we do not hook IOAPIC msi routes (we have explicit notifier for
IOAPIC to keep its cache updated). We only need to care about irqfd
users.

Signed-off-by: Peter Xu 
Reviewed-by: Paolo Bonzini 
---
 include/sysemu/kvm.h |  6 ++
 kvm-all.c|  2 ++
 target-arm/kvm.c | 11 +++
 target-i386/kvm.c| 48 
 target-i386/trace-events |  2 ++
 target-mips/kvm.c| 11 +++
 target-ppc/kvm.c | 11 +++
 target-s390x/kvm.c   | 11 +++
 8 files changed, 102 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index e5d90bd..0a16e0e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -359,6 +359,12 @@ void kvm_arch_init_irq_routing(KVMState *s);
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev);
 
+/* Notify arch about newly added MSI routes */
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev);
+/* Notify arch about released MSI routes */
+int kvm_arch_release_virq_post(int virq);
+
 int kvm_arch_msi_data_to_gsi(uint32_t data);
 
 int kvm_set_irq(KVMState *s, int irq, int level);
diff --git a/kvm-all.c b/kvm-all.c
index d94c0e4..69ff658 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1133,6 +1133,7 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 }
 }
 clear_gsi(s, virq);
+kvm_arch_release_virq_post(virq);
 }
 
 static unsigned int kvm_hash_msi(uint32_t data)
@@ -1281,6 +1282,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 }
 
 kvm_add_routing_entry(s, &kroute);
+kvm_arch_add_msi_route_post(&kroute, vector, dev);
 kvm_irqchip_commit_routes(s);
 
 return virq;
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 5c2bd7a..dbe393c 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -622,6 +622,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
*route,
 return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 17cd24b..5d7a7a7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3352,6 +3352,54 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 return 0;
 }
 
+typedef struct MSIRouteEntry MSIRouteEntry;
+
+struct MSIRouteEntry {
+PCIDevice *dev; /* Device pointer */
+int vector; /* MSI/MSIX vector index */
+int virq;   /* Virtual IRQ index */
+QLIST_ENTRY(MSIRouteEntry) list;
+};
+
+/* List of used GSI routes */
+static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
+QLIST_HEAD_INITIALIZER(msi_route_list);
+
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+MSIRouteEntry *entry;
+
+if (!dev) {
+/* These are (possibly) IOAPIC routes only used for split
+ * kernel irqchip mode, while what we are housekeeping are
+ * PCI devices only. */
+return 0;
+}
+
+entry = g_new0(MSIRouteEntry, 1);
+entry->dev = dev;
+entry->vector = vector;
+entry->virq = route->gsi;
+QLIST_INSERT_HEAD(&msi_route_list, entry, list);
+
+trace_kvm_x86_add_msi_route(route->gsi);
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+MSIRouteEntry *entry, *next;
+QLIST_FOREACH_SAFE(entry, &msi_route_list, list, next) {
+if (entry->virq == virq) {
+trace_kvm_x86_remove_msi_route(virq);
+QLIST_REMOVE(entry, list);
+break;
+}
+}
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
diff --git a/target-i386/trace-events b/target-i386/trace-events
index 2113075..818058c 100644
--- a/target-i386/trace-events
+++ b/target-i386/trace-events
@@ -2,3 +2,5 @@
 
 # target-i386/kvm.c
 kvm_x86_fixup_msi_error(uint32_t

[Qemu-devel] [PATCH v11 21/28] intel_iommu: add SID validation for IR

This patch enables SID validation. Invalid interrupts will be dropped.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 69 ---
 include/hw/i386/intel_iommu.h | 17 +++
 2 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 506d7cf..bbbf37e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2008,9 +2008,13 @@ static Property vtd_properties[] = {
 
 /* Read IRTE entry with specific index */
 static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
-VTD_IRTE *entry)
+VTD_IRTE *entry, uint16_t sid)
 {
+static const uint16_t vtd_svt_mask[VTD_SQ_MAX] = \
+{0x, 0xfffb, 0xfff9, 0xfff8};
 dma_addr_t addr = 0x00;
+uint16_t mask, source_id;
+uint8_t bus, bus_max, bus_min;
 
 addr = iommu->intr_root + index * sizeof(*entry);
 if (dma_memory_read(&address_space_memory, addr, entry,
@@ -2037,23 +2041,58 @@ static int vtd_irte_get(IntelIOMMUState *iommu, 
uint16_t index,
 return -VTD_FR_IR_IRTE_RSVD;
 }
 
-/*
- * TODO: Check Source-ID corresponds to SVT (Source Validation
- * Type) bits
- */
+if (sid != X86_IOMMU_SID_INVALID) {
+/* Validate IRTE SID */
+source_id = le32_to_cpu(entry->source_id);
+switch (entry->sid_vtype) {
+case VTD_SVT_NONE:
+VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
+break;
+
+case VTD_SVT_ALL:
+mask = vtd_svt_mask[entry->sid_q];
+if ((source_id & mask) != (sid & mask)) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
+"%d failed (reqid 0x%04x sid 0x%04x)", index,
+sid, source_id);
+return -VTD_FR_IR_SID_ERR;
+}
+break;
+
+case VTD_SVT_BUS:
+bus_max = source_id >> 8;
+bus_min = source_id & 0xff;
+bus = sid >> 8;
+if (bus > bus_max || bus < bus_min) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
+"failed (bus %d outside %d-%d)", index, bus,
+bus_min, bus_max);
+return -VTD_FR_IR_SID_ERR;
+}
+break;
+
+default:
+VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
+"%d", entry->sid_vtype, index);
+/* Take this as verification failure. */
+return -VTD_FR_IR_SID_ERR;
+break;
+}
+}
 
 return 0;
 }
 
 /* Fetch IRQ information of specific IR index */
-static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq 
*irq)
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index,
+ VTDIrq *irq, uint16_t sid)
 {
 VTD_IRTE irte;
 int ret = 0;
 
 bzero(&irte, sizeof(irte));
 
-ret = vtd_irte_get(iommu, index, &irte);
+ret = vtd_irte_get(iommu, index, &irte, sid);
 if (ret) {
 return ret;
 }
@@ -2105,7 +2144,8 @@ static void vtd_generate_msi_message(VTDIrq *irq, 
MSIMessage *msg_out)
 /* Interrupt remapping for MSI/MSI-X entry */
 static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
MSIMessage *origin,
-   MSIMessage *translated)
+   MSIMessage *translated,
+   uint16_t sid)
 {
 int ret = 0;
 VTD_IR_MSIAddress addr;
@@ -2148,7 +2188,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
 index += origin->data & VTD_IR_MSI_DATA_SUBHANDLE;
 }
 
-ret = vtd_remap_irq_get(iommu, index, &irq);
+ret = vtd_remap_irq_get(iommu, index, &irq, sid);
 if (ret) {
 return ret;
 }
@@ -2195,7 +2235,8 @@ do_not_translate:
 static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
  MSIMessage *dst, uint16_t sid)
 {
-return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu), src, dst);
+return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu),
+   src, dst, sid);
 }
 
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
@@ -2221,11 +2262,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, 
hwaddr addr,
 {
 int ret = 0;
 MSIMessage from = {0}, to = {0};
+uint16_t sid = X86_IOMMU_SID_INVALID;
 
 from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
 from.data = (uint32_t) value;
 
-ret = vtd_interrupt_remap_msi(opaque, &from, &to);
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = vtd_interrupt_remap_msi(opaque, &from, &to, sid);
 if (ret) {
 /* TODO: report error */
 VTD_DPRINTF(GENERAL, "in

[Qemu-devel] [PATCH v11 27/28] kvm-all: add trace events for kvm irqchip ops

These will help us monitoring irqchip route activities more easily.

Signed-off-by: Peter Xu 
Reviewed-by: Paolo Bonzini 
---
 kvm-all.c| 5 +
 trace-events | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/kvm-all.c b/kvm-all.c
index 3764ba9..ef81ca5 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1048,6 +1048,7 @@ void kvm_irqchip_commit_routes(KVMState *s)
 int ret;
 
 s->irq_routes->flags = 0;
+trace_kvm_irqchip_commit_routes();
 ret = kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s->irq_routes);
 assert(ret == 0);
 }
@@ -1271,6 +1272,8 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 return -EINVAL;
 }
 
+trace_kvm_irqchip_add_msi_route(virq);
+
 kvm_add_routing_entry(s, &kroute);
 kvm_arch_add_msi_route_post(&kroute, vector, dev);
 kvm_irqchip_commit_routes(s);
@@ -1301,6 +1304,8 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 return -EINVAL;
 }
 
+trace_kvm_irqchip_update_msi_route(virq);
+
 return kvm_update_routing_entry(s, &kroute);
 }
 
diff --git a/trace-events b/trace-events
index 4767059..52c6a6c 100644
--- a/trace-events
+++ b/trace-events
@@ -118,6 +118,9 @@ kvm_run_exit(int cpu_index, uint32_t reason) "cpu_index %d, 
reason %d"
 kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p"
 kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve 
ONEREG %" PRIu64 " from KVM: %s"
 kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set 
ONEREG %" PRIu64 " to KVM: %s"
+kvm_irqchip_commit_routes(void) ""
+kvm_irqchip_add_msi_route(int virq) "Adding MSI route virq=%d"
+kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
 
 # TCG related tracing (mostly disabled by default)
 # cpu-exec.c
-- 
2.4.11

[Qemu-devel] [PATCH v11 24/28] kvm-irqchip: x86: add msi route notify fn

One more IEC notifier is added to let msi routes know about the IEC
changes. When interrupt invalidation happens, all registered msi routes
will be updated for all PCI devices.

Since both vfio and vhost are possible gsi route consumers, this patch
will go one step further to keep them safe in split irqchip mode and
when irqfd is enabled.

Reviewed-by: Paolo Bonzini 
[move trace-events lines into target-i386/trace-events]
Signed-off-by: Peter Xu 
---
 hw/pci/pci.c | 15 +++
 include/hw/pci/pci.h |  2 ++
 kvm-all.c| 10 +-
 target-i386/kvm.c| 30 ++
 target-i386/trace-events |  1 +
 5 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 149994b..728c6d4 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2596,6 +2596,21 @@ PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
 }
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
+{
+MSIMessage msg;
+if (msix_enabled(dev)) {
+msg = msix_get_message(dev, vector);
+} else if (msi_enabled(dev)) {
+msg = msi_get_message(dev, vector);
+} else {
+/* Should never happen */
+error_report("%s: unknown interrupt type", __func__);
+abort();
+}
+return msg;
+}
+
 static const TypeInfo pci_device_type_info = {
 .name = TYPE_PCI_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 9ed1624..74d797d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -805,4 +805,6 @@ extern const VMStateDescription vmstate_pci_device;
 .offset = vmstate_offset_pointer(_state, _field, PCIDevice), \
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+
 #endif
diff --git a/kvm-all.c b/kvm-all.c
index 69ff658..ca30a58 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1246,15 +1246,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 MSIMessage msg = {0, 0};
 
 if (dev) {
-if (msix_enabled(dev)) {
-msg = msix_get_message(dev, vector);
-} else if (msi_enabled(dev)) {
-msg = msi_get_message(dev, vector);
-} else {
-/* Should never happen */
-error_report("%s: unknown interrupt type", __func__);
-abort();
-}
+msg = pci_get_msi_message(dev, vector);
 }
 
 if (kvm_gsi_direct_mapping()) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5d7a7a7..f02ba0a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -3365,9 +3366,26 @@ struct MSIRouteEntry {
 static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
 QLIST_HEAD_INITIALIZER(msi_route_list);
 
+static void kvm_update_msi_routes_all(void *private, bool global,
+  uint32_t index, uint32_t mask)
+{
+int cnt = 0;
+MSIRouteEntry *entry;
+MSIMessage msg;
+/* TODO: explicit route update */
+QLIST_FOREACH(entry, &msi_route_list, list) {
+cnt++;
+msg = pci_get_msi_message(entry->dev, entry->vector);
+kvm_irqchip_update_msi_route(kvm_state, entry->virq,
+ msg, entry->dev);
+}
+trace_kvm_x86_update_msi_routes(cnt);
+}
+
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
 int vector, PCIDevice *dev)
 {
+static bool notify_list_inited = false;
 MSIRouteEntry *entry;
 
 if (!dev) {
@@ -3384,6 +3402,18 @@ int kvm_arch_add_msi_route_post(struct 
kvm_irq_routing_entry *route,
 QLIST_INSERT_HEAD(&msi_route_list, entry, list);
 
 trace_kvm_x86_add_msi_route(route->gsi);
+
+if (!notify_list_inited) {
+/* For the first time we do add route, add ourselves into
+ * IOMMU's IEC notify list if needed. */
+X86IOMMUState *iommu = x86_iommu_get_default();
+if (iommu) {
+x86_iommu_iec_register_notifier(iommu,
+kvm_update_msi_routes_all,
+NULL);
+}
+notify_list_inited = true;
+}
 return 0;
 }
 
diff --git a/target-i386/trace-events b/target-i386/trace-events
index 818058c..ccc49e3 100644
--- a/target-i386/trace-events
+++ b/target-i386/trace-events
@@ -4,3 +4,4 @@
 kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI 
%" PRIu32
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
+kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
-- 
2.4.11

[Qemu-devel] [PATCH v11 28/28] intel_iommu: disallow kernel-irqchip=on with IR

When user specify "kernel-irqchip=on", throw error and then quit.

Signed-off-by: Peter Xu 
Reviewed-by: Paolo Bonzini 
---
 hw/i386/intel_iommu.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 8a91e64..76471ad 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -20,6 +20,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "hw/sysbus.h"
 #include "exec/address-spaces.h"
 #include "intel_iommu_internal.h"
@@ -29,6 +30,7 @@
 #include "hw/boards.h"
 #include "hw/i386/x86-iommu.h"
 #include "hw/pci-host/q35.h"
+#include "sysemu/kvm.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -2476,6 +2478,13 @@ static void vtd_realize(DeviceState *dev, Error **errp)
 pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
 /* Pseudo address space under root PCI bus. */
 pcms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
+
+/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
+if (kvm_irqchip_in_kernel() && !kvm_irqchip_is_split()) {
+error_report("Intel Interrupt Remapping cannot work with "
+ "kernel-irqchip=on, please use 'split|off'.");
+exit(1);
+}
 }
 
 static void vtd_class_init(ObjectClass *klass, void *data)
-- 
2.4.11

[Qemu-devel] [PATCH v11 25/28] kvm-irqchip: do explicit commit when update irq

In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.

Signed-off-by: Peter Xu 
Reviewed-by: Paolo Bonzini 
---
 hw/i386/kvm/pci-assign.c | 2 ++
 hw/misc/ivshmem.c| 1 +
 hw/vfio/pci.c| 1 +
 hw/virtio/virtio-pci.c   | 1 +
 include/sysemu/kvm.h | 2 +-
 kvm-all.c| 2 --
 kvm-stub.c   | 4 
 target-i386/kvm.c| 1 +
 8 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 82b752d..ec5dfe2 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1014,6 +1014,7 @@ static void assigned_dev_update_msi_msg(PCIDevice 
*pci_dev)
 
 kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
  msi_get_message(pci_dev, 0), pci_dev);
+kvm_irqchip_commit_routes(kvm_state);
 }
 
 static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
@@ -1601,6 +1602,7 @@ static void assigned_dev_msix_mmio_write(void *opaque, 
hwaddr addr,
 if (ret) {
 error_report("Error updating irq routing entry (%d)", ret);
 }
+kvm_irqchip_commit_routes(kvm_state);
 }
 }
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 8512523..241a70c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -322,6 +322,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned 
vector,
 if (ret < 0) {
 return ret;
 }
+kvm_irqchip_commit_routes(kvm_state);
 
 return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 93455e5..b3b61bf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -458,6 +458,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, 
MSIMessage msg,
  PCIDevice *pdev)
 {
 kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
+kvm_irqchip_commit_routes(kvm_state);
 }
 
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cbdfd59..f0677b7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -842,6 +842,7 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 if (ret < 0) {
 return ret;
 }
+kvm_irqchip_commit_routes(kvm_state);
 }
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 0a16e0e..c9c2436 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -371,7 +371,6 @@ int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
-void kvm_irqchip_commit_routes(KVMState *s);
 
 void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
@@ -494,6 +493,7 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
  PCIDevice *dev);
+void kvm_irqchip_commit_routes(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
diff --git a/kvm-all.c b/kvm-all.c
index ca30a58..3764ba9 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1094,8 +1094,6 @@ static int kvm_update_routing_entry(KVMState *s,
 
 *entry = *new_entry;
 
-kvm_irqchip_commit_routes(s);
-
 return 0;
 }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 982e590..64e23f6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -135,6 +135,10 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 return -ENOSYS;
 }
 
+void kvm_irqchip_commit_routes(KVMState *s)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
 return -ENOSYS;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f02ba0a..0e26862 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3379,6 +3379,7 @@ static void kvm_update_msi_routes_all(void *private, bool 
global,
 kvm_irqchip_update_msi_route(kvm_state, entry->virq,
  msg, entry->dev);
 }
+kvm_irqchip_commit_routes(kvm_state);
 trace_kvm_x86_update_msi_routes(cnt);
 }
 
-- 
2.4.11

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, Jun 28, 2016 at 09:47:47AM +0800, Fam Zheng wrote:
> This was the only exceptional module init function that does something
> else than a simple list of bdrv_register() calls, in all the block
> drivers.
> 
> The qcrypto_hash_supports is actually a static check, determined at
> compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> consistency.
> 
> Signed-off-by: Fam Zheng 

The point of using qcrypto_hash_supports() is that it isolates the
block code Makefile rules from the details of the current specific
impl of the hash APIs in QEMU. As a prime example of why this is
important, try rebasing to GIT master, and you'll find we no longer
use gnutls for the hash APIs. We choose between libgcrypt, nettle
or a empty stub for hash impls now. I think it is a backwards step
to add back these makefile conditionals

> ---
>  block/Makefile.objs | 2 +-
>  block/quorum.c  | 4 
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 44a5416..c87d605 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> qcow2-snapshot.o qcow2-c
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
>  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> -block-obj-y += quorum.o
> +block-obj-$(CONFIG_GNUTLS_HASH) += quorum.o
>  block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
>  block-obj-y += block-backend.o snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> diff --git a/block/quorum.c b/block/quorum.c
> index 331b726..18fbed8 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -1113,10 +1113,6 @@ static BlockDriver bdrv_quorum = {
>  
>  static void bdrv_quorum_init(void)
>  {
> -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> -/* SHA256 hash support is required for quorum device */
> -return;
> -}
>  bdrv_register(&bdrv_quorum);
>  }
>  
> -- 
> 2.9.0
> 
> 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

[Qemu-devel] [PATCH v11 26/28] intel_iommu: support all masks in interrupt entry cache invalidation

From: Radim Krčmář 

Linux guests do not gracefully handle cases when the invalidation mask
they wanted is not supported, probably because real hardware always
allowed all.

We can just say that all 16 masks are supported, because both
ioapic_iec_notifier and kvm_update_msi_routes_all invalidate all caches.

Signed-off-by: Radim Krčmář 
---
 hw/i386/intel_iommu.c  | 2 +-
 hw/i386/intel_iommu_internal.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index bbbf37e..8a91e64 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2376,7 +2376,7 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
 if (x86_iommu->intr_supported) {
-s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
+s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM | VTD_ECAP_MHMV;
 }
 
 vtd_reset_context_cache(s);
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 72b0114..0829a50 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -186,6 +186,7 @@
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR (1ULL << 3)
 #define VTD_ECAP_EIM(1ULL << 4)
+#define VTD_ECAP_MHMV   (15ULL << 20)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
-- 
2.4.11

Re: [Qemu-devel] [PULL 0/1] Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

2016-07-05 Thread Peter Maydell

On 4 July 2016 at 16:26, Gerd Hoffmann  wrote:
>   Hi,
>
> Reverts patch and also remove pc-bios/bios-fast.bin.
>
> please pull,
>   Gerd
>
> The following changes since commit 3173a1fd549b7fa0f7029b2c6a6b86ba6efa92aa:
>
>   Merge remote-tracking branch 
> 'remotes/pmaydell/tags/pull-target-arm-20160704' into staging (2016-07-04 
> 14:33:05 +0100)
>
> are available in the git repository at:
>
>
>   git://git.kraxel.org/qemu tags/pull-seabios-20160704-3
>
> for you to fetch changes up to 3b1154fff16cf3bb9297d65145c89321bc1b7aa4:
>
>   Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86." 
> (2016-07-04 17:23:33 +0200)
>
> 
> Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."
>
> 
> Gerd Hoffmann (1):
>   Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, 07/05 09:45, Daniel P. Berrange wrote:
> On Tue, Jun 28, 2016 at 09:47:47AM +0800, Fam Zheng wrote:
> > This was the only exceptional module init function that does something
> > else than a simple list of bdrv_register() calls, in all the block
> > drivers.
> > 
> > The qcrypto_hash_supports is actually a static check, determined at
> > compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> > consistency.
> > 
> > Signed-off-by: Fam Zheng 
> 
> The point of using qcrypto_hash_supports() is that it isolates the
> block code Makefile rules from the details of the current specific
> impl of the hash APIs in QEMU. As a prime example of why this is
> important, try rebasing to GIT master, and you'll find we no longer
> use gnutls for the hash APIs. We choose between libgcrypt, nettle
> or a empty stub for hash impls now. I think it is a backwards step
> to add back these makefile conditionals

I don't want to spoil the isolation, but I think it's also worth to keep the
reasoning of whether a driver is supported as simple as possible. In other
words, if it's determined at configure time, there is no reason to use a
runtime check, right?

Fam

> 
> > ---
> >  block/Makefile.objs | 2 +-
> >  block/quorum.c  | 4 
> >  2 files changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 44a5416..c87d605 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -3,7 +3,7 @@ block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> > qcow2-snapshot.o qcow2-c
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> >  block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> > -block-obj-y += quorum.o
> > +block-obj-$(CONFIG_GNUTLS_HASH) += quorum.o
> >  block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
> >  block-obj-y += block-backend.o snapshot.o qapi.o
> >  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > diff --git a/block/quorum.c b/block/quorum.c
> > index 331b726..18fbed8 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -1113,10 +1113,6 @@ static BlockDriver bdrv_quorum = {
> >  
> >  static void bdrv_quorum_init(void)
> >  {
> > -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> > -/* SHA256 hash support is required for quorum device */
> > -return;
> > -}
> >  bdrv_register(&bdrv_quorum);
> >  }
> >  
> > -- 
> > 2.9.0
> > 
> > 
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH for-2.7? 0/3] Two coroutine tweaks

On Mon, 07/04 19:09, Paolo Bonzini wrote:
> Hi,
> 
> I meant to send these close to soft freeze because the final patch is
> prone to conflicts, but it turns out I didn't send them.  Still they're
> pretty mechanical so I'm shooting them out now.  Any thoughts?

I didn't review patch 3 line by line trusting coccinelle and the compiler,
otherwise the series looks good to me and the parameter moving is particularly
nice! Thanks,

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

2016-07-05 Thread Sascha Silbe

Dear Alberto,

Alberto Garcia  writes:

> On Tue 05 Jul 2016 09:58:25 AM CEST, Sascha Silbe wrote:
>
>> The quorum driver needs SHA256 which was introduced in gnutls 2.11.1.
>
> Are you sure about that?
>
> * Version 1.7.4 (released 2007-02-05)
>
> [...]
>
> ** API and ABI modifications:
> GNUTLS_MAC_SHA256,
[...]

You are right, I misinterpreted the gnutls 2.11.1 NEWS entry. It added
the RSA_NULL_SHA256 cipher suite, not the SHA256 hash algorithm.

So the existing configure check is good enough, no need to bump the
required gnutls version.

Sascha
-- 
Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
https://se-silbe.de/
USt-IdNr. DE281696641

[Qemu-devel] [PATCH] block/gluster: add support to choose libgfapi logfile

2016-07-05 Thread Prasanna Kumar Kalever

currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded
in a call to glfs logging api, in case if debug level is chosen to DEBUG/TRACE
gfapi logs will be huge and fill/overflow the console view.

this patch provides a commandline option to mention log file path which helps
in logging to the specified file and also help in persisting the gfapi logs.

Usage: -drive file=gluster://hostname/volname/image.qcow2,file.debug=9,\
 file.logfile=/var/log/qemu/qemu-gfapi.log

Signed-off-by: Prasanna Kumar Kalever 
---
 block/gluster.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 16f7778..6875429 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -24,6 +24,7 @@ typedef struct GlusterAIOCB {
 typedef struct BDRVGlusterState {
 struct glfs *glfs;
 struct glfs_fd *fd;
+const char *logfile;
 bool supports_seek_data;
 int debug_level;
 } BDRVGlusterState;
@@ -34,6 +35,7 @@ typedef struct GlusterConf {
 char *volname;
 char *image;
 char *transport;
+const char *logfile;
 int debug_level;
 } GlusterConf;
 
@@ -181,7 +183,8 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 ret = qemu_gluster_parseuri(gconf, filename);
 if (ret < 0) {
 error_setg(errp, "Usage: file=gluster[+transport]://[server[:port]]/"
-   "volname/image[?socket=...]");
+   "volname/image[?socket=...][,file.debug=N]"
+   "[,file.logfile=/path/filename.log]");
 errno = -ret;
 goto out;
 }
@@ -197,7 +200,7 @@ static struct glfs *qemu_gluster_init(GlusterConf *gconf, 
const char *filename,
 goto out;
 }
 
-ret = glfs_set_logging(glfs, "-", gconf->debug_level);
+ret = glfs_set_logging(glfs, gconf->logfile, gconf->debug_level);
 if (ret < 0) {
 goto out;
 }
@@ -256,6 +259,8 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, 
ssize_t ret, void *arg)
 }
 
 #define GLUSTER_OPT_FILENAME "filename"
+#define GLUSTER_OPT_LOGFILE "logfile"
+#define GLUSTER_LOGFILE_DEFAULT "-" /* '-' handled in libgfapi as /dev/stderr 
*/
 #define GLUSTER_OPT_DEBUG "debug"
 #define GLUSTER_DEBUG_DEFAULT 4
 #define GLUSTER_DEBUG_MAX 9
@@ -271,6 +276,11 @@ static QemuOptsList runtime_opts = {
 .help = "URL to the gluster image",
 },
 {
+.name = GLUSTER_OPT_LOGFILE,
+.type = QEMU_OPT_STRING,
+.help = "Logfile path of libgfapi",
+},
+{
 .name = GLUSTER_OPT_DEBUG,
 .type = QEMU_OPT_NUMBER,
 .help = "Gluster log level, valid range is 0-9",
@@ -339,6 +349,12 @@ static int qemu_gluster_open(BlockDriverState *bs,  QDict 
*options,
 
 filename = qemu_opt_get(opts, GLUSTER_OPT_FILENAME);
 
+s->logfile = qemu_opt_get(opts, GLUSTER_OPT_LOGFILE);
+if (!s->logfile) {
+s->logfile = GLUSTER_LOGFILE_DEFAULT;
+}
+gconf->logfile = s->logfile;
+
 s->debug_level = qemu_opt_get_number(opts, GLUSTER_OPT_DEBUG,
  GLUSTER_DEBUG_DEFAULT);
 if (s->debug_level < 0) {
@@ -422,6 +438,7 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState 
*state,
 
 gconf = g_new0(GlusterConf, 1);
 
+gconf->logfile = s->logfile;
 gconf->debug_level = s->debug_level;
 reop_s->glfs = qemu_gluster_init(gconf, state->bs->filename, errp);
 if (reop_s->glfs == NULL) {
@@ -556,6 +573,11 @@ static int qemu_gluster_create(const char *filename,
 char *tmp = NULL;
 GlusterConf *gconf = g_new0(GlusterConf, 1);
 
+gconf->logfile = qemu_opt_get_del(opts, GLUSTER_OPT_LOGFILE);
+if (!gconf->logfile) {
+gconf->logfile = GLUSTER_LOGFILE_DEFAULT;
+}
+
 gconf->debug_level = qemu_opt_get_number_del(opts, GLUSTER_OPT_DEBUG,
  GLUSTER_DEBUG_DEFAULT);
 if (gconf->debug_level < 0) {
@@ -949,6 +971,11 @@ static QemuOptsList qemu_gluster_create_opts = {
 .help = "Preallocation mode (allowed values: off, full)"
 },
 {
+.name = GLUSTER_OPT_LOGFILE,
+.type = QEMU_OPT_STRING,
+.help = "Logfile path of libgfapi",
+},
+{
 .name = GLUSTER_OPT_DEBUG,
 .type = QEMU_OPT_NUMBER,
 .help = "Gluster log level, valid range is 0-9",
-- 
2.7.4

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

2016-07-05 Thread Alberto Garcia

On Tue 05 Jul 2016 10:45:21 AM CEST, Daniel P. Berrange wrote:

> The point of using qcrypto_hash_supports() is that it isolates the
> block code Makefile rules from the details of the current specific
> impl of the hash APIs in QEMU. As a prime example of why this is
> important, try rebasing to GIT master, and you'll find we no longer
> use gnutls for the hash APIs. We choose between libgcrypt, nettle or a
> empty stub for hash impls now. I think it is a backwards step to add
> back these makefile conditionals

Now that you mention this I wonder why we are not using glib for the
hashing functions. GChecksum is available since glib 2.16 (QEMU requires
2.22) and it supports MD5, SHA1, SHA256 and SHA512. I see that in git
master there's now a few algorithms more, but for the Quorum case those
ones are enough.

Berto

Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error

2016-07-05 Thread Marc-André Lureau

Hi

On Tue, Jul 5, 2016 at 12:35 AM, Michael S. Tsirkin  wrote:
> On Mon, Jul 04, 2016 at 11:56:56PM +0200, Marc-André Lureau wrote:
>> Hi
>>
>> On Mon, Jul 4, 2016 at 5:47 PM, Michael S. Tsirkin  wrote:
>> > Why does vhost_user_set_log_base need to return error?
>> > If backend is not there to handle this message,
>> > then it is not changing memory so it's ok to ignore the error.
>>
>> How do you know it's not changing the memory?
>
> either it closed socket intentionally or it exited
> and kernel cleaned up.

And if it closed intentionally during migration, we want to catch this
as a bug since it may still modify the memory

>> Furthermore, if the migration happened, it's because backend claim
>> VHOST_F_LOG_ALL, thus it should really not fail
>
> I don't see why - could you explain pls?

If the backend claims migration support, it shouldn't have bad
migration behaviour such as closing the vhost-user socket.

>> > Same logic applies to many other messages.
>>
>> Pretty much all messages, the error can't be ignored, or operations
>> will just fail silentely randomly. I don't understand why vhost-user
>> io error can be ignored. Also it's quite inconsistent the way the code
>> is today, vhost_user_write() returns an error that is mostly ignored,
>> while vhost_user_read() is checked. Why having an error later when you
>> can report it earlier? I fail to understand the rationale of this
>> error handling.
>
> It's historical.  the way I see it, most errors due to disconnect
> can be ignored except
> maybe for the initial feature negotiation which is needed
> so we know what to tell guest.

The way I see it is that errors should not be ignored because it makes
it harder to track what is going on.

> Errors due to e.g. buffer being full should cause an assert
> as it's an internal qemu error.

I don't see why qemu would be responsible for say, a suspended backend.

>
>
>>
>> --
>> Marc-André Lureau



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, Jul 05, 2016 at 11:18:29AM +0200, Alberto Garcia wrote:
> On Tue 05 Jul 2016 10:45:21 AM CEST, Daniel P. Berrange wrote:
> 
> > The point of using qcrypto_hash_supports() is that it isolates the
> > block code Makefile rules from the details of the current specific
> > impl of the hash APIs in QEMU. As a prime example of why this is
> > important, try rebasing to GIT master, and you'll find we no longer
> > use gnutls for the hash APIs. We choose between libgcrypt, nettle or a
> > empty stub for hash impls now. I think it is a backwards step to add
> > back these makefile conditionals
> 
> Now that you mention this I wonder why we are not using glib for the
> hashing functions. GChecksum is available since glib 2.16 (QEMU requires
> 2.22) and it supports MD5, SHA1, SHA256 and SHA512. I see that in git
> master there's now a few algorithms more, but for the Quorum case those
> ones are enough.

The GChecksum API is inadequate for QEMU's needs, due to its limited
range of algorithms. We absolutely do not want different areas of
the code using different APIs either. The goal of the crypto APIs is
to provide a standard internal API for all cryptographic related
operations for use across the whole codebase. This has clarified much
of our code by removing countless #ifdef conditionals from the code
and similar from the build system. It also facilitates people auditing
QEMU use & implementation of crypto as there is only one place to look
at to review. It also ensures that QEMU is only using certified secure
crypto libraries, not some custom re-implementation of the crypto
algorithms that have never been through a security review. Finally is
ensures that QEMU correctly responds to runtime configurable changes,
such as FIPS mode which restricts use of certain crypto algorithms
at runtime, even if they're technically available at compile time.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > QEMU sets the guests physical address bits to 40; this is wrong
> > on most hardware, and can be detected by the guest.
> > It also stops you using really huge multi-TB VMs.
> > 
> > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > of years that reads the hosts value and uses that in the guest.  That's
> > correct as far as the guest sees it, and lets you create huge VMs.
> > 
> > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > life gets complicated in migration; prior to 2.6 it all apparently
> > worked (although a guest that looked might spot the change).
> > In 2.6 Paolo started checking MSR writes and they failed when the
> > incoming MTRR mask didn't fit.
> > 
> > This series:
> >a) Fixes up mtrr masks so that if you're migrating between hosts
> >   of different physical address size it tries to do something sensible.
> > 
> >b) Lets you specify the guest physical address size via a CPU property, 
> > i.e.
> > -cpu SandyBridge,phys-bits=36
> > 
> >   The default on old machine types is to use the existing 40 bits value.
> > 
> >c) Lets you tell qemu to use the same setting as the host, i.e.
> > -cpu SandyBridge,phys-bits=0
> >  
> >   This is the default on new machine types.
> > 
> > Note that mixed size hosts are still not necessarily safe; a guest
> > started on a host with a large physical address size might start using
> > those bits and get upset when it's moved to a small host.
> > However that was already potentially broken in existing qemu that
> > used a magic value of 40.
> > 
> > There's potential to add some extra guards against people
> > doing silly stuff; e.g. stop people running VMs using 1TB of
> > address space on a tiny host.
> > 
> > Dave
> 
> This is all in target-i386 so if the maintainers want it this way, they
> can merge this, and I do not have strong objections, but I wanted to
> document an alternative that is IMHO somewhat nicer. Feel free to
> ignore.  See below.
> 
> How can guest use more memory than what host supports?
> I think there are two ways:
> 
> 1. more memory than host supports is supplied
>This is a configuration error. We can simply detect this
>and fail init, or print a warning, no need for new flags.

Yes we should do that; however there's a case that's potentially
currently working for people but actually kind of illegal.
That case is specifying a small amount of actual memory
but a large maxmem - i.e.:

 -m 2G,slots=16,maxmem=2T

On a host with a 39bit physaddress limit do you error
on that or not?  I think oVirt is currently doing something
similar to that, but I'm trying to get confirmation.

> 2. pci addresses out of host range assigned by guest
>Again normally at least seabios will not do this,
>maybe OVMF will?
>we certainly can add an interface telling firmware
>what the limit is.
> 
> Thus an alternative is:
> - add interface to tell QEMU how much 64 bit memory can pci use.
> - teach firmware to limit itself to that
> - set guest bits to 48 unconditionally
> 
> 
> the disadvantage of this approach is that firmware needs to be changed

I guess it also needs the CRS to tell the guest OS not
to remap PCI stuff into that space?  I thought also from the previous
discussions that the guest would get a different exception if it
actually tried to use any of the bits below 48 it didn't have.

> the advantage is that we get seemless migration between different
> hosts as long as they both can support the configuration,
> without any management effort.

The reality (Linux guest) is that this already works as long as you don't
map anything into the high address space, and the firmware wont do
that unless it's pushed to by an excessive maxmem or huge
64bit PCI bars.

Dave

> 
> > 
> > v2
> >   Default on new machine types is to read from the host
> >   Use the MAKE_64BIT_MASK macro
> >   Validate phys_bits in the realise method
> >   Move reading the host physical bits to the realise method
> >   Set phys-bits even for 32bit guests
> >   Add warning when your phys-bits doesn't match your host in the none
> > default case
> > 
> > Dr. David Alan Gilbert (6):
> >   x86: Allow physical address bits to be set
> >   x86: Mask mtrr mask based on CPU physical address limits
> >   x86: fill high bits of mtrr mask
> >   x86: Set physical address bits based on host
> >   x86: fix up 32 bit phys_bits case
> >   x86: Add sanity checks on phys_bits
> > 
> >  include/hw/i386/pc.h | 10 
> >  target-i386/cpu.c| 71 
> > ++--
> >  target-i386/cpu.h|  6 +
> >  target-i386/kvm.c| 36 +++---
> >  4 files changed, 112 insertions(+), 11 deletions(-)
> > 
> > -- 
> > 2.7.4
--
Dr. David Alan Gilbert / dgi

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, Jul 05, 2016 at 04:57:58PM +0800, Fam Zheng wrote:
> On Tue, 07/05 09:45, Daniel P. Berrange wrote:
> > On Tue, Jun 28, 2016 at 09:47:47AM +0800, Fam Zheng wrote:
> > > This was the only exceptional module init function that does something
> > > else than a simple list of bdrv_register() calls, in all the block
> > > drivers.
> > > 
> > > The qcrypto_hash_supports is actually a static check, determined at
> > > compile time.  Follow the block-job-$(CONFIG_FOO) convention for
> > > consistency.
> > > 
> > > Signed-off-by: Fam Zheng 
> > 
> > The point of using qcrypto_hash_supports() is that it isolates the
> > block code Makefile rules from the details of the current specific
> > impl of the hash APIs in QEMU. As a prime example of why this is
> > important, try rebasing to GIT master, and you'll find we no longer
> > use gnutls for the hash APIs. We choose between libgcrypt, nettle
> > or a empty stub for hash impls now. I think it is a backwards step
> > to add back these makefile conditionals
> 
> I don't want to spoil the isolation, but I think it's also worth to keep the
> reasoning of whether a driver is supported as simple as possible. In other
> words, if it's determined at configure time, there is no reason to use a
> runtime check, right?

While the crypto algorithms may all be built-in at compile time, there can
be situations where they are forbidden from use at runtime. For example
the FIPS mode activation will disable many algorithms from use. You might
say that this doesn't matter since FIPs mode dosn't affect SHA256, but the
point of having the generic crypto API inside QEMU is to stop us sprinkling
these kind of ad-hoc assumptions about crypto policies across the code base.

Can you backup and explain more detail what the actual problem you're trying
to solve is. IIUC, it is related to module loading, but I'm not seeing exactly
what it is. Surely when we load the quorum.so module, we'll just invoke the
bdrv_quorum_init() method as normal, so I would have expected the current
logic to continue to "just work".  ie, just because we load a module, does
not mean that module should be required to register its block driver.

The other alternative though is to simply remove the hash check from the
init method *and* unconditionally compile it, and simply allow the
quorum_open() method do the qcrypto_hash_supports() check. This would
be the same way that the LUKS block driver works - it has many crypto
algorithms in use, chosen dynamically, so it has no choice but to test
this at open() time.

> > > diff --git a/block/quorum.c b/block/quorum.c
> > > index 331b726..18fbed8 100644
> > > --- a/block/quorum.c
> > > +++ b/block/quorum.c
> > > @@ -1113,10 +1113,6 @@ static BlockDriver bdrv_quorum = {
> > >  
> > >  static void bdrv_quorum_init(void)
> > >  {
> > > -if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
> > > -/* SHA256 hash support is required for quorum device */
> > > -return;
> > > -}
> > >  bdrv_register(&bdrv_quorum);
> > >  }

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 5/6] x86: fix up 32 bit phys_bits case

On Mon, Jul 04, 2016 at 08:16:08PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> On 32 bit systems fix up phys_bits to be consistent with what
> we tell the guest; don't ever bother with using the phys_bits
> property.

> @@ -2990,6 +2986,15 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
> **errp)
>  cpu->phys_bits = host_phys_bits;
>  
>  }
> +} else {
> +/* For 32 bit systems don't use the user set value, but keep
> + * phys_bits consistent with what we tell the guest.
> + */
> +if (env->features[FEAT_1_EDX] & CPUID_PSE36) {
> +cpu->phys_bits = 36;
> +} else {
> +cpu->phys_bits = 32;
> +}

I kind of feel like we should report an error and exit if the
user/app has provided a phys_bits property value, rather than
silently ignoring their provided value, on the basis that this
is a user/app configuration error.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> QEMU sets the guests physical address bits to 40; this is wrong
> on most hardware, and can be detected by the guest.
> It also stops you using really huge multi-TB VMs.
> 
> Red Hat has had a patch, that Andrea wrote, downstream for a couple
> of years that reads the hosts value and uses that in the guest.  That's
> correct as far as the guest sees it, and lets you create huge VMs.
> 
> The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> life gets complicated in migration; prior to 2.6 it all apparently
> worked (although a guest that looked might spot the change).
> In 2.6 Paolo started checking MSR writes and they failed when the
> incoming MTRR mask didn't fit.
> 
> This series:
>a) Fixes up mtrr masks so that if you're migrating between hosts
>   of different physical address size it tries to do something sensible.
> 
>b) Lets you specify the guest physical address size via a CPU property, 
> i.e.
> -cpu SandyBridge,phys-bits=36
> 
>   The default on old machine types is to use the existing 40 bits value.
> 
>c) Lets you tell qemu to use the same setting as the host, i.e.
> -cpu SandyBridge,phys-bits=0
>  
>   This is the default on new machine types.

As a general rule we've tried to say that if you pick an explicit CPU
model, we're migration safe. By having the phys-bits default value
always reflect the host CPU value, it feels like we've made the explicit
CPU model choice less safe, just like -cpu host is.

IOW, if choosing a named CPU model, it feels like we should have a
corresponding fixed phys-bit value for that CPU model, even if it
has to be quiet conservative (eg default to bits=36). A phys-bits=0
value should only be used with  -cpu host.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

* Daniel P. Berrange (berra...@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > QEMU sets the guests physical address bits to 40; this is wrong
> > on most hardware, and can be detected by the guest.
> > It also stops you using really huge multi-TB VMs.
> > 
> > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > of years that reads the hosts value and uses that in the guest.  That's
> > correct as far as the guest sees it, and lets you create huge VMs.
> > 
> > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > life gets complicated in migration; prior to 2.6 it all apparently
> > worked (although a guest that looked might spot the change).
> > In 2.6 Paolo started checking MSR writes and they failed when the
> > incoming MTRR mask didn't fit.
> > 
> > This series:
> >a) Fixes up mtrr masks so that if you're migrating between hosts
> >   of different physical address size it tries to do something sensible.
> > 
> >b) Lets you specify the guest physical address size via a CPU property, 
> > i.e.
> > -cpu SandyBridge,phys-bits=36
> > 
> >   The default on old machine types is to use the existing 40 bits value.
> > 
> >c) Lets you tell qemu to use the same setting as the host, i.e.
> > -cpu SandyBridge,phys-bits=0
> >  
> >   This is the default on new machine types.
> 
> As a general rule we've tried to say that if you pick an explicit CPU
> model, we're migration safe. By having the phys-bits default value
> always reflect the host CPU value, it feels like we've made the explicit
> CPU model choice less safe, just like -cpu host is.
> 
> IOW, if choosing a named CPU model, it feels like we should have a
> corresponding fixed phys-bit value for that CPU model, even if it
> has to be quiet conservative (eg default to bits=36). A phys-bits=0
> value should only be used with  -cpu host.

phys-bits doesn't follow a cpu model in real hardware
  e.g. a SandyBridge Xeon and a SandyBridge i7 are different.

So unless we suddenly created at least 2x as many cpu models we can't
do that.

Dave

> 
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Baptiste Reynal

On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
 wrote:
> On 2016/7/4 20:22, Baptiste Reynal wrote:
>>
>> On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
>>  wrote:
>>>
>>> For now, we still didn't support live memory snapshot, we have discussed
>>> a scheme which based on userfaultfd long time ago.
>>> You can find the discussion by the follow link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html
>>>
>>> The scheme is based on userfaultfd's write-protect capability.
>>> The userfaultfd write protection feature is available here:
>>> http://www.spinics.net/lists/linux-mm/msg97422.html
>>>
>>> The process of this live memory scheme is like bellow:
>>> 1. Pause VM
>>> 2. Enable write-protect fault notification by using userfaultfd to
>>> mark VM's memory to write-protect (readonly).
>>> 3. Save VM's static state (here is device state) to snapshot file
>>> 4. Resume VM, VM is going to run.
>>> 5. Snapshot thread begins to save VM's live state (here is RAM) into
>>> snapshot file.
>>> 6. During this time, all the actions of writing VM's memory will be
>>> blocked
>>>by kernel, and kernel will wakeup the fault treating thread in qemu to
>>>process this write-protect fault. The fault treating thread will
>>> deliver this
>>>page's address to snapshot thread.
>>> 7. snapshot thread gets this address, save this page into snasphot file,
>>> and then remove the write-protect by using userfaultfd API, after
>>> that,
>>> the actions of writing will be recovered.
>>> 8. Repeat step 5~7 until all VM's memory is saved to snapshot file
>>>
>>> Compared with the feature of 'migrate VM's state to file',
>>> the main difference for live memory snapshot is it has little time delay
>>> for
>>> catching VM's state. It just captures the VM's state while got users
>>> snapshot
>>> command, just like take a photo of VM's state.
>>>
>>> For now, we only support tcg accelerator, since userfaultfd is not
>>> supporting
>>> tracking write faults for KVM.
>>>
>>> Usage:
>>> 1. Take a snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio
>>> Issue snapshot command:
>>> (qemu)migrate -d file:/home/Snapshot
>>> 2. Revert to the snapshot
>>> #x86_64-softmmu/qemu-system-x86_64 -machine
>>> pc-i440fx-2.5,accel=tcg,usb=off -drive
>>> file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
>>> -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1  -vnc :7 -m
>>> 8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
>>> --monitor stdio -incoming file:/home/Snapshot
>>>
>>> NOTE:
>>> The userfaultfd write protection feature does not support THP for now,
>>> Before taking snapshot, please disable THP by:
>>> echo never > /sys/kernel/mm/transparent_hugepage/enabled
>>>
>>> TODO:
>>> - Reduce the influence for VM while taking snapshot
>>>
>>> zhanghailiang (13):
>>>postcopy/migration: Split fault related state into struct
>>>  UserfaultState
>>>migration: Allow the migrate command to work on file: urls
>>>migration: Allow -incoming to work on file: urls
>>>migration: Create a snapshot thread to realize saving memory snapshot
>>>migration: implement initialization work for snapshot
>>>QEMUSizedBuffer: Introduce two help functions for qsb
>>>savevm: Split qemu_savevm_state_complete_precopy() into two helper
>>>  functions
>>>snapshot: Save VM's device state into snapshot file
>>>migration/postcopy-ram: fix some helper functions to support
>>>  userfaultfd write-protect
>>>snapshot: Enable the write-protect notification capability for VM's
>>>  RAM
>>>snapshot/migration: Save VM's RAM into snapshot file
>>>migration/ram: Fix some helper functions' parameter to use
>>>  PageSearchStatus
>>>snapshot: Remove page's write-protect and copy the content during
>>>  setup stage
>>>
>>>   include/migration/migration.h |  41 +--
>>>   include/migration/postcopy-ram.h  |   9 +-
>>>   include/migration/qemu-file.h |   3 +-
>>>   include/qemu/typedefs.h   |   1 +
>>>   include/sysemu/sysemu.h   |   3 +
>>>   linux-headers/linux/userfaultfd.h |  21 +++-
>>>   migration/fd.c|  51 -
>>>   migration/migration.c | 101 -
>>>   migration/postcopy-ram.c  | 229
>>> --
>>>   migration/qemu-file-buf.c |  61 ++
>>>   migration/ram.c   | 104 -
>>>   migration/savevm.c|  90 ---
>>>   trace-events  |   1 +
>>>   13 files changed, 587 insertions(+),

Re: [Qemu-devel] [PATCH v2 0/3] seabios: add serial console support

2016-07-05 Thread Gerd Hoffmann

On Di, 2016-07-05 at 09:06 +0100, Daniel P. Berrange wrote:
> On Mon, Jul 04, 2016 at 10:39:51PM +0200, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > Next round of patches.  Changes:
> > 
> >  * Moved it all to a new sercon.c file.
> >  * Code maps cp437 to utf8 now, giving a much nicer display.  Compare
> >"Use the ↑ and ↓ keys to change the selection." (this series) with
> >"Use the ^ and v keys to change the selection." (sgabios)  ;-)
> >  * Simplified keyboard code, using enqueue_key now.
> >  * Restructed code, to cleanup things and to address review comments.
> 
> Currently libvirt has an option to turn on serial console support
> for the BIOS. When this is set it adds the sga device. How will
> libvirt know when seabios has this feature built-in, and thus does
> not need to add the sga device ? 

Current code activates the serial console (unconditionally) in case no
vga is present.  I want also support output on both vga and serial
console, but code for the later isn't there yet.

What the final default behavior will be is not clear yet.  Not enabled?
Enabled in case no VGA is present?  Enabled unconditionally (simliar to
ovmf)?

Likewise it is not clear yet how we'll go enable/disable this.

One option would be to continue using sgabios.bin in fw_cfg.  But
instead of loading and executing it activate the build-in serial console
if we find the file.  That would make the switch completely transparent
to upper layers.  But it is quite hackish of course ...

Comments and ideas are welcome.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] quorum: Only compile when supported

On Tue, Jul 05, 2016 at 10:26:56AM +0100, Daniel P. Berrange wrote:
> On Tue, Jul 05, 2016 at 11:18:29AM +0200, Alberto Garcia wrote:
> > On Tue 05 Jul 2016 10:45:21 AM CEST, Daniel P. Berrange wrote:
> > 
> > > The point of using qcrypto_hash_supports() is that it isolates the
> > > block code Makefile rules from the details of the current specific
> > > impl of the hash APIs in QEMU. As a prime example of why this is
> > > important, try rebasing to GIT master, and you'll find we no longer
> > > use gnutls for the hash APIs. We choose between libgcrypt, nettle or a
> > > empty stub for hash impls now. I think it is a backwards step to add
> > > back these makefile conditionals
> > 
> > Now that you mention this I wonder why we are not using glib for the
> > hashing functions. GChecksum is available since glib 2.16 (QEMU requires
> > 2.22) and it supports MD5, SHA1, SHA256 and SHA512. I see that in git
> > master there's now a few algorithms more, but for the Quorum case those
> > ones are enough.
> 
> The GChecksum API is inadequate for QEMU's needs, due to its limited
> range of algorithms. We absolutely do not want different areas of
> the code using different APIs either. The goal of the crypto APIs is
> to provide a standard internal API for all cryptographic related
> operations for use across the whole codebase. This has clarified much
> of our code by removing countless #ifdef conditionals from the code
> and similar from the build system. It also facilitates people auditing
> QEMU use & implementation of crypto as there is only one place to look
> at to review. It also ensures that QEMU is only using certified secure
> crypto libraries, not some custom re-implementation of the crypto
> algorithms that have never been through a security review. Finally is
> ensures that QEMU correctly responds to runtime configurable changes,
> such as FIPS mode which restricts use of certain crypto algorithms
> at runtime, even if they're technically available at compile time.

Acutally, having said that, what we could do is to replace the no-op
stub hash impl, with a GCheckusum based impl. That way if neither
gcrypt or nettle are available, we can fallback to GChecksum for a
sub-set of the hash algorithms.

That would allow us to move the gcrypto_hash_supports() check out
of the quorum register method, and into its open() method, and
avoid any Makefile.objs conditionals.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) 
> > wrote:
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > QEMU sets the guests physical address bits to 40; this is wrong
> > > on most hardware, and can be detected by the guest.
> > > It also stops you using really huge multi-TB VMs.
> > > 
> > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > of years that reads the hosts value and uses that in the guest.  That's
> > > correct as far as the guest sees it, and lets you create huge VMs.
> > > 
> > > The downside, is that if you've got a mix of hosts, say an i7 and a Xeon,
> > > life gets complicated in migration; prior to 2.6 it all apparently
> > > worked (although a guest that looked might spot the change).
> > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > incoming MTRR mask didn't fit.
> > > 
> > > This series:
> > >a) Fixes up mtrr masks so that if you're migrating between hosts
> > >   of different physical address size it tries to do something 
> > > sensible.
> > > 
> > >b) Lets you specify the guest physical address size via a CPU 
> > > property, i.e.
> > > -cpu SandyBridge,phys-bits=36
> > > 
> > >   The default on old machine types is to use the existing 40 bits 
> > > value.
> > > 
> > >c) Lets you tell qemu to use the same setting as the host, i.e.
> > > -cpu SandyBridge,phys-bits=0
> > >  
> > >   This is the default on new machine types.
> > > 
> > > Note that mixed size hosts are still not necessarily safe; a guest
> > > started on a host with a large physical address size might start using
> > > those bits and get upset when it's moved to a small host.
> > > However that was already potentially broken in existing qemu that
> > > used a magic value of 40.
> > > 
> > > There's potential to add some extra guards against people
> > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > address space on a tiny host.
> > > 
> > > Dave
> > 
> > This is all in target-i386 so if the maintainers want it this way, they
> > can merge this, and I do not have strong objections, but I wanted to
> > document an alternative that is IMHO somewhat nicer. Feel free to
> > ignore.  See below.
> > 
> > How can guest use more memory than what host supports?
> > I think there are two ways:
> > 
> > 1. more memory than host supports is supplied
> >This is a configuration error. We can simply detect this
> >and fail init, or print a warning, no need for new flags.
> 
> Yes we should do that; however there's a case that's potentially
> currently working for people but actually kind of illegal.
> That case is specifying a small amount of actual memory
> but a large maxmem - i.e.:
> 
>  -m 2G,slots=16,maxmem=2T
> 
> On a host with a 39bit physaddress limit do you error
> on that or not?  I think oVirt is currently doing something
> similar to that, but I'm trying to get confirmation.

That would only be a problem since pci is allocated above
maxmem so 64 bit pci addresses aren't accessible.
With my proposal we can actually force firmware to avoid
using 64 bit memory for that config.
Will work better than today.


> > 2. pci addresses out of host range assigned by guest
> >Again normally at least seabios will not do this,
> >maybe OVMF will?
> >we certainly can add an interface telling firmware
> >what the limit is.
> > 
> > Thus an alternative is:
> > - add interface to tell QEMU how much 64 bit memory can pci use.
> > - teach firmware to limit itself to that
> > - set guest bits to 48 unconditionally
> > 
> > 
> > the disadvantage of this approach is that firmware needs to be changed
> 
> I guess it also needs the CRS to tell the guest OS not
> to remap PCI stuff into that space?

CRS is a list of legal addresses, not list of illegal ones.
So just don't include what's illegal there.

>  I thought also from the previous
> discussions that the guest would get a different exception if it
> actually tried to use any of the bits below 48 it didn't have.

Basically if you try to map pci at an address outside CRS
you can get any kind of crash since there could be on-board
hardware handling these addresses.
So I do not think we care about that.


> > the advantage is that we get seemless migration between different
> > hosts as long as they both can support the configuration,
> > without any management effort.
> 
> The reality (Linux guest) is that this already works as long as you don't
> map anything into the high address space, and the firmware wont do
> that unless it's pushed to by an excessive maxmem or huge
> 64bit PCI bars.
> 
> Dave

Right. So the disadvantage isn't big at all, and I think advantages
outweight it.

> > 
> > > 
> > > v2
> > >   Default on new machine types is to read from the host
> > >   Use the MAKE_64BIT_MASK macro
>

Re: [Qemu-devel] [PULL 00/36] pc, pci, virtio: new features, cleanups, fixes

2016-07-05 Thread Peter Maydell

On 4 July 2016 at 17:46, Michael S. Tsirkin  wrote:
> The following changes since commit e2c8f9e44e07d8210049abaa6042ec3c956f1dd4:
>
>   Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into 
> staging (2016-07-04 10:49:17 +0100)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to 278a2a21f80031f7f5e9c436df96a13860726107:
>
>   vmw_pvscsi: remove unnecessary internal msi state flag (2016-07-04 19:43:39 
> +0300)
>
> 
> pc, pci, virtio: new features, cleanups, fixes
>
> iommus can not be added with -device.
> cleanups and fixes all over the place
>
> Signed-off-by: Michael S. Tsirkin 
>
> 

Hi. I'm afraid this fails 'make check' on 32-bit ARM:

TEST: tests/bios-tables-test... (pid=6348)
  /i386/acpi/piix4/tcg:OK
  /i386/acpi/piix4/tcg/bridge: OK
  /i386/acpi/piix4/tcg/ipmi:   OK
  /i386/acpi/piix4/tcg/cpuhp:  OK
  /i386/acpi/piix4/tcg/pxb:
qemu-system-i386: -object memory-backend-file,size
=4G,mem-path=/tmp/shmem,share,id=mb: memory size 0x0 must be equal to
or larger than page size 0x1000
socket_accept failed: Resource temporarily unavailable
**
ERROR:/home/petmay01/qemu/tests/libqtest.c:197:qtest_init: assertion
failed: (s->fd >= 0 && s->qmp_fd >= 0)
FAIL
GTester: last random seed: R02Sc223c7a9fe71ba2c8b765236341f9eb4
(pid=6367)
  /i386/acpi/q35/tcg:  OK
  /i386/acpi/q35/tcg/bridge:   OK
  /i386/acpi/q35/tcg/ipmi: OK
  /i386/acpi/q35/tcg/cpuhp:OK
  /i386/acpi/q35/tcg/pxb-pcie:
qemu-system-i386: -object
memory-backend-file,size=4G,mem-path=/tmp/shmem,share,id=mb: memory
size 0x0 must be equal to or larger than page size 0x1000
socket_accept failed: Resource temporarily unavailable
**
ERROR:/home/petmay01/qemu/tests/libqtest.c:197:qtest_init: assertion
failed: (s->fd >= 0 && s->qmp_fd >= 0)
FAIL
GTester: last random seed: R02S99f74d1571ae513e9bc2f6919818463a
(pid=6386)
FAIL: tests/bios-tables-test

There is also a new warning from clang's sanitizer:
GTESTER check-qtest-i386
/home/petmay01/linaro/qemu-for-merges/hw/pci/pci.c:196:23: runtime
error: shift exponent -1 is negative

And tests/bios-tables-test hung under OSX:

(gdb) thread apply all bt

Thread 1 (process 53686):
#0  0x7fff868df762 in accept ()
#1  0x000100630b48 in socket_accept (sock=3) at
/Users/pm215/src/qemu-for-merges/tests/libqtest.c:86
#2  0x000100630741 in qtest_init (extra_args=0x7f83a1516dd0 "-net
none -display none -machine accel=tcg -device
pxb,id=pxb,bus_nr=0x80,bus=pci.0 -object
memory-backend-file,size=4G,mem-path=/tmp/shmem,share,id=mb -device
ivshmem-plain,memdev=mb,bus=pxb -drive i"...) at
/Users/pm215/src/qemu-for-merges/tests/libqtest.c:188
#3  0x00010062c3db in qtest_start [inlined] () at
/Users/pm215/src/qemu-for-merges/tests/libqtest.h:719
#4  0x00010062c3db in test_acpi_one (params=, data=0x7fff5f5d4168) at
libqtest.h:719
#5  0x00010062c2cd in test_acpi_piix4_tcg_pxb () at
/Users/pm215/src/qemu-for-merges/tests/bios-tables-test.c:876
#6  0x00010073991d in g_test_run_suite_internal ()
#7  0x000100739ae1 in g_test_run_suite_internal ()
#8  0x000100739ae1 in g_test_run_suite_internal ()
#9  0x000100739ae1 in g_test_run_suite_internal ()
#10 0x000100739ae1 in g_test_run_suite_internal ()
#11 0x000100739198 in g_test_run_suite ()
#12 0x00010062bdee in main (argc=, argv=) at
/Users/pm215/src/qemu-for-merges/tests/bios-tables-test.c:925

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/3] seabios: add serial console support

On Tue, Jul 05, 2016 at 12:00:48PM +0200, Gerd Hoffmann wrote:
> On Di, 2016-07-05 at 09:06 +0100, Daniel P. Berrange wrote:
> > On Mon, Jul 04, 2016 at 10:39:51PM +0200, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > Next round of patches.  Changes:
> > > 
> > >  * Moved it all to a new sercon.c file.
> > >  * Code maps cp437 to utf8 now, giving a much nicer display.  Compare
> > >"Use the ↑ and ↓ keys to change the selection." (this series) with
> > >"Use the ^ and v keys to change the selection." (sgabios)  ;-)
> > >  * Simplified keyboard code, using enqueue_key now.
> > >  * Restructed code, to cleanup things and to address review comments.
> > 
> > Currently libvirt has an option to turn on serial console support
> > for the BIOS. When this is set it adds the sga device. How will
> > libvirt know when seabios has this feature built-in, and thus does
> > not need to add the sga device ? 
> 
> Current code activates the serial console (unconditionally) in case no
> vga is present.  I want also support output on both vga and serial
> console, but code for the later isn't there yet.
> 
> What the final default behavior will be is not clear yet.  Not enabled?
> Enabled in case no VGA is present?  Enabled unconditionally (simliar to
> ovmf)?

(Bitter) experiance in libvirt has shown us that magically enabling
things based on whether or not some other feature is enabled leads
to pain and suffering in the long term.

So from libvirt's POV, we would like an explicit command line flag
to turn on/off seabios serial console support, with no dependancy
on whether VGA is present or not.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) 
> > > wrote:
> > > > From: "Dr. David Alan Gilbert" 
> > > > 
> > > > QEMU sets the guests physical address bits to 40; this is wrong
> > > > on most hardware, and can be detected by the guest.
> > > > It also stops you using really huge multi-TB VMs.
> > > > 
> > > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > > of years that reads the hosts value and uses that in the guest.  That's
> > > > correct as far as the guest sees it, and lets you create huge VMs.
> > > > 
> > > > The downside, is that if you've got a mix of hosts, say an i7 and a 
> > > > Xeon,
> > > > life gets complicated in migration; prior to 2.6 it all apparently
> > > > worked (although a guest that looked might spot the change).
> > > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > > incoming MTRR mask didn't fit.
> > > > 
> > > > This series:
> > > >a) Fixes up mtrr masks so that if you're migrating between hosts
> > > >   of different physical address size it tries to do something 
> > > > sensible.
> > > > 
> > > >b) Lets you specify the guest physical address size via a CPU 
> > > > property, i.e.
> > > > -cpu SandyBridge,phys-bits=36
> > > > 
> > > >   The default on old machine types is to use the existing 40 bits 
> > > > value.
> > > > 
> > > >c) Lets you tell qemu to use the same setting as the host, i.e.
> > > > -cpu SandyBridge,phys-bits=0
> > > >  
> > > >   This is the default on new machine types.
> > > > 
> > > > Note that mixed size hosts are still not necessarily safe; a guest
> > > > started on a host with a large physical address size might start using
> > > > those bits and get upset when it's moved to a small host.
> > > > However that was already potentially broken in existing qemu that
> > > > used a magic value of 40.
> > > > 
> > > > There's potential to add some extra guards against people
> > > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > > address space on a tiny host.
> > > > 
> > > > Dave
> > > 
> > > This is all in target-i386 so if the maintainers want it this way, they
> > > can merge this, and I do not have strong objections, but I wanted to
> > > document an alternative that is IMHO somewhat nicer. Feel free to
> > > ignore.  See below.
> > > 
> > > How can guest use more memory than what host supports?
> > > I think there are two ways:
> > > 
> > > 1. more memory than host supports is supplied
> > >This is a configuration error. We can simply detect this
> > >and fail init, or print a warning, no need for new flags.
> > 
> > Yes we should do that; however there's a case that's potentially
> > currently working for people but actually kind of illegal.
> > That case is specifying a small amount of actual memory
> > but a large maxmem - i.e.:
> > 
> >  -m 2G,slots=16,maxmem=2T
> > 
> > On a host with a 39bit physaddress limit do you error
> > on that or not?  I think oVirt is currently doing something
> > similar to that, but I'm trying to get confirmation.
> 
> That would only be a problem since pci is allocated above
> maxmem so 64 bit pci addresses aren't accessible.
> With my proposal we can actually force firmware to avoid
> using 64 bit memory for that config.
> Will work better than today.
> 
> 
> > > 2. pci addresses out of host range assigned by guest
> > >Again normally at least seabios will not do this,
> > >maybe OVMF will?
> > >we certainly can add an interface telling firmware
> > >what the limit is.
> > > 
> > > Thus an alternative is:
> > > - add interface to tell QEMU how much 64 bit memory can pci use.
> > > - teach firmware to limit itself to that
> > > - set guest bits to 48 unconditionally
> > > 
> > > 
> > > the disadvantage of this approach is that firmware needs to be changed
> > 
> > I guess it also needs the CRS to tell the guest OS not
> > to remap PCI stuff into that space?
> 
> CRS is a list of legal addresses, not list of illegal ones.
> So just don't include what's illegal there.
> 
> >  I thought also from the previous
> > discussions that the guest would get a different exception if it
> > actually tried to use any of the bits below 48 it didn't have.
> 
> Basically if you try to map pci at an address outside CRS
> you can get any kind of crash since there could be on-board
> hardware handling these addresses.
> So I do not think we care about that.

The issue about guest bits is not purely about PCI addresses though;
I thought it was also to do with visible behaviour/exceptions in
page tables.

> > > the advantage is that we get seemless migration between different
> > > hosts as long as they both can support the configuration,
> > > without any management effort.
> > 
> > The reality

Re: [Qemu-devel] [RFC 00/13] Live memory snapshot based on userfaultfd

2016-07-05 Thread Hailiang Zhang

On 2016/7/5 17:57, Baptiste Reynal wrote:

On Tue, Jul 5, 2016 at 3:49 AM, Hailiang Zhang
wrote:

On 2016/7/4 20:22, Baptiste Reynal wrote:

On Thu, Jan 7, 2016 at 1:19 PM, zhanghailiang
wrote:

For now, we still didn't support live memory snapshot, we have discussed
a scheme which based on userfaultfd long time ago.
You can find the discussion by the follow link:
https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg01779.html

The scheme is based on userfaultfd's write-protect capability.
The userfaultfd write protection feature is available here:
http://www.spinics.net/lists/linux-mm/msg97422.html

The process of this live memory scheme is like bellow:
1. Pause VM
2. Enable write-protect fault notification by using userfaultfd to
mark VM's memory to write-protect (readonly).
3. Save VM's static state (here is device state) to snapshot file
4. Resume VM, VM is going to run.
5. Snapshot thread begins to save VM's live state (here is RAM) into
snapshot file.
6. During this time, all the actions of writing VM's memory will be
blocked
by kernel, and kernel will wakeup the fault treating thread in qemu to
process this write-protect fault. The fault treating thread will
deliver this
page's address to snapshot thread.
7. snapshot thread gets this address, save this page into snasphot file,
and then remove the write-protect by using userfaultfd API, after
that,
the actions of writing will be recovered.
8. Repeat step 5~7 until all VM's memory is saved to snapshot file

Compared with the feature of 'migrate VM's state to file',
the main difference for live memory snapshot is it has little time delay
for
catching VM's state. It just captures the VM's state while got users
snapshot
command, just like take a photo of VM's state.

For now, we only support tcg accelerator, since userfaultfd is not
supporting
tracking write faults for KVM.

Usage:
1. Take a snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine
pc-i440fx-2.5,accel=tcg,usb=off -drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio
Issue snapshot command:
(qemu)migrate -d file:/home/Snapshot
2. Revert to the snapshot
#x86_64-softmmu/qemu-system-x86_64 -machine
pc-i440fx-2.5,accel=tcg,usb=off -drive
file=/mnt/windows/win7_install.qcow2.bak,if=none,id=drive-ide0-0-1,format=qcow2,cache=none
-device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -vnc :7 -m
8192 -smp 1 -netdev tap,id=bn0 -device virtio-net-pci,id=net-pci0,netdev=bn0
--monitor stdio -incoming file:/home/Snapshot

NOTE:
The userfaultfd write protection feature does not support THP for now,
Before taking snapshot, please disable THP by:
echo never > /sys/kernel/mm/transparent_hugepage/enabled

TODO:
- Reduce the influence for VM while taking snapshot

zhanghailiang (13):
postcopy/migration: Split fault related state into struct
UserfaultState
migration: Allow the migrate command to work on file: urls
migration: Allow -incoming to work on file: urls
migration: Create a snapshot thread to realize saving memory snapshot
migration: implement initialization work for snapshot
QEMUSizedBuffer: Introduce two help functions for qsb
savevm: Split qemu_savevm_state_complete_precopy() into two helper
functions
snapshot: Save VM's device state into snapshot file
migration/postcopy-ram: fix some helper functions to support
userfaultfd write-protect
snapshot: Enable the write-protect notification capability for VM's
RAM
snapshot/migration: Save VM's RAM into snapshot file
migration/ram: Fix some helper functions' parameter to use
PageSearchStatus
snapshot: Remove page's write-protect and copy the content during
setup stage

--
1.8.3.1

Hi,

Hi Hailiang,

Can I get the status of this patch series ? I cannot find a v2.

Yes, I haven't updated it for long time, it is based on userfault-wp API
in kernel, and Andrea didn't update the related patches until recent days.
I will update this series in the next one or two week

[Qemu-devel] [RFC PATCH V3 2/3] filter-rewriter: track connection and parse packet

We use colo-base.h to track connection and parse packet

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/filter-rewriter.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index c4f2739..7f0da2c 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -48,6 +48,20 @@ typedef struct RewriterState {
 uint32_t hashtable_size;
 } RewriterState;
 
+/*
+ * Return 1 on success, if return 0 means the pkt
+ * is not TCP packet
+ */
+static int is_tcp_packet(Packet *pkt)
+{
+if (!parse_packet_early(pkt) &&
+pkt->ip->ip_p == IPPROTO_TCP) {
+return 1;
+} else {
+return 0;
+}
+}
+
 static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  NetClientState *sender,
  unsigned flags,
@@ -55,11 +69,47 @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
  int iovcnt,
  NetPacketSent *sent_cb)
 {
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+Connection *conn;
+ConnectionKey key = {{ 0 } };
+Packet *pkt;
+ssize_t size = iov_size(iov, iovcnt);
+char *buf = g_malloc0(size);
+
+iov_to_buf(iov, iovcnt, 0, buf, size);
+pkt = packet_new(buf, size);
+
 /*
  * if we get tcp packet
  * we will rewrite it to make secondary guest's
  * connection established successfully
  */
+if (is_tcp_packet(pkt)) {
+if (sender == nf->netdev) {
+fill_connection_key(pkt, &key, SECONDARY);
+} else {
+fill_connection_key(pkt, &key, PRIMARY);
+}
+
+conn = connection_get(s->connection_track_table,
+  &key,
+  &s->hashtable_size);
+if (!conn->processing) {
+g_queue_push_tail(&s->conn_list, conn);
+conn->processing = true;
+}
+
+if (sender == nf->netdev) {
+/* NET_FILTER_DIRECTION_TX */
+/* handle_primary_tcp_pkt */
+} else {
+/* NET_FILTER_DIRECTION_RX */
+/* handle_secondary_tcp_pkt */
+}
+}
+
+packet_destroy(pkt, NULL);
+pkt = NULL;
 return 0;
 }
 
-- 
2.7.4

[Qemu-devel] [RFC PATCH V3 3/3] filter-rewriter: rewrite tcp packet to keep secondary connection

We will rewrite tcp packet secondary received and sent.
When colo guest is a tcp server.

Firstly, client start a tcp handshake. the packet's seq=client_seq,
ack=0,flag=SYN. COLO primary guest get this pkt and mirror(filter-mirror)
to secondary guest, secondary get it use filter-redirector.
Then,primary guest response pkt
(seq=primary_seq,ack=client_seq+1,flag=ACK|SYN).
secondary guest response pkt
(seq=secondary_seq,ack=client_seq+1,flag=ACK|SYN).
In here,we use filter-rewriter save the secondary_seq to it's tcp connection.
Finally handshake,client send pkt
(seq=client_seq+1,ack=primary_seq+1,flag=ACK).
Here,filter-rewriter can get primary_seq, and rewrite ack from primary_seq+1
to secondary_seq+1, recalculate checksum. So the secondary tcp connection
kept good.

When we send/recv packet.
client send pkt(seq=client_seq+1+data_len,ack=primary_seq+1,flag=ACK|PSH).
filter-rewriter rewrite ack and send to secondary guest.

primary guest response pkt
(seq=primary_seq+1,ack=client_seq+1+data_len,flag=ACK)
secondary guest response pkt
(seq=secondary_seq+1,ack=client_seq+1+data_len,flag=ACK)
we rewrite secondary guest seq from secondary_seq+1 to primary_seq+1.
So tcp connection kept good.

In code We use offset( = secondary_seq - primary_seq )
to rewrite seq or ack.
handle_primary_tcp_pkt: tcp_pkt->th_ack += offset;
handle_secondary_tcp_pkt: tcp_pkt->th_seq -= offset;

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/colo-base.c   |   2 +
 net/colo-base.h   |   7 
 net/filter-rewriter.c | 108 +-
 trace-events  |   5 +++
 4 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/net/colo-base.c b/net/colo-base.c
index 9673661..58fbd9d 100644
--- a/net/colo-base.c
+++ b/net/colo-base.c
@@ -123,6 +123,8 @@ Connection *connection_new(ConnectionKey *key)
 
 conn->ip_proto = key->ip_proto;
 conn->processing = false;
+conn->offset = 0;
+conn->syn_flag = 0;
 g_queue_init(&conn->primary_list);
 g_queue_init(&conn->secondary_list);
 
diff --git a/net/colo-base.h b/net/colo-base.h
index 62460c5..353bd55 100644
--- a/net/colo-base.h
+++ b/net/colo-base.h
@@ -71,6 +71,13 @@ typedef struct Connection {
 uint8_t ip_proto;
 /* be used by filter-rewriter */
 colo_conn_state state;
+/* offset = secondary_seq - primary_seq */
+tcp_seq  offset;
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+int syn_flag;
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index 7f0da2c..f911f99 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -21,6 +21,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/iov.h"
 #include "net/checksum.h"
+#include "trace.h"
 
 #define FILTER_COLO_REWRITER(obj) \
 OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
@@ -62,6 +63,89 @@ static int is_tcp_packet(Packet *pkt)
 }
 }
 
+/* handle tcp packet from primary guest */
+static int handle_primary_tcp_pkt(NetFilterState *nf,
+  Connection *conn,
+  Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_layer;
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char *sdebug, *ddebug;
+sdebug = strdup(inet_ntoa(pkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(pkt->ip->ip_dst));
+trace_colo_filter_rewriter_pkt_info(__func__, sdebug, ddebug,
+ntohl(tcp_pkt->th_seq), ntohl(tcp_pkt->th_ack),
+tcp_pkt->th_flags);
+trace_colo_filter_rewriter_conn_offset(conn->offset);
+g_free(sdebug);
+g_free(ddebug);
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+/*
+ * we use this flag update offset func
+ * run once in independent tcp connection
+ */
+conn->syn_flag = 1;
+}
+
+if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
+if (conn->syn_flag) {
+/* offset = secondary_seq - primary seq */
+conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
+conn->syn_flag = 0;
+}
+/* handle packets to the secondary from the primary */
+tcp_pkt->th_ack = htonl(ntohl(tcp_pkt->th_ack) + conn->offset);
+
+net_checksum_calculate((uint8_t *)pkt->data, pkt->size);
+}
+
+return 0;
+}
+
+/* handle tcp packet from secondary guest */
+static int handle_secondary_tcp_pkt(NetFilterState *nf,
+Connection *conn,
+Packet *pkt)
+{
+struct tcphdr *tcp_pkt;
+
+tcp_pkt = (struct tcphdr *)pkt->transport_layer;
+
+if (trace_event_get_state(TRACE_COLO_FILTER_REWRITER_DEBUG)) {
+char *sdebug, *ddebug;
+sdebug = strdup(inet_ntoa(pkt->i

[Qemu-devel] [RFC PATCH V3 1/3] filter-rewriter: introduce filter-rewriter initialization

Filter-rewriter is a part of COLO project.
It will rewrite some of secondary packet to make
secondary guest's tcp connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

usage:

colo secondary:
-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
-object filter-rewriter,id=rew0,netdev=hn0,queue=all

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
 net/Makefile.objs |   1 +
 net/filter-rewriter.c | 108 ++
 qemu-options.hx   |  13 ++
 vl.c  |   3 +-
 4 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 net/filter-rewriter.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 119589f..645bd10 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -18,3 +18,4 @@ common-obj-y += filter-buffer.o
 common-obj-y += filter-mirror.o
 common-obj-y += colo-compare.o
 common-obj-y += colo-base.o
+common-obj-y += filter-rewriter.o
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
new file mode 100644
index 000..c4f2739
--- /dev/null
+++ b/net/filter-rewriter.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "net/colo-base.h"
+#include "net/filter.h"
+#include "net/net.h"
+#include "qemu-common.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/main-loop.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+
+#define FILTER_COLO_REWRITER(obj) \
+OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
+
+#define TYPE_FILTER_REWRITER "filter-rewriter"
+
+enum {
+PRIMARY = 0,
+SECONDARY,
+};
+
+typedef struct RewriterState {
+NetFilterState parent_obj;
+/* connection list: the connections belonged to this NIC could be found
+ * in this list.
+ * element type: Connection
+ */
+GQueue conn_list;
+NetQueue *incoming_queue;
+/* hashtable to save connection */
+GHashTable *connection_track_table;
+/* to save unprocessed_connections */
+GQueue unprocessed_connections;
+/* current hash size */
+uint32_t hashtable_size;
+} RewriterState;
+
+static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
+ NetClientState *sender,
+ unsigned flags,
+ const struct iovec *iov,
+ int iovcnt,
+ NetPacketSent *sent_cb)
+{
+/*
+ * if we get tcp packet
+ * we will rewrite it to make secondary guest's
+ * connection established successfully
+ */
+return 0;
+}
+
+static void colo_rewriter_cleanup(NetFilterState *nf)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+g_queue_free(&s->conn_list);
+}
+
+static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
+{
+RewriterState *s = FILTER_COLO_REWRITER(nf);
+
+g_queue_init(&s->conn_list);
+s->hashtable_size = 0;
+
+s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+  connection_key_equal,
+  g_free,
+  connection_destroy);
+s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+}
+
+static void colo_rewriter_class_init(ObjectClass *oc, void *data)
+{
+NetFilterClass *nfc = NETFILTER_CLASS(oc);
+
+nfc->setup = colo_rewriter_setup;
+nfc->cleanup = colo_rewriter_cleanup;
+nfc->receive_iov = colo_rewriter_receive_iov;
+}
+
+static const TypeInfo colo_rewriter_info = {
+.name = TYPE_FILTER_REWRITER,
+.parent = TYPE_NETFILTER,
+.class_init = colo_rewriter_class_init,
+.instance_size = sizeof(RewriterState),
+};
+
+static void register_types(void)
+{
+type_register_static(&colo_rewriter_info);
+}
+
+type_init(register_types);
diff --git a/qemu-options.hx b/qemu-options.hx
index 14bade5..4afd511 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3859,6 +3859,19 @@ Create a filter-redirector we need to differ outdev id 
from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or 
outdev
 need to be specified.
 
+@item -object 
filter-rewriter,id=@var{id},netdev=@var{netdevid},rewriter-mode=@var{mode}[,queue=@var{all|rx|tx}]
+
+Filter-rewriter is a part of COLO project.It will rewrite tcp packet

[Qemu-devel] [RFC PATCH V3 0/3] filter-rewriter: introduce filter-rewriter

Filter-rewriter is a part of COLO project.
So this patch set depend on colo-compare.
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

v3:
  - fix typo
  - add conn->syn_flag
  - add some comments
  - remove conn_list_lock
  - fix offset set bug

v2:
  - add more comments about packet flows
  - add some trace-event
  - add seq offset ( = secondary_seq - primary_seq)

v1:
  - initial patch

Zhang Chen (3):
  filter-rewriter: introduce filter-rewriter initialization
  filter-rewriter: track connection and parse packet
  filter-rewriter: rewrite tcp packet to keep secondary connection

 net/Makefile.objs |   1 +
 net/colo-base.c   |   2 +
 net/colo-base.h   |   7 ++
 net/filter-rewriter.c | 262 ++
 qemu-options.hx   |  13 +++
 trace-events  |   5 +
 vl.c  |   3 +-
 7 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 net/filter-rewriter.c

-- 
2.7.4

Re: [Qemu-devel] [PATCH v8 7/7] trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state

2016-07-05 Thread Lluís Vilanova

Eric Blake writes:

> On 07/04/2016 03:41 AM, Lluís Vilanova wrote:
>> Signed-off-by: Lluís Vilanova 
>> Reviewed-by: Stefan Hajnoczi 
>> ---
>> hmp-commands-info.hx |6 +-
>> hmp-commands.hx  |7 +-
>> monitor.c|   17 +-
>> qapi/trace.json  |   32 +--
>> qmp-commands.hx  |   35 +++-
>> trace/qmp.c  |  148 
>> --
>> 6 files changed, 202 insertions(+), 43 deletions(-)
>> 

>> +++ b/qapi/trace.json
>> @@ -1,6 +1,6 @@
>> # -*- mode: python -*-
>> #
>> -# Copyright (C) 2011-2014 Lluís Vilanova 
>> +# Copyright (C) 2011-2016 Lluís Vilanova 
>> #
>> # This work is licensed under the terms of the GNU GPL, version 2 or later.
>> # See the COPYING file in the top-level directory.
>> @@ -29,11 +29,14 @@
>> #
>> # @name: Event name.
>> # @state: Tracing state.
>> +# @vcpu: Whether this is a per-vCPU event (since 2.7).
>> +#
>> +# An event is per-vCPU if it has the "vcpu" property in the "trace-events" 
>> file.

> Is this comment still true, now that we've split trace-events into
> multiple files?

It is true for the few events we have now with this property. I can change it
to:

  An event is per-vCPU if it has the "vcpu" property in the "trace-events"
  files.

Note the plural in "files".


Cheers,
  Lluis

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

On Tue, Jul 05, 2016 at 11:13:26AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Tue, Jul 05, 2016 at 10:33:25AM +0100, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > > On Mon, Jul 04, 2016 at 08:16:03PM +0100, Dr. David Alan Gilbert (git) 
> > > > wrote:
> > > > > From: "Dr. David Alan Gilbert" 
> > > > > 
> > > > > QEMU sets the guests physical address bits to 40; this is wrong
> > > > > on most hardware, and can be detected by the guest.
> > > > > It also stops you using really huge multi-TB VMs.
> > > > > 
> > > > > Red Hat has had a patch, that Andrea wrote, downstream for a couple
> > > > > of years that reads the hosts value and uses that in the guest.  
> > > > > That's
> > > > > correct as far as the guest sees it, and lets you create huge VMs.
> > > > > 
> > > > > The downside, is that if you've got a mix of hosts, say an i7 and a 
> > > > > Xeon,
> > > > > life gets complicated in migration; prior to 2.6 it all apparently
> > > > > worked (although a guest that looked might spot the change).
> > > > > In 2.6 Paolo started checking MSR writes and they failed when the
> > > > > incoming MTRR mask didn't fit.
> > > > > 
> > > > > This series:
> > > > >a) Fixes up mtrr masks so that if you're migrating between hosts
> > > > >   of different physical address size it tries to do something 
> > > > > sensible.
> > > > > 
> > > > >b) Lets you specify the guest physical address size via a CPU 
> > > > > property, i.e.
> > > > > -cpu SandyBridge,phys-bits=36
> > > > > 
> > > > >   The default on old machine types is to use the existing 40 bits 
> > > > > value.
> > > > > 
> > > > >c) Lets you tell qemu to use the same setting as the host, i.e.
> > > > > -cpu SandyBridge,phys-bits=0
> > > > >  
> > > > >   This is the default on new machine types.
> > > > > 
> > > > > Note that mixed size hosts are still not necessarily safe; a guest
> > > > > started on a host with a large physical address size might start using
> > > > > those bits and get upset when it's moved to a small host.
> > > > > However that was already potentially broken in existing qemu that
> > > > > used a magic value of 40.
> > > > > 
> > > > > There's potential to add some extra guards against people
> > > > > doing silly stuff; e.g. stop people running VMs using 1TB of
> > > > > address space on a tiny host.
> > > > > 
> > > > > Dave
> > > > 
> > > > This is all in target-i386 so if the maintainers want it this way, they
> > > > can merge this, and I do not have strong objections, but I wanted to
> > > > document an alternative that is IMHO somewhat nicer. Feel free to
> > > > ignore.  See below.
> > > > 
> > > > How can guest use more memory than what host supports?
> > > > I think there are two ways:
> > > > 
> > > > 1. more memory than host supports is supplied
> > > >This is a configuration error. We can simply detect this
> > > >and fail init, or print a warning, no need for new flags.
> > > 
> > > Yes we should do that; however there's a case that's potentially
> > > currently working for people but actually kind of illegal.
> > > That case is specifying a small amount of actual memory
> > > but a large maxmem - i.e.:
> > > 
> > >  -m 2G,slots=16,maxmem=2T
> > > 
> > > On a host with a 39bit physaddress limit do you error
> > > on that or not?  I think oVirt is currently doing something
> > > similar to that, but I'm trying to get confirmation.
> > 
> > That would only be a problem since pci is allocated above
> > maxmem so 64 bit pci addresses aren't accessible.
> > With my proposal we can actually force firmware to avoid
> > using 64 bit memory for that config.
> > Will work better than today.
> > 
> > 
> > > > 2. pci addresses out of host range assigned by guest
> > > >Again normally at least seabios will not do this,
> > > >maybe OVMF will?
> > > >we certainly can add an interface telling firmware
> > > >what the limit is.
> > > > 
> > > > Thus an alternative is:
> > > > - add interface to tell QEMU how much 64 bit memory can pci use.
> > > > - teach firmware to limit itself to that
> > > > - set guest bits to 48 unconditionally
> > > > 
> > > > 
> > > > the disadvantage of this approach is that firmware needs to be changed
> > > 
> > > I guess it also needs the CRS to tell the guest OS not
> > > to remap PCI stuff into that space?
> > 
> > CRS is a list of legal addresses, not list of illegal ones.
> > So just don't include what's illegal there.
> > 
> > >  I thought also from the previous
> > > discussions that the guest would get a different exception if it
> > > actually tried to use any of the bits below 48 it didn't have.
> > 
> > Basically if you try to map pci at an address outside CRS
> > you can get any kind of crash since there could be on-board
> > hardware handling these addresses.
> > So I do not think we care about that.
> 
> The issue about guest bits is not pu

Re: [Qemu-devel] [PATCH v2 6/6] x86: Add sanity checks on phys_bits

* Eduardo Habkost (ehabk...@redhat.com) wrote:
> On Mon, Jul 04, 2016 at 08:16:09PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" 
> > 
> > Add some sanity checks on the phys-bits setting now that
> > the user can set it.
> >a) That it's in a sane range (52..32)
> >b) Warn if it mismatches the host and isn't the old default.
> > 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> >  target-i386/cpu.c | 13 +
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index e15abea..5402002 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -2985,6 +2985,19 @@ static void x86_cpu_realizefn(DeviceState *dev, 
> > Error **errp)
> >  /* The user asked for us to use the host physical bits */
> >  cpu->phys_bits = host_phys_bits;
> >  
> > +} else if (cpu->phys_bits > 52 || cpu->phys_bits < 32) {
> > +error_setg(errp, "phys_bits should be between 32 and 52 or 0 
> > to"
> > + " use host size (but is %u)", cpu->phys_bits);
> > +return;
> > +}
> 
> This check belongs to patch 1/6, doesn't it?

Yes, I can move it to there.

> Here we have the same magic number (52), and I don't know where
> it came from. Maybe it should become a (documented) macro?
> 
> Also, won't this make the "phys_bits < 52" check added by patch
> 3/6 unnecessary?

Possibly; although it did feel safer to put that in where we were generating
the bitmask.

> > +/* Print a warning if the user set it to a value that's not the
> > + * host value; ignore the magic value 40 because it may well just
> > + * be the old machine type.
> > + */
> 
> With this, we won't print a warning if "phys-bits=40" is set
> explicitly. If we want to disable the warning only for the old
> machine-types, we can add a boolean flag that disables it.

Yes, I can do that.

> > +if (cpu->phys_bits != host_phys_bits && cpu->phys_bits != 40) {
> > +fprintf(stderr, "Warning: Host physical bits (%u)"
> > +" does not match phys_bits (%u)\n",
> > +host_phys_bits, cpu->phys_bits);
> 
> Shouldn't we use error_report() for this?
> 
> Also, this prints a warning for each VCPU. This is not the first
> time we want to print a warning only once (see
> x86_cpu_apic_id_from_index() in hw/i386/pc.c and ht_warned in
> target-i386/cpu.c). It looks like QEMU needs a warn_once()
> helper.

OK, will do.

Dave

> >  }
> >  } else {
> >  /* For 32 bit systems don't use the user set value, but keep
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v1 2/2] Revert "block: don't register quorum driver if SHA256 support is unavailable"

The qcrypto hash APIs now guarantee that sha256 is available at
compile time, so skipping registration is rarely needed. A check
at time of open is kept to ensure good error reporting in the
(unlikely) case sha256 is runtime disabled.

This reverts commit e94867ed5f241008d0f53142b2704a075f9ed505.
---
 block/quorum.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 331b726..ed02cce 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -878,6 +878,12 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 int i;
 int ret = 0;
 
+if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
+error_setg(errp,
+   "SHA256 hash support is required for quorum device");
+return -EINVAL;
+}
+
 qdict_flatten(options);
 
 /* count how many different children are present */
@@ -1113,10 +1119,6 @@ static BlockDriver bdrv_quorum = {
 
 static void bdrv_quorum_init(void)
 {
-if (!qcrypto_hash_supports(QCRYPTO_HASH_ALG_SHA256)) {
-/* SHA256 hash support is required for quorum device */
-return;
-}
 bdrv_register(&bdrv_quorum);
 }
 
-- 
2.7.4

[Qemu-devel] [PATCH v1 1/2] crypto: use glib as fallback for hash algorithm

GLib >= 2.16 provides GChecksum API which is good enough
for md5, sha1, sha256 and sha512. Use this as a final
fallback if neither nettle or gcrypt are available. This
lets us remove the stub hash impl, and so callers can
be sure those 4 algs are always available at compile
time. They may still be disabled at runtime, so a check
for qcrypto_hash_supports() is still best practice to
report good error messages.

Signed-off-by: Daniel P. Berrange 
---
 crypto/Makefile.objs |  2 +-
 crypto/hash-glib.c   | 94 
 crypto/hash-stub.c   | 41 ---
 3 files changed, 95 insertions(+), 42 deletions(-)
 create mode 100644 crypto/hash-glib.c
 delete mode 100644 crypto/hash-stub.c

diff --git a/crypto/Makefile.objs b/crypto/Makefile.objs
index 1f86f4f..e409b89 100644
--- a/crypto/Makefile.objs
+++ b/crypto/Makefile.objs
@@ -2,6 +2,7 @@ crypto-obj-y = init.o
 crypto-obj-y += hash.o
 crypto-obj-$(CONFIG_NETTLE) += hash-nettle.o
 crypto-obj-$(if $(CONFIG_NETTLE),n,$(CONFIG_GCRYPT)) += hash-gcrypt.o
+crypto-obj-$(if $(CONFIG_NETTLE),n,$(if $(CONFIG_GCRYPT),n,y)) += hash-glib.o
 crypto-obj-y += aes.o
 crypto-obj-y += desrfb.o
 crypto-obj-y += cipher.o
@@ -30,4 +31,3 @@ crypto-aes-obj-y = aes.o
 
 stub-obj-y += random-stub.o
 stub-obj-y += pbkdf-stub.o
-stub-obj-y += hash-stub.o
diff --git a/crypto/hash-glib.c b/crypto/hash-glib.c
new file mode 100644
index 000..81ef7ca
--- /dev/null
+++ b/crypto/hash-glib.c
@@ -0,0 +1,94 @@
+/*
+ * QEMU Crypto hash algorithms
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "crypto/hash.h"
+
+
+static int qcrypto_hash_alg_map[QCRYPTO_HASH_ALG__MAX] = {
+[QCRYPTO_HASH_ALG_MD5] = G_CHECKSUM_MD5,
+[QCRYPTO_HASH_ALG_SHA1] = G_CHECKSUM_SHA1,
+[QCRYPTO_HASH_ALG_SHA224] = -1,
+[QCRYPTO_HASH_ALG_SHA256] = G_CHECKSUM_SHA256,
+[QCRYPTO_HASH_ALG_SHA384] = -1,
+[QCRYPTO_HASH_ALG_SHA512] = G_CHECKSUM_SHA512,
+[QCRYPTO_HASH_ALG_RIPEMD160] = -1,
+};
+
+gboolean qcrypto_hash_supports(QCryptoHashAlgorithm alg)
+{
+if (alg < G_N_ELEMENTS(qcrypto_hash_alg_map) &&
+qcrypto_hash_alg_map[alg] != -1) {
+return true;
+}
+return false;
+}
+
+
+int qcrypto_hash_bytesv(QCryptoHashAlgorithm alg,
+const struct iovec *iov,
+size_t niov,
+uint8_t **result,
+size_t *resultlen,
+Error **errp)
+{
+int i, ret;
+GChecksum *cs;
+
+if (alg >= G_N_ELEMENTS(qcrypto_hash_alg_map) ||
+qcrypto_hash_alg_map[alg] == -1) {
+error_setg(errp,
+   "Unknown hash algorithm %d",
+   alg);
+return -1;
+}
+
+cs = g_checksum_new(qcrypto_hash_alg_map[alg]);
+
+for (i = 0; i < niov; i++) {
+g_checksum_update(cs, iov[i].iov_base, iov[i].iov_len);
+}
+
+ret = g_checksum_type_get_length(qcrypto_hash_alg_map[alg]);
+if (ret < 0) {
+error_setg(errp, "%s",
+   "Unable to get hash length");
+goto error;
+}
+if (*resultlen == 0) {
+*resultlen = ret;
+*result = g_new0(uint8_t, *resultlen);
+} else if (*resultlen != ret) {
+error_setg(errp,
+   "Result buffer size %zu is smaller than hash %d",
+   *resultlen, ret);
+goto error;
+}
+
+g_checksum_get_digest(cs, *result, resultlen);
+
+g_checksum_free(cs);
+return 0;
+
+ error:
+g_checksum_free(cs);
+return -1;
+}
diff --git a/crypto/hash-stub.c b/crypto/hash-stub.c
deleted file mode 100644
index 8a9b8d4..000
--- a/crypto/hash-stub.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * QEMU Crypto hash algorithms
- *
- * Copyright (c) 2016 Red Hat, Inc.
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GN

[Qemu-devel] [PATCH v1 0/2] Use GChecksum as fallback hash impl

This uses the GChecksum APIs as final hash impl, instead of
a no-op stub. This lets us remove conditional registration
of the quorum driver.

Daniel P. Berrange (2):
  crypto: use glib as fallback for hash algorithm
  Revert "block: don't register quorum driver if SHA256 support is
unavailable"

 block/quorum.c   | 10 +++---
 crypto/Makefile.objs |  2 +-
 crypto/hash-glib.c   | 94 
 crypto/hash-stub.c   | 41 ---
 4 files changed, 101 insertions(+), 46 deletions(-)
 create mode 100644 crypto/hash-glib.c
 delete mode 100644 crypto/hash-stub.c

-- 
2.7.4

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

2016-07-05 Thread Paolo Bonzini

On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> >  -m 2G,slots=16,maxmem=2T
> > 
> > On a host with a 39bit physaddress limit do you error
> > on that or not?  I think oVirt is currently doing something
> > similar to that, but I'm trying to get confirmation.
> 
> That would only be a problem since pci is allocated above
> maxmem so 64 bit pci addresses aren't accessible.
> With my proposal we can actually force firmware to avoid
> using 64 bit memory for that config.
> Will work better than today.

So you would remove completely the 64-bit _CRS in this case?

How do you handle migration in the above scenario from say 46bit host to
39bit host, where the firmware has mapped (while running on the source)
a 64-bit BAR above the destination's maximum physical address?

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v2 0/6] x86: Physical address limit patches

On Tue, Jul 05, 2016 at 12:59:42PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/07/2016 12:06, Michael S. Tsirkin wrote:
> > >  -m 2G,slots=16,maxmem=2T
> > > 
> > > On a host with a 39bit physaddress limit do you error
> > > on that or not?  I think oVirt is currently doing something
> > > similar to that, but I'm trying to get confirmation.
> > 
> > That would only be a problem since pci is allocated above
> > maxmem so 64 bit pci addresses aren't accessible.
> > With my proposal we can actually force firmware to avoid
> > using 64 bit memory for that config.
> > Will work better than today.
> 
> So you would remove completely the 64-bit _CRS in this case?

Yes.

> How do you handle migration in the above scenario from say 46bit host to
> 39bit host, where the firmware has mapped (while running on the source)
> a 64-bit BAR above the destination's maximum physical address?
> 
> Thanks,
> 
> Paolo

Again management would specify how much 64 bit pci space firmware should use.
If more is specified than host can support we can error out.

-- 
MST

Re: [Qemu-devel] [PATCH 08/24] vhost-user: return a read error