Re: [Qemu-devel] [PATCH 2/3] spapr/pci: Free MSIs during reset

2019-07-28 Thread David Gibson
On Fri, Jul 26, 2019 at 04:44:44PM +0200, Greg Kurz wrote:
> When the machine is reset, the MSI bitmap is cleared but the allocated
> MSIs are not freed. Some operating systems, such as AIX, can detect the
> previous configuration and assert.
> 
> Empty the MSI cache, this performs the needed cleanup.
> 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-4.2, thanks.

> ---
>  hw/ppc/spapr_pci.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index bc22568bfa71..e45507bf2b53 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -2078,6 +2078,8 @@ static void spapr_phb_reset(DeviceState *qdev)
>  if (spapr_phb_eeh_available(SPAPR_PCI_HOST_BRIDGE(qdev))) {
>  spapr_phb_vfio_reset(qdev);
>  }
> +
> +g_hash_table_remove_all(sphb->msi);
>  }
>  
>  static Property spapr_phb_properties[] = {
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 3/3] spapr/irq: Drop spapr_irq_msi_reset()

2019-07-28 Thread David Gibson
On Fri, Jul 26, 2019 at 04:44:49PM +0200, Greg Kurz wrote:
> PHBs already take care of clearing the MSIs from the bitmap during reset
> or unplug. No need to do this globally from the machine code. Rather add
> an assert to ensure that PHBs have acted as expected.
> 
> Signed-off-by: Greg Kurz 

Applied to ppc-for-4.2, thanks.

> ---
>  hw/ppc/spapr.c |4 
>  hw/ppc/spapr_irq.c |7 ++-
>  include/hw/ppc/spapr_irq.h |1 -
>  3 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5894329f29a9..855e9fbd9805 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1739,10 +1739,6 @@ static void spapr_machine_reset(MachineState *machine)
>  ppc_set_compat(first_ppc_cpu, spapr->max_compat_pvr, &error_fatal);
>  }
>  
> -if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) {
> -spapr_irq_msi_reset(spapr);
> -}
> -
>  /*
>   * NVLink2-connected GPU RAM needs to be placed on a separate NUMA node.
>   * We assign a new numa ID per GPU in spapr_pci_collect_nvgpu() which is
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index d07aed8ca9f9..c72d8433681d 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -57,11 +57,6 @@ void spapr_irq_msi_free(SpaprMachineState *spapr, int irq, 
> uint32_t num)
>  bitmap_clear(spapr->irq_map, irq - SPAPR_IRQ_MSI, num);
>  }
>  
> -void spapr_irq_msi_reset(SpaprMachineState *spapr)
> -{
> -bitmap_clear(spapr->irq_map, 0, spapr->irq_map_nr);
> -}
> -
>  static void spapr_irq_init_kvm(SpaprMachineState *spapr,
>SpaprIrq *irq, Error **errp)
>  {
> @@ -729,6 +724,8 @@ int spapr_irq_post_load(SpaprMachineState *spapr, int 
> version_id)
>  
>  void spapr_irq_reset(SpaprMachineState *spapr, Error **errp)
>  {
> +assert(bitmap_empty(spapr->irq_map, spapr->irq_map_nr));
> +
>  if (spapr->irq->reset) {
>  spapr->irq->reset(spapr, errp);
>  }
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index f965a58f8954..44fe4f9e0e2e 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -28,7 +28,6 @@ void spapr_irq_msi_init(SpaprMachineState *spapr, uint32_t 
> nr_msis);
>  int spapr_irq_msi_alloc(SpaprMachineState *spapr, uint32_t num, bool align,
>  Error **errp);
>  void spapr_irq_msi_free(SpaprMachineState *spapr, int irq, uint32_t num);
> -void spapr_irq_msi_reset(SpaprMachineState *spapr);
>  
>  typedef struct SpaprIrq {
>  uint32_tnr_irqs;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 09/17] ppc/xive: Extend XiveTCTX with a XiveRouter pointer

2019-07-28 Thread David Gibson
On Thu, Jul 18, 2019 at 01:54:12PM +0200, Cédric Le Goater wrote:
> This is to perform lookups in the NVT table when a vCPU is dispatched
> and possibily resend interrupts.
> 
> Future XIVE chip will use a different class for the model of the
> interrupt controller and we might need to change the type of
> 'XiveRouter *' to 'Object *'
> 
> Signed-off-by: Cédric Le Goater 

Hrm.  This still bothers me.  AIUI there can be multiple XiveRouters
in the system, yes?  And at least theoretically can present irqs from
multiple routers?  In which case what's the rule for which one should
be associated with a specific.

I guess it's the one on the same chip, but that needs to be explained
up front, with some justification of why that's the relevant one.

> ---
>  include/hw/ppc/xive.h | 2 ++
>  hw/intc/xive.c| 9 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 4851ff87e795..206b23ecfab3 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -320,6 +320,8 @@ typedef struct XiveTCTX {
>  qemu_irqos_output;
>  
>  uint8_t regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
> +
> +struct XiveRouter  *xrtr;
>  } XiveTCTX;
>  
>  /*
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 88f2e560db0f..1b0eccb6df40 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -573,6 +573,14 @@ static void xive_tctx_realize(DeviceState *dev, Error 
> **errp)
>  Object *obj;
>  Error *local_err = NULL;
>  
> +obj = object_property_get_link(OBJECT(dev), "xrtr", &local_err);
> +if (!obj) {
> +error_propagate(errp, local_err);
> +error_prepend(errp, "required link 'xrtr' not found: ");
> +return;
> +}
> +tctx->xrtr = XIVE_ROUTER(obj);
> +
>  obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
>  if (!obj) {
>  error_propagate(errp, local_err);
> @@ -666,6 +674,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, 
> Error **errp)
>  object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
>  object_unref(obj);
>  object_property_add_const_link(obj, "cpu", cpu, &error_abort);
> +object_property_add_const_link(obj, "xrtr", OBJECT(xrtr), &error_abort);
>  object_property_set_bool(obj, true, "realized", &local_err);
>  if (local_err) {
>  goto error;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 09/17] ppc/xive: Extend XiveTCTX with a XiveRouter pointer

2019-07-28 Thread Cédric Le Goater
On 28/07/2019 09:46, David Gibson wrote:
> On Thu, Jul 18, 2019 at 01:54:12PM +0200, Cédric Le Goater wrote:
>> This is to perform lookups in the NVT table when a vCPU is dispatched
>> and possibily resend interrupts.
>>
>> Future XIVE chip will use a different class for the model of the
>> interrupt controller and we might need to change the type of
>> 'XiveRouter *' to 'Object *'
>>
>> Signed-off-by: Cédric Le Goater 
> 
> Hrm.  This still bothers me. 

Your feeling is right. There should be a good reason to link two objects 
together as it can be an issue later on (such P10). It should not be an 
hidden parameter to function calls. this is more or less the case. 
 
See below for more explanation.

> AIUI there can be multiple XiveRouters in the system, yes?  

yes and it works relatively well with 4 chips. I say relatively because 
the presenter model is taking a shortcut we should fix. 

> And at least theoretically can present irqs from multiple routers? 

Yes. the console being the most simple example. We only have one device 
per system on the LPC bus of chip 0. 
 
> In which case what's the rule for which one should be associated with 
> a specific.
> I guess it's the one on the same chip, but that needs to be explained
> up front, with some justification of why that's the relevant one.

Yes. we try to minimize the traffic on the PowerBUS so generally CPU 
targets are on the same IC. The EAT on POWER10 should be on the same
HW chip.


I think we can address the proposed changes from another perspective, 
from the presenter one. this is cleaner and reflects better the HW design. 

The thread contexts are owned by the presenter. It can scan its list 
when doing CAM matching and when the thread context registers are being 
accessed by a CPU. Adding a XiveRouter parameter to all the TIMA 
operations seems like a better option and solves the problem.
 

The thread context registers are modeled under the CPU object today 
because it was practical for sPAPR but on HW, these are SRAM entries,
one for each HW thread of the chip. So may be, we should have introduced
an other table under the XiveRouter to model the contexts but there
was no real need for the XIVE POWER9 IC of the pseries machine. This 
design might be reaching its limits with PowerNV and POWER10.  


Looking at :
 
  [PATCH v2 15/17] ppc/pnv: Grab the XiveRouter object from XiveTCTX in 
pnv_xive_get_tctx()

we see that the code adds an extra check on the THREAD_ENABLE registers 
and for that, its needs the IC to which belongs the thread context. This 
code is wrong. We should not be need to find a XiveRouter object from a 
XiveRouter handler.

This is because the xive_presenter_match() routine does:
   
CPU_FOREACH(cs) {
XiveTCTX *tctx = xive_router_get_tctx(xrtr, cs);
 
we should be, instead, looping on the different IC of the system 
looking for a match. Something else to fix. I think I will use the
PIR to match the CPU of a chip.


Looking at POWER10, XIVE internal structures have changed and we will
need to introduce new IC models, one for PowerNV obviously but surely 
also one for pseries. A third one ... yes, sorry about that. If we go 
in that direction, it would be good to have a common XiveTCTX and not 
link it to a specific XiveRouter (P9 or P10). Another good reason not
to use that link.


So I will rework the end of that patchset. Thanks for having given me 
time to think more about it before merging. I did more experiments and
the models are now more precise, specially with guest and multichip
support. 


C.

 
> 
>> ---
>>  include/hw/ppc/xive.h | 2 ++
>>  hw/intc/xive.c| 9 +
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 4851ff87e795..206b23ecfab3 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -320,6 +320,8 @@ typedef struct XiveTCTX {
>>  qemu_irqos_output;
>>  
>>  uint8_t regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
>> +
>> +struct XiveRouter  *xrtr;
>>  } XiveTCTX;
>>  
>>  /*
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 88f2e560db0f..1b0eccb6df40 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -573,6 +573,14 @@ static void xive_tctx_realize(DeviceState *dev, Error 
>> **errp)
>>  Object *obj;
>>  Error *local_err = NULL;
>>  
>> +obj = object_property_get_link(OBJECT(dev), "xrtr", &local_err);
>> +if (!obj) {
>> +error_propagate(errp, local_err);
>> +error_prepend(errp, "required link 'xrtr' not found: ");
>> +return;
>> +}
>> +tctx->xrtr = XIVE_ROUTER(obj);
>> +
>>  obj = object_property_get_link(OBJECT(dev), "cpu", &local_err);
>>  if (!obj) {
>>  error_propagate(errp, local_err);
>> @@ -666,6 +674,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, 
>> Error **errp)
>>  object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
>>  object_unref(obj)

Re: [Qemu-devel] Question regarding tcg trace-events

2019-07-28 Thread Alex Bennée


sainath grandhi  writes:

> Hello
> I am working with qemu tracing support and combined with tcg.
> I read that if tcg property is used for trace-event, it generates a
> trace-event once during translation and another trace-event after the
> execution.
>
> I made the following change in target/i386/translate.c
>
> -static inline void gen_op_movl_seg_T0_vm(DisasContext *s, int seg_reg)
> +static inline void gen_op_movl_seg_T0_vm(DisasContext *s, int
> seg_reg, CPUX86State *env)
>  {
>  tcg_gen_ext16u_tl(s->T0, s->T0);
>  tcg_gen_st32_tl(s->T0, cpu_env,
>  offsetof(CPUX86State,segs[seg_reg].selector));
> +trace_seg_write_tcg(tcg_ctx->cpu, cpu_env, env->eip, seg_reg,
> env->segs[seg_reg].selector, s->T0);

This is a new trace point you've added?

>  tcg_gen_shli_tl(cpu_seg_base[seg_reg], s->T0, 4);
>
> I see seg_write_trans and seg_write_exec trace-events.
> Question I have is the following:
> I expect one seg_write_trans trace-event per seg_write_exec
> trace-event. However I notice more than one seg_write_exec
> trace-events after a seg_write_trans

If a translated block is executed more than once (most are) you should
see more exec events than trans events.

> and in some cases seg_write_exec
> trace-events occur without a seg_write_trans.

That is odd.

> Why do this happen? Does this have something to do with TCG and TBs?

In TCG an execution block (TranslationBlock) is:

  - translated into TCgops
  - generated into host code
  - added to the code cache

from this point each time we need to execute something with the same
parameters (pc/flags) we fetch the already translated code and execute
it directly. There are more pointers to how the TCG works on the wiki.

--
Alex Bennée



Re: [Qemu-devel] [Virtio-fs] [PATCH 2/5] virtiofsd: prevent lo_lookup() NULL pointer dereference

2019-07-28 Thread piaojun
Hi Stefan,

On 2019/7/26 17:11, Stefan Hajnoczi wrote:
> Most lo_do_lookup() have already checked that the parent inode exists.
> lo_lookup() hasn't and can therefore hit a NULL pointer dereference when
> lo_inode(req, parent) returns NULL.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  contrib/virtiofsd/passthrough_ll.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/contrib/virtiofsd/passthrough_ll.c 
> b/contrib/virtiofsd/passthrough_ll.c
> index 9ae1381618..277a17fc03 100644
> --- a/contrib/virtiofsd/passthrough_ll.c
> +++ b/contrib/virtiofsd/passthrough_ll.c
> @@ -766,6 +766,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t 
> parent, const char *name,
>   struct lo_data *lo = lo_data(req);
>   struct lo_inode *inode, *dir = lo_inode(req, parent);
>  
> + if (!dir) {
> + return EBADF;
> + }
> +

I worry about that dir will be released or set NULL just after NULL
checking. Or could we use some lock to prevent the simultaneity?

Thanks,
Jun

>   memset(e, 0, sizeof(*e));
>   e->attr_timeout = lo->timeout;
>   e->entry_timeout = lo->timeout;
> 



[Qemu-devel] [PATCH 2/3] memory-device: break the loop if no hint is provided

2019-07-28 Thread Wei Yang
When there is no hint, the first un-overlapped range is the proper one.
Just break the loop instead of iterate the whole list.

Signed-off-by: Wei Yang 
---
 hw/mem/memory-device.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index df3261b32a..413b514586 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -180,6 +180,8 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 range_make_empty(&new);
 break;
 }
+} else if (!hint) {
+break;
 }
 }
 
-- 
2.17.1




[Qemu-devel] [PATCH 1/3] memory-device: not necessary to use goto for the last check

2019-07-28 Thread Wei Yang
We are already at the last condition check.

Signed-off-by: Wei Yang 
---
 hw/mem/memory-device.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 5f2c408036..df3261b32a 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -186,7 +186,6 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 if (!range_contains_range(&as, &new)) {
 error_setg(errp, "could not find position in guest address space for "
"memory device - memory fragmented due to alignments");
-goto out;
 }
 out:
 g_slist_free(list);
-- 
2.17.1




[Qemu-devel] [PATCH 0/3] memory-device: refine memory_device_get_free_addr

2019-07-28 Thread Wei Yang
When we iterate the memory-device list to get the available range, it is not
necessary to iterate the whole list.

1) the first non-overlap range is the proper one if no hint is provided
2) no more overlap for hinted range if tmp exceed it

Wei Yang (3):
  memory-device: not necessary to use goto for the last check
  memory-device: break the loop if no hint is provided
  memory-device: break the loop if tmp exceed the hinted range

 hw/mem/memory-device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.17.1




[Qemu-devel] [PATCH 3/3] memory-device: break the loop if tmp exceed the hinted range

2019-07-28 Thread Wei Yang
The memory-device list built by memory_device_build_list is ordered by
its address, this means if the tmp range exceed the hinted range, all
the following range will not overlap with it.

Signed-off-by: Wei Yang 
---
 hw/mem/memory-device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 413b514586..aea47ab3e8 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -180,7 +180,7 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 range_make_empty(&new);
 break;
 }
-} else if (!hint) {
+} else if (!hint || range_lob(&tmp) > range_upb(&new)) {
 break;
 }
 }
-- 
2.17.1




Re: [Qemu-devel] [PATCH v2 03/14] audio: add audiodev property to vnc and wav_capture

2019-07-28 Thread Zoltán Kővágó

On 2019-07-22 16:21, Markus Armbruster wrote:

"Zoltán Kővágó"  writes:


On 2019-07-16 08:23, Markus Armbruster wrote:

"Kővágó, Zoltán"  writes:


Signed-off-by: Kővágó, Zoltán 
---
   ui/vnc.h|  2 ++
   monitor/misc.c  | 12 +++-
   ui/vnc.c| 15 ++-
   hmp-commands.hx | 13 -
   qemu-options.hx |  6 ++
   5 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 2f84db3142..6f54653455 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -183,6 +183,8 @@ struct VncDisplay
   #ifdef CONFIG_VNC_SASL
   VncDisplaySASL sasl;
   #endif
+
+AudioState *audio_state;
   };
 typedef struct VncTight {
diff --git a/monitor/misc.c b/monitor/misc.c
index e39a0e..f97810d370 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -1148,7 +1148,17 @@ static void hmp_wavcapture(Monitor *mon, const QDict 
*qdict)
   int bits = qdict_get_try_int(qdict, "bits", -1);
   int has_channels = qdict_haskey(qdict, "nchannels");
   int nchannels = qdict_get_try_int(qdict, "nchannels", -1);
+const char *audiodev = qdict_get_try_str(qdict, "audiodev");
   CaptureState *s;
+AudioState *as = NULL;
+
+if (audiodev) {
+as = audio_state_by_name(audiodev);
+if (!as) {
+monitor_printf(mon, "Invalid audiodev specified\n");
+return;
+}
+}


Note for later: if "audiodev" is specified, it must name an existing
AudioState.


 s = g_malloc0 (sizeof (*s));
   @@ -1156,7 +1166,7 @@ static void hmp_wavcapture(Monitor *mon,
const QDict *qdict)
   bits = has_bits ? bits : 16;
   nchannels = has_channels ? nchannels : 2;
   -if (wav_start_capture(NULL, s, path, freq, bits, nchannels))
{
+if (wav_start_capture(as, s, path, freq, bits, nchannels)) {
   monitor_printf(mon, "Failed to add wave capture\n");
   g_free (s);
   return;


Note for later: this is the only other failure mode.


diff --git a/ui/vnc.c b/ui/vnc.c
index 140f364dda..24f9be5b5d 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1222,7 +1222,7 @@ static void audio_add(VncState *vs)
   ops.destroy = audio_capture_destroy;
   ops.capture = audio_capture;
   -vs->audio_cap = AUD_add_capture(NULL, &vs->as, &ops, vs);
+vs->audio_cap = AUD_add_capture(vs->vd->audio_state, &vs->as, &ops, vs);
   if (!vs->audio_cap) {
   error_report("Failed to add audio capture");
   }
@@ -3369,6 +3369,9 @@ static QemuOptsList qemu_vnc_opts = {
   },{
   .name = "non-adaptive",
   .type = QEMU_OPT_BOOL,
+},{
+.name = "audiodev",
+.type = QEMU_OPT_STRING,
   },
   { /* end of list */ }
   },
@@ -3806,6 +3809,7 @@ void vnc_display_open(const char *id, Error **errp)
   const char *saslauthz;
   int lock_key_sync = 1;
   int key_delay_ms;
+const char *audiodev;
 if (!vd) {
   error_setg(errp, "VNC display not active");
@@ -3991,6 +3995,15 @@ void vnc_display_open(const char *id, Error **errp)
   }
   vd->ledstate = 0;
   +audiodev = qemu_opt_get(opts, "audiodev");
+if (audiodev) {
+vd->audio_state = audio_state_by_name(audiodev);
+if (!vd->audio_state) {
+error_setg(errp, "Audiodev '%s' not found", audiodev);
+goto fail;
+}
+}


Note for later: if "audiodev" is specified, it must name an existing
AudioState.

I like this error message better than the one in hmp_wavcapture().  Use
it there, too?

Move it into audio_state_by_name() by giving it an Error **errp
parameter?  Matter of taste, up to you.


+
   device_id = qemu_opt_get(opts, "display");
   if (device_id) {
   int head = qemu_opt_get_number(opts, "head", 0);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index bfa5681dd2..fa7f009268 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -819,16 +819,19 @@ ETEXI
 {
   .name   = "wavcapture",
-.args_type  = "path:F,freq:i?,bits:i?,nchannels:i?",
-.params = "path [frequency [bits [channels]]]",
+.args_type  = "path:F,freq:i?,bits:i?,nchannels:i?,audiodev:s?",
+.params = "path [frequency [bits [channels [audiodev",
   .help   = "capture audio to a wave file (default frequency=44100 
bits=16 channels=2)",
   .cmd= hmp_wavcapture,
   },
   STEXI
-@item wavcapture @var{filename} [@var{frequency} [@var{bits} [@var{channels}]]]
+@item wavcapture @var{filename} [@var{frequency} [@var{bits} [@var{channels} 
[@var{audiodev}
   @findex wavcapture
-Capture audio into @var{filename}. Using sample rate @var{frequency}
-bits per sample @var{bits} and number of channels @var{channels}.
+Capture audio into @var{filename} from @var{audiodev}, using sample rate
+@var{frequency} bits per sample @var{bits} and number of channels
+@var{channels}. When not using an -audiodev argument on command line,
+@var{aud

Re: [Qemu-devel] [PATCH v2 03/14] audio: add audiodev property to vnc and wav_capture

2019-07-28 Thread Zoltán Kővágó

On 2019-07-28 15:42, Zoltán Kővágó wrote:

On 2019-07-22 16:21, Markus Armbruster wrote:

"Zoltán Kővágó"  writes:


On 2019-07-16 08:23, Markus Armbruster wrote:

"Kővágó, Zoltán"  writes:


Signed-off-by: Kővágó, Zoltán 
---
   ui/vnc.h    |  2 ++
   monitor/misc.c  | 12 +++-
   ui/vnc.c    | 15 ++-
   hmp-commands.hx | 13 -
   qemu-options.hx |  6 ++
   5 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 2f84db3142..6f54653455 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -183,6 +183,8 @@ struct VncDisplay
   #ifdef CONFIG_VNC_SASL
   VncDisplaySASL sasl;
   #endif
+
+    AudioState *audio_state;
   };
 typedef struct VncTight {
diff --git a/monitor/misc.c b/monitor/misc.c
index e39a0e..f97810d370 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -1148,7 +1148,17 @@ static void hmp_wavcapture(Monitor *mon, 
const QDict *qdict)

   int bits = qdict_get_try_int(qdict, "bits", -1);
   int has_channels = qdict_haskey(qdict, "nchannels");
   int nchannels = qdict_get_try_int(qdict, "nchannels", -1);
+    const char *audiodev = qdict_get_try_str(qdict, "audiodev");
   CaptureState *s;
+    AudioState *as = NULL;
+
+    if (audiodev) {
+    as = audio_state_by_name(audiodev);
+    if (!as) {
+    monitor_printf(mon, "Invalid audiodev specified\n");
+    return;
+    }
+    }


Note for later: if "audiodev" is specified, it must name an existing
AudioState.


 s = g_malloc0 (sizeof (*s));
   @@ -1156,7 +1166,7 @@ static void hmp_wavcapture(Monitor *mon,
const QDict *qdict)
   bits = has_bits ? bits : 16;
   nchannels = has_channels ? nchannels : 2;
   -    if (wav_start_capture(NULL, s, path, freq, bits, nchannels))
{
+    if (wav_start_capture(as, s, path, freq, bits, nchannels)) {
   monitor_printf(mon, "Failed to add wave capture\n");
   g_free (s);
   return;


Note for later: this is the only other failure mode.


diff --git a/ui/vnc.c b/ui/vnc.c
index 140f364dda..24f9be5b5d 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1222,7 +1222,7 @@ static void audio_add(VncState *vs)
   ops.destroy = audio_capture_destroy;
   ops.capture = audio_capture;
   -    vs->audio_cap = AUD_add_capture(NULL, &vs->as, &ops, vs);
+    vs->audio_cap = AUD_add_capture(vs->vd->audio_state, &vs->as, 
&ops, vs);

   if (!vs->audio_cap) {
   error_report("Failed to add audio capture");
   }
@@ -3369,6 +3369,9 @@ static QemuOptsList qemu_vnc_opts = {
   },{
   .name = "non-adaptive",
   .type = QEMU_OPT_BOOL,
+    },{
+    .name = "audiodev",
+    .type = QEMU_OPT_STRING,
   },
   { /* end of list */ }
   },
@@ -3806,6 +3809,7 @@ void vnc_display_open(const char *id, Error 
**errp)

   const char *saslauthz;
   int lock_key_sync = 1;
   int key_delay_ms;
+    const char *audiodev;
 if (!vd) {
   error_setg(errp, "VNC display not active");
@@ -3991,6 +3995,15 @@ void vnc_display_open(const char *id, Error 
**errp)

   }
   vd->ledstate = 0;
   +    audiodev = qemu_opt_get(opts, "audiodev");
+    if (audiodev) {
+    vd->audio_state = audio_state_by_name(audiodev);
+    if (!vd->audio_state) {
+    error_setg(errp, "Audiodev '%s' not found", audiodev);
+    goto fail;
+    }
+    }


Note for later: if "audiodev" is specified, it must name an existing
AudioState.

I like this error message better than the one in hmp_wavcapture().  Use
it there, too?

Move it into audio_state_by_name() by giving it an Error **errp
parameter?  Matter of taste, up to you.


+
   device_id = qemu_opt_get(opts, "display");
   if (device_id) {
   int head = qemu_opt_get_number(opts, "head", 0);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index bfa5681dd2..fa7f009268 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -819,16 +819,19 @@ ETEXI
 {
   .name   = "wavcapture",
-    .args_type  = "path:F,freq:i?,bits:i?,nchannels:i?",
-    .params = "path [frequency [bits [channels]]]",
+    .args_type  = 
"path:F,freq:i?,bits:i?,nchannels:i?,audiodev:s?",
+    .params = "path [frequency [bits [channels 
[audiodev",
   .help   = "capture audio to a wave file (default 
frequency=44100 bits=16 channels=2)",

   .cmd    = hmp_wavcapture,
   },
   STEXI
-@item wavcapture @var{filename} [@var{frequency} [@var{bits} 
[@var{channels}]]]
+@item wavcapture @var{filename} [@var{frequency} [@var{bits} 
[@var{channels} [@var{audiodev}

   @findex wavcapture
-Capture audio into @var{filename}. Using sample rate @var{frequency}
-bits per sample @var{bits} and number of channels @var{channels}.
+Capture audio into @var{filename} from @var{audiodev}, using 
sample rate

+@var{frequency} bits per sample @var{bits} and number of channels
+@var{channels}. When n

Re: [Qemu-devel] [PATCH 2/2] iotests: use python logging for iotests.log()

2019-07-28 Thread Eduardo Habkost
On Fri, Jul 26, 2019 at 06:52:01PM -0400, John Snow wrote:
> We can turn logging on/off globally instead of per-function.
> 
> Remove use_log from run_job, and use python logging to turn on
> diffable output when we run through a script entry point.
> 
> (No, I have no idea why output on 245 changed. I really don't.)

I believe this happens because the logging StreamHandler will
flush the stream at every call.

I see the same output reordering on 245 if I add a
sys.stdout.flush() call to iotests.log(), or if I change
iotests.main() to use sys.stdout for the unittest runner output.

> 
> Signed-off-by: John Snow 
> ---
>  tests/qemu-iotests/030|  4 +--
>  tests/qemu-iotests/245|  1 +
>  tests/qemu-iotests/245.out| 24 +-
>  tests/qemu-iotests/iotests.py | 48 ---
>  4 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
> index 1b69f318c6..a382cb430b 100755
> --- a/tests/qemu-iotests/030
> +++ b/tests/qemu-iotests/030
> @@ -411,8 +411,8 @@ class TestParallelOps(iotests.QMPTestCase):
>  result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
>  self.assert_qmp(result, 'return', {})
>  
> -self.vm.run_job(job='drive0', auto_dismiss=True, use_log=False)
> -self.vm.run_job(job='node4', auto_dismiss=True, use_log=False)
> +self.vm.run_job(job='drive0', auto_dismiss=True)
> +self.vm.run_job(job='node4', auto_dismiss=True)

So, this one is the only run_job() caller specifying
use_log=False.  It doesn't call activate_logging() anywhere, so
it seems OK.

However, we need to ensure all the other run_job() callers will
actually enable logging.  The remaining run_job() callers are:
206 207 210 211 212 213 237 245 255 256.

These look OK:
206:iotests.script_initialize(supported_fmts=['qcow2'])
245:iotests.activate_logging()
255:iotests.script_initialize(supported_fmts=['qcow2'])
256:iotests.script_initialize(supported_fmts=['qcow2'])
257:iotests.script_main(main, supported_fmts=['qcow2'])

These don't seem to call activate_logging() anywhere:
207 210 211 212 213 237.

What about other scripts calling log() that aren't calling activate_logging()?
Let's see:

  $ git grep -LE 'script_main|script_initialize|activate_logging' \
   $(grep -lP '(?  self.assert_no_active_block_jobs()
>  
>  # Test a block-stream and a block-commit job in parallel
> diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
> index bc1ceb9792..3bc29acb33 100644
> --- a/tests/qemu-iotests/245
> +++ b/tests/qemu-iotests/245
> @@ -1000,4 +1000,5 @@ class TestBlockdevReopen(iotests.QMPTestCase):
>  self.reopen(opts, {'backing': 'hd2'})
>  
>  if __name__ == '__main__':
> +iotests.activate_logging()
>  iotests.main(supported_fmts=["qcow2"])
> diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
> index a19de5214d..15c3630e92 100644
> --- a/tests/qemu-iotests/245.out
> +++ b/tests/qemu-iotests/245.out
> @@ -1,17 +1,17 @@
> +{"execute": "job-finalize", "arguments": {"id": "commit0"}}
> +{"return": {}}
> +{"data": {"id": "commit0", "type": "commit"}, "event": "BLOCK_JOB_PENDING", 
> "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"data": {"device": "commit0", "len": 3145728, "offset": 3145728, "speed": 
> 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
> +{"execute": "job-finalize", "arguments": {"id": "stream0"}}
> +{"return": {}}
> +{"data": {"id": "stream0", "type": "stream"}, "event": "BLOCK_JOB_PENDING", 
> "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"data": {"device": "stream0", "len": 3145728, "offset": 3145728, "speed": 
> 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
> +{"execute": "job-finalize", "arguments": {"id": "stream0"}}
> +{"return": {}}
> +{"data": {"id": "stream0", "type": "stream"}, "event": "BLOCK_JOB_PENDING", 
> "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"data": {"device": "stream0", "len": 3145728, "offset": 3145728, "speed": 
> 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
>  ..
>  --
>  Ran 18 tests
>  
>  OK
> -{"execute": "job-finalize", "arguments": {"id": "commit0"}}
> -{"return": {}}
> -{"data": {"id": "commit0", "type": "commit"}, "event": "BLOCK_JOB_PENDING", 
> "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> -{"data": {"device": "commit0", "len": 3145728, "offset": 3145728, "speed": 
> 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
> {"microseconds": "USECS", "seconds": "SECS"}}
> -{"execute": "job-finalize", "arguments": {"id": "stream0"}}
> -{"return": {}}
> -{"data": {"id": "stream0", "type": "strea

Re: [Qemu-devel] [PATCH 1/2] iotests: add script_initialize

2019-07-28 Thread Eduardo Habkost
On Fri, Jul 26, 2019 at 06:52:00PM -0400, John Snow wrote:
> Like script_main, but doesn't require a single point of entry.
> Replace all existing initialization sections with this drop-in replacement.
> 
> This brings debug support to all existing script-style iotests.
> 
> Note: supported_oses=['linux'] was omitted, as it is a default argument.
> Signed-off-by: John Snow 
[...]

Looks good overall, I just have one comment:


> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 727730422f..5e9b2989dd 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -891,9 +891,8 @@ def execute_unittest(output, verbosity, debug):
>  sys.stderr.write(re.sub(r'Ran (\d+) tests? in [\d.]+s',
>  r'Ran \1 tests', output.getvalue()))
>  
> -def execute_test(test_function=None,
> - supported_fmts=[], supported_oses=['linux'],
> - supported_cache_modes=[], unsupported_fmts=[]):
> +def execute_setup_common(supported_fmts=[], supported_oses=['linux'],
> + supported_cache_modes=[], unsupported_fmts=[]):
>  """Run either unittest or script-style tests."""
>  
>  # We are using TEST_DIR and QEMU_DEFAULT_MACHINE as proxies to
> @@ -925,16 +924,28 @@ def execute_test(test_function=None,
>  output = io.BytesIO()
>  
>  logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
> +return output, verbosity, debug

Can't we make this simpler?

What about just returning `debug`, and letting execute_unittest()
take care of `verbosity` and `output`?

>  
> +def execute_test(test_function=None, *args, **kwargs):
> +"""Run either unittest or script-style tests."""
> +
> +unittest_args = execute_setup_common(*args, **kwargs)
>  if not test_function:
> -execute_unittest(output, verbosity, debug)
> +execute_unittest(*unittest_args)
>  else:
>  test_function()
>  
> +# This is called from script-style iotests without a single point of entry
> +def script_initialize(*args, **kwargs):
> +"""Initialize script-style tests without running any tests."""
> +execute_setup_common(*args, **kwargs)
> +
> +# This is called from script-style iotests with a single point of entry
>  def script_main(test_function, *args, **kwargs):
>  """Run script-style tests outside of the unittest framework"""
>  execute_test(test_function, *args, **kwargs)
>  
> +# This is called from unittest style iotests
>  def main(*args, **kwargs):
>  """Run tests using the unittest framework"""
>  execute_test(None, *args, **kwargs)
> -- 
> 2.21.0
> 

-- 
Eduardo



[Qemu-devel] [PATCH 0/2] add speed limit for multifd migration

2019-07-28 Thread Ivan Ren
Currently multifd migration has not been limited and it will consume
the whole bandwidth of Nic. These two patches add speed limitation to
it.

Ivan Ren (2):
  migration: add qemu_file_update_rate_transfer interface
  migration: add speed limit for multifd migration

 migration/qemu-file.c |  5 +
 migration/qemu-file.h |  1 +
 migration/ram.c   | 22 --
 3 files changed, 18 insertions(+), 10 deletions(-)

-- 
2.17.2 (Apple Git-113)




[Qemu-devel] [PATCH 1/2] migration: add qemu_file_update_rate_transfer interface

2019-07-28 Thread Ivan Ren
Add qemu_file_update_rate_transfer for just update bytes_xfer for
speed limitation. This will be used for further migration feature
such as multifd migration.

Signed-off-by: Ivan Ren 
---
 migration/qemu-file.c | 5 +
 migration/qemu-file.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 0431585502..13e7f03f9b 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -615,6 +615,11 @@ void qemu_file_reset_rate_limit(QEMUFile *f)
 f->bytes_xfer = 0;
 }
 
+void qemu_file_update_rate_transfer(QEMUFile *f, int64_t len)
+{
+f->bytes_xfer += len;
+}
+
 void qemu_put_be16(QEMUFile *f, unsigned int v)
 {
 qemu_put_byte(f, v >> 8);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 13baf896bd..6145d10aca 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -147,6 +147,7 @@ int qemu_peek_byte(QEMUFile *f, int offset);
 void qemu_file_skip(QEMUFile *f, int size);
 void qemu_update_position(QEMUFile *f, size_t size);
 void qemu_file_reset_rate_limit(QEMUFile *f);
+void qemu_file_update_rate_transfer(QEMUFile *f, int64_t len);
 void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
 int64_t qemu_file_get_rate_limit(QEMUFile *f);
 void qemu_file_set_error(QEMUFile *f, int ret);
-- 
2.17.2 (Apple Git-113)




[Qemu-devel] [PATCH 2/2] migration: add speed limit for multifd migration

2019-07-28 Thread Ivan Ren
Limit the speed of multifd migration through common speed limitation
qemu file.

Signed-off-by: Ivan Ren 
---
 migration/ram.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 889148dd84..e3fde16776 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -922,7 +922,7 @@ struct {
  * false.
  */
 
-static int multifd_send_pages(void)
+static int multifd_send_pages(RAMState *rs)
 {
 int i;
 static int next_channel;
@@ -954,6 +954,7 @@ static int multifd_send_pages(void)
 multifd_send_state->pages = p->pages;
 p->pages = pages;
 transferred = ((uint64_t) pages->used) * TARGET_PAGE_SIZE + p->packet_len;
+qemu_file_update_rate_transfer(rs->f, transferred);
 ram_counters.multifd_bytes += transferred;
 ram_counters.transferred += transferred;;
 qemu_mutex_unlock(&p->mutex);
@@ -962,7 +963,7 @@ static int multifd_send_pages(void)
 return 1;
 }
 
-static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+static int multifd_queue_page(RAMState *rs, RAMBlock *block, ram_addr_t offset)
 {
 MultiFDPages_t *pages = multifd_send_state->pages;
 
@@ -981,12 +982,12 @@ static int multifd_queue_page(RAMBlock *block, ram_addr_t 
offset)
 }
 }
 
-if (multifd_send_pages() < 0) {
+if (multifd_send_pages(rs) < 0) {
 return -1;
 }
 
 if (pages->block != block) {
-return  multifd_queue_page(block, offset);
+return  multifd_queue_page(rs, block, offset);
 }
 
 return 1;
@@ -1054,7 +1055,7 @@ void multifd_save_cleanup(void)
 multifd_send_state = NULL;
 }
 
-static void multifd_send_sync_main(void)
+static void multifd_send_sync_main(RAMState *rs)
 {
 int i;
 
@@ -1062,7 +1063,7 @@ static void multifd_send_sync_main(void)
 return;
 }
 if (multifd_send_state->pages->used) {
-if (multifd_send_pages() < 0) {
+if (multifd_send_pages(rs) < 0) {
 error_report("%s: multifd_send_pages fail", __func__);
 return;
 }
@@ -1083,6 +1084,7 @@ static void multifd_send_sync_main(void)
 p->packet_num = multifd_send_state->packet_num++;
 p->flags |= MULTIFD_FLAG_SYNC;
 p->pending_job++;
+qemu_file_update_rate_transfer(rs->f, p->packet_len);
 qemu_mutex_unlock(&p->mutex);
 qemu_sem_post(&p->sem);
 }
@@ -2079,7 +2081,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus 
*pss, bool last_stage)
 static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
  ram_addr_t offset)
 {
-if (multifd_queue_page(block, offset) < 0) {
+if (multifd_queue_page(rs, block, offset) < 0) {
 return -1;
 }
 ram_counters.normal++;
@@ -3482,7 +3484,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-multifd_send_sync_main();
+multifd_send_sync_main(*rsp);
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 qemu_fflush(f);
 
@@ -3570,7 +3572,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
 out:
-multifd_send_sync_main();
+multifd_send_sync_main(rs);
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 qemu_fflush(f);
 ram_counters.transferred += 8;
@@ -3629,7 +3631,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 rcu_read_unlock();
 
-multifd_send_sync_main();
+multifd_send_sync_main(rs);
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 qemu_fflush(f);
 
-- 
2.17.2 (Apple Git-113)




Re: [Qemu-devel] [PATCH qemu RFC 4/4] spapr: Implement SLOF-less client_architecture_support

2019-07-28 Thread Alexey Kardashevskiy



On 28/07/2019 16:09, David Gibson wrote:
> On Sat, Jul 20, 2019 at 11:28:50AM +1000, Alexey Kardashevskiy wrote:
>> QEMU already implements H_CAS called by SLOF. The existing handler
>> prepares a diff FDT and SLOF applies it on top of its current tree.
>> In SLOF-less setup when the user explicitly selected "bios=no",
>> this updates the FDT from the OS, updates it and writes back to the OS.
>> The new behavior is advertised to the OS via "/chosen/qemu,h_cas".
> 
> I don't love having two different paths here, I'm wondering if we can
> unify things at all.
> 
> I have wondered at some points if there's anything preventing us just
> giving a whole new device tree at CAS, rather than selected updates -
> that could simplify several things.


An update has a header, and this patch just copies the fdt over, if I
add a header to this new path, this will require a few more changes in
the guest which I would rather avoid. Thanks,


> 
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>  include/hw/ppc/spapr.h |  5 +
>>  hw/ppc/spapr.c | 24 -
>>  hw/ppc/spapr_hcall.c   | 49 +++---
>>  3 files changed, 70 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 7f5d7a70d27e..73cd9cf25b83 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -766,9 +766,14 @@ struct SpaprEventLogEntry {
>>  
>>  void spapr_events_init(SpaprMachineState *sm);
>>  void spapr_dt_events(SpaprMachineState *sm, void *fdt);
>> +int spapr_h_cas_do_compose_response(SpaprMachineState *spapr, void *fdt,
>> +SpaprOptionVector *ov5_updates);
>>  int spapr_h_cas_compose_response(SpaprMachineState *sm,
>>   target_ulong addr, target_ulong size,
>>   SpaprOptionVector *ov5_updates);
>> +#define FDT_MAX_SIZE 0x10
>> +void *spapr_build_fdt(SpaprMachineState *spapr);
>> +
>>  void close_htab_fd(SpaprMachineState *spapr);
>>  void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr);
>>  void spapr_free_hpt(SpaprMachineState *spapr);
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index b097a99951f1..f84895f4a8b4 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -978,6 +978,19 @@ static bool spapr_hotplugged_dev_before_cas(void)
>>  return false;
>>  }
>>  
>> +int spapr_h_cas_do_compose_response(SpaprMachineState *spapr, void *fdt,
>> +SpaprOptionVector *ov5_updates)
> 
> Not a great function name.
> 
>> +{
>> +/* Fixup cpu nodes */
>> +_FDT((spapr_fixup_cpu_dt(fdt, spapr)));
>> +
>> +if (spapr_dt_cas_updates(spapr, fdt, ov5_updates)) {
>> +return -1;
>> +}
>> +
>> +return 0;
>> +}
>> +
>>  int spapr_h_cas_compose_response(SpaprMachineState *spapr,
>>   target_ulong addr, target_ulong size,
>>   SpaprOptionVector *ov5_updates)
>> @@ -1009,10 +1022,7 @@ int spapr_h_cas_compose_response(SpaprMachineState 
>> *spapr,
>>  _FDT((fdt_open_into(fdt_skel, fdt, size)));
>>  g_free(fdt_skel);
>>  
>> -/* Fixup cpu nodes */
>> -_FDT((spapr_fixup_cpu_dt(fdt, spapr)));
>> -
>> -if (spapr_dt_cas_updates(spapr, fdt, ov5_updates)) {
>> +if (spapr_h_cas_do_compose_response(spapr, fdt, ov5_updates)) {
>>  return -1;
>>  }
>>  
>> @@ -1232,6 +1242,10 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, 
>> void *fdt)
>>  
>>  /* We always implemented RTAS as hcall, tell guests to call it directly 
>> */
>>  _FDT(fdt_setprop_cell(fdt, chosen, "qemu,h_rtas", 1));
>> +/* Tell the guest that H_CAS will return the entire FDT now, not the 
>> diff */
>> +if (!spapr->bios_enabled) {
>> +_FDT(fdt_setprop_cell(fdt, chosen, "qemu,h_cas", 1));
> 
> As with H_RTAS< using qemu,hypertas-functions would be more
> appropriate for this.
> 
>> +}
>>  
>>  spapr_dt_ov5_platform_support(spapr, fdt, chosen);
>>  
>> @@ -1262,7 +1276,7 @@ static void spapr_dt_hypervisor(SpaprMachineState 
>> *spapr, void *fdt)
>>  }
>>  }
>>  
>> -static void *spapr_build_fdt(SpaprMachineState *spapr)
>> +void *spapr_build_fdt(SpaprMachineState *spapr)
>>  {
>>  MachineState *machine = MACHINE(spapr);
>>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>> index b964d94f330b..c5cb06c9d507 100644
>> --- a/hw/ppc/spapr_hcall.c
>> +++ b/hw/ppc/spapr_hcall.c
>> @@ -17,6 +17,7 @@
>>  #include "hw/ppc/spapr_ovec.h"
>>  #include "mmu-book3s-v3.h"
>>  #include "hw/mem/memory-device.h"
>> +#include "sysemu/device_tree.h"
>>  
>>  static bool has_spr(PowerPCCPU *cpu, int spr)
>>  {
>> @@ -1774,9 +1775,51 @@ static target_ulong 
>> h_client_architecture_support(PowerPCCPU *cpu,
>>  /* legacy hash or new hash: */
>>  spapr_setup_hpt_and_vrma(spapr);
>>  }
>> -   

Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/2] tests.acceptance.avocado_qemu: Add support for powerpc

2019-07-28 Thread Satheesh Rajendran
On Fri, Jul 26, 2019 at 01:49:00PM +0200, Greg Kurz wrote:
> On Fri, 26 Jul 2019 12:48:09 +0530
> sathn...@linux.vnet.ibm.com wrote:
> 
> > From: Satheesh Rajendran 
> > 
> > Current acceptance test will not run in powerpc Little endian
> > environment due the arch name does not match the qemu binary path,
> > let's handle it.
> > 
> 
> They do not match because "arch" as returned by uname() is
> something different from the "target" in QEMU. This usually
> matches, except with bi-endian architectures like ppc64.
> Uname "arch" may be ppc64 or ppc64le but "target" is always
> ppc64.

Yes, instead I would reword the commit message to sound like that.
Thanks!

> 
> > Signed-off-by: Satheesh Rajendran 
> > ---
> >  tests/acceptance/avocado_qemu/__init__.py | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/tests/acceptance/avocado_qemu/__init__.py 
> > b/tests/acceptance/avocado_qemu/__init__.py
> > index aee5d820ed..a05f0bb530 100644
> > --- a/tests/acceptance/avocado_qemu/__init__.py
> > +++ b/tests/acceptance/avocado_qemu/__init__.py
> > @@ -19,6 +19,7 @@ sys.path.append(os.path.join(SRC_ROOT_DIR, 'python'))
> >  
> >  from qemu.machine import QEMUMachine
> >  
> > +
> 
> empty line damage
> 
Sure, I did as pylint complained about, probably can be sent
as seperate commit.
> >  def is_readable_executable_file(path):
> >  return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
> >  
> > @@ -39,6 +40,9 @@ def pick_default_qemu_bin(arch=None):
> >  """
> >  if arch is None:
> >  arch = os.uname()[4]
> > +# qemu binary path does not match arch for powerpc, handle it
> > +if 'ppc64le' in arch:
> > +arch = 'ppc64'
> 
> We also have other bi-endian targets (arm and aarch64). I'm not
> sure teaching pick_default_qemu_bin() about all of them is the
> way to go.
> 
It is good for the tests where have explicit arch mentioned
but it will not work for platform generic tests like below one
for example,

avocado run version.py 
JOB ID : ef3d99cf0232d38e5eb34c1552a8ab44ac77c45c
JOB LOG: /home/sath/tests/results/job-2019-07-29T01.45-ef3d99c/job.log
 (1/1) version.py:Version.test_qmp_human_info_version: CANCEL: No QEMU binary 
defined or found in the source tree (0.00 s)
RESULTS: PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 
1
JOB TIME   : 0.35 s

and more over we can preserve arch:ppc64 to run Big Endian guest
image.

> What about passing the right target in the first place ?
>

> ie, this in patch 2:
> 
> +def test_ppc64le_pseries(self):
> +"""
> +:avocado: tags=arch:ppc64
> 
> >  qemu_bin_relative_path = os.path.join("%s-softmmu" % arch,
> >"qemu-system-%s" % arch)

> >  if is_readable_executable_file(qemu_bin_relative_path):
> 
> 




Re: [Qemu-devel] [PATCH v2 09/17] ppc/xive: Extend XiveTCTX with a XiveRouter pointer

2019-07-28 Thread David Gibson
On Sun, Jul 28, 2019 at 11:06:27AM +0200, Cédric Le Goater wrote:
> On 28/07/2019 09:46, David Gibson wrote:
> > On Thu, Jul 18, 2019 at 01:54:12PM +0200, Cédric Le Goater wrote:
> >> This is to perform lookups in the NVT table when a vCPU is dispatched
> >> and possibily resend interrupts.
> >>
> >> Future XIVE chip will use a different class for the model of the
> >> interrupt controller and we might need to change the type of
> >> 'XiveRouter *' to 'Object *'
> >>
> >> Signed-off-by: Cédric Le Goater 
> > 
> > Hrm.  This still bothers me. 
> 
> Your feeling is right. There should be a good reason to link two objects 
> together as it can be an issue later on (such P10). It should not be an 
> hidden parameter to function calls. this is more or less the case. 
>  
> See below for more explanation.
> 
> > AIUI there can be multiple XiveRouters in the system, yes?  
> 
> yes and it works relatively well with 4 chips. I say relatively because 
> the presenter model is taking a shortcut we should fix. 
> 
> > And at least theoretically can present irqs from multiple routers? 
> 
> Yes. the console being the most simple example. We only have one device 
> per system on the LPC bus of chip 0. 
>  
> > In which case what's the rule for which one should be associated with 
> > a specific.
> > I guess it's the one on the same chip, but that needs to be explained
> > up front, with some justification of why that's the relevant one.
> 
> Yes. we try to minimize the traffic on the PowerBUS so generally CPU 
> targets are on the same IC. The EAT on POWER10 should be on the same
> HW chip.
> 
> 
> I think we can address the proposed changes from another perspective, 
> from the presenter one. this is cleaner and reflects better the HW design. 
> 
> The thread contexts are owned by the presenter. It can scan its list 
> when doing CAM matching and when the thread context registers are being 
> accessed by a CPU. Adding a XiveRouter parameter to all the TIMA 
> operations seems like a better option and solves the problem.
>  
> 
> The thread context registers are modeled under the CPU object today 
> because it was practical for sPAPR but on HW, these are SRAM entries,
> one for each HW thread of the chip. So may be, we should have introduced
> an other table under the XiveRouter to model the contexts but there
> was no real need for the XIVE POWER9 IC of the pseries machine. This 
> design might be reaching its limits with PowerNV and POWER10.  
> 
> 
> Looking at :
>  
>   [PATCH v2 15/17] ppc/pnv: Grab the XiveRouter object from XiveTCTX in 
> pnv_xive_get_tctx()
> 
> we see that the code adds an extra check on the THREAD_ENABLE registers 
> and for that, its needs the IC to which belongs the thread context. This 
> code is wrong. We should not be need to find a XiveRouter object from a 
> XiveRouter handler.
> 
> This is because the xive_presenter_match() routine does:
>
> CPU_FOREACH(cs) {
> XiveTCTX *tctx = xive_router_get_tctx(xrtr, cs);
>  
> we should be, instead, looping on the different IC of the system 
> looking for a match. Something else to fix. I think I will use the
> PIR to match the CPU of a chip.
> 
> 
> Looking at POWER10, XIVE internal structures have changed and we will
> need to introduce new IC models, one for PowerNV obviously but surely 
> also one for pseries. A third one ... yes, sorry about that. If we go 
> in that direction, it would be good to have a common XiveTCTX and not 
> link it to a specific XiveRouter (P9 or P10). Another good reason not
> to use that link.
> 
> 
> So I will rework the end of that patchset. Thanks for having given me 
> time to think more about it before merging. I did more experiments and
> the models are now more precise, specially with guest and multichip
> support.

Ok, good to hear.  I will wait for the respin.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 03/14] audio: add audiodev property to vnc and wav_capture

2019-07-28 Thread Markus Armbruster
"Zoltán Kővágó"  writes:

> On 2019-07-28 15:42, Zoltán Kővágó wrote:
>> On 2019-07-22 16:21, Markus Armbruster wrote:
>>> "Zoltán Kővágó"  writes:
>>>
 On 2019-07-16 08:23, Markus Armbruster wrote:
[...]
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index bfa5681dd2..fa7f009268 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -819,16 +819,19 @@ ETEXI
>>  {
>>    .name   = "wavcapture",
>> -    .args_type  = "path:F,freq:i?,bits:i?,nchannels:i?",
>> -    .params = "path [frequency [bits [channels]]]",
>> +    .args_type  =
>> "path:F,freq:i?,bits:i?,nchannels:i?,audiodev:s?",
>> +    .params = "path [frequency [bits [channels
>> [audiodev",
>>    .help   = "capture audio to a wave file (default
>> frequency=44100 bits=16 channels=2)",
>>    .cmd    = hmp_wavcapture,
>>    },
>>    STEXI
>> -@item wavcapture @var{filename} [@var{frequency} [@var{bits}
>> [@var{channels}]]]
>> +@item wavcapture @var{filename} [@var{frequency} [@var{bits}
>> [@var{channels} [@var{audiodev}
>>    @findex wavcapture
>> -Capture audio into @var{filename}. Using sample rate @var{frequency}
>> -bits per sample @var{bits} and number of channels @var{channels}.
>> +Capture audio into @var{filename} from @var{audiodev}, using
>> sample rate
>> +@var{frequency} bits per sample @var{bits} and number of channels
>> +@var{channels}. When not using an -audiodev argument on command line,
>> +@var{audiodev} must be omitted, otherwise is must specify a valid
>> +audiodev.
>
> I can see the code for "must specify a valid audiodev" in
> hmp_wavcapture().  Where is "must be omitted" checked?

 It's not checked right now, but if the user specifies audiodev, it
 must be a valid audiodev id.  So if the user can guess the id (which
 is not too hard ATM, it's simply the driver's name), it will work even
 in this case.

> Preexisting: the list "sample rate @var{frequency} bits per sample
> @var{bits} and number of channels @var{channels}" lacks a comma after
> @var{frequency}, please fix that.  I'd put one after @var{bits}
> as well,
> but that's a matter of taste[*]
>
> The sentence is of the form "if not COND then A else B".  The
> less-negated form "if COND then B else A" is commonly easier to read.
>
> Documentation says "from @var{audiodev}".  But when "not using an
> -audiodev argument on command line, +@var{audiodev} must be omitted".
> Where does it sample from then?  I figure from some default audio
> device.  Where is that default audio device explained?  I skimmed the
> -audiodev documentation in qemu-options.hx, but couldn't see it there.

 Currently there are two ways to specify audio options, the legacy ones
 using the QEMU_AUDIO_* environment variables, and the new one using
 -audiodev arguments.  The two formats cannot be mixed, and eventually
 we should remove the legacy ones (IIRC my previous patch series
 already deprecated them), then we can get rid of this madness.  Maybe
 something like "When using the legacy environment variable based audio
 config, @var{audiodev} must be omitted." would be better?
>>>
>>> What about effectively de-documenting the deprecated method?  I.e. write
>>> as if it was already gone.  This should result in more readable
>>> documentation.
>>
>> Makes sense.  User will less likely use deprecated methods that way
>> and it simplifies the documentation.
>>
>>> Double-checking: will audiodev become mandatory once the deprecated
>>> method is gone?
>>
>> Yes.
>
> Actually, now that I took a second look at the params, it might be
> problematic.  Currently audiodev is the last parameter, so if it
> becomes mandatory, that will effectively mean other parameters will
> become mandatory too.  And if I were to change the order and move
> audiodev first, that would break backward compatibility during the
> deprecation period.

HMP is not a stable interface.  We can break compatibility there when we
have a reason.

In this case, we can either break it right away (insert audiodev before
the other optional parameters), or later (insert it last, move it when
it becomes mandatory).  Matter of taste.  I'd break it right away.



[Qemu-devel] [for-4.2 PATCH 0/6] Block-related record/replay fixes

2019-07-28 Thread Pavel Dovgalyuk
The set of patches include the block-related updates
of the record/replay icount feature:
 - application of 'snapshot' option on the file layer instead of
   the top one: command line and documentation fix
 - implementation of bdrv_snapshot_goto for blkreplay driver
 - start/stop fix in replay mode with attached block devices
 - record/replay of bh oneshot events, used in the block layer

These patches are rebased upon the following series:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg05568.html

---

Pavel Dovgalyuk (6):
  block: implement bdrv_snapshot_goto for blkreplay
  replay: disable default snapshot for record/replay
  replay: update docs for record/replay with block devices
  replay: don't drain/flush bdrv queue while RR is working
  replay: finish record/replay before closing the disks
  replay: add BH oneshot event for block layer


 block/blkreplay.c|8 
 block/block-backend.c|8 +---
 block/io.c   |   32 ++--
 block/iscsi.c|5 +++--
 block/nfs.c  |5 +++--
 block/null.c |4 +++-
 block/nvme.c |6 --
 block/rbd.c  |5 +++--
 block/vxhs.c |5 +++--
 cpus.c   |2 --
 docs/replay.txt  |   12 +---
 include/sysemu/replay.h  |3 +++
 replay/replay-events.c   |   16 
 replay/replay-internal.h |1 +
 replay/replay.c  |4 +++-
 stubs/Makefile.objs  |1 +
 stubs/replay-user.c  |9 +
 vl.c |   11 +--
 18 files changed, 113 insertions(+), 24 deletions(-)
 create mode 100644 stubs/replay-user.c

-- 
Pavel Dovgalyuk



[Qemu-devel] [for-4.2 PATCH 2/6] replay: disable default snapshot for record/replay

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch disables setting '-snapshot' option on by default
in record/replay mode. This is needed for creating vmstates in record
and replay modes.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Kevin Wolf 
---
 vl.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index b426b32134..b73fd078a8 100644
--- a/vl.c
+++ b/vl.c
@@ -1202,7 +1202,7 @@ static void configure_blockdev(BlockdevOptionsQueue 
*bdo_queue,
 qapi_free_BlockdevOptions(bdo->bdo);
 g_free(bdo);
 }
-if (snapshot || replay_mode != REPLAY_MODE_NONE) {
+if (snapshot) {
 qemu_opts_foreach(qemu_find_opts("drive"), drive_enable_snapshot,
   NULL, NULL);
 }
@@ -3051,7 +3051,13 @@ int main(int argc, char **argv, char **envp)
 drive_add(IF_PFLASH, -1, optarg, PFLASH_OPTS);
 break;
 case QEMU_OPTION_snapshot:
-snapshot = 1;
+{
+Error *blocker = NULL;
+snapshot = 1;
+error_setg(&blocker, QERR_REPLAY_NOT_SUPPORTED,
+   "-snapshot");
+replay_add_blocker(blocker);
+}
 break;
 case QEMU_OPTION_numa:
 opts = qemu_opts_parse_noisily(qemu_find_opts("numa"),




[Qemu-devel] [for-4.2 PATCH 1/6] block: implement bdrv_snapshot_goto for blkreplay

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch enables making snapshots with blkreplay used in
block devices.
This function is required to make bdrv_snapshot_goto without
calling .bdrv_open which is not implemented.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Kevin Wolf 
---
 block/blkreplay.c |8 
 1 file changed, 8 insertions(+)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 2b7931b940..c96ac8f4bc 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -126,6 +126,12 @@ static int coroutine_fn 
blkreplay_co_flush(BlockDriverState *bs)
 return ret;
 }
 
+static int blkreplay_snapshot_goto(BlockDriverState *bs,
+   const char *snapshot_id)
+{
+return bdrv_snapshot_goto(bs->file->bs, snapshot_id, NULL);
+}
+
 static BlockDriver bdrv_blkreplay = {
 .format_name= "blkreplay",
 .instance_size  = 0,
@@ -140,6 +146,8 @@ static BlockDriver bdrv_blkreplay = {
 .bdrv_co_pwrite_zeroes  = blkreplay_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = blkreplay_co_pdiscard,
 .bdrv_co_flush  = blkreplay_co_flush,
+
+.bdrv_snapshot_goto = blkreplay_snapshot_goto,
 };
 
 static void bdrv_blkreplay_init(void)




[Qemu-devel] [for-4.2 PATCH 3/6] replay: update docs for record/replay with block devices

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

This patch updates the description of the command lines for using
record/replay with attached block devices.

Signed-off-by: Pavel Dovgalyuk 
---
 docs/replay.txt |   12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/docs/replay.txt b/docs/replay.txt
index ee6aee9861..ce97c3f72f 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -27,7 +27,7 @@ Usage of the record/replay:
  * First, record the execution with the following command line:
 qemu-system-i386 \
  -icount shift=7,rr=record,rrfile=replay.bin \
- -drive file=disk.qcow2,if=none,id=img-direct \
+ -drive file=disk.qcow2,if=none,snapshot,id=img-direct \
  -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
  -device ide-hd,drive=img-blkreplay \
  -netdev user,id=net1 -device rtl8139,netdev=net1 \
@@ -35,7 +35,7 @@ Usage of the record/replay:
  * After recording, you can replay it by using another command line:
 qemu-system-i386 \
  -icount shift=7,rr=replay,rrfile=replay.bin \
- -drive file=disk.qcow2,if=none,id=img-direct \
+ -drive file=disk.qcow2,if=none,snapshot,id=img-direct \
  -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \
  -device ide-hd,drive=img-blkreplay \
  -netdev user,id=net1 -device rtl8139,netdev=net1 \
@@ -223,7 +223,7 @@ Block devices record/replay module intercepts calls of
 bdrv coroutine functions at the top of block drivers stack.
 To record and replay block operations the drive must be configured
 as following:
- -drive file=disk.qcow2,if=none,id=img-direct
+ -drive file=disk.qcow2,if=none,snapshot,id=img-direct
  -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
  -device ide-hd,drive=img-blkreplay
 
@@ -252,6 +252,12 @@ This snapshot is created at start of recording and 
restored at start
 of replaying. It also can be loaded while replaying to roll back
 the execution.
 
+'snapshot' flag of the disk image must be removed to save the snapshots
+in the overlay (or original image) instead of using the temporary overlay.
+ -drive file=disk.ovl,if=none,id=img-direct
+ -drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay
+ -device ide-hd,drive=img-blkreplay
+
 Use QEMU monitor to create additional snapshots. 'savevm ' command
 created the snapshot and 'loadvm ' restores it. To prevent corruption
 of the original disk image, use overlay files linked to the original images.




[Qemu-devel] [for-4.2 PATCH 5/6] replay: finish record/replay before closing the disks

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

After recent updates block devices cannot be closed on qemu exit.
This happens due to the block request polling when replay is not finished.
Therefore now we stop execution recording before closing the block devices.

Signed-off-by: Pavel Dovgalyuk 
---
 replay/replay.c |2 ++
 vl.c|1 +
 2 files changed, 3 insertions(+)

diff --git a/replay/replay.c b/replay/replay.c
index 6a62ec3811..71c4e6b777 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -385,6 +385,8 @@ void replay_finish(void)
 g_free(replay_snapshot);
 replay_snapshot = NULL;
 
+replay_mode = REPLAY_MODE_NONE;
+
 replay_finish_events();
 }
 
diff --git a/vl.c b/vl.c
index b73fd078a8..2a341f5ad2 100644
--- a/vl.c
+++ b/vl.c
@@ -4499,6 +4499,7 @@ int main(int argc, char **argv, char **envp)
 
 /* No more vcpu or device emulation activity beyond this point */
 vm_shutdown();
+replay_finish();
 
 job_cancel_sync_all();
 bdrv_close_all();




[Qemu-devel] [PATCH v8 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT)

2019-07-28 Thread Tao Xu
This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V7 patches link:
https://patchwork.kernel.org/cover/11046195/

Changelog:
v8:
- rebase to upstream
- Add check if numa->numa_state is NULL in pxb_dev_realize_common
- Use nb_nodes in spapr_populate_memory() (Igor)
v7:
- Defer 11-13 of patch v6, because the driver of _HMA hasn't been
  implemented in kernel driver
- Drop the HMAT_LB_MEM_CACHE_LAST_LEVEL which is not used in
  ACPI 6.3 (Jonathan)
- Add bit mask in flags of hmat-lb (Jonathan)
- Add a marco to indicate the type is latency or bandwidth (Jonathan)

Liu Jingqi (5):
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)
  numa: Extend the CLI to provide memory latency and bandwidth
information
  numa: Extend the CLI to provide memory side cache information

Tao Xu (6):
  hw/arm: simplify arm_load_dtb
  numa: move numa global variable nb_numa_nodes into MachineState
  numa: move numa global variable have_numa_distance into MachineState
  numa: move numa global variable numa_info into MachineState
  numa: Extend CLI to provide initiator information for numa nodes
  tests/bios-tables-test: add test cases for ACPI HMAT

 exec.c  |   5 +-
 hw/acpi/Kconfig |   5 +
 hw/acpi/Makefile.objs   |   1 +
 hw/acpi/aml-build.c |   9 +-
 hw/acpi/hmat.c  | 256 +
 hw/acpi/hmat.h  | 103 ++
 hw/arm/aspeed.c |   5 +-
 hw/arm/boot.c   |  20 +-
 hw/arm/collie.c |   8 +-
 hw/arm/cubieboard.c |   5 +-
 hw/arm/exynos4_boards.c |   7 +-
 hw/arm/highbank.c   |   8 +-
 hw/arm/imx25_pdk.c  |   5 +-
 hw/arm/integratorcp.c   |   8 +-
 hw/arm/kzm.c|   5 +-
 hw/arm/mainstone.c  |   5 +-
 hw/arm/mcimx6ul-evk.c   |   5 +-
 hw/arm/mcimx7d-sabre.c  |   5 +-
 hw/arm/musicpal.c   |   8 +-
 hw/arm/nseries.c|   5 +-
 hw/arm/omap_sx1.c   |   5 +-
 hw/arm/palm.c   |  10 +-
 hw/arm/raspi.c  |   6 +-
 hw/arm/realview.c   |   5 +-
 hw/arm/sabrelite.c  |   5 +-
 hw/arm/sbsa-ref.c   |  12 +-
 hw/arm/spitz.c  |   5 +-
 hw/arm/tosa.c   |   8 +-
 hw/arm/versatilepb.c|   5 +-
 hw/arm/vexpress.c   |   5 +-
 hw/arm/virt-acpi-build.c|  19 +-
 hw/arm/virt.c   |  17 +-
 hw/arm/xilinx_zynq.c|   8 +-
 hw/arm/xlnx-versal-virt.c   |   7 +-
 hw/arm/xlnx-zcu102.c|   5 +-
 hw/arm/z2.c |   8 +-
 hw/core/machine-hmp-cmds.c  |  12 +-
 hw/core/machine.c   |  38 +++-
 hw/core/numa.c  | 287 
 hw/i386/acpi-build.c|   7 +-
 hw/i386/pc.c|  13 +-
 hw/mem/pc-dimm.c|   2 +
 hw/pci-bridge/pci_expander_bridge.c |   8 +-
 hw/ppc/spapr.c  |  30 +--
 hw/ppc/spapr_pci.c  |   4 +-
 include/hw/acpi/aml-build.h |   2 +-
 include/hw/arm/boot.h   |   4 +-
 include/hw/boards.h |   1 +
 include/qemu/typedefs.h |   2 +
 include/sysemu/numa.h   |  30 ++-
 include/sysemu/sysemu.h |  23 +++
 qapi/machine.json   | 183 +-
 qemu-options.hx |  84 +++-
 tests/bios-tables-test.c|  43 +
 54 files changed, 1135 insertions(+), 246 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

-- 
2.20.1




[Qemu-devel] [PATCH v8 09/11] numa: Extend the CLI to provide memory latency and bandwidth information

2019-07-28 Thread Tao Xu
From: Liu Jingqi 

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/core/numa.c| 127 ++
 include/sysemu/numa.h |   2 +
 qapi/machine.json | 100 -
 qemu-options.hx   |  45 ++-
 4 files changed, 271 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index cfb6339810..83ead77191 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -37,6 +37,7 @@
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 #include "qemu/cutils.h"
+#include "hw/acpi/hmat.h"
 
 QemuOptsList qemu_numa_opts = {
 .name = "numa",
@@ -183,6 +184,126 @@ void parse_numa_distance(MachineState *ms, 
NumaDistOptions *dist, Error **errp)
 ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+NodeInfo *numa_info = ms->numa_state->nodes;
+HMAT_LB_Info *hmat_lb = NULL;
+
+if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+if (!node->has_latency) {
+error_setg(errp, "Missing 'latency' option.");
+return;
+}
+if (node->has_bandwidth) {
+error_setg(errp, "Invalid option 'bandwidth' since "
+   "the data type is latency.");
+return;
+}
+if (node->has_base_bw) {
+error_setg(errp, "Invalid option 'base_bw' since "
+   "the data type is latency.");
+return;
+}
+}
+
+if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+if (!node->has_bandwidth) {
+error_setg(errp, "Missing 'bandwidth' option.");
+return;
+}
+if (node->has_latency) {
+error_setg(errp, "Invalid option 'latency' since "
+   "the data type is bandwidth.");
+return;
+}
+if (node->has_base_lat) {
+error_setg(errp, "Invalid option 'base_lat' since "
+   "the data type is bandwidth.");
+return;
+}
+}
+
+if (node->initiator >= nb_numa_nodes) {
+error_setg(errp, "Invalid initiator=%"
+   PRIu16 ", it should be less than %d.",
+   node->initiator, nb_numa_nodes);
+return;
+}
+if (!numa_info[node->initiator].has_cpu) {
+error_setg(errp, "Invalid initiator=%"
+   PRIu16 ", it isn't an initiator proximity domain.",
+   node->initiator);
+return;
+}
+
+if (node->target >= nb_numa_nodes) {
+error_setg(errp, "Invalid target=%"
+   PRIu16 ", it should be less than %d.",
+   node->target, nb_numa_nodes);
+return;
+}
+if (!numa_info[node->target].initiator_valid) {
+error_setg(errp, "Invalid target=%"
+   PRIu16 ", it hasn't a valid initiator proximity domain.",
+   node->target);
+return;
+}
+
+if (node->has_latency) {
+hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+if (!hmat_lb) {
+hmat_lb = g_malloc0(sizeof(*hmat_lb));
+ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = 
hmat_lb;
+} else if (hmat_lb->latency[node->initiator][node->target]) {
+error_setg(errp, "Duplicate configuration of the latency for "
+   "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+   node->initiator, node->target);
+return;
+}
+
+/* Only the first time of setting the base unit is valid. */
+if ((hmat_lb->base_lat == 0) && (node->has_base_lat)) {
+hmat_lb->base_lat = node->base_lat;
+}
+
+hmat_lb->latency[node->initiator][node->target] = node->latency;
+}
+
+if (node->has_bandwidth) {
+hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+if (!hmat_lb) {
+hmat_lb = g_malloc0(sizeof(*hmat_lb));
+ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = 
hmat_lb;
+} else if (hmat_lb->bandwidth[node->initiator][node->target]) {
+error_setg(errp, "Duplicate configuration of the bandwidth for "
+   "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+   node->initiator, node->target);
+return;
+}
+
+/* Only the first time of setting the base unit is valid. */
+if (hmat_lb->base_bw == 0) {
+if (!node->has_base_bw) {
+error_se

[Qemu-devel] [for-4.2 PATCH 4/6] replay: don't drain/flush bdrv queue while RR is working

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

In record/replay mode bdrv queue is controlled by replay mechanism.
It does not allow saving or loading the snapshots
when bdrv queue is not empty. Stopping the VM is not blocked by nonempty
queue, but flushing the queue is still impossible there,
because it may cause deadlocks in replay mode.
This patch disables bdrv_drain_all and bdrv_flush_all in
record/replay mode.

Stopping the machine when the IO requests are not finished is needed
for the debugging. E.g., breakpoint may be set at the specified step,
and forcing the IO requests to finish may break the determinism
of the execution.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Kevin Wolf 
---
 block/io.c |   28 
 cpus.c |2 --
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 06305c6ea6..2e71bcb8d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -32,6 +32,7 @@
 #include "qemu/cutils.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "sysemu/replay.h"
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
@@ -598,6 +599,15 @@ void bdrv_drain_all_begin(void)
 return;
 }
 
+/*
+ * bdrv queue is managed by record/replay,
+ * waiting for finishing the I/O requests may
+ * be infinite
+ */
+if (replay_events_enabled()) {
+return;
+}
+
 /* AIO_WAIT_WHILE() with a NULL context can only be called from the main
  * loop AioContext, so make sure we're in the main context. */
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
@@ -627,6 +637,15 @@ void bdrv_drain_all_end(void)
 BlockDriverState *bs = NULL;
 int drained_end_counter = 0;
 
+/*
+ * bdrv queue is managed by record/replay,
+ * waiting for finishing the I/O requests may
+ * be endless
+ */
+if (replay_events_enabled()) {
+return;
+}
+
 while ((bs = bdrv_next_all_states(bs))) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
@@ -1997,6 +2016,15 @@ int bdrv_flush_all(void)
 BlockDriverState *bs = NULL;
 int result = 0;
 
+/*
+ * bdrv queue is managed by record/replay,
+ * creating new flush request for stopping
+ * the VM may break the determinism
+ */
+if (replay_events_enabled()) {
+return result;
+}
+
 for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 int ret;
diff --git a/cpus.c b/cpus.c
index a7120ffbd5..f6b7252fdb 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1083,7 +1083,6 @@ static int do_vm_stop(RunState state, bool send_stop)
 }
 
 bdrv_drain_all();
-replay_disable_events();
 ret = bdrv_flush_all();
 
 return ret;
@@ -2169,7 +2168,6 @@ int vm_prepare_start(void)
 /* We are sending this now, but the CPUs will be resumed shortly later */
 qapi_event_send_resume();
 
-replay_enable_events();
 cpu_enable_ticks();
 runstate_set(RUN_STATE_RUNNING);
 vm_state_notify(1, RUN_STATE_RUNNING);




[Qemu-devel] [for-4.2 PATCH 6/6] replay: add BH oneshot event for block layer

2019-07-28 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

Replay is capable of recording normal BH events, but sometimes
there are single use callbacks scheduled with aio_bh_schedule_oneshot
function. This patch enables recording and replaying such callbacks.
Block layer uses these events for calling the completion function.
Replaying these calls makes the execution deterministic.

Signed-off-by: Pavel Dovgalyuk 
Acked-by: Kevin Wolf 
---
 block/block-backend.c|8 +---
 block/io.c   |4 ++--
 block/iscsi.c|5 +++--
 block/nfs.c  |5 +++--
 block/null.c |4 +++-
 block/nvme.c |6 --
 block/rbd.c  |5 +++--
 block/vxhs.c |5 +++--
 include/sysemu/replay.h  |3 +++
 replay/replay-events.c   |   16 
 replay/replay-internal.h |1 +
 replay/replay.c  |2 +-
 stubs/Makefile.objs  |1 +
 stubs/replay-user.c  |9 +
 14 files changed, 57 insertions(+), 17 deletions(-)
 create mode 100644 stubs/replay-user.c

diff --git a/block/block-backend.c b/block/block-backend.c
index 0056b526b8..b8b45e9e82 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -17,6 +17,7 @@
 #include "block/throttle-groups.h"
 #include "sysemu/blockdev.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
 #include "qemu/id.h"
@@ -1296,7 +1297,8 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 acb->blk = blk;
 acb->ret = ret;
 
-aio_bh_schedule_oneshot(blk_get_aio_context(blk), error_callback_bh, acb);
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ error_callback_bh, acb);
 return &acb->common;
 }
 
@@ -1352,8 +1354,8 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-aio_bh_schedule_oneshot(blk_get_aio_context(blk),
-blk_aio_complete_bh, acb);
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
 }
 
 return &acb->common;
diff --git a/block/io.c b/block/io.c
index 2e71bcb8d6..ebff47e6e1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -367,8 +367,8 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs,
 if (bs) {
 bdrv_inc_in_flight(bs);
 }
-aio_bh_schedule_oneshot(bdrv_get_aio_context(bs),
-bdrv_co_drain_bh_cb, &data);
+replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
+ bdrv_co_drain_bh_cb, &data);
 
 qemu_coroutine_yield();
 /* If we are resumed from some other event (such as an aio completion or a
diff --git a/block/iscsi.c b/block/iscsi.c
index 506bf5f875..2ced15066a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -40,6 +40,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/uuid.h"
+#include "sysemu/replay.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qmp/qdict.h"
@@ -280,8 +281,8 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
 }
 
 if (iTask->co) {
-aio_bh_schedule_oneshot(iTask->iscsilun->aio_context,
- iscsi_co_generic_bh_cb, iTask);
+replay_bh_schedule_oneshot_event(iTask->iscsilun->aio_context,
+ iscsi_co_generic_bh_cb, iTask);
 } else {
 iTask->complete = 1;
 }
diff --git a/block/nfs.c b/block/nfs.c
index d93241b3bb..cfd6c956b3 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -37,6 +37,7 @@
 #include "qemu/uri.h"
 #include "qemu/cutils.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
@@ -257,8 +258,8 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void 
*data,
 if (task->ret < 0) {
 error_report("NFS Error: %s", nfs_get_error(nfs));
 }
-aio_bh_schedule_oneshot(task->client->aio_context,
-nfs_co_generic_bh_cb, task);
+replay_bh_schedule_oneshot_event(task->client->aio_context,
+ nfs_co_generic_bh_cb, task);
 }
 
 static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset,
diff --git a/block/null.c b/block/null.c
index 699aa295cb..15e1d56746 100644
--- a/block/null.c
+++ b/block/null.c
@@ -17,6 +17,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "block/block_int.h"
+#include "sysemu/replay.h"
 
 #define NULL_OPT_LATENCY "latency-ns"
 #define NULL_OPT_ZEROES  "read-zeroes"
@@ -179,7 +180,8 @@ static inline BlockAIOCB *null_aio_common(BlockDriverState 
*bs,
 timer_mod_ns(&acb->timer,
  qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns);
 } 

[Qemu-devel] [PATCH v8 02/11] numa: move numa global variable nb_numa_nodes into MachineState

2019-07-28 Thread Tao Xu
Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Suggested-by: Igor Mammedov 
Suggested-by: Eduardo Habkost 
Signed-off-by: Tao Xu 
---

Changes in v8:
- Add check if numa->numa_state is NULL in pxb_dev_realize_common
- Use nb_nodes in spapr_populate_memory() (Igor)
---
 exec.c  |  5 ++-
 hw/acpi/aml-build.c |  3 +-
 hw/arm/boot.c   |  4 +-
 hw/arm/sbsa-ref.c   |  4 +-
 hw/arm/virt-acpi-build.c| 10 +++--
 hw/arm/virt.c   |  4 +-
 hw/core/machine-hmp-cmds.c  | 12 --
 hw/core/machine.c   | 14 +--
 hw/core/numa.c  | 60 +
 hw/i386/acpi-build.c|  2 +-
 hw/i386/pc.c|  9 +++--
 hw/mem/pc-dimm.c|  2 +
 hw/pci-bridge/pci_expander_bridge.c |  8 +++-
 hw/ppc/spapr.c  | 20 +-
 include/hw/acpi/aml-build.h |  2 +-
 include/hw/boards.h |  1 +
 include/sysemu/numa.h   | 10 -
 17 files changed, 111 insertions(+), 59 deletions(-)

diff --git a/exec.c b/exec.c
index 3e78de3b8f..4fd6ec2bd0 100644
--- a/exec.c
+++ b/exec.c
@@ -1749,6 +1749,7 @@ long qemu_minrampagesize(void)
 long hpsize = LONG_MAX;
 long mainrampagesize;
 Object *memdev_root;
+MachineState *ms = MACHINE(qdev_get_machine());
 
 mainrampagesize = qemu_mempath_getpagesize(mem_path);
 
@@ -1776,7 +1777,9 @@ long qemu_minrampagesize(void)
  * so if its page size is smaller we have got to report that size instead.
  */
 if (hpsize > mainrampagesize &&
-(nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL)) {
+(ms->numa_state == NULL ||
+ ms->numa_state->num_nodes == 0 ||
+ numa_info[0].node_memdev == NULL)) {
 static bool warned;
 if (!warned) {
 error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 555c24f21d..63c1cae8c9 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1726,10 +1726,11 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, 
uint64_t base,
  * ACPI spec 5.2.17 System Locality Distance Information Table
  * (Revision 2.0 or later)
  */
-void build_slit(GArray *table_data, BIOSLinker *linker)
+void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
 {
 int slit_start, i, j;
 slit_start = table_data->len;
+int nb_numa_nodes = ms->numa_state->num_nodes;
 
 acpi_data_push(table_data, sizeof(AcpiTableHeader));
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index ba604f8277..d02d2dae85 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -598,9 +598,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 }
 g_strfreev(node_path);
 
-if (nb_numa_nodes > 0) {
+if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
 mem_base = binfo->loader_start;
-for (i = 0; i < nb_numa_nodes; i++) {
+for (i = 0; i < ms->numa_state->num_nodes; i++) {
 mem_len = numa_info[i].node_mem;
 rc = fdt_add_memory_node(fdt, acells, mem_base,
  scells, mem_len, i);
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 2aba3c58c5..22847909bf 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -144,6 +144,7 @@ static void create_fdt(SBSAMachineState *sms)
 {
 void *fdt = create_device_tree(&sms->fdt_size);
 const MachineState *ms = MACHINE(sms);
+int nb_numa_nodes = ms->numa_state->num_nodes;
 int cpu;
 
 if (!fdt) {
@@ -760,7 +761,7 @@ sbsa_ref_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
 static int64_t
 sbsa_ref_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-return idx % nb_numa_nodes;
+return idx % ms->numa_state->num_nodes;
 }
 
 static void sbsa_ref_instance_init(Object *obj)
@@ -787,6 +788,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
 mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
 mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;
+mc->numa_mem_supported = true;
 }
 
 static const TypeInfo sbsa_ref_info = {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 0afb372769..a2cc4b84fe 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -516,7 +516,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 int i, srat_start;
 uint64_t mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(vms);
-const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
+MachineState *ms = MACHINE(vms);
+co

[Qemu-devel] [PATCH v8 05/11] numa: Extend CLI to provide initiator information for numa nodes

2019-07-28 Thread Tao Xu
In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jingqi Liu 
Suggested-by: Dan Williams 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/core/machine.c | 24 
 hw/core/numa.c| 13 +
 include/sysemu/numa.h |  3 +++
 qapi/machine.json |  6 +-
 qemu-options.hx   | 27 +++
 5 files changed, 68 insertions(+), 5 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1b6a304e1c..a2b88b4fb3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -656,6 +656,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props, Error 
**errp)
 {
 MachineClass *mc = MACHINE_GET_CLASS(machine);
+NodeInfo *numa_info = machine->numa_state->nodes;
 bool match = false;
 int i;
 
@@ -725,6 +726,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
 match = true;
 slot->props.node_id = props->node_id;
 slot->props.has_node_id = props->has_node_id;
+
+if (numa_info[props->node_id].initiator_valid &&
+(props->node_id != numa_info[props->node_id].initiator)) {
+error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+   " should be itself.", props->node_id);
+return;
+}
+numa_info[props->node_id].initiator_valid = true;
+numa_info[props->node_id].has_cpu = true;
+numa_info[props->node_id].initiator = props->node_id;
 }
 
 if (!match) {
@@ -1066,6 +1077,7 @@ static void machine_numa_finish_cpu_init(MachineState 
*machine)
 GString *s = g_string_new(NULL);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
+NodeInfo *numa_info = machine->numa_state->nodes;
 
 assert(machine->numa_state->num_nodes);
 for (i = 0; i < possible_cpus->len; i++) {
@@ -1099,6 +,18 @@ static void machine_numa_finish_cpu_init(MachineState 
*machine)
 machine_set_cpu_numa_node(machine, &props, &error_fatal);
 }
 }
+
+for (i = 0; i < machine->numa_state->num_nodes; i++) {
+if (numa_info[i].initiator_valid &&
+!numa_info[numa_info[i].initiator].has_cpu) {
+error_report("The initiator-id %"PRIu16 " of NUMA node %d"
+ " does not exist.", numa_info[i].initiator, i);
+error_printf("\n");
+
+exit(1);
+}
+}
+
 if (s->len && !qtest_enabled()) {
 warn_report("CPU(s) not present in any NUMA nodes: %s",
 s->str);
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 8fcbba05d6..cfb6339810 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, 
NumaNodeOptions *node,
 numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
 numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
 }
+
+if (node->has_initiator) {
+if (numa_info[nodenr].initiator_valid &&
+(node->initiator != numa_info[nodenr].initiator)) {
+error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
+   "set to node %" PRIu16, nodenr,
+   numa_info[nodenr].initiator);
+return;
+}
+
+numa_info[nodenr].initiator_valid = true;
+numa_info[nodenr].initiator = node->initiator;
+}
 numa_info[nodenr].present = true;
 max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
 ms->numa_state->num_nodes++;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 76da3016db..46ad06e000 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -10,6 +10,9 @@ struct NodeInfo {
 uint64_t node_mem;
 struct HostMemoryBackend *node_memdev;
 bool present;
+bool has_cpu;
+bool initiator_valid;
+uint16_t initiator;
 uint8_t distance[MAX_NODES];
 };
 
diff --git a/qapi/machine.json b/qapi/machine.json
index 6db8a7e2ec..05e367d26a 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -414,6 +414,9 @@
 # @memdev: memory backend object.  If specified for one node,
 #  it must be specified for all nodes.
 #
+# @initiator: the initiator numa nodeid that is closest (as in directly
+# attached) to this numa node (since 4.2)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOp

[Qemu-devel] [PATCH v8 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)

2019-07-28 Thread Tao Xu
From: Liu Jingqi 

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/acpi/hmat.c  | 95 -
 hw/acpi/hmat.h  | 41 ++
 include/qemu/typedefs.h |  1 +
 include/sysemu/numa.h   |  3 ++
 include/sysemu/sysemu.h | 21 +
 5 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index abf99b1adc..431818dc82 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t 
flags, int initiator,
 build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+  uint32_t num_initiator, uint32_t num_target,
+  uint32_t *initiator_pxm, int type)
+{
+uint32_t s = num_initiator;
+uint32_t t = num_target;
+uint8_t m, n;
+uint8_t mask = 0x0f;
+int i;
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 32 + 4 * s + 4 * t + 2 * s * t, 4);
+/* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+build_append_int_noprefix(table_data, hmat_lb->hierarchy & mask, 1);
+/* Data Type */
+build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Number of Initiator Proximity Domains (s) */
+build_append_int_noprefix(table_data, s, 4);
+/* Number of Target Proximity Domains (t) */
+build_append_int_noprefix(table_data, t, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Entry Base Unit */
+if (HMAT_IS_LATENCY(type)) {
+build_append_int_noprefix(table_data, hmat_lb->base_lat, 8);
+} else {
+build_append_int_noprefix(table_data, hmat_lb->base_bw, 8);
+}
+
+/* Initiator Proximity Domain List */
+for (i = 0; i < s; i++) {
+build_append_int_noprefix(table_data, initiator_pxm[i], 4);
+}
+
+/* Target Proximity Domain List */
+for (i = 0; i < t; i++) {
+build_append_int_noprefix(table_data, i, 4);
+}
+
+/* Latency or Bandwidth Entries */
+for (i = 0; i < s; i++) {
+m = initiator_pxm[i];
+for (n = 0; n < t; n++) {
+uint16_t entry;
+
+if (HMAT_IS_LATENCY(type)) {
+entry = hmat_lb->latency[m][n];
+} else {
+entry = hmat_lb->bandwidth[m][n];
+}
+
+build_append_int_noprefix(table_data, entry, 2);
+}
+}
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
 uint16_t flags;
-int i;
+uint32_t num_initiator = 0;
+uint32_t initiator_pxm[MAX_NODES];
+int i, hrchy, type;
+HMAT_LB_Info *numa_hmat_lb;
 
 for (i = 0; i < nstat->num_nodes; i++) {
 flags = 0;
@@ -82,6 +152,29 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *nstat)
 
 build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
 }
+
+for (i = 0; i < nstat->num_nodes; i++) {
+if (nstat->nodes[i].has_cpu) {
+initiator_pxm[num_initiator++] = i;
+}
+}
+
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142
+ */
+for (hrchy = HMAT_LB_MEM_MEMORY;
+ hrchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hrchy++) {
+for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+ type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+numa_hmat_lb = nstat->hmat_lb[hrchy][type];
+
+if (numa_hmat_lb) {
+build_hmat_lb(table_data, numa_hmat_lb, num_initiator,
+  nstat->num_nodes, initiator_pxm, type);
+}
+}
+}
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 574cfba60a..5f050781e6 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -40,6 +40,47 @@
  */
 #define HMAT_PROX_INIT_VALID 0x1
 
+#define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
+
+struct HMAT_LB_Info {
+/*
+ * Indicates total number of Proximity Domains
+ * that can initiate memory access requests.
+ */
+uint32_tnum_initiator;
+/*
+

[Qemu-devel] [Bug 1838228] [NEW] Slirp Broadcast traffic

2019-07-28 Thread Chris Koch
Public bug reported:

Hi all,

Version: QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-7+build1)

I'm running some DHCP traffic to a *custom* DHCP server with user-mode
networking in QEMU. I'm using port 30067 for the server, so this does
not conflict with the built-in DHCP server.

DHCP broadcasts to and from the server, and I'm observing issues with
both sending and receiving packets.

Firstly, from the VM, a packet sent to IPv4 x.x.x.2:30067 (gateway)
makes it to the server, but a packet sent to 255.255.255.255 does not.
I'd suspect that Slirp has to support sending to the broadcast IP
address? Or is this something I can turn on with a configuration option?
(My QEMU version too old?)

Secondly, the source address in a DHCP IPv4 packet must be 0.0.0.0 (by
RFC). That means that any return packet will have 0.0.0.0 swapped in as
its destination address. However, that packet doesn't make it into the
VM at all. I know that if you deliver this packet to Linux, a raw socket
will spit it back out. The packets' destination address should not
prevent the packet from being delivered to the right VM, since Slirp
(should?) know exactly which VM the session belongs to. (It's a proxy,
not a router.)

WDYT? Did I miss some configuration options or use too old a version?

Thanks,
Chris

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1838228

Title:
  Slirp Broadcast traffic

Status in QEMU:
  New

Bug description:
  Hi all,

  Version: QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-7+build1)

  I'm running some DHCP traffic to a *custom* DHCP server with user-mode
  networking in QEMU. I'm using port 30067 for the server, so this does
  not conflict with the built-in DHCP server.

  DHCP broadcasts to and from the server, and I'm observing issues with
  both sending and receiving packets.

  Firstly, from the VM, a packet sent to IPv4 x.x.x.2:30067 (gateway)
  makes it to the server, but a packet sent to 255.255.255.255 does not.
  I'd suspect that Slirp has to support sending to the broadcast IP
  address? Or is this something I can turn on with a configuration
  option? (My QEMU version too old?)

  Secondly, the source address in a DHCP IPv4 packet must be 0.0.0.0 (by
  RFC). That means that any return packet will have 0.0.0.0 swapped in
  as its destination address. However, that packet doesn't make it into
  the VM at all. I know that if you deliver this packet to Linux, a raw
  socket will spit it back out. The packets' destination address should
  not prevent the packet from being delivered to the right VM, since
  Slirp (should?) know exactly which VM the session belongs to. (It's a
  proxy, not a router.)

  WDYT? Did I miss some configuration options or use too old a version?

  Thanks,
  Chris

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1838228/+subscriptions



[Qemu-devel] [PATCH v8 11/11] tests/bios-tables-test: add test cases for ACPI HMAT

2019-07-28 Thread Tao Xu
ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
  tests/acpi-test-data/pc/*.acpihmat
  tests/acpi-test-data/pc/HMAT.*
  tests/acpi-test-data/q35/*.acpihmat
  tests/acpi-test-data/q35/HMAT.*

Reviewed-by: Jingqi Liu 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 tests/bios-tables-test.c | 43 
 1 file changed, 43 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index a356ac3489..53ffc8081a 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -871,6 +871,47 @@ static void test_acpi_piix4_tcg_dimm_pxm(void)
 test_acpi_tcg_dimm_pxm(MACHINE_PC);
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+test_data data;
+
+memset(&data, 0, sizeof(data));
+data.machine = machine;
+data.variant = ".acpihmat";
+test_acpi_one(" -smp 2,sockets=2"
+  " -m 128M,slots=2,maxmem=1G"
+  " -object memory-backend-ram,size=64M,id=m0"
+  " -object memory-backend-ram,size=64M,id=m1"
+  " -numa node,nodeid=0,memdev=m0"
+  " -numa node,nodeid=1,memdev=m1,initiator=0"
+  " -numa cpu,node-id=0,socket-id=0"
+  " -numa cpu,node-id=0,socket-id=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-latency,base-lat=10,latency=5"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-bandwidth,base-bw=20,bandwidth=5"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-latency,base-lat=10,latency=10"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-bandwidth,base-bw=20,bandwidth=10"
+  " -numa hmat-cache,node-id=0,size=0x2,total=1,level=1"
+  ",assoc=direct,policy=write-back,line=8"
+  " -numa hmat-cache,node-id=1,size=0x2,total=1,level=1"
+  ",assoc=direct,policy=write-back,line=8",
+  &data);
+free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
 test_data data = {
@@ -915,6 +956,8 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
 qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
 qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
 } else if (strcmp(arch, "aarch64") == 0) {
 qtest_add_func("acpi/virt", test_acpi_virt_tcg);
 }
-- 
2.20.1




[Qemu-devel] [PATCH v8 03/11] numa: move numa global variable have_numa_distance into MachineState

2019-07-28 Thread Tao Xu
Move existing numa global have_numa_distance into NumaState.

Reviewed-by: Igor Mammedov 
Reviewed-by: Liu Jingqi 
Suggested-by: Igor Mammedov 
Suggested-by: Eduardo Habkost 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/arm/sbsa-ref.c| 2 +-
 hw/arm/virt-acpi-build.c | 2 +-
 hw/arm/virt.c| 2 +-
 hw/core/numa.c   | 5 ++---
 hw/i386/acpi-build.c | 2 +-
 include/sysemu/numa.h| 4 ++--
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 22847909bf..7e4c471717 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -158,7 +158,7 @@ static void create_fdt(SBSAMachineState *sms)
 qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x2);
 qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
 
-if (have_numa_distance) {
+if (ms->numa_state->have_numa_distance) {
 int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
 uint32_t *matrix = g_malloc0(size);
 int idx, i, j;
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index a2cc4b84fe..461a44b5b0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -797,7 +797,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables 
*tables)
 if (ms->numa_state->num_nodes > 0) {
 acpi_add_table(table_offsets, tables_blob);
 build_srat(tables_blob, tables->linker, vms);
-if (have_numa_distance) {
+if (ms->numa_state->have_numa_distance) {
 acpi_add_table(table_offsets, tables_blob);
 build_slit(tables_blob, tables->linker, ms);
 }
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c72b8fd3a7..6f0170cf1d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -232,7 +232,7 @@ static void create_fdt(VirtMachineState *vms)
 "clk24mhz");
 qemu_fdt_setprop_cell(fdt, "/apb-pclk", "phandle", vms->clock_phandle);
 
-if (have_numa_distance) {
+if (nb_numa_nodes > 0 && ms->numa_state->have_numa_distance) {
 int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
 uint32_t *matrix = g_malloc0(size);
 int idx, i, j;
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 4d5e308bf1..2142ec29e8 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -50,7 +50,6 @@ static int have_mem;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
  * For all nodes, nodeid < max_numa_nodeid
  */
-bool have_numa_distance;
 NodeInfo numa_info[MAX_NODES];
 
 
@@ -168,7 +167,7 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions 
*dist, Error **errp)
 }
 
 numa_info[src].distance[dst] = val;
-have_numa_distance = true;
+ms->numa_state->have_numa_distance = true;
 }
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
@@ -441,7 +440,7 @@ void numa_complete_configuration(MachineState *ms)
  * asymmetric. In this case, the distances for both directions
  * of all node pairs are required.
  */
-if (have_numa_distance) {
+if (ms->numa_state->have_numa_distance) {
 /* Validate enough NUMA distance information was provided. */
 validate_numa_distance(ms);
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d4c092358d..081a8fc116 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2692,7 +2692,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 if (pcms->numa_nodes) {
 acpi_add_table(table_offsets, tables_blob);
 build_srat(tables_blob, tables->linker, machine);
-if (have_numa_distance) {
+if (machine->numa_state->have_numa_distance) {
 acpi_add_table(table_offsets, tables_blob);
 build_slit(tables_blob, tables->linker, machine);
 }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 3e8dbf20c1..2e5e998adb 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,8 +6,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/boards.h"
 
-extern bool have_numa_distance;
-
 struct NodeInfo {
 uint64_t node_mem;
 struct HostMemoryBackend *node_memdev;
@@ -26,6 +24,8 @@ struct NumaState {
 /* Number of NUMA nodes */
 int num_nodes;
 
+/* Allow setting NUMA distance for different NUMA nodes */
+bool have_numa_distance;
 };
 typedef struct NumaState NumaState;
 
-- 
2.20.1




[Qemu-devel] [PATCH v8 01/11] hw/arm: simplify arm_load_dtb

2019-07-28 Thread Tao Xu
In struct arm_boot_info, kernel_filename, initrd_filename and
kernel_cmdline are copied from from MachineState. This patch add
MachineState as a parameter into arm_load_dtb() and move the copy chunk
of kernel_filename, initrd_filename and kernel_cmdline into
arm_load_kernel().

Reviewed-by: Igor Mammedov 
Reviewed-by: Liu Jingqi 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

Changes in v8:
- rebase to upstream
---
 hw/arm/aspeed.c   |  5 +
 hw/arm/boot.c | 14 --
 hw/arm/collie.c   |  8 +---
 hw/arm/cubieboard.c   |  5 +
 hw/arm/exynos4_boards.c   |  7 ++-
 hw/arm/highbank.c |  8 +---
 hw/arm/imx25_pdk.c|  5 +
 hw/arm/integratorcp.c |  8 +---
 hw/arm/kzm.c  |  5 +
 hw/arm/mainstone.c|  5 +
 hw/arm/mcimx6ul-evk.c |  5 +
 hw/arm/mcimx7d-sabre.c|  5 +
 hw/arm/musicpal.c |  8 +---
 hw/arm/nseries.c  |  5 +
 hw/arm/omap_sx1.c |  5 +
 hw/arm/palm.c | 10 ++
 hw/arm/raspi.c|  6 +-
 hw/arm/realview.c |  5 +
 hw/arm/sabrelite.c|  5 +
 hw/arm/sbsa-ref.c |  3 +--
 hw/arm/spitz.c|  5 +
 hw/arm/tosa.c |  8 +---
 hw/arm/versatilepb.c  |  5 +
 hw/arm/vexpress.c |  5 +
 hw/arm/virt.c |  8 +++-
 hw/arm/xilinx_zynq.c  |  8 +---
 hw/arm/xlnx-versal-virt.c |  7 ++-
 hw/arm/xlnx-zcu102.c  |  5 +
 hw/arm/z2.c   |  8 +---
 include/hw/arm/boot.h |  4 ++--
 30 files changed, 43 insertions(+), 147 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 843b708247..f8733b86b9 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -241,9 +241,6 @@ static void aspeed_board_init(MachineState *machine,
 write_boot_rom(drive0, FIRMWARE_ADDR, fl->size, &error_abort);
 }
 
-aspeed_board_binfo.kernel_filename = machine->kernel_filename;
-aspeed_board_binfo.initrd_filename = machine->initrd_filename;
-aspeed_board_binfo.kernel_cmdline = machine->kernel_cmdline;
 aspeed_board_binfo.ram_size = ram_size;
 aspeed_board_binfo.loader_start = sc->info->memmap[ASPEED_SDRAM];
 aspeed_board_binfo.nb_cpus = bmc->soc.num_cpus;
@@ -252,7 +249,7 @@ static void aspeed_board_init(MachineState *machine,
 cfg->i2c_init(bmc);
 }
 
-arm_load_kernel(ARM_CPU(first_cpu), &aspeed_board_binfo);
+arm_load_kernel(ARM_CPU(first_cpu), machine, &aspeed_board_binfo);
 }
 
 static void palmetto_bmc_i2c_init(AspeedBoardState *bmc)
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index c2b89b3bb9..ba604f8277 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -524,7 +524,7 @@ static void fdt_add_psci_node(void *fdt)
 }
 
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
- hwaddr addr_limit, AddressSpace *as)
+ hwaddr addr_limit, AddressSpace *as, MachineState *ms)
 {
 void *fdt = NULL;
 int size, rc, n = 0;
@@ -627,9 +627,9 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 qemu_fdt_add_subnode(fdt, "/chosen");
 }
 
-if (binfo->kernel_cmdline && *binfo->kernel_cmdline) {
+if (ms->kernel_cmdline && *ms->kernel_cmdline) {
 rc = qemu_fdt_setprop_string(fdt, "/chosen", "bootargs",
- binfo->kernel_cmdline);
+ ms->kernel_cmdline);
 if (rc < 0) {
 fprintf(stderr, "couldn't set /chosen/bootargs\n");
 goto fail;
@@ -1261,7 +1261,7 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct 
arm_boot_info *info)
  */
 }
 
-void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info *info)
+void arm_load_kernel(ARMCPU *cpu, MachineState *ms, struct arm_boot_info *info)
 {
 CPUState *cs;
 AddressSpace *as = arm_boot_address_space(cpu, info);
@@ -1282,7 +1282,9 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
  * doesn't support secure.
  */
 assert(!(info->secure_board_setup && kvm_enabled()));
-
+info->kernel_filename = ms->kernel_filename;
+info->kernel_cmdline = ms->kernel_cmdline;
+info->initrd_filename = ms->initrd_filename;
 info->dtb_filename = qemu_opt_get(qemu_get_machine_opts(), "dtb");
 info->dtb_limit = 0;
 
@@ -1294,7 +1296,7 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 }
 
 if (!info->skip_dtb_autoload && have_dtb(info)) {
-if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as) < 0) {
+if (arm_load_dtb(info->dtb_start, info, info->dtb_limit, as, ms) < 0) {
 exit(1);
 }
 }
diff --git a/hw/arm/collie.c b/hw/arm/collie.c
index 3db3c56004..72bc8f26e5 100644
--- a/hw/arm/collie.c
+++ b/hw/arm/collie.c
@@ -26,9 +26,6 @@ static struct arm_boot_info collie_binfo = {
 
 static void collie_init(MachineState *machine)
 {
-const char *ker

[Qemu-devel] [PATCH v8 10/11] numa: Extend the CLI to provide memory side cache information

2019-07-28 Thread Tao Xu
From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/core/numa.c| 67 +++
 include/sysemu/numa.h |  2 ++
 qapi/machine.json | 81 +--
 qemu-options.hx   | 14 +++-
 4 files changed, 161 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 83ead77191..75db35ac19 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -304,6 +304,67 @@ void parse_numa_hmat_lb(MachineState *ms, 
NumaHmatLBOptions *node,
 }
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+HMAT_Cache_Info *hmat_cache = NULL;
+
+if (node->node_id >= nb_numa_nodes) {
+error_setg(errp, "Invalid node-id=%" PRIu32
+   ", it should be less than %d.",
+   node->node_id, nb_numa_nodes);
+return;
+}
+
+if (node->total > MAX_HMAT_CACHE_LEVEL) {
+error_setg(errp, "Invalid total=%" PRIu8
+   ", it should be less than or equal to %d.",
+   node->total, MAX_HMAT_CACHE_LEVEL);
+return;
+}
+if (node->level > node->total) {
+error_setg(errp, "Invalid level=%" PRIu8
+   ", it should be less than or equal to"
+   " total=%" PRIu8 ".",
+   node->level, node->total);
+return;
+}
+if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+error_setg(errp, "Duplicate configuration of the side cache for "
+   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
+   node->node_id, node->level);
+return;
+}
+
+if ((node->level > 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+(node->size >=
+ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) 
{
+error_setg(errp, "Invalid size=0x%" PRIx64
+   ", the size of level=%" PRIu8
+   " should be less than the size(0x%" PRIx64
+   ") of level=%" PRIu8 ".",
+   node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level - 1]->size,
+   node->level - 1);
+return;
+}
+
+hmat_cache = g_malloc0(sizeof(*hmat_cache));
+
+hmat_cache->mem_proximity = node->node_id;
+hmat_cache->size = node->size;
+hmat_cache->total_levels = node->total;
+hmat_cache->level = node->level;
+hmat_cache->associativity = node->assoc;
+hmat_cache->write_policy = node->policy;
+hmat_cache->line_size = node->line;
+
+ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
 Error *err = NULL;
@@ -348,6 +409,12 @@ void set_numa_options(MachineState *ms, NumaOptions 
*object, Error **errp)
 goto end;
 }
 break;
+case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+if (err) {
+goto end;
+}
+break;
 default:
 abort();
 }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index f0857b7ee6..9009bbdee3 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -43,6 +43,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, 
Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
 Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 25f5b0321d..65a3584b92 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -379,10 +379,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
 #
+# @hmat-cache: memory side cache information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -398,7 +400,8 @@
 'node': 'NumaNodeOptions',
 'dist': 'NumaDistOptions',
 'cpu': 'NumaCpuOptions',
-'hmat-lb': 'NumaHmatLBOptions' }}
+'hmat-lb': 'NumaHmatLBOptions',
+'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -600,6 +603,80 @@
 '

[Qemu-devel] [PATCH v8 04/11] numa: move numa global variable numa_info into MachineState

2019-07-28 Thread Tao Xu
Move existing numa global numa_info (renamed as "nodes") into NumaState.

Reviewed-by: Igor Mammedov 
Suggested-by: Igor Mammedov 
Suggested-by: Eduardo Habkost 
Signed-off-by: Tao Xu 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 exec.c   |  2 +-
 hw/acpi/aml-build.c  |  6 --
 hw/arm/boot.c|  2 +-
 hw/arm/sbsa-ref.c|  3 ++-
 hw/arm/virt-acpi-build.c |  7 ---
 hw/arm/virt.c|  3 ++-
 hw/core/numa.c   | 15 +--
 hw/i386/pc.c |  4 ++--
 hw/ppc/spapr.c   | 10 +-
 hw/ppc/spapr_pci.c   |  4 +++-
 include/sysemu/numa.h|  5 +++--
 11 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/exec.c b/exec.c
index 4fd6ec2bd0..de87d3548b 100644
--- a/exec.c
+++ b/exec.c
@@ -1779,7 +1779,7 @@ long qemu_minrampagesize(void)
 if (hpsize > mainrampagesize &&
 (ms->numa_state == NULL ||
  ms->numa_state->num_nodes == 0 ||
- numa_info[0].node_memdev == NULL)) {
+ ms->numa_state->nodes[0].node_memdev == NULL)) {
 static bool warned;
 if (!warned) {
 error_report("Huge page support disabled (n/a for main memory).");
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 63c1cae8c9..26ccc1a3e2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1737,8 +1737,10 @@ void build_slit(GArray *table_data, BIOSLinker *linker, 
MachineState *ms)
 build_append_int_noprefix(table_data, nb_numa_nodes, 8);
 for (i = 0; i < nb_numa_nodes; i++) {
 for (j = 0; j < nb_numa_nodes; j++) {
-assert(numa_info[i].distance[j]);
-build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
+assert(ms->numa_state->nodes[i].distance[j]);
+build_append_int_noprefix(table_data,
+  ms->numa_state->nodes[i].distance[j],
+  1);
 }
 }
 
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d02d2dae85..6472aa441e 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -601,7 +601,7 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
 mem_base = binfo->loader_start;
 for (i = 0; i < ms->numa_state->num_nodes; i++) {
-mem_len = numa_info[i].node_mem;
+mem_len = ms->numa_state->nodes[i].node_mem;
 rc = fdt_add_memory_node(fdt, acells, mem_base,
  scells, mem_len, i);
 if (rc < 0) {
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 7e4c471717..3a243e6a53 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -168,7 +168,8 @@ static void create_fdt(SBSAMachineState *sms)
 idx = (i * nb_numa_nodes + j) * 3;
 matrix[idx + 0] = cpu_to_be32(i);
 matrix[idx + 1] = cpu_to_be32(j);
-matrix[idx + 2] = cpu_to_be32(numa_info[i].distance[j]);
+matrix[idx + 2] =
+cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
 }
 }
 
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 461a44b5b0..89899ec4c1 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -534,11 +534,12 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 
 mem_base = vms->memmap[VIRT_MEM].base;
 for (i = 0; i < ms->numa_state->num_nodes; ++i) {
-if (numa_info[i].node_mem > 0) {
+if (ms->numa_state->nodes[i].node_mem > 0) {
 numamem = acpi_data_push(table_data, sizeof(*numamem));
-build_srat_memory(numamem, mem_base, numa_info[i].node_mem, i,
+build_srat_memory(numamem, mem_base,
+  ms->numa_state->nodes[i].node_mem, i,
   MEM_AFFINITY_ENABLED);
-mem_base += numa_info[i].node_mem;
+mem_base += ms->numa_state->nodes[i].node_mem;
 }
 }
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6f0170cf1d..46f39e20bc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -242,7 +242,8 @@ static void create_fdt(VirtMachineState *vms)
 idx = (i * nb_numa_nodes + j) * 3;
 matrix[idx + 0] = cpu_to_be32(i);
 matrix[idx + 1] = cpu_to_be32(j);
-matrix[idx + 2] = cpu_to_be32(numa_info[i].distance[j]);
+matrix[idx + 2] =
+cpu_to_be32(ms->numa_state->nodes[i].distance[j]);
 }
 }
 
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 2142ec29e8..8fcbba05d6 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -50,8 +50,6 @@ static int have_mem;
 static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
  * For all nodes, nodeid < max_numa_nodeid
  */
-NodeInfo numa_info[MAX_NODES];
-
 
 static vo

[Qemu-devel] [PATCH v8 08/11] hmat acpi: Build Memory Side Cache Information Structure(s)

2019-07-28 Thread Tao Xu
From: Liu Jingqi 

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/acpi/hmat.c  | 64 -
 hw/acpi/hmat.h  | 17 +++
 include/qemu/typedefs.h |  1 +
 include/sysemu/numa.h   |  3 ++
 include/sysemu/sysemu.h |  2 ++
 5 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 431818dc82..01a6552d51 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -134,14 +134,63 @@ static void build_hmat_lb(GArray *table_data, 
HMAT_LB_Info *hmat_lb,
 }
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-143 */
+static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache)
+{
+/*
+ * Cache Attributes: Bits [3:0] – Total Cache Levels
+ * for this Memory Proximity Domain
+ */
+uint32_t cache_attr = hmat_cache->total_levels & 0xF;
+
+/* Bits [7:4] : Cache Level described in this structure */
+cache_attr |= (hmat_cache->level & 0xF) << 4;
+
+/* Bits [11:8] - Cache Associativity */
+cache_attr |= (hmat_cache->associativity & 0xF) << 8;
+
+/* Bits [15:12] - Write Policy */
+cache_attr |= (hmat_cache->write_policy & 0xF) << 12;
+
+/* Bits [31:16] - Cache Line size in bytes */
+cache_attr |= (hmat_cache->line_size & 0x) << 16;
+
+cache_attr = cpu_to_le32(cache_attr);
+
+/* Type */
+build_append_int_noprefix(table_data, 2, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 32, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, hmat_cache->mem_proximity, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Memory Side Cache Size */
+build_append_int_noprefix(table_data, hmat_cache->size, 8);
+/* Cache Attributes */
+build_append_int_noprefix(table_data, cache_attr, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/*
+ * Number of SMBIOS handles (n)
+ * Linux kernel uses Memory Side Cache Information Structure
+ * without SMBIOS entries for now, so set Number of SMBIOS handles
+ * as 0.
+ */
+build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
 {
 uint16_t flags;
 uint32_t num_initiator = 0;
 uint32_t initiator_pxm[MAX_NODES];
-int i, hrchy, type;
+int i, hrchy, type, level;
 HMAT_LB_Info *numa_hmat_lb;
+HMAT_Cache_Info *numa_hmat_cache;
 
 for (i = 0; i < nstat->num_nodes; i++) {
 flags = 0;
@@ -175,6 +224,19 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *nstat)
 }
 }
 }
+
+/*
+ * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+ * Table 5-143
+ */
+for (i = 0; i < nstat->num_nodes; i++) {
+for (level = 0; level <= MAX_HMAT_CACHE_LEVEL; level++) {
+numa_hmat_cache = nstat->hmat_cache[i][level];
+if (numa_hmat_cache) {
+build_hmat_cache(table_data, numa_hmat_cache);
+}
+}
+}
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat)
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
index 5f050781e6..6c32f12e78 100644
--- a/hw/acpi/hmat.h
+++ b/hw/acpi/hmat.h
@@ -81,6 +81,23 @@ struct HMAT_LB_Info {
 uint16_tbandwidth[MAX_NODES][MAX_NODES];
 };
 
+struct HMAT_Cache_Info {
+/* The memory proximity domain to which the memory belongs. */
+uint32_tmem_proximity;
+/* Size of memory side cache in bytes. */
+uint64_tsize;
+/* Total cache levels for this memory proximity domain. */
+uint8_t total_levels;
+/* Cache level described in this structure. */
+uint8_t level;
+/* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
+uint8_t associativity;
+/* Write Policy: None/Write Back(WB)/Write Through(WT) */
+uint8_t write_policy;
+/* Cache Line size in bytes. */
+uint16_tline_size;
+};
+
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *nstat);
 
 #endif
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index c0257e936b..d971f5109e 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -33,6 +33,7 @@ typedef struct FWCfgEntry FWCfgEntry;
 typedef struct FWCfgIoState FWCfgIoState;
 typedef struct FWCfgMemState FWCfgMemState;
 typedef struct FWCfgState FWCfgState;
+ty

Re: [Qemu-devel] [PATCH 1/2] migration: add qemu_file_update_rate_transfer interface

2019-07-28 Thread Wei Yang
On Mon, Jul 29, 2019 at 10:32:52AM +0800, Ivan Ren wrote:
>Add qemu_file_update_rate_transfer for just update bytes_xfer for
>speed limitation. This will be used for further migration feature
>such as multifd migration.
>
>Signed-off-by: Ivan Ren 
>---
> migration/qemu-file.c | 5 +
> migration/qemu-file.h | 1 +
> 2 files changed, 6 insertions(+)
>
>diff --git a/migration/qemu-file.c b/migration/qemu-file.c
>index 0431585502..13e7f03f9b 100644
>--- a/migration/qemu-file.c
>+++ b/migration/qemu-file.c
>@@ -615,6 +615,11 @@ void qemu_file_reset_rate_limit(QEMUFile *f)
> f->bytes_xfer = 0;
> }
> 
>+void qemu_file_update_rate_transfer(QEMUFile *f, int64_t len)

Looks good, except the function name. Not good at naming :-)

Reviewed-by: Wei Yang 

>+{
>+f->bytes_xfer += len;
>+}
>+
> void qemu_put_be16(QEMUFile *f, unsigned int v)
> {
> qemu_put_byte(f, v >> 8);
>diff --git a/migration/qemu-file.h b/migration/qemu-file.h
>index 13baf896bd..6145d10aca 100644
>--- a/migration/qemu-file.h
>+++ b/migration/qemu-file.h
>@@ -147,6 +147,7 @@ int qemu_peek_byte(QEMUFile *f, int offset);
> void qemu_file_skip(QEMUFile *f, int size);
> void qemu_update_position(QEMUFile *f, size_t size);
> void qemu_file_reset_rate_limit(QEMUFile *f);
>+void qemu_file_update_rate_transfer(QEMUFile *f, int64_t len);
> void qemu_file_set_rate_limit(QEMUFile *f, int64_t new_rate);
> int64_t qemu_file_get_rate_limit(QEMUFile *f);
> void qemu_file_set_error(QEMUFile *f, int ret);
>-- 
>2.17.2 (Apple Git-113)
>

-- 
Wei Yang
Help you, Help me



[Qemu-devel] [PATCH v8 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)

2019-07-28 Thread Tao Xu
From: Liu Jingqi 

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v8.
---
 hw/acpi/Kconfig   |   5 +++
 hw/acpi/Makefile.objs |   1 +
 hw/acpi/hmat.c| 101 ++
 hw/acpi/hmat.h|  45 +++
 hw/i386/acpi-build.c  |   3 ++
 5 files changed, 155 insertions(+)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 7c59cf900b..039bb99efa 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
 select ACPI_NVDIMM
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
+select ACPI_HMAT
 
 config ACPI_X86_ICH
 bool
@@ -31,3 +32,7 @@ config ACPI_VMGENID
 bool
 default y
 depends on PC
+
+config ACPI_HMAT
+bool
+depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 9bb2101e3b..c05019b059 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 00..abf99b1adc
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,101 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi 
+ *  Tao Xu 
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-141
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
+   int mem_node)
+{
+
+/* Memory Proximity Domain Attributes Structure */
+/* Type */
+build_append_int_noprefix(table_data, 0, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 40, 4);
+/* Flags */
+build_append_int_noprefix(table_data, flags, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Proximity Domain for the Attached Initiator */
+build_append_int_noprefix(table_data, initiator, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, mem_node, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/*
+ * Reserved:
+ * Previously defined as the Start Address of the System Physical
+ * Address Range. Deprecated since ACPI Spec 6.3.
+ */
+build_append_int_noprefix(table_data, 0, 8);
+/*
+ * Reserved:
+ * Previously defined as the Range Length of the region in bytes.
+ * Deprecated since ACPI Spec 6.3.
+ */
+build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaState *nstat)
+{
+uint16_t flags;
+int i;
+
+for (i = 0; i < nstat->num_nodes; i++) {
+flags = 0;
+
+if (nstat->nodes[i].initiator_valid) {
+flags |= HMAT_PROX_INIT_VALID;
+}
+
+build_hmat_mpda(table_data, flags, nstat->nodes[i].initiator, i);
+}
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *l

Re: [Qemu-devel] [PATCH 2/2] migration: add speed limit for multifd migration

2019-07-28 Thread Wei Yang
On Mon, Jul 29, 2019 at 10:32:53AM +0800, Ivan Ren wrote:
>Limit the speed of multifd migration through common speed limitation
>qemu file.
>
>Signed-off-by: Ivan Ren 
>---
> migration/ram.c | 22 --
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
>diff --git a/migration/ram.c b/migration/ram.c
>index 889148dd84..e3fde16776 100644
>--- a/migration/ram.c
>+++ b/migration/ram.c
>@@ -922,7 +922,7 @@ struct {
>  * false.
>  */
> 
>-static int multifd_send_pages(void)
>+static int multifd_send_pages(RAMState *rs)
> {
> int i;
> static int next_channel;
>@@ -954,6 +954,7 @@ static int multifd_send_pages(void)
> multifd_send_state->pages = p->pages;
> p->pages = pages;
> transferred = ((uint64_t) pages->used) * TARGET_PAGE_SIZE + p->packet_len;
>+qemu_file_update_rate_transfer(rs->f, transferred);
> ram_counters.multifd_bytes += transferred;
> ram_counters.transferred += transferred;;
> qemu_mutex_unlock(&p->mutex);
>@@ -962,7 +963,7 @@ static int multifd_send_pages(void)
> return 1;
> }
> 
>-static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
>+static int multifd_queue_page(RAMState *rs, RAMBlock *block, ram_addr_t 
>offset)
> {
> MultiFDPages_t *pages = multifd_send_state->pages;
> 
>@@ -981,12 +982,12 @@ static int multifd_queue_page(RAMBlock *block, 
>ram_addr_t offset)
> }
> }
> 
>-if (multifd_send_pages() < 0) {
>+if (multifd_send_pages(rs) < 0) {
> return -1;
> }
> 
> if (pages->block != block) {
>-return  multifd_queue_page(block, offset);
>+return  multifd_queue_page(rs, block, offset);
> }
> 
> return 1;
>@@ -1054,7 +1055,7 @@ void multifd_save_cleanup(void)
> multifd_send_state = NULL;
> }
> 
>-static void multifd_send_sync_main(void)
>+static void multifd_send_sync_main(RAMState *rs)
> {
> int i;
> 
>@@ -1062,7 +1063,7 @@ static void multifd_send_sync_main(void)
> return;
> }
> if (multifd_send_state->pages->used) {
>-if (multifd_send_pages() < 0) {
>+if (multifd_send_pages(rs) < 0) {
> error_report("%s: multifd_send_pages fail", __func__);
> return;
> }
>@@ -1083,6 +1084,7 @@ static void multifd_send_sync_main(void)
> p->packet_num = multifd_send_state->packet_num++;
> p->flags |= MULTIFD_FLAG_SYNC;
> p->pending_job++;
>+qemu_file_update_rate_transfer(rs->f, p->packet_len);

The original code seems forget to update 

ram_counters.multifd_bytes
ram_counters.transferred

Sounds we need to update these counters here too.

> qemu_mutex_unlock(&p->mutex);
> qemu_sem_post(&p->sem);
> }
>@@ -2079,7 +2081,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus 
>*pss, bool last_stage)
> static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
>  ram_addr_t offset)
> {
>-if (multifd_queue_page(block, offset) < 0) {
>+if (multifd_queue_page(rs, block, offset) < 0) {
> return -1;
> }
> ram_counters.normal++;
>@@ -3482,7 +3484,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> ram_control_before_iterate(f, RAM_CONTROL_SETUP);
> ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> 
>-multifd_send_sync_main();
>+multifd_send_sync_main(*rsp);
> qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> qemu_fflush(f);
> 
>@@ -3570,7 +3572,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
> ram_control_after_iterate(f, RAM_CONTROL_ROUND);
> 
> out:
>-multifd_send_sync_main();
>+multifd_send_sync_main(rs);
> qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> qemu_fflush(f);
> ram_counters.transferred += 8;
>@@ -3629,7 +3631,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> 
> rcu_read_unlock();
> 
>-multifd_send_sync_main();
>+multifd_send_sync_main(rs);
> qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> qemu_fflush(f);
> 
>-- 
>2.17.2 (Apple Git-113)
>

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [for-4.2 PATCH 0/6] Block-related record/replay fixes

2019-07-28 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/156438176555.22071.10523120047318890136.stgit@pasha-Precision-3630-Tower/



Hi,

This series failed build test on s390x host. Please find the details below.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e

echo
echo "=== ENV ==="
env

echo
echo "=== PACKAGES ==="
rpm -qa

echo
echo "=== UNAME ==="
uname -a

CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

  CC  replay/replay-snapshot.o
  CC  replay/replay-net.o
/var/tmp/patchew-tester-tmp-z2gfg7al/src/replay/replay-events.c: In function 
‘replay_bh_schedule_oneshot_event’:
/var/tmp/patchew-tester-tmp-z2gfg7al/src/replay/replay-events.c:141:23: error: 
implicit declaration of function ‘replay_get_current_icount’; did you mean 
‘replay_get_current_step’? [-Werror=implicit-function-declaration]
  141 | uint64_t id = replay_get_current_icount();
  |   ^
  |   replay_get_current_step
/var/tmp/patchew-tester-tmp-z2gfg7al/src/replay/replay-events.c:141:23: error: 
nested extern declaration of ‘replay_get_current_icount’ 
[-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [/var/tmp/patchew-tester-tmp-z2gfg7al/src/rules.mak:69: 
replay/replay-events.o] Error 1
make: *** Waiting for unfinished jobs


The full log is available at
http://patchew.org/logs/156438176555.22071.10523120047318890136.stgit@pasha-Precision-3630-Tower/testing.s390x/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH 1/3] memory-device: not necessary to use goto for the last check

2019-07-28 Thread Igor Mammedov
On Sun, 28 Jul 2019 21:13:02 +0800
Wei Yang  wrote:

> We are already at the last condition check.
> 
> Signed-off-by: Wei Yang 

Reviewed-by: Igor Mammedov 

> ---
>  hw/mem/memory-device.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
> index 5f2c408036..df3261b32a 100644
> --- a/hw/mem/memory-device.c
> +++ b/hw/mem/memory-device.c
> @@ -186,7 +186,6 @@ static uint64_t memory_device_get_free_addr(MachineState 
> *ms,
>  if (!range_contains_range(&as, &new)) {
>  error_setg(errp, "could not find position in guest address space for 
> "
> "memory device - memory fragmented due to alignments");
> -goto out;
>  }
>  out:
>  g_slist_free(list);




Re: [Qemu-devel] [PATCH 2/2] migration: add speed limit for multifd migration

2019-07-28 Thread Ivan Ren
>> if (multifd_send_state->pages->used) {
>>-if (multifd_send_pages() < 0) {
>>+if (multifd_send_pages(rs) < 0) {
>> error_report("%s: multifd_send_pages fail", __func__);
>> return;
>> }
>>@@ -1083,6 +1084,7 @@ static void multifd_send_sync_main(void)
>> p->packet_num = multifd_send_state->packet_num++;
>> p->flags |= MULTIFD_FLAG_SYNC;
>> p->pending_job++;
>>+qemu_file_update_rate_transfer(rs->f, p->packet_len);
>
>The original code seems forget to update
>
>ram_counters.multifd_bytes
>ram_counters.transferred
>
>Sounds we need to update these counters here too.

Yes, Thanks for review
I'll send a new version with a new patch to fix it.

On Mon, Jul 29, 2019 at 2:40 PM Wei Yang 
wrote:

> On Mon, Jul 29, 2019 at 10:32:53AM +0800, Ivan Ren wrote:
> >Limit the speed of multifd migration through common speed limitation
> >qemu file.
> >
> >Signed-off-by: Ivan Ren 
> >---
> > migration/ram.c | 22 --
> > 1 file changed, 12 insertions(+), 10 deletions(-)
> >
> >diff --git a/migration/ram.c b/migration/ram.c
> >index 889148dd84..e3fde16776 100644
> >--- a/migration/ram.c
> >+++ b/migration/ram.c
> >@@ -922,7 +922,7 @@ struct {
> >  * false.
> >  */
> >
> >-static int multifd_send_pages(void)
> >+static int multifd_send_pages(RAMState *rs)
> > {
> > int i;
> > static int next_channel;
> >@@ -954,6 +954,7 @@ static int multifd_send_pages(void)
> > multifd_send_state->pages = p->pages;
> > p->pages = pages;
> > transferred = ((uint64_t) pages->used) * TARGET_PAGE_SIZE +
> p->packet_len;
> >+qemu_file_update_rate_transfer(rs->f, transferred);
> > ram_counters.multifd_bytes += transferred;
> > ram_counters.transferred += transferred;;
> > qemu_mutex_unlock(&p->mutex);
> >@@ -962,7 +963,7 @@ static int multifd_send_pages(void)
> > return 1;
> > }
> >
> >-static int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
> >+static int multifd_queue_page(RAMState *rs, RAMBlock *block, ram_addr_t
> offset)
> > {
> > MultiFDPages_t *pages = multifd_send_state->pages;
> >
> >@@ -981,12 +982,12 @@ static int multifd_queue_page(RAMBlock *block,
> ram_addr_t offset)
> > }
> > }
> >
> >-if (multifd_send_pages() < 0) {
> >+if (multifd_send_pages(rs) < 0) {
> > return -1;
> > }
> >
> > if (pages->block != block) {
> >-return  multifd_queue_page(block, offset);
> >+return  multifd_queue_page(rs, block, offset);
> > }
> >
> > return 1;
> >@@ -1054,7 +1055,7 @@ void multifd_save_cleanup(void)
> > multifd_send_state = NULL;
> > }
> >
> >-static void multifd_send_sync_main(void)
> >+static void multifd_send_sync_main(RAMState *rs)
> > {
> > int i;
> >
> >@@ -1062,7 +1063,7 @@ static void multifd_send_sync_main(void)
> > return;
> > }
> > if (multifd_send_state->pages->used) {
> >-if (multifd_send_pages() < 0) {
> >+if (multifd_send_pages(rs) < 0) {
> > error_report("%s: multifd_send_pages fail", __func__);
> > return;
> > }
> >@@ -1083,6 +1084,7 @@ static void multifd_send_sync_main(void)
> > p->packet_num = multifd_send_state->packet_num++;
> > p->flags |= MULTIFD_FLAG_SYNC;
> > p->pending_job++;
> >+qemu_file_update_rate_transfer(rs->f, p->packet_len);
>
> The original code seems forget to update
>
> ram_counters.multifd_bytes
> ram_counters.transferred
>
> Sounds we need to update these counters here too.
>
> > qemu_mutex_unlock(&p->mutex);
> > qemu_sem_post(&p->sem);
> > }
> >@@ -2079,7 +2081,7 @@ static int ram_save_page(RAMState *rs,
> PageSearchStatus *pss, bool last_stage)
> > static int ram_save_multifd_page(RAMState *rs, RAMBlock *block,
> >  ram_addr_t offset)
> > {
> >-if (multifd_queue_page(block, offset) < 0) {
> >+if (multifd_queue_page(rs, block, offset) < 0) {
> > return -1;
> > }
> > ram_counters.normal++;
> >@@ -3482,7 +3484,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> > ram_control_before_iterate(f, RAM_CONTROL_SETUP);
> > ram_control_after_iterate(f, RAM_CONTROL_SETUP);
> >
> >-multifd_send_sync_main();
> >+multifd_send_sync_main(*rsp);
> > qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > qemu_fflush(f);
> >
> >@@ -3570,7 +3572,7 @@ static int ram_save_iterate(QEMUFile *f, void
> *opaque)
> > ram_control_after_iterate(f, RAM_CONTROL_ROUND);
> >
> > out:
> >-multifd_send_sync_main();
> >+multifd_send_sync_main(rs);
> > qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > qemu_fflush(f);
> > ram_counters.transferred += 8;
> >@@ -3629,7 +3631,7 @@ static int ram_save_complete(QEMUFile *f, void
> *opaque)
> >
> > rcu_read_unlock();
> >
> >-multifd_send_sync_main();
> >+multifd_send_sync_main(rs);
> > qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> > qemu_fflush(f);
>

Re: [Qemu-devel] [PATCH v7 02/11] numa: move numa global variable nb_numa_nodes into MachineState

2019-07-28 Thread Tao Xu

On 7/26/2019 10:17 PM, Eduardo Habkost wrote:

On Fri, Jul 26, 2019 at 03:43:43PM +0200, Igor Mammedov wrote:

On Wed, 24 Jul 2019 15:15:28 -0300
Eduardo Habkost  wrote:


On Wed, Jul 24, 2019 at 05:48:11PM +0200, Igor Mammedov wrote:

On Wed, 24 Jul 2019 12:02:41 -0300
Eduardo Habkost  wrote:
   

On Wed, Jul 24, 2019 at 04:27:21PM +0200, Igor Mammedov wrote:

On Tue, 23 Jul 2019 12:23:57 -0300
Eduardo Habkost  wrote:
   

On Tue, Jul 23, 2019 at 04:56:41PM +0200, Igor Mammedov wrote:

On Tue, 16 Jul 2019 22:51:12 +0800
Tao Xu  wrote:
   

Add struct NumaState in MachineState and move existing numa global
nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable
numa_support into MachineClass to decide which submachines support NUMA.

Suggested-by: Igor Mammedov 
Suggested-by: Eduardo Habkost 
Signed-off-by: Tao Xu 
---

No changes in v7.

Changes in v6:
 - Rebase to upstream, move globals in arm/sbsa-ref and use
   numa_mem_supported
 - When used once or twice in the function, use
   ms->numa_state->num_nodes directly
 - Correct some mistakes
 - Use once monitor_printf in hmp_info_numa
---

[...]

  if (pxb->numa_node != NUMA_NODE_UNASSIGNED &&
-pxb->numa_node >= nb_numa_nodes) {
+pxb->numa_node >= ms->numa_state->num_nodes) {

this will crash if user tries to use device on machine that doesn't support numa
check that numa_state is not NULL before dereferencing


That's exactly why the machine_num_numa_nodes() was created in
v5, but then you asked for its removal.

V4 to more precise.
I dislike small wrappers because they usually doesn't simplify code and make it 
more obscure,
forcing to jump around to see what's really going on.
Like it's implemented in this patch it's obvious what's wrong right away.

In that particular case machine_num_numa_nodes() was also misused since only a 
handful
of places (6) really need NULL check while majority (48) can directly access 
ms->numa_state->num_nodes.
without NULL check.


I strongly disagree, here.  Avoiding a ms->numa_state==NULL check
is pointless optimization,

I see it not as optimization (compiler probably would manage to optimize out 
most of them)
but as rather properly self documented code. Doing check in places where it's
not needed is confusing at best and can mask/introduce later subtle bugs at 
worst.
   

and leads to hard to spot bugs like
the one you saw above.

That one was actually easy to spot because of the way it's written in this 
patch.


When somebody is looking at a line of code containing
"ms->numa_state->num_nodes", how exactly are they supposed to
know if ms->numa_state is already guaranteed to be non-NULL, or
not?

read the code/patch
(at least I don't review just by looking at one line. And less time
I have to spend, on reading extra code and finding answers why it's
written the way it's, the better)

In this patch code touching ms->numa_state, is divided in 2 categories
generic code (memory API, CLI entry point, generic machine call
site for numa specific code, devices, monitor/qmp) and numa aware code
(huma parser and numa aware machines). The later one is majority of
affected code where  ms->numa_state != NULL.

Even after I forget how this works and read code later, it would be
easy to do educated guess/check where NULL check is not need seeing
related code.


It's even easier to not have to check if/when numa_state can be
NULL because the code is safe on either case.

You don't review code by looking at a single line, but you don't
need to make it harder than it is.



With machine_num_numa_nodes() would have to look for answer why we
are doing it (unless we add a comment that check is there for noreason
in most cases and it's exercise for reader to find out where
it it's really need).


Sorry, your justification doesn't make sense to me.  You don't
need to look for any answer at all, if the code makes it not
matter if numa_state is NULL.  Having a single caller with
numa_state==NULL would be enough justification for the check.



I don't see any justification for wrapper this case,
could we stop bikeshedding and just let author to move on with fixing bugs, pls?


The author can move on and decide what to do, as I won't block a
patch only because of presence or absence of the wrapper.



Thank you Eduardo and Igor, I have submit v8 to fix the bugs. Both way 
is Okay for me. And if there are bugs later, I will fix it ASAP.