Re: [PATCH 0/2] target/i386: Some mmx/sse instructions don't require

2022-04-04 Thread Wei Li
Ping..

And the title is target/i386: Some mmx/sse instructions don't require
CR0.TS=0

On Fri, Mar 25, 2022 at 10:55 PM Wei Li  wrote:

> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/427
>
> All instructions decoded by 'gen_see' is assumed to require CRO.TS=0. But
> according to SDM, CRC32 doesn't require it. In fact, EMMS, FMMS and some
> mmx/sse instructions(0F38F[0-F], 0F3AF[0-F]) don't require it.
>
> To solve the problem, first to move EMMS and FMMS out of gen_sse. Then
> instructions in 'gen_sse' require it only when modrm & 0xF0 is false.
>
> Wei Li (2):
>   Move EMMS and FEMMS instructions out of gen_sse
>   Some mmx/sse instructions in 'gen_sse' don't require CRO.TS=0
>
>  target/i386/tcg/translate.c | 45 +
>  1 file changed, 21 insertions(+), 24 deletions(-)
>
> --
> 2.30.2
>
>
>
Thanks.
--
Wei Li


Re: [PATCH v1.1 1/9] qapi: fix example of netdev_add command

2022-04-04 Thread Victor Toso
Hi,

On Mon, Apr 04, 2022 at 08:14:11AM +0200, Markus Armbruster wrote:
> Victor Toso  writes:
> 
> > Example output has the optional member @dnssearch as string type. It
> > should be an array of strings instead. Fix it.
> 
> "of String objects".  Happy to fix this in my tree.

Sure

> 
> >
> > For reference, see NetdevUserOptions.
> >
> > Signed-off-by: Victor Toso 
> > ---
> >  qapi/net.json | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 0d4578bd07..b92f3f5fb4 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -51,7 +51,7 @@
> >  #
> >  # -> { "execute": "netdev_add",
> >  #  "arguments": { "type": "user", "id": "netdev1",
> > -# "dnssearch": "example.org" } }
> > +# "dnssearch": [ { "str": "example.org" } ] } }
> >  # <- { "return": {} }
> >  #
> >  ##
> 
> Preferably with the commit message tweak:
> Reviewed-by: Markus Armbruster 

Thanks,
Victor


signature.asc
Description: PGP signature


Re: [PATCH] [PATCH RFC v2] Implements Backend Program conventions for vhost-user-scsi

2022-04-04 Thread Stefan Hajnoczi
On Mon, 4 Apr 2022 at 00:16, Sakshi Kaushik  wrote:
> I have made the suggested changes and submitted v3.
> But I am not sure how to check this code by running it? When I try to run the 
> .c code I get the error message: 'qemu/osdep.h' no such file or directory.
>
> I have followed the building step without error. Not sure if I'm doing it 
> correctly.

The executable will be in
build/contrib/vhost-user-scsi/vhost-user-scsi after you run 'make'. If
you ran make from the root of the QEMU source tree you can run the
program like this:

  $ build/contrib/vhost-user-scsi/vhost-user-scsi

This will let you check that the new command-line options work and
that the help output is updated.

The simplest way to check that vhost-user-scsi is listening on the
given UNIX domain socket path (--socket-path) or file descriptor
(--fd) is using:

  $ netstat --unix --listening --program

This shows all processes that are listening on a UNIX domain socket.
You can search for the PID of the vhost-user-scsi process to see if
it's listed.

Stefan



Re: [PATCH] target/s390x: Fix the accumulation of ccm in op_icm

2022-04-04 Thread David Hildenbrand
On 01.04.22 21:36, Richard Henderson wrote:
> Coverity rightly reports that 0xff << pos can overflow.
> This would affect the ICMH instruction.
> 
> Fixes: Coverity CID 1487161
> Signed-off-by: Richard Henderson 
> ---
>  target/s390x/tcg/translate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
> index 5acfc0ff9b..ea7baf0832 100644
> --- a/target/s390x/tcg/translate.c
> +++ b/target/s390x/tcg/translate.c
> @@ -2622,7 +2622,7 @@ static DisasJumpType op_icm(DisasContext *s, DisasOps 
> *o)
>  tcg_gen_qemu_ld8u(tmp, o->in2, get_mem_index(s));
>  tcg_gen_addi_i64(o->in2, o->in2, 1);
>  tcg_gen_deposit_i64(o->out, o->out, tmp, pos, 8);
> -ccm |= 0xff << pos;
> +ccm |= 0xffull << pos;
>  }
>  m3 = (m3 << 1) & 0xf;
>  pos -= 8;

Reviewed-by: David Hildenbrand 

-- 
Thanks,

David / dhildenb




Re: [PATCH v5 1/4] qapi/machine.json: Add cluster-id

2022-04-04 Thread Daniel P . Berrangé
On Sun, Apr 03, 2022 at 10:59:50PM +0800, Gavin Shan wrote:
> This adds cluster-id in CPU instance properties, which will be used
> by arm/virt machine. Besides, the cluster-id is also verified or
> dumped in various spots:
> 
>   * hw/core/machine.c::machine_set_cpu_numa_node() to associate
> CPU with its NUMA node.
> 
>   * hw/core/machine.c::machine_numa_finish_cpu_init() to associate
> CPU with NUMA node when no default association isn't provided.
> 
>   * hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump
> cluster-id.
> 
> Signed-off-by: Gavin Shan 
> ---
>  hw/core/machine-hmp-cmds.c |  4 
>  hw/core/machine.c  | 16 
>  qapi/machine.json  |  6 --
>  3 files changed, 24 insertions(+), 2 deletions(-)

Missing changes to hw/core/machine-smp.c similar to 'dies' in that
file.

When 'dies' was added we added a 'dies_supported' flag, so we could
reject use of 'dies' when it was not supported - which is everywhere
except i386 target.

We need the same for 'clusters_supported' machine property since
AFAICT only the arm 'virt' machine is getting supported in this
series.

> 
> diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> index 4e2f319aeb..5cb5eecbfc 100644
> --- a/hw/core/machine-hmp-cmds.c
> +++ b/hw/core/machine-hmp-cmds.c
> @@ -77,6 +77,10 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict 
> *qdict)
>  if (c->has_die_id) {
>  monitor_printf(mon, "die-id: \"%" PRIu64 "\"\n", c->die_id);
>  }
> +if (c->has_cluster_id) {
> +monitor_printf(mon, "cluster-id: \"%" PRIu64 "\"\n",
> +   c->cluster_id);
> +}
>  if (c->has_core_id) {
>  monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", 
> c->core_id);
>  }
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index d856485cb4..8748b64657 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -677,6 +677,11 @@ void machine_set_cpu_numa_node(MachineState *machine,
>  return;
>  }
>  
> +if (props->has_cluster_id && !slot->props.has_cluster_id) {
> +error_setg(errp, "cluster-id is not supported");
> +return;
> +}
> +
>  if (props->has_socket_id && !slot->props.has_socket_id) {
>  error_setg(errp, "socket-id is not supported");
>  return;
> @@ -696,6 +701,11 @@ void machine_set_cpu_numa_node(MachineState *machine,
>  continue;
>  }
>  
> +if (props->has_cluster_id &&
> +props->cluster_id != slot->props.cluster_id) {
> +continue;
> +}
> +
>  if (props->has_die_id && props->die_id != slot->props.die_id) {
>  continue;
>  }
> @@ -990,6 +1000,12 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
>  }
>  g_string_append_printf(s, "die-id: %"PRId64, cpu->props.die_id);
>  }
> +if (cpu->props.has_cluster_id) {
> +if (s->len) {
> +g_string_append_printf(s, ", ");
> +}
> +g_string_append_printf(s, "cluster-id: %"PRId64, 
> cpu->props.cluster_id);
> +}
>  if (cpu->props.has_core_id) {
>  if (s->len) {
>  g_string_append_printf(s, ", ");
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 9c460ec450..ea22b574b0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -868,10 +868,11 @@
>  # @node-id: NUMA node ID the CPU belongs to
>  # @socket-id: socket number within node/board the CPU belongs to
>  # @die-id: die number within socket the CPU belongs to (since 4.1)
> -# @core-id: core number within die the CPU belongs to
> +# @cluster-id: cluster number within die the CPU belongs to
> +# @core-id: core number within cluster/die the CPU belongs to
>  # @thread-id: thread number within core the CPU belongs to
>  #
> -# Note: currently there are 5 properties that could be present
> +# Note: currently there are 6 properties that could be present
>  #   but management should be prepared to pass through other
>  #   properties with device_add command to allow for future
>  #   interface extension. This also requires the filed names to be kept in
> @@ -883,6 +884,7 @@
>'data': { '*node-id': 'int',
>  '*socket-id': 'int',
>  '*die-id': 'int',
> +'*cluster-id': 'int',
>  '*core-id': 'int',
>  '*thread-id': 'int'
>}
> -- 
> 2.23.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v5 2/4] hw/arm/virt: Consider SMP configuration in CPU topology

2022-04-04 Thread Daniel P . Berrangé
On Sun, Apr 03, 2022 at 10:59:51PM +0800, Gavin Shan wrote:
> Currently, the SMP configuration isn't considered when the CPU
> topology is populated. In this case, it's impossible to provide
> the default CPU-to-NUMA mapping or association based on the socket
> ID of the given CPU.
> 
> This takes account of SMP configuration when the CPU topology
> is populated. The die ID for the given CPU isn't assigned since
> it's not supported on arm/virt machine yet.
> 
> Signed-off-by: Gavin Shan 
> ---
>  hw/arm/virt.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d2e5ecd234..3174526730 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2505,6 +2505,7 @@ static const CPUArchIdList 
> *virt_possible_cpu_arch_ids(MachineState *ms)
>  int n;
>  unsigned int max_cpus = ms->smp.max_cpus;
>  VirtMachineState *vms = VIRT_MACHINE(ms);
> +MachineClass *mc = MACHINE_GET_CLASS(vms);
>  
>  if (ms->possible_cpus) {
>  assert(ms->possible_cpus->len == max_cpus);
> @@ -2518,8 +2519,21 @@ static const CPUArchIdList 
> *virt_possible_cpu_arch_ids(MachineState *ms)
>  ms->possible_cpus->cpus[n].type = ms->cpu_type;
>  ms->possible_cpus->cpus[n].arch_id =
>  virt_cpu_mp_affinity(vms, n);
> +
> +assert(!mc->smp_props.dies_supported);
> +ms->possible_cpus->cpus[n].props.has_socket_id = true;
> +ms->possible_cpus->cpus[n].props.socket_id =
> +(n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads)) %
> +ms->smp.sockets;
> +ms->possible_cpus->cpus[n].props.has_cluster_id = true;
> +ms->possible_cpus->cpus[n].props.cluster_id =
> +(n / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters;
> +ms->possible_cpus->cpus[n].props.has_core_id = true;
> +ms->possible_cpus->cpus[n].props.core_id =
> +(n / ms->smp.threads) % ms->smp.cores;
>  ms->possible_cpus->cpus[n].props.has_thread_id = true;
> -ms->possible_cpus->cpus[n].props.thread_id = n;
> +ms->possible_cpus->cpus[n].props.thread_id =
> +n % ms->smp.threads;

Does this need to be conditionalized d behind a machine property, so that
we don't change behaviour of existing machine type versions ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v5 1/4] qapi/machine.json: Add cluster-id

2022-04-04 Thread Daniel P . Berrangé
On Mon, Apr 04, 2022 at 09:37:10AM +0100, Daniel P. Berrangé wrote:
> On Sun, Apr 03, 2022 at 10:59:50PM +0800, Gavin Shan wrote:
> > This adds cluster-id in CPU instance properties, which will be used
> > by arm/virt machine. Besides, the cluster-id is also verified or
> > dumped in various spots:
> > 
> >   * hw/core/machine.c::machine_set_cpu_numa_node() to associate
> > CPU with its NUMA node.
> > 
> >   * hw/core/machine.c::machine_numa_finish_cpu_init() to associate
> > CPU with NUMA node when no default association isn't provided.
> > 
> >   * hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump
> > cluster-id.
> > 
> > Signed-off-by: Gavin Shan 
> > ---
> >  hw/core/machine-hmp-cmds.c |  4 
> >  hw/core/machine.c  | 16 
> >  qapi/machine.json  |  6 --
> >  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> Missing changes to hw/core/machine-smp.c similar to 'dies' in that
> file.
> 
> When 'dies' was added we added a 'dies_supported' flag, so we could
> reject use of 'dies' when it was not supported - which is everywhere
> except i386 target.
> 
> We need the same for 'clusters_supported' machine property since
> AFAICT only the arm 'virt' machine is getting supported in this
> series.

Oh, actually I'm mixing up cluster-id and clusters - the latter is
already supported.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-04 Thread Peter Maydell
On Mon, 4 Apr 2022 at 07:55, Cédric Le Goater  wrote:
>
> On 4/1/22 21:16, Richard Henderson wrote:
> > Coverity warns that we shift a 32-bit value by N, and then
> > accumulate it into a 64-bit type (target_ulong on ppc64).
> >
> > The ccr is always 8 * 4-bit fields, and thus is always a
> > 32-bit quantity; narrow the type to avoid the warning.
> >
> > Fixes: Coverity CID 1487223
> > Signed-off-by: Richard Henderson 
> > ---
> >   linux-user/ppc/signal.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
>
> Queued for ppc-7.0

NB that this is only suppressing a coverity warning, not
correcting any incorrect behaviour, so if you don't have
anything else you were planning to send for 7.0 it could
also wait til 7.1.

thanks
-- PMM



Re: [PATCH v4 0/4] util/thread-pool: Expose minimun and maximum size

2022-04-04 Thread Nicolas Saenz Julienne
On Fri, 2022-04-01 at 11:35 +0200, Nicolas Saenz Julienne wrote:

Subject says 0/4 where is should've been 0/3.

> As discussed on the previous RFC[1] the thread-pool's dynamic thread
> management doesn't play well with real-time and latency sensitive
> systems. This series introduces a set of controls that'll permit
> achieving more deterministic behaviours, for example by fixing the
> pool's size.
> 
> We first introduce a new common interface to event loop configuration by
> moving iothread's already available properties into an abstract class
> called 'EventLooopBackend' and have both 'IOThread' and the newly
> created 'MainLoop' inherit the properties from that class.
> 
> With this new configuration interface in place it's relatively simple to
> introduce new options to fix the even loop's thread pool sizes. The
> resulting QAPI looks like this:
> 
> -object main-loop,id=main-loop,thread-pool-min=1,thread-pool-max=1
> 
> Note that all patches are bisect friendly and pass all the tests.
> 
> [1] 
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20220202175234.656711-1-nsaen...@redhat.com/
> 
> @Stefan I kept your Signed-off-by, since the changes trivial/not
> thread-pool related
> 
> ---
> Changes since v3:
>  - Avoid duplication in qom.json by creating EventLoopBaseProperties.
>  - Fix failures on first compilation due to race between
>event-loop-base.o and qapi header generation.
> 
> Changes since v2:
>  - Get rid of wrong locking/waiting
>  - Fix qapi versioning
>  - Better commit messages
> 
> Changes since v1:
>  - Address all Stefan's comments
>  - Introduce new fix
> 
> Nicolas Saenz Julienne (3):
>   Introduce event-loop-base abstract class
>   util/main-loop: Introduce the main loop into QOM
>   util/event-loop-base: Introduce options to set the thread pool size
> 
>  event-loop-base.c| 140 +++
>  include/block/aio.h  |  10 +++
>  include/block/thread-pool.h  |   3 +
>  include/qemu/main-loop.h |  10 +++
>  include/sysemu/event-loop-base.h |  41 +
>  include/sysemu/iothread.h|   6 +-
>  iothread.c   |  68 +--
>  meson.build  |  26 +++---
>  qapi/qom.json|  40 +++--
>  util/aio-posix.c |   1 +
>  util/async.c |  20 +
>  util/main-loop.c |  65 ++
>  util/thread-pool.c   |  55 +++-
>  13 files changed, 416 insertions(+), 69 deletions(-)
>  create mode 100644 event-loop-base.c
>  create mode 100644 include/sysemu/event-loop-base.h
> 

-- 
Nicolás Sáenz




Re: [PATCH] target/i386: Suppress coverity warning on fsave/frstor

2022-04-04 Thread Damien Hedde



On 4/1/22 20:46, Richard Henderson wrote:

Coverity warns that 14 << data32 may overflow with respect
to the target_ulong to which it is subsequently added.
We know this wasn't true because data32 is in [1,2],
but the suggested fix is perfectly fine.

Fixes: Coverity CID 1487135, 1487256
Signed-off-by: Richard Henderson 
---
  target/i386/tcg/fpu_helper.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index ebf5e73df9..30bc44fcf8 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -2466,7 +2466,7 @@ static void do_fsave(CPUX86State *env, target_ulong ptr, 
int data32,
  
  do_fstenv(env, ptr, data32, retaddr);
  
-ptr += (14 << data32);

+ptr += (target_ulong)14 << data32;
  for (i = 0; i < 8; i++) {
  tmp = ST(i);
  do_fstt(env, tmp, ptr, retaddr);
@@ -2488,7 +2488,7 @@ static void do_frstor(CPUX86State *env, target_ulong ptr, 
int data32,
  int i;
  
  do_fldenv(env, ptr, data32, retaddr);

-ptr += (14 << data32);
+ptr += (target_ulong)14 << data32;
  
  for (i = 0; i < 8; i++) {

  tmp = do_fldt(env, ptr, retaddr);


Reviewed-by: Damien Hedde 



Re: [PATCH] plugins: Assert mmu_idx in range before use in qemu_plugin_get_hwaddr

2022-04-04 Thread Damien Hedde

Reviewed-by: Damien Hedde 

On 4/1/22 21:02, Richard Henderson wrote:

Coverity reports out-of-bound accesses here.  This should be a
false positive due to how the index is decoded from MemOpIdx.

Fixes: Coverity CID 1487201
Signed-off-by: Richard Henderson 
---
  plugins/api.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/plugins/api.c b/plugins/api.c
index 7bf71b189d..2078b16edb 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -289,6 +289,8 @@ struct qemu_plugin_hwaddr 
*qemu_plugin_get_hwaddr(qemu_plugin_meminfo_t info,
  enum qemu_plugin_mem_rw rw = get_plugin_meminfo_rw(info);
  hwaddr_info.is_store = (rw & QEMU_PLUGIN_MEM_W) != 0;
  
+assert(mmu_idx < NB_MMU_MODES);

+
  if (!tlb_plugin_lookup(cpu, vaddr, mmu_idx,
 hwaddr_info.is_store, &hwaddr_info)) {
  error_report("invalid use of qemu_plugin_get_hwaddr");




Re: [PATCH v1 1/9] qmp: Add dump machine type compatible properties

2022-04-04 Thread Maxim Davydov



On 3/30/22 14:03, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:

This patch adds the ability to get all the compat_props of the
corresponding supported machines for their comparison.

Example:
{ "execute" : "query-machines", "arguments" : { "is-full" : true } }

Signed-off-by: Maxim Davydov 
---
  hw/core/machine-qmp-cmds.c  | 25 +++-
  qapi/machine.json   | 58 +++--
  tests/qtest/fuzz/qos_fuzz.c |  2 +-
  3 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 4f4ab30f8c..8f3206ba8d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -74,7 +74,8 @@ CpuInfoFastList *qmp_query_cpus_fast(Error **errp)
  return head;
  }
  -MachineInfoList *qmp_query_machines(Error **errp)
+MachineInfoList *qmp_query_machines(bool has_is_full, bool is_full,
+    Error **errp)
  {
  GSList *el, *machines = object_class_get_list(TYPE_MACHINE, 
false);

  MachineInfoList *mach_list = NULL;
@@ -107,6 +108,28 @@ MachineInfoList *qmp_query_machines(Error **errp)
  info->default_ram_id = g_strdup(mc->default_ram_id);
  info->has_default_ram_id = true;
  }
+    if (has_is_full && is_full && mc->compat_props) {


is_full is guaranteed to be zero when has_is_full is zero. So, it's 
enough to write:


   if (is_full && mc->compat_props) {


+    int i;
+    info->compat_props = NULL;
+    info->has_compat_props = true;
+
+    for (i = 0; i < mc->compat_props->len; i++) {
+    GlobalProperty *mt_prop = 
g_ptr_array_index(mc->compat_props,

+    i);
+    ObjectClass *klass = 
object_class_by_name(mt_prop->driver);

+    CompatProperty *prop;
+
+    prop = g_malloc0(sizeof(*prop));
+    if (klass && object_class_is_abstract(klass)) {
+    prop->abstract = true;
+    }
+    prop->driver = g_strdup(mt_prop->driver);
+    prop->property = g_strdup(mt_prop->property);
+    prop->value = g_strdup(mt_prop->value);
+
+    QAPI_LIST_PREPEND(info->compat_props, prop);
+    }
+    }
    QAPI_LIST_PREPEND(mach_list, info);
  }
diff --git a/qapi/machine.json b/qapi/machine.json
index 42fc68403d..16e961477c 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -130,6 +130,28 @@
  ##
  { 'command': 'query-cpus-fast', 'returns': [ 'CpuInfoFast' ] }
  +##
+# @CompatProperty:
+#
+# Machine type compatible property. It's based on GlobalProperty and 
created

+# for machine type compat properties (see scripts)
+#
+# @driver: name of the driver that has GlobalProperty
+#
+# @abstract: Bool value that shows that property is belonged to 
abstract class

+#
+# @property: global property name
+#
+# @value: global property value
+#
+# Since: 7.0
+##
+{ 'struct': 'CompatProperty',
+  'data': { 'driver': 'str',
+    'abstract': 'bool',
+    'property': 'str',
+    'value': 'str' } }
+
  ##
  # @MachineInfo:
  #
@@ -158,6 +180,9 @@
  #
  # @default-ram-id: the default ID of initial RAM memory backend 
(since 5.2)

  #
+# @compat-props: List of compatible properties that defines machine 
type

+#    (since 7.0)
+#
  # Since: 1.2
  ##
  { 'struct': 'MachineInfo',
@@ -165,18 +190,47 @@
  '*is-default': 'bool', 'cpu-max': 'int',
  'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
  'deprecated': 'bool', '*default-cpu-type': 'str',
-    '*default-ram-id': 'str' } }
+    '*default-ram-id': 'str', '*compat-props': 
['CompatProperty'] } }

    ##
  # @query-machines:
  #
  # Return a list of supported machines
  #
+# @is-full: if true return will contain information about machine type
+#   compatible properties (since 7.0)


Should be 7.1.

Also, maybe call it "compat-props" to be consistent with output and 
with documentation?



+#
  # Returns: a list of MachineInfo
  #
  # Since: 1.2
+#
+# Example:
+#
+# -> { "execute" : "query-machines", "arguments" : { "is-full" : 
true } }

+# <- { "return": [
+#  {
+#  "hotpluggable-cpus": true,
+#  "name": "pc-q35-6.2",
+#  "compat-props": [
+#  {
+#  "abstract": false,
+#  "driver": "virtio-mem",
+#  "property": "unplugged-inaccessible",
+#  "value": "off"
+#   }
+#   ],
+#   "numa-mem-supported": false,
+#   "default-cpu-type": "qemu64-x86_64-cpu",
+#   "cpu-max": 288,
+#   "deprecated": false,
+#   "default-ram-id": "pc.ram"
+#   },
+#   ...
+#    }
  ##
-{ 'command': 'query-machines', '

Re: [PATCH] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-04 Thread Cédric Le Goater

On 4/4/22 10:41, Peter Maydell wrote:

On Mon, 4 Apr 2022 at 07:55, Cédric Le Goater  wrote:


On 4/1/22 21:16, Richard Henderson wrote:

Coverity warns that we shift a 32-bit value by N, and then
accumulate it into a 64-bit type (target_ulong on ppc64).

The ccr is always 8 * 4-bit fields, and thus is always a
32-bit quantity; narrow the type to avoid the warning.

Fixes: Coverity CID 1487223
Signed-off-by: Richard Henderson 
---
   linux-user/ppc/signal.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)


Queued for ppc-7.0


NB that this is only suppressing a coverity warning, not
correcting any incorrect behaviour, so if you don't have
anything else you were planning to send for 7.0 it could
also wait til 7.1.


I have a couple of small fixes in :

  https://github.com/legoater/qemu/commits/ppc-for-upstream

  linux-user/ppc: Narrow type of ccr in save_user_regs
  ppc/pnv: Fix number of registers in the PCIe controller on POWER9
  hw/ppc: free env->tb_env in spapr_unrealize_vcpu()

Nothing critical indeed. So these can wait 7.1 ?

Thanks,

C.



Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Stefan Hajnoczi
On Fri, Apr 01, 2022 at 01:01:53PM +0200, Paolo Bonzini wrote:
> On 4/1/22 10:05, Emanuele Giuseppe Esposito wrote:
> > > The list itself would be used internally to implement the write-side
> > > lock and unlock primitives, but it would not be protected by the above
> > > functions.  So there would be a couple additional functions:
> > > 
> > >    bdrv_graph_list_lock <-> cpu_list_lock
> > >    bdrv_graph_list_unlock <-> cpu_list_unlock
> > 
> > The list would be graph_bdrv_states, why do we need to protect it with a
> > lock? Currently it is protected by BQL, and theoretically only
> > bdrv_graph_wrlock iterates on it. And as we defined in the assertion
> > below, wrlock is always in the main loop too.
> 
> You're right, CPU_FOREACH only appears in start_exclusive; so likewise you
> only need to walk the list in bdrv_graph_wrlock, i.e. only under BQL.
> 
> My thought was that, within the implementation, you'll need a mutex to
> protect has_waiter, and protecting the list with the same mutex made sense
> to me.  But indeed it's not necessary.

What is the relationship between this new API and aio_set_fd_handler()'s
is_external?

A few thoughts:

- The new API doesn't stop more I/O requests from being submitted, it
  just blocks the current coroutine so request processing is deferred.

- In other words, is_external is a flow control API whereas the new API
  queues up request coroutines without notifying the caller.

- The new API still needs to be combined with bdrv_drained_begin/end()
  to ensure in-flight requests are done.

- It's not obvious to me whether the new API obsoletes is_external. I
  think it probably doesn't.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-04 Thread Peter Maydell
On Mon, 4 Apr 2022 at 10:09, Cédric Le Goater  wrote:
>
> On 4/4/22 10:41, Peter Maydell wrote:
> > On Mon, 4 Apr 2022 at 07:55, Cédric Le Goater  wrote:
> >>
> >> On 4/1/22 21:16, Richard Henderson wrote:
> >>> Coverity warns that we shift a 32-bit value by N, and then
> >>> accumulate it into a 64-bit type (target_ulong on ppc64).
> >>>
> >>> The ccr is always 8 * 4-bit fields, and thus is always a
> >>> 32-bit quantity; narrow the type to avoid the warning.
> >>>
> >>> Fixes: Coverity CID 1487223
> >>> Signed-off-by: Richard Henderson 
> >>> ---
> >>>linux-user/ppc/signal.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> Queued for ppc-7.0
> >
> > NB that this is only suppressing a coverity warning, not
> > correcting any incorrect behaviour, so if you don't have
> > anything else you were planning to send for 7.0 it could
> > also wait til 7.1.
>
> I have a couple of small fixes in :
>
>https://github.com/legoater/qemu/commits/ppc-for-upstream
>
>linux-user/ppc: Narrow type of ccr in save_user_regs
>ppc/pnv: Fix number of registers in the PCIe controller on POWER9
>hw/ppc: free env->tb_env in spapr_unrealize_vcpu()
>
> Nothing critical indeed. So these can wait 7.1 ?

Up to you -- they're all small enough to be OK going into
7.0, and the other two are fixing real bugs.

thanks
-- PMM



Re: [PATCH v4 0/4] util/thread-pool: Expose minimun and maximum size

2022-04-04 Thread Stefan Hajnoczi
On Fri, Apr 01, 2022 at 11:35:20AM +0200, Nicolas Saenz Julienne wrote:
> As discussed on the previous RFC[1] the thread-pool's dynamic thread
> management doesn't play well with real-time and latency sensitive
> systems. This series introduces a set of controls that'll permit
> achieving more deterministic behaviours, for example by fixing the
> pool's size.
> 
> We first introduce a new common interface to event loop configuration by
> moving iothread's already available properties into an abstract class
> called 'EventLooopBackend' and have both 'IOThread' and the newly
> created 'MainLoop' inherit the properties from that class.
> 
> With this new configuration interface in place it's relatively simple to
> introduce new options to fix the even loop's thread pool sizes. The
> resulting QAPI looks like this:
> 
> -object main-loop,id=main-loop,thread-pool-min=1,thread-pool-max=1
> 
> Note that all patches are bisect friendly and pass all the tests.
> 
> [1] 
> https://patchwork.ozlabs.org/project/qemu-devel/patch/20220202175234.656711-1-nsaen...@redhat.com/
> 
> @Stefan I kept your Signed-off-by, since the changes trivial/not
> thread-pool related

Looks good to me. I will wait for Markus to review the QAPI schema changes.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

2022-04-04 Thread Łukasz Gieryk
On Thu, Mar 31, 2022 at 02:38:41PM +0200, Igor Mammedov wrote:
> it's unclear what's bing hotpluged and unplugged, it would be better if
> you included QEMU CLI and relevan qmp/monito commands to reproduce it.

Qemu CLI:
-
-device pcie-root-port,slot=0,id=rp0
-device nvme-subsys,id=subsys0
-device 
nvme,id=nvme0,bus=rp0,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,sriov_vq_flexible=2,sriov_vi_flexible=1

Guest OS:
-
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
sudo nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
echo 1 > /sys/bus/pci/devices/:01:00.0/reset
sleep 1
echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
sleep 2
echo 01:00.1 > /sys/bus/pci/drivers/nvme/bind

Qemu monitor:
-
device_del nvme0
 



Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Paolo Bonzini
On Mon, Apr 4, 2022 at 11:25 AM Stefan Hajnoczi  wrote:

> - The new API doesn't stop more I/O requests from being submitted, it
>   just blocks the current coroutine so request processing is deferred.
>

New I/O requests would not complete until the write-side critical section
ends. However they would still be accepted: from the point of view of the
guest, the "consumed" index of the virtio ring would move forward, unlike
bdrv_drained_begin/end().

- In other words, is_external is a flow control API whereas the new API
>   queues up request coroutines without notifying the caller.
>

Yes, I think this is the same I wrote above.

> - The new API still needs to be combined with bdrv_drained_begin/end()
>   to ensure in-flight requests are done.
>

I don't think so, because in-flight requests would take the lock for
reading. The write side would not start until those in-flight requests
release the lock.

- It's not obvious to me whether the new API obsoletes is_external. I think
> it probably doesn't.
>

I agree that it doesn't. This new lock is only protecting ->parents and
->children. bdrv_drained_begin()/end() remains necessary, for example, when
you need to send a request during the drained section. An example is
block_resize.

In addition, bdrv_drained_begin()/end() ensures that the callback of
blk_aio_*() functions has been invoked (see commit 46aaf2a566,
"block-backend: Decrease in_flight only after callback", 2018-09-25).  This
new lock would not ensure that.

As an aside, instead of is_external, QEMU could remove/add the ioeventfd
handler in the blk->dev_ops->drained_begin and blk->dev_ops->drained_end
callbacks respectively. But that's just a code cleanup.

Paolo


Re: [PATCH v6 15/19] vfio-user: handle device interrupts

2022-04-04 Thread Stefan Hajnoczi
On Wed, Mar 30, 2022 at 09:40:42AM +, Thanos Makatos wrote:
> > -Original Message-
> > From: Jag Raman 
> > Sent: 29 March 2022 20:07
> > To: Stefan Hajnoczi 
> > Cc: Alex Williamson ; qemu-devel  > de...@nongnu.org>; Michael S. Tsirkin ; Philippe Mathieu-
> > Daudé ; Paolo Bonzini ; Beraldo
> > Leal ; Daniel P. Berrangé ;
> > edua...@habkost.net; Marcel Apfelbaum ;
> > Eric Blake ; Markus Armbruster ;
> > Juan Quintela ; dgilb...@redhat.com; John Levon
> > ; Thanos Makatos ;
> > Elena Ufimtseva ; John Johnson
> > ; Kanth Ghatraju 
> > Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts
> > 
> > 
> > 
> > > On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi 
> > wrote:
> > >
> > > On Sat, Mar 26, 2022 at 11:47:36PM +, Jag Raman wrote:
> > >>
> > >>
> > >>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi 
> > wrote:
> > >>>
> > >>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> >  Forward remote device's interrupts to the guest
> > 
> >  Signed-off-by: Elena Ufimtseva 
> >  Signed-off-by: John G Johnson 
> >  Signed-off-by: Jagannathan Raman 
> >  ---
> >  include/hw/pci/pci.h  |   6 ++
> >  include/hw/remote/vfio-user-obj.h |   6 ++
> >  hw/pci/msi.c  |  13 +++-
> >  hw/pci/msix.c |  12 +++-
> >  hw/remote/machine.c   |  11 +--
> >  hw/remote/vfio-user-obj.c | 107 ++
> >  stubs/vfio-user-obj.c |   6 ++
> >  MAINTAINERS   |   1 +
> >  hw/remote/trace-events|   1 +
> >  stubs/meson.build |   1 +
> >  10 files changed, 158 insertions(+), 6 deletions(-)
> >  create mode 100644 include/hw/remote/vfio-user-obj.h
> >  create mode 100644 stubs/vfio-user-obj.c
> > 
> >  diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >  index c3f3c90473..d42d526a48 100644
> >  --- a/include/hw/pci/pci.h
> >  +++ b/include/hw/pci/pci.h
> >  @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice
> > *pci_dev,
> >  typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
> > pcibus_t addr, pcibus_t size, int type);
> >  typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> >  +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> >  +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
> > 
> >  typedef struct PCIIORegion {
> > pcibus_t addr; /* current PCI mapping address. -1 means not mapped 
> >  */
> >  @@ -323,6 +325,10 @@ struct PCIDevice {
> > /* Space to store MSIX table & pending bit array */
> > uint8_t *msix_table;
> > uint8_t *msix_pba;
> >  +
> >  +PCIMSINotify *msi_notify;
> >  +PCIMSIxNotify *msix_notify;
> >  +
> > /* MemoryRegion container for msix exclusive BAR setup */
> > MemoryRegion msix_exclusive_bar;
> > /* Memory Regions for MSIX table and pending bit entries. */
> >  diff --git a/include/hw/remote/vfio-user-obj.h 
> >  b/include/hw/remote/vfio-
> > user-obj.h
> >  new file mode 100644
> >  index 00..87ab78b875
> >  --- /dev/null
> >  +++ b/include/hw/remote/vfio-user-obj.h
> >  @@ -0,0 +1,6 @@
> >  +#ifndef VFIO_USER_OBJ_H
> >  +#define VFIO_USER_OBJ_H
> >  +
> >  +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> >  +
> >  +#endif
> >  diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> >  index 47d2b0f33c..93f5e400cc 100644
> >  --- a/hw/pci/msi.c
> >  +++ b/hw/pci/msi.c
> >  @@ -51,6 +51,8 @@
> >  */
> >  bool msi_nonbroken;
> > 
> >  +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> >  +
> >  /* If we get rid of cap allocator, we won't need this. */
> >  static inline uint8_t msi_cap_sizeof(uint16_t flags)
> >  {
> >  @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
> > dev->msi_cap = config_offset;
> > dev->cap_present |= QEMU_PCI_CAP_MSI;
> > 
> >  +dev->msi_notify = pci_msi_notify;
> > >>>
> > >>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
> > >>> callback function should only override the behavior of
> > >>> msi_send_message(), not the entire msi_notify() function.
> > >>>
> > >>> The same applies to MSI-X.
> > >>
> > >> Hi Stefan,
> > >>
> > >> We noticed that the client is handling the masking and unmasking of MSIx
> > >> interrupts.
> > >>
> > >> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
> > >> vfio_msix_vector_release() handles masking operations. The server 
> > >> triggers
> > >> an MSIx interrupt by signaling the eventfd associated with the vector. 
> > >> If the
> > vector
> > >> is unmasked, the interrupt bypasses the client/QEMU and takes this
> > >> path: “server -> KVM -> 

Re: [PATCH v1 0/9] Fix some qapi examples

2022-04-04 Thread Markus Armbruster
Victor Toso  writes:

> Hi,
>
> I did another iteration of adding the examples in the qapi documentation
> as unit tests in another project and found a few that could be updated.
>
> Feel free to cherry-pick them. Happy to rework it if needed.

Queued, thanks!




Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Emanuele Giuseppe Esposito



Am 04/04/2022 um 11:41 schrieb Paolo Bonzini:
> 
> 
> On Mon, Apr 4, 2022 at 11:25 AM Stefan Hajnoczi  > wrote:
> 
> - The new API doesn't stop more I/O requests from being submitted, it
>   just blocks the current coroutine so request processing is deferred.
> 
> 
> New I/O requests would not complete until the write-side critical
> section ends. However they would still be accepted: from the point of
> view of the guest, the "consumed" index of the virtio ring would move
> forward, unlike bdrv_drained_begin/end().
> 
> - In other words, is_external is a flow control API whereas the new API
>   queues up request coroutines without notifying the caller.
> 
> 
> Yes, I think this is the same I wrote above.
> 
> - The new API still needs to be combined with bdrv_drained_begin/end()
>   to ensure in-flight requests are done.
> 
> 
> I don't think so, because in-flight requests would take the lock for
> reading. The write side would not start until those in-flight requests
> release the lock.
> 
> - It's not obvious to me whether the new API obsoletes is_external.
> I think it probably doesn't.
> 
> 
> I agree that it doesn't. This new lock is only protecting ->parents and
> ->children. 

Side note: it will also be used to protect other fields, like
.aio_context I think. I haven't checked if there is something else we
might want to protect that is currently protected by AioContext lock.

At least, I think we are going to use the same lock, right?

Emanuele

bdrv_drained_begin()/end() remains necessary, for example,
> when you need to send a request during the drained section. An example
> is block_resize.
> 
> In addition, bdrv_drained_begin()/end() ensures that the callback of
> blk_aio_*() functions has been invoked (see commit 46aaf2a566,
> "block-backend: Decrease in_flight only after callback", 2018-09-25). 
> This new lock would not ensure that.
> 
> As an aside, instead of is_external, QEMU could remove/add the ioeventfd
> handler in the blk->dev_ops->drained_begin and blk->dev_ops->drained_end
> callbacks respectively. But that's just a code cleanup.
> 
> Paolo




Re: [RFC PATCH 0/5] Removal of AioContext lock, bs->parents and ->children: proof of concept

2022-04-04 Thread Paolo Bonzini
On 4/4/22 11:51, Emanuele Giuseppe Esposito wrote:
>>
>> I agree that it doesn't. This new lock is only protecting ->parents and
>> ->children. 
> Side note: it will also be used to protect other fields, like
> .aio_context I think. I haven't checked if there is something else we
> might want to protect that is currently protected by AioContext lock.
> 
> At least, I think we are going to use the same lock, right?

I have no idea honestly.  It can make sense for anything that is changed
very rarely and read during requests.

.aio_context has the .detach/.attach callbacks and I wonder if there
should be any reason to access it outside the callbacks.  A lot of uses
of .aio_context (for example for aio_bh_new or
aio_bh_schedule_oneshot/replay_bh_schedule_oneshot_event) can, and
perhaps should, be changed to just qemu_get_current_aio_context().  For
multiqueue we probably want the same BlockDriverState to use the
AioContext corresponding to a virtio queue, rather than always the same one.

Paolo




Re: [PATCH v5 1/4] qapi/machine.json: Add cluster-id

2022-04-04 Thread Gavin Shan

Hi Daniel,

On 4/4/22 4:40 PM, Daniel P. Berrangé wrote:

On Mon, Apr 04, 2022 at 09:37:10AM +0100, Daniel P. Berrangé wrote:

On Sun, Apr 03, 2022 at 10:59:50PM +0800, Gavin Shan wrote:

This adds cluster-id in CPU instance properties, which will be used
by arm/virt machine. Besides, the cluster-id is also verified or
dumped in various spots:

   * hw/core/machine.c::machine_set_cpu_numa_node() to associate
 CPU with its NUMA node.

   * hw/core/machine.c::machine_numa_finish_cpu_init() to associate
 CPU with NUMA node when no default association isn't provided.

   * hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump
 cluster-id.

Signed-off-by: Gavin Shan 
---
  hw/core/machine-hmp-cmds.c |  4 
  hw/core/machine.c  | 16 
  qapi/machine.json  |  6 --
  3 files changed, 24 insertions(+), 2 deletions(-)


Missing changes to hw/core/machine-smp.c similar to 'dies' in that
file.

When 'dies' was added we added a 'dies_supported' flag, so we could
reject use of 'dies' when it was not supported - which is everywhere
except i386 target.

We need the same for 'clusters_supported' machine property since
AFAICT only the arm 'virt' machine is getting supported in this
series.


Oh, actually I'm mixing up cluster-id and clusters - the latter is
already supported.



Yeah, @clusters_supported has been existing for a while.

Thanks,
Gavin




Re: [PATCH v5 2/4] hw/arm/virt: Consider SMP configuration in CPU topology

2022-04-04 Thread Gavin Shan

Hi Daniel,

On 4/4/22 4:39 PM, Daniel P. Berrangé wrote:

On Sun, Apr 03, 2022 at 10:59:51PM +0800, Gavin Shan wrote:

Currently, the SMP configuration isn't considered when the CPU
topology is populated. In this case, it's impossible to provide
the default CPU-to-NUMA mapping or association based on the socket
ID of the given CPU.

This takes account of SMP configuration when the CPU topology
is populated. The die ID for the given CPU isn't assigned since
it's not supported on arm/virt machine yet.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d2e5ecd234..3174526730 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2505,6 +2505,7 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
  int n;
  unsigned int max_cpus = ms->smp.max_cpus;
  VirtMachineState *vms = VIRT_MACHINE(ms);
+MachineClass *mc = MACHINE_GET_CLASS(vms);
  
  if (ms->possible_cpus) {

  assert(ms->possible_cpus->len == max_cpus);
@@ -2518,8 +2519,21 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
  ms->possible_cpus->cpus[n].type = ms->cpu_type;
  ms->possible_cpus->cpus[n].arch_id =
  virt_cpu_mp_affinity(vms, n);
+
+assert(!mc->smp_props.dies_supported);
+ms->possible_cpus->cpus[n].props.has_socket_id = true;
+ms->possible_cpus->cpus[n].props.socket_id =
+(n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads)) %
+ms->smp.sockets;
+ms->possible_cpus->cpus[n].props.has_cluster_id = true;
+ms->possible_cpus->cpus[n].props.cluster_id =
+(n / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters;
+ms->possible_cpus->cpus[n].props.has_core_id = true;
+ms->possible_cpus->cpus[n].props.core_id =
+(n / ms->smp.threads) % ms->smp.cores;
  ms->possible_cpus->cpus[n].props.has_thread_id = true;
-ms->possible_cpus->cpus[n].props.thread_id = n;
+ms->possible_cpus->cpus[n].props.thread_id =
+n % ms->smp.threads;


Does this need to be conditionalized d behind a machine property, so that
we don't change behaviour of existing machine type versions ?



I think we probably needn't to do that because the default NUMA node
for the given CPU is returned based on the socket ID in next patch.
The socket ID is calculated in this patch. Otherwise, we will see
CPU topology broken warnings in Linux guest. I think we need to fix
this issue for all machine type versions.

Thanks,
Gavin




Re: [PATCH 1/1] xlnx-bbram: hw/nvram: Fix uninitialized Error *

2022-04-04 Thread Francisco Iglesias
On [2022 Apr 01] Fri 12:06:31, Tong Ho wrote:
> This adds required initialization of Error * variable.
> 
> Signed-off-by: Tong Ho 

Reviewed-by: Francisco Iglesias 

> ---
>  hw/nvram/xlnx-bbram.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/nvram/xlnx-bbram.c b/hw/nvram/xlnx-bbram.c
> index b70828e5bf..6ed32adad9 100644
> --- a/hw/nvram/xlnx-bbram.c
> +++ b/hw/nvram/xlnx-bbram.c
> @@ -89,7 +89,7 @@ static bool bbram_pgm_enabled(XlnxBBRam *s)
>  
>  static void bbram_bdrv_error(XlnxBBRam *s, int rc, gchar *detail)
>  {
> -Error *errp;
> +Error *errp = NULL;
>  
>  error_setg_errno(&errp, -rc, "%s: BBRAM backstore %s failed.",
>   blk_name(s->blk), detail);
> -- 
> 2.25.1
> 
> 



Re: [PATCH v1 2/9] pci: add null-pointer check

2022-04-04 Thread Maxim Davydov



On 3/30/22 14:07, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:

Call pci_bus_get_w64_range can fail with the segmentation fault. For
example, this can happen during attempt to get pci-hole64-end 
immediately

after initialization.


So, immediately after initialization, h->bus is NULL?

The significant bit is, is the value which we calculate without h->bus 
is correct or not? That should be covered by commit message.
For example, object_new_with_class() returns only initialized object 
(after calling instance_init). It means that pci_root_bus_new() in 
q35_host_realize() hasn't been called for returned object and pci->bus 
== NULL. So, if then we try to call q35_host_get_pci_hole64_end() it 
will fail with segmentation fault in the pci_for_each_device_under_bus() 
(d = bus->devices[devfn], but bus == NULL). Similarly for i440fx. I'm 
not sure that it's the correct behavior.
To reproduce this situation, run "{'execute' : 'query-init-properties'}" 
or qmp_query_init_properties() from 8th patch of this series without 
applying fixes for pci-host.
After this fix, the behavior is the similar as if range_is_empty(&w64) 
== True, but without SEGFAULT. Although, we can check flag 
DeviceState.realized to detect unrealized device.




Signed-off-by: Maxim Davydov 
---
  hw/pci-host/i440fx.c | 17 +++--
  hw/pci-host/q35.c    | 17 +++--
  2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index e08716142b..71a114e551 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -158,10 +158,12 @@ static uint64_t 
i440fx_pcihost_get_pci_hole64_start_value(Object *obj)

  PCIHostState *h = PCI_HOST_BRIDGE(obj);
  I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
  Range w64;
-    uint64_t value;
+    uint64_t value = 0;
  -    pci_bus_get_w64_range(h->bus, &w64);
-    value = range_is_empty(&w64) ? 0 : range_lob(&w64);
+    if (h->bus) {
+    pci_bus_get_w64_range(h->bus, &w64);
+    value = range_is_empty(&w64) ? 0 : range_lob(&w64);
+    }
  if (!value && s->pci_hole64_fix) {
  value = pc_pci_hole64_start();
  }
@@ -191,10 +193,13 @@ static void 
i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,

  I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
  uint64_t hole64_start = 
i440fx_pcihost_get_pci_hole64_start_value(obj);

  Range w64;
-    uint64_t value, hole64_end;
+    uint64_t value = 0;
+    uint64_t hole64_end;
  -    pci_bus_get_w64_range(h->bus, &w64);
-    value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
+    if (h->bus) {
+    pci_bus_get_w64_range(h->bus, &w64);
+    value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
+    }
  hole64_end = ROUND_UP(hole64_start + s->pci_hole64_size, 1ULL 
<< 30);

  if (s->pci_hole64_fix && value < hole64_end) {
  value = hole64_end;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index ab5a47aff5..d679fd85ef 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -124,10 +124,12 @@ static uint64_t 
q35_host_get_pci_hole64_start_value(Object *obj)

  PCIHostState *h = PCI_HOST_BRIDGE(obj);
  Q35PCIHost *s = Q35_HOST_DEVICE(obj);
  Range w64;
-    uint64_t value;
+    uint64_t value = 0;
  -    pci_bus_get_w64_range(h->bus, &w64);
-    value = range_is_empty(&w64) ? 0 : range_lob(&w64);
+    if (h->bus) {
+    pci_bus_get_w64_range(h->bus, &w64);
+    value = range_is_empty(&w64) ? 0 : range_lob(&w64);
+    }
  if (!value && s->pci_hole64_fix) {
  value = pc_pci_hole64_start();
  }
@@ -157,10 +159,13 @@ static void q35_host_get_pci_hole64_end(Object 
*obj, Visitor *v,

  Q35PCIHost *s = Q35_HOST_DEVICE(obj);
  uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
  Range w64;
-    uint64_t value, hole64_end;
+    uint64_t value = 0;
+    uint64_t hole64_end;
  -    pci_bus_get_w64_range(h->bus, &w64);
-    value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
+    if (h->bus) {
+    pci_bus_get_w64_range(h->bus, &w64);
+    value = range_is_empty(&w64) ? 0 : range_upb(&w64) + 1;
+    }
  hole64_end = ROUND_UP(hole64_start + s->mch.pci_hole64_size, 
1ULL << 30);

  if (s->pci_hole64_fix && value < hole64_end) {
  value = hole64_end;





--
Best regards,
Maxim Davydov




Re: [PATCH] docs/system/devices/can.rst: correct links to CTU CAN FD IP core documentation.

2022-04-04 Thread Francisco Iglesias
On [2022 Apr 02] Sat 22:45:23, Pavel Pisa wrote:
> Signed-off-by: Pavel Pisa 

Reviewed-by: Francisco Iglesias 

> ---
>  docs/system/devices/can.rst | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/system/devices/can.rst b/docs/system/devices/can.rst
> index 16d72c3ac3..fe37af8223 100644
> --- a/docs/system/devices/can.rst
> +++ b/docs/system/devices/can.rst
> @@ -182,7 +182,7 @@ Links to other resources
>   (5) `GNU/Linux, CAN and CANopen in Real-time Control Applications Slides 
> from LinuxDays 2017 (include updated RTLWS 2015 content) 
> `_
>   (6) `Linux SocketCAN utilities `_
>   (7) `CTU CAN FD project including core VHDL design, Linux driver, test 
> utilities etc. `_
> - (8) `CTU CAN FD Core Datasheet Documentation 
> `_
> - (9) `CTU CAN FD Core System Architecture Documentation 
> `_
> - (10) `CTU CAN FD Driver Documentation 
> `_
> + (8) `CTU CAN FD Core Datasheet Documentation 
> `_
> + (9) `CTU CAN FD Core System Architecture Documentation 
> `_
> + (10) `CTU CAN FD Driver Documentation 
> `_
>   (11) `Integration with PCIe interfacing for Intel/Altera Cyclone IV based 
> board `_
> -- 
> 2.20.1
> 
> 
> 



Re: [PATCH] multifd: Copy pages before compressing them with zlib

2022-04-04 Thread Dr. David Alan Gilbert
* Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> zlib_send_prepare() compresses pages of a running VM. zlib does not
> make any thread-safety guarantees with respect to changing deflate()
> input concurrently with deflate() [1].
> 
> One can observe problems due to this with the IBM zEnterprise Data
> Compression accelerator capable zlib [2]. When the hardware
> acceleration is enabled, migration/multifd/tcp/zlib test fails
> intermittently [3] due to sliding window corruption.
> 
> At the moment this problem occurs only with this accelerator, since
> its architecture explicitly discourages concurrent accesses [4]:
> 
> Page 26-57, "Other Conditions":
> 
> As observed by this CPU, other CPUs, and channel
> programs, references to the parameter block, first,
> second, and third operands may be multiple-access
> references, accesses to these storage locations are
> not necessarily block-concurrent, and the sequence
> of these accesses or references is undefined.
> 
> Still, it might affect other platforms due to a future zlib update.
> Therefore, copy the page being compressed into a private buffer before
> passing it to zlib.

While this might work around the problem; your explanation doesn't quite
fit with the symptoms; or if they do, then you have a separate problem.

The live migration code relies on the fact that the source is running
and changing it's memory as the data is transmitted; however it also
relies on the fact that if this happens the 'dirty' flag is set _after_
those changes causing another round of migration and retransmission of
the (now stable) data.

We don't expect the load of the data for the first page write to be
correct, consistent etc - we just rely on the retransmission to be
correct when the page is stable.

If your compressor hardware is doing something undefined during the
first case that's fine; as long as it works fine in the stable case
where the data isn't changing.

Adding the extra copy is going to slow everyone else dowmn; and since
there's plenty of pthread lockingin those multifd I'm expecting them
to get reasonably defined ordering and thus be safe from multi threading
problems (please correct us if we've actually done something wrong in
the locking there).

IMHO your accelerator when called from a zlib call needs to behave
the same as if it was the software implementation; i.e. if we've got
pthread calls in there that are enforcing ordering then that should be
fine; your accelerator implementation needs to add a barrier of some
type or an internal copy, not penalise everyone else.

Dave



> 
> [1] https://zlib.net/manual.html
> [2] https://github.com/madler/zlib/pull/410
> [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
> [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  migration/multifd-zlib.c | 35 ++-
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 3a7ae44485..b6b22b7d1f 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -27,6 +27,8 @@ struct zlib_data {
>  uint8_t *zbuff;
>  /* size of compressed buffer */
>  uint32_t zbuff_len;
> +/* uncompressed buffer */
> +uint8_t buf[];
>  };
>  
>  /* Multifd zlib compression */
> @@ -43,9 +45,18 @@ struct zlib_data {
>   */
>  static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
>  {
> -struct zlib_data *z = g_new0(struct zlib_data, 1);
> -z_stream *zs = &z->zs;
> +/* This is the maximum size of the compressed buffer */
> +uint32_t zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
> +size_t buf_len = qemu_target_page_size();
> +struct zlib_data *z;
> +z_stream *zs;
>  
> +z = g_try_malloc0(sizeof(struct zlib_data) + buf_len + zbuff_len);
> +if (!z) {
> +error_setg(errp, "multifd %u: out of memory for zlib_data", p->id);
> +return -1;
> +}
> +zs = &z->zs;
>  zs->zalloc = Z_NULL;
>  zs->zfree = Z_NULL;
>  zs->opaque = Z_NULL;
> @@ -54,15 +65,8 @@ static int zlib_send_setup(MultiFDSendParams *p, Error 
> **errp)
>  error_setg(errp, "multifd %u: deflate init failed", p->id);
>  return -1;
>  }
> -/* This is the maxium size of the compressed buffer */
> -z->zbuff_len = compressBound(MULTIFD_PACKET_SIZE);
> -z->zbuff = g_try_malloc(z->zbuff_len);
> -if (!z->zbuff) {
> -deflateEnd(&z->zs);
> -g_free(z);
> -error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
> -return -1;
> -}
> +z->zbuff_len = zbuff_len;
> +z->zbuff = z->buf + buf_len;
>  p->data = z;
>  return 0;
>  }
> @@ -80,7 +84,6 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
> **errp)
>  struct zlib_data *z = p->data;
>  
>  deflateEnd(&z->zs);
> -g_free(z->zbuff);
>  z->zbuff = NULL;
>  g_free(p-

Re: [PATCH v1 3/9] mem: appropriate handling getting mem region

2022-04-04 Thread Maxim Davydov



On 3/30/22 14:27, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:
Attempt to get memory region if the device doesn't have hostmem may 
not be

an error. This can be happen immediately after initialization (getting
value without default one).

Signed-off-by: Maxim Davydov 
---
  hw/i386/sgx-epc.c | 5 -
  hw/mem/nvdimm.c   | 6 ++
  hw/mem/pc-dimm.c  | 5 +
  3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/i386/sgx-epc.c b/hw/i386/sgx-epc.c
index d664829d35..1a4c8acdcc 100644
--- a/hw/i386/sgx-epc.c
+++ b/hw/i386/sgx-epc.c
@@ -121,9 +121,12 @@ static MemoryRegion 
*sgx_epc_md_get_memory_region(MemoryDeviceState *md,

  {
  SGXEPCDevice *epc = SGX_EPC(md);
  HostMemoryBackend *hostmem;
+    DeviceState *dev = DEVICE(epc);
    if (!epc->hostmem) {
-    error_setg(errp, "'" SGX_EPC_MEMDEV_PROP "' property must be 
set");

+    if (dev->realized) {
+    error_setg(errp, "'" SGX_EPC_MEMDEV_PROP "' property 
must be set");

+    }
  return NULL;
  }


I can't judge, is it really and error or not.

But the way you change the logic is not correct, as you change the 
semantics:


Old semantics: on error return NULL and set errp, on success return 
non-NULL and not set errp


New semantics: on error return NULL and set errp, on success return 
anything (may be NULL) and not set errp.


Callers are not prepared to this. For example, look at 
memory_device_unplug:

it does

  mr = mdc->get_memory_region(md, &error_abort);

assume it returns NULL, which is not an error (so we don't crash on 
error_abort)


and then pass mr  to memory_region_del_subregion(), which in turn 
access mr->container, which will crash if mr is NULL.


Most probably the situation I describe is not possible, but I just 
want to illustrate the idea.


Moreover, in QEMU functions which has "Error **errp" argument and 
return pointer are recommended to return NULL on failure and nonNULL 
on success. In other words, return value of function with "Error 
**errp" argument should report success/failure information. And having 
NULL as possible success return value is not recommended, as it's 
ambiguous and leads to bugs (see big comment at start of 
include/qapi/error.h).


So, if it's really needed to change the semantics in such 
not-recommended way, you should check that all callers are OK with it 
and also describe new semantics in a comment near get_memory_region 
declaration. But better is deal with returned error as it is.. What is 
an exact problem you trying to solve with this commit?
I tried to solve the problem with errors from request MemoryRegion (via 
*md_get_memory_region()) that was called immediately after 
object_new_with_class(). But it does seem to change the semantics. 
Perhaps better solution would be to ignore these errors or to add an 
exception to handle the object properties correctly.



  diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 7c7d81..61e77e5476 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -166,9 +166,15 @@ static MemoryRegion 
*nvdimm_md_get_memory_region(MemoryDeviceState *md,

   Error **errp)
  {
  NVDIMMDevice *nvdimm = NVDIMM(md);
+    PCDIMMDevice *dimm = PC_DIMM(nvdimm);
  Error *local_err = NULL;
    if (!nvdimm->nvdimm_mr) {
+    /* Not error if we try get memory region after init */
+    if (!dimm->hostmem) {
+    return NULL;
+    }
+
  nvdimm_prepare_memory_region(nvdimm, &local_err);
  if (local_err) {
  error_propagate(errp, local_err);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index f27e1a11ba..6fd74de97f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -240,6 +240,11 @@ static void 
pc_dimm_md_set_addr(MemoryDeviceState *md, uint64_t addr,
  static MemoryRegion *pc_dimm_md_get_memory_region(MemoryDeviceState 
*md,

    Error **errp)
  {
+    PCDIMMDevice *dimm = PC_DIMM(md);
+    /* Not error if we try get memory region after init */
+    if (!dimm->hostmem) {
+    return NULL;
+    }
  return pc_dimm_get_memory_region(PC_DIMM(md), errp);
  }




--
Best regards,
Maxim Davydov




Re: [PATCH v5 2/4] hw/arm/virt: Consider SMP configuration in CPU topology

2022-04-04 Thread Igor Mammedov
On Mon, 4 Apr 2022 18:48:00 +0800
Gavin Shan  wrote:

> Hi Daniel,
> 
> On 4/4/22 4:39 PM, Daniel P. Berrangé wrote:
> > On Sun, Apr 03, 2022 at 10:59:51PM +0800, Gavin Shan wrote:  
> >> Currently, the SMP configuration isn't considered when the CPU
> >> topology is populated. In this case, it's impossible to provide
> >> the default CPU-to-NUMA mapping or association based on the socket
> >> ID of the given CPU.
> >>
> >> This takes account of SMP configuration when the CPU topology
> >> is populated. The die ID for the given CPU isn't assigned since
> >> it's not supported on arm/virt machine yet.
> >>
> >> Signed-off-by: Gavin Shan 
> >> ---
> >>   hw/arm/virt.c | 16 +++-
> >>   1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index d2e5ecd234..3174526730 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -2505,6 +2505,7 @@ static const CPUArchIdList 
> >> *virt_possible_cpu_arch_ids(MachineState *ms)
> >>   int n;
> >>   unsigned int max_cpus = ms->smp.max_cpus;
> >>   VirtMachineState *vms = VIRT_MACHINE(ms);
> >> +MachineClass *mc = MACHINE_GET_CLASS(vms);
> >>   
> >>   if (ms->possible_cpus) {
> >>   assert(ms->possible_cpus->len == max_cpus);
> >> @@ -2518,8 +2519,21 @@ static const CPUArchIdList 
> >> *virt_possible_cpu_arch_ids(MachineState *ms)
> >>   ms->possible_cpus->cpus[n].type = ms->cpu_type;
> >>   ms->possible_cpus->cpus[n].arch_id =
> >>   virt_cpu_mp_affinity(vms, n);
> >> +
> >> +assert(!mc->smp_props.dies_supported);
> >> +ms->possible_cpus->cpus[n].props.has_socket_id = true;
> >> +ms->possible_cpus->cpus[n].props.socket_id =
> >> +(n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads)) %
> >> +ms->smp.sockets;
> >> +ms->possible_cpus->cpus[n].props.has_cluster_id = true;
> >> +ms->possible_cpus->cpus[n].props.cluster_id =
> >> +(n / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters;
> >> +ms->possible_cpus->cpus[n].props.has_core_id = true;
> >> +ms->possible_cpus->cpus[n].props.core_id =
> >> +(n / ms->smp.threads) % ms->smp.cores;
> >>   ms->possible_cpus->cpus[n].props.has_thread_id = true;
> >> -ms->possible_cpus->cpus[n].props.thread_id = n;
> >> +ms->possible_cpus->cpus[n].props.thread_id =
> >> +n % ms->smp.threads;  
> > 
> > Does this need to be conditionalized d behind a machine property, so that
> > we don't change behaviour of existing machine type versions ?
> >   
> 
> I think we probably needn't to do that because the default NUMA node
> for the given CPU is returned based on the socket ID in next patch.
> The socket ID is calculated in this patch. Otherwise, we will see
> CPU topology broken warnings in Linux guest. I think we need to fix
> this issue for all machine type versions.

Agreed.
Also guest-wise it's ACPI only change, which is 'firmware' part of QEMU,
and by default we don't to version those changes unless we a pressed
into it (i.e the same policy that goes for bundled firmware)

> Thanks,
> Gavin
> 




Re: [PATCH] multifd: Copy pages before compressing them with zlib

2022-04-04 Thread Ilya Leoshkevich
On Mon, 2022-04-04 at 12:20 +0100, Dr. David Alan Gilbert wrote:
> * Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > make any thread-safety guarantees with respect to changing
> > deflate()
> > input concurrently with deflate() [1].
> > 
> > One can observe problems due to this with the IBM zEnterprise Data
> > Compression accelerator capable zlib [2]. When the hardware
> > acceleration is enabled, migration/multifd/tcp/zlib test fails
> > intermittently [3] due to sliding window corruption.
> > 
> > At the moment this problem occurs only with this accelerator, since
> > its architecture explicitly discourages concurrent accesses [4]:
> > 
> >     Page 26-57, "Other Conditions":
> > 
> >     As observed by this CPU, other CPUs, and channel
> >     programs, references to the parameter block, first,
> >     second, and third operands may be multiple-access
> >     references, accesses to these storage locations are
> >     not necessarily block-concurrent, and the sequence
> >     of these accesses or references is undefined.
> > 
> > Still, it might affect other platforms due to a future zlib update.
> > Therefore, copy the page being compressed into a private buffer
> > before
> > passing it to zlib.
> 
> While this might work around the problem; your explanation doesn't
> quite
> fit with the symptoms; or if they do, then you have a separate
> problem.
> 
> The live migration code relies on the fact that the source is running
> and changing it's memory as the data is transmitted; however it also
> relies on the fact that if this happens the 'dirty' flag is set
> _after_
> those changes causing another round of migration and retransmission
> of
> the (now stable) data.
> 
> We don't expect the load of the data for the first page write to be
> correct, consistent etc - we just rely on the retransmission to be
> correct when the page is stable.
> 
> If your compressor hardware is doing something undefined during the
> first case that's fine; as long as it works fine in the stable case
> where the data isn't changing.
> 
> Adding the extra copy is going to slow everyone else dowmn; and since
> there's plenty of pthread lockingin those multifd I'm expecting them
> to get reasonably defined ordering and thus be safe from multi
> threading
> problems (please correct us if we've actually done something wrong in
> the locking there).
> 
> IMHO your accelerator when called from a zlib call needs to behave
> the same as if it was the software implementation; i.e. if we've got
> pthread calls in there that are enforcing ordering then that should
> be
> fine; your accelerator implementation needs to add a barrier of some
> type or an internal copy, not penalise everyone else.
> 
> Dave

The problem with the accelerator is that during the first case the
internal state might end up being corrupted (in particular: what goes
into the deflate stream differs from what goes into the sliding
window). This may affect the data integrity in the second case later
on.

I've been trying to think what to do with that, and of course doing an
internal copy is one option (a barrier won't suffice). However, I
realized that zlib API as documented doesn't guarantee that it's safe
to change input data concurrently with compression. On the other hand,
today's zlib is implemented in a way that tolerates this.

So the open question for me is, whether we should honor zlib
documentation (in which case, I would argue, QEMU needs to be changed)
or say that the behavior of today's zlib implementation is more
important (in which case accelerator code needs to change). I went with
the former for now, but the latter is of course doable as well.



Re: [PATCH 0/3] qapi-schema: support alternates with array type

2022-04-04 Thread Markus Armbruster
Paolo Bonzini  writes:

> As suggested in the review of the statistics subsystem.

Queued for 7.1, thanks!




Re: [PATCH v1 6/9] chardev: add appropriate getting address

2022-04-04 Thread Maxim Davydov



On 3/30/22 14:32, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:

Attempt to get address after initialization shouldn't fail on assert in
the qapi automatically generated code. As a possible solution, it can
return null type.


But at some point this address appears? May be we try to query it too 
early, or we need some more initialization steps?
For example, query address after object_new_with_class(). Without the 
patch it triggers assert(). I tried to implement the same solution as in 
hw/ppc/spapr_drc.c:prop_get_fdt


Isn't it better to report failure, when we try to query things that 
are not yet initialized?
Yes, maybe it should set errp after visit_type_null. And it should be a 
common error for unrealized devices to fix the same problem with 
MemoryRegion, etc.




Signed-off-by: Maxim Davydov 
---
  chardev/char-socket.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index fab2d791d4..f851e3346b 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -33,6 +33,7 @@
  #include "qapi/clone-visitor.h"
  #include "qapi/qapi-visit-sockets.h"
  #include "qemu/yank.h"
+#include "qapi/qmp/qnull.h"
    #include "chardev/char-io.h"
  #include "chardev/char-socket.h"
@@ -1509,6 +1510,14 @@ char_socket_get_addr(Object *obj, Visitor *v, 
const char *name,

  {
  SocketChardev *s = SOCKET_CHARDEV(obj);
  +    QNull *null = NULL;
+
+    /* Return NULL type if getting addr was called after init */
+    if (!s->addr) {
+    visit_type_null(v, NULL, &null, errp);
+    return;
+    }
+
  visit_type_SocketAddress(v, name, &s->addr, errp);
  }




--
Best regards,
Maxim Davydov




Re: [PATCH] multifd: Copy pages before compressing them with zlib

2022-04-04 Thread Daniel P . Berrangé
On Mon, Apr 04, 2022 at 12:20:14PM +0100, Dr. David Alan Gilbert wrote:
> * Ilya Leoshkevich (i...@linux.ibm.com) wrote:
> > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > make any thread-safety guarantees with respect to changing deflate()
> > input concurrently with deflate() [1].
> > 
> > One can observe problems due to this with the IBM zEnterprise Data
> > Compression accelerator capable zlib [2]. When the hardware
> > acceleration is enabled, migration/multifd/tcp/zlib test fails
> > intermittently [3] due to sliding window corruption.
> > 
> > At the moment this problem occurs only with this accelerator, since
> > its architecture explicitly discourages concurrent accesses [4]:
> > 
> > Page 26-57, "Other Conditions":
> > 
> > As observed by this CPU, other CPUs, and channel
> > programs, references to the parameter block, first,
> > second, and third operands may be multiple-access
> > references, accesses to these storage locations are
> > not necessarily block-concurrent, and the sequence
> > of these accesses or references is undefined.
> > 
> > Still, it might affect other platforms due to a future zlib update.
> > Therefore, copy the page being compressed into a private buffer before
> > passing it to zlib.
> 
> While this might work around the problem; your explanation doesn't quite
> fit with the symptoms; or if they do, then you have a separate problem.
> 
> The live migration code relies on the fact that the source is running
> and changing it's memory as the data is transmitted; however it also
> relies on the fact that if this happens the 'dirty' flag is set _after_
> those changes causing another round of migration and retransmission of
> the (now stable) data.
> 
> We don't expect the load of the data for the first page write to be
> correct, consistent etc - we just rely on the retransmission to be
> correct when the page is stable.
> 
> If your compressor hardware is doing something undefined during the
> first case that's fine; as long as it works fine in the stable case
> where the data isn't changing.
> 
> Adding the extra copy is going to slow everyone else dowmn; and since
> there's plenty of pthread lockingin those multifd I'm expecting them
> to get reasonably defined ordering and thus be safe from multi threading
> problems (please correct us if we've actually done something wrong in
> the locking there).
> 
> IMHO your accelerator when called from a zlib call needs to behave
> the same as if it was the software implementation; i.e. if we've got
> pthread calls in there that are enforcing ordering then that should be
> fine; your accelerator implementation needs to add a barrier of some
> type or an internal copy, not penalise everyone else.

It is reasonable to argue that QEMU is relying on undefined behaviour
when invoking zlib in this case, so it isn't clear that the accelerator
impl should be changed, rather than QEMU be changed to follow the zlib
API requirements. 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] multifd: Copy pages before compressing them with zlib

2022-04-04 Thread Juan Quintela
Daniel P. Berrangé  wrote:
> On Mon, Apr 04, 2022 at 12:20:14PM +0100, Dr. David Alan Gilbert wrote:
>> * Ilya Leoshkevich (i...@linux.ibm.com) wrote:
>> > zlib_send_prepare() compresses pages of a running VM. zlib does not
>> > make any thread-safety guarantees with respect to changing deflate()
>> > input concurrently with deflate() [1].
>> > 
>> > One can observe problems due to this with the IBM zEnterprise Data
>> > Compression accelerator capable zlib [2]. When the hardware
>> > acceleration is enabled, migration/multifd/tcp/zlib test fails
>> > intermittently [3] due to sliding window corruption.
>> > 
>> > At the moment this problem occurs only with this accelerator, since
>> > its architecture explicitly discourages concurrent accesses [4]:
>> > 
>> > Page 26-57, "Other Conditions":
>> > 
>> > As observed by this CPU, other CPUs, and channel
>> > programs, references to the parameter block, first,
>> > second, and third operands may be multiple-access
>> > references, accesses to these storage locations are
>> > not necessarily block-concurrent, and the sequence
>> > of these accesses or references is undefined.
>> > 
>> > Still, it might affect other platforms due to a future zlib update.
>> > Therefore, copy the page being compressed into a private buffer before
>> > passing it to zlib.
>> 
>> While this might work around the problem; your explanation doesn't quite
>> fit with the symptoms; or if they do, then you have a separate problem.
>> 
>> The live migration code relies on the fact that the source is running
>> and changing it's memory as the data is transmitted; however it also
>> relies on the fact that if this happens the 'dirty' flag is set _after_
>> those changes causing another round of migration and retransmission of
>> the (now stable) data.
>> 
>> We don't expect the load of the data for the first page write to be
>> correct, consistent etc - we just rely on the retransmission to be
>> correct when the page is stable.
>> 
>> If your compressor hardware is doing something undefined during the
>> first case that's fine; as long as it works fine in the stable case
>> where the data isn't changing.
>> 
>> Adding the extra copy is going to slow everyone else dowmn; and since
>> there's plenty of pthread lockingin those multifd I'm expecting them
>> to get reasonably defined ordering and thus be safe from multi threading
>> problems (please correct us if we've actually done something wrong in
>> the locking there).
>> 
>> IMHO your accelerator when called from a zlib call needs to behave
>> the same as if it was the software implementation; i.e. if we've got
>> pthread calls in there that are enforcing ordering then that should be
>> fine; your accelerator implementation needs to add a barrier of some
>> type or an internal copy, not penalise everyone else.
>
> It is reasonable to argue that QEMU is relying on undefined behaviour
> when invoking zlib in this case, so it isn't clear that the accelerator
> impl should be changed, rather than QEMU be changed to follow the zlib
> API requirements. 

It works on all the other cases.  My vote if need taht is that we add a
zlib-sync or similar method.
zlib already means doing a copy, doing an extra copy will cost too much
on my opinion.

Once that we are here, is there such a requirement for zstd?  In my
testing, zstd was basically always better than zlib (no, I don't
remember the details).

Later, Juan.




[PULL 0/3] ppc queue

2022-04-04 Thread Cédric Le Goater
The following changes since commit bc6ec396d471d9e4aae7e2ff8b72e11da9a97665:

  Merge tag 'pull-request-2022-04-01' of https://gitlab.com/thuth/qemu into 
staging (2022-04-02 09:36:07 +0100)

are available in the Git repository at:

  https://github.com/legoater/qemu/ tags/pull-ppc-20220404

for you to fetch changes up to 0798da8df9fd917515c957ae918d6d979cf5f3fb:

  linux-user/ppc: Narrow type of ccr in save_user_regs (2022-04-04 08:49:06 
+0200)


ppc-7.0 queue:

* Coverity fixes
* Fix for a memory leak issue


Daniel Henrique Barboza (1):
  hw/ppc: free env->tb_env in spapr_unrealize_vcpu()

Frederic Barrat (1):
  ppc/pnv: Fix number of registers in the PCIe controller on POWER9

Richard Henderson (1):
  linux-user/ppc: Narrow type of ccr in save_user_regs

 include/hw/pci-host/pnv_phb4.h | 2 +-
 include/hw/ppc/ppc.h   | 1 +
 hw/ppc/ppc.c   | 7 +++
 hw/ppc/spapr_cpu_core.c| 3 +++
 linux-user/ppc/signal.c| 2 +-
 5 files changed, 13 insertions(+), 2 deletions(-)



[PULL 2/3] ppc/pnv: Fix number of registers in the PCIe controller on POWER9

2022-04-04 Thread Cédric Le Goater
From: Frederic Barrat 

The spec defines 3 registers, even though only index 0 and 2 are valid
on POWER9. The same model is used on POWER10. Register 1 is defined
there but we currently don't use it in skiboot. So we can keep
reporting an error on write.

Reported by Coverity (CID 1487176).

Fixes: 4f9924c4d4cf ("ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge")
Suggested-by: Benjamin Herrenschmidt 
Signed-off-by: Frederic Barrat 
Reviewed-by: Daniel Henrique Barboza 
Message-Id: <20220401091925.770803-1-fbar...@linux.ibm.com>
Signed-off-by: Cédric Le Goater 
---
 include/hw/pci-host/pnv_phb4.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/pci-host/pnv_phb4.h b/include/hw/pci-host/pnv_phb4.h
index b02ecdceaa4c..19dcbd6f8727 100644
--- a/include/hw/pci-host/pnv_phb4.h
+++ b/include/hw/pci-host/pnv_phb4.h
@@ -180,7 +180,7 @@ struct PnvPhb4PecState {
 MemoryRegion nest_regs_mr;
 
 /* PCI registers, excluding per-stack */
-#define PHB4_PEC_PCI_REGS_COUNT 0x2
+#define PHB4_PEC_PCI_REGS_COUNT 0x3
 uint64_t pci_regs[PHB4_PEC_PCI_REGS_COUNT];
 MemoryRegion pci_regs_mr;
 
-- 
2.34.1




[PULL 3/3] linux-user/ppc: Narrow type of ccr in save_user_regs

2022-04-04 Thread Cédric Le Goater
From: Richard Henderson 

Coverity warns that we shift a 32-bit value by N, and then
accumulate it into a 64-bit type (target_ulong on ppc64).

The ccr is always 8 * 4-bit fields, and thus is always a
32-bit quantity; narrow the type to avoid the warning.

Fixes: Coverity CID 1487223
Signed-off-by: Richard Henderson 
Reviewed-by: Cédric Le Goater 
Message-Id: <20220401191643.330393-1-richard.hender...@linaro.org>
Signed-off-by: Cédric Le Goater 
---
 linux-user/ppc/signal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index ec0b9c0df3da..ce5a4682cdfd 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -229,7 +229,7 @@ static void save_user_regs(CPUPPCState *env, struct 
target_mcontext *frame)
 {
 target_ulong msr = env->msr;
 int i;
-target_ulong ccr = 0;
+uint32_t ccr = 0;
 
 /* In general, the kernel attempts to be intelligent about what it
needs to save for Altivec/FP/SPE registers.  We don't care that
-- 
2.34.1




Re: [PATCH v2 5/7] block/block-copy: block_copy(): add timeout_ns parameter

2022-04-04 Thread Hanna Reitz

On 01.04.22 18:08, Vladimir Sementsov-Ogievskiy wrote:

01.04.2022 16:16, Hanna Reitz wrote:

On 01.04.22 11:19, Vladimir Sementsov-Ogievskiy wrote:

Add possibility to limit block_copy() call in time. To be used in the
next commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-copy.c | 26 +++---
  block/copy-before-write.c  |  2 +-
  include/block/block-copy.h |  2 +-
  3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index ec46775ea5..b47cb188dd 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c


[...]

@@ -894,12 +902,16 @@ int coroutine_fn block_copy(BlockCopyState *s, 
int64_t start, int64_t bytes,

  .max_workers = BLOCK_COPY_MAX_WORKERS,
  };
-    return block_copy_common(&call_state);
-}
+    ret = qemu_co_timeout(block_copy_async_co_entry, call_state, 
timeout_ns,

+  g_free);


A direct path for timeout_ns == 0 might still be nice to have.


+    if (ret < 0) {
+    /* Timeout. call_state will be freed by running coroutine. */


Maybe assert(ret == -ETIMEDOUT);?


OK




+    return ret;


If I’m right in understanding how qemu_co_timeout() works, 
block_copy_common() will continue to run here.  Shouldn’t we at least 
cancel it by setting call_state->cancelled to true?


Agree



(Besides this, I think that letting block_copy_common() running in 
the background should be OK.  I’m not sure what the implications are 
if we do cancel the call here, while on-cbw-error is 
break-guest-write, though.  Should be fine, I guess, because 
block_copy_common() will still correctly keep track of what it has 
successfully copied and what it hasn’t?)


Hmm. I now think, that we should at least wait for such cancelled 
background requests before block_copy_state_free in cbw_close(). But 
in "[PATCH v5 00/45] Transactional block-graph modifying API" I want 
to detach children from CBW filter before calling .close().. So, 
possible solution is to wait for all cancelled requests on 
.bdrv_co_drain_begin().


Or alternatively, may be just increase bs->in_flight for CBW filter 
for each background cancelled request? And decrease when it finish. 
For this we should add a kind of callback to be called when timed-out 
coroutine entry finish.


in_flight sounds good to me.  That would automatically work for 
draining, right?





[PULL 1/3] hw/ppc: free env->tb_env in spapr_unrealize_vcpu()

2022-04-04 Thread Cédric Le Goater
From: Daniel Henrique Barboza 

The timebase is allocated during spapr_realize_vcpu() and it's not
freed. This results in memory leaks when doing vcpu unplugs:

==636935==
==636935== 144 (96 direct, 48 indirect) bytes in 1 blocks are definitely lost 
in loss record 6
,461 of 8,135
==636935==at 0x4897468: calloc (vg_replace_malloc.c:760)
==636935==by 0x5077213: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6400.4)
==636935==by 0x507757F: g_malloc0_n (in /usr/lib64/libglib-2.0.so.0.6400.4)
==636935==by 0x93C3FB: cpu_ppc_tb_init (ppc.c:1066)
==636935==by 0x97BC2B: spapr_realize_vcpu (spapr_cpu_core.c:268)
==636935==by 0x97C01F: spapr_cpu_core_realize (spapr_cpu_core.c:337)
==636935==by 0xD4626F: device_set_realized (qdev.c:531)
==636935==by 0xD55273: property_set_bool (object.c:2273)
==636935==by 0xD523DF: object_property_set (object.c:1408)
==636935==by 0xD588B7: object_property_set_qobject (qom-qobject.c:28)
==636935==by 0xD52897: object_property_set_bool (object.c:1477)
==636935==by 0xD4579B: qdev_realize (qdev.c:333)
==636935==

This patch adds a cpu_ppc_tb_free() helper in hw/ppc/ppc.c to allow us
to free the timebase. This leak is then solved by calling
cpu_ppc_tb_free() in spapr_unrealize_vcpu().

Fixes: 6f4b5c3ec590 ("spapr: CPU hot unplug support")
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20220329124545.529145-2-danielhb...@gmail.com>
Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/ppc.h| 1 +
 hw/ppc/ppc.c| 7 +++
 hw/ppc/spapr_cpu_core.c | 3 +++
 3 files changed, 11 insertions(+)

diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index b0ba4bd6b978..364f165b4b56 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -54,6 +54,7 @@ struct ppc_tb_t {
 
 uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
 clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
+void cpu_ppc_tb_free(CPUPPCState *env);
 void cpu_ppc_hdecr_init(CPUPPCState *env);
 void cpu_ppc_hdecr_exit(CPUPPCState *env);
 
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index faa02d6710c9..fea70df45e69 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1083,6 +1083,13 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t 
freq)
 return &cpu_ppc_set_tb_clk;
 }
 
+void cpu_ppc_tb_free(CPUPPCState *env)
+{
+timer_free(env->tb_env->decr_timer);
+timer_free(env->tb_env->hdecr_timer);
+g_free(env->tb_env);
+}
+
 /* cpu_ppc_hdecr_init may be used if the timer is not used by HDEC emulation */
 void cpu_ppc_hdecr_init(CPUPPCState *env)
 {
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ed847139602f..8a4861f45a2a 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -189,10 +189,13 @@ static const VMStateDescription vmstate_spapr_cpu_state = 
{
 
 static void spapr_unrealize_vcpu(PowerPCCPU *cpu, SpaprCpuCore *sc)
 {
+CPUPPCState *env = &cpu->env;
+
 if (!sc->pre_3_0_migration) {
 vmstate_unregister(NULL, &vmstate_spapr_cpu_state, cpu->machine_data);
 }
 spapr_irq_cpu_intc_destroy(SPAPR_MACHINE(qdev_get_machine()), cpu);
+cpu_ppc_tb_free(env);
 qdev_unrealize(DEVICE(cpu));
 }
 
-- 
2.34.1




Re: [PATCH 2/4] virtio-ccw: move vhost_ccw_scsi to a separate file

2022-04-04 Thread Cornelia Huck
On Mon, Mar 28 2022, Paolo Bonzini  wrote:

> Remove unecessary use of #ifdef CONFIG_VHOST_SCSI, instead just use a
> separate file and a separate rule in meson.build.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  hw/s390x/meson.build   |  1 +
>  hw/s390x/vhost-scsi-ccw.c  | 64 ++
>  hw/s390x/virtio-ccw-scsi.c | 47 
>  3 files changed, 65 insertions(+), 47 deletions(-)
>  create mode 100644 hw/s390x/vhost-scsi-ccw.c
>

> diff --git a/hw/s390x/vhost-scsi-ccw.c b/hw/s390x/vhost-scsi-ccw.c

As Eric already noted, please add an entry in MAINTAINERS under
virtio-ccw for this file.

> new file mode 100644
> index 00..b68d1c
> --- /dev/null
> +++ b/hw/s390x/vhost-scsi-ccw.c
> @@ -0,0 +1,64 @@
> +/*
> + * vhost ccw scsi implementation
> + *
> + * Copyright 2012, 2015 IBM Corp.
> + * Author(s): Cornelia Huck 

That old copyright notice gets copied around a lot; what I find funny
here is that you actually introduced the device in the first place :)
(commit ccf6916c843edd30ea4ecfaaac68faa865529c97)

(I believe we really can't do any better, and I probably did touch this
while still wearing my IBM hat.)

> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or (at
> + * your option) any later version. See the COPYING file in the top-level
> + * directory.
> + */




Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-04 Thread Quentin Perret
On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote:
> On Fri, Apr 1, 2022, at 7:59 AM, Quentin Perret wrote:
> > On Thursday 31 Mar 2022 at 09:04:56 (-0700), Andy Lutomirski wrote:
> 
> 
> > To answer your original question about memory 'conversion', the key
> > thing is that the pKVM hypervisor controls the stage-2 page-tables for
> > everyone in the system, all guests as well as the host. As such, a page
> > 'conversion' is nothing more than a permission change in the relevant
> > page-tables.
> >
> 
> So I can see two different ways to approach this.
> 
> One is that you split the whole address space in half and, just like SEV and 
> TDX, allocate one bit to indicate the shared/private status of a page.  This 
> makes it work a lot like SEV and TDX.
>
> The other is to have shared and private pages be distinguished only by their 
> hypercall history and the (protected) page tables.  This saves some address 
> space and some page table allocations, but it opens some cans of worms too.  
> In particular, the guest and the hypervisor need to coordinate, in a way that 
> the guest can trust, to ensure that the guest's idea of which pages are 
> private match the host's.  This model seems a bit harder to support nicely 
> with the private memory fd model, but not necessarily impossible.

Right. Perhaps one thing I should clarify as well: pKVM (as opposed to
TDX) has only _one_ page-table per guest, and it is controllex by the
hypervisor only. So the hypervisor needs to be involved for both shared
and private mappings. As such, shared pages have relatively similar
constraints when it comes to host mm stuff --  we can't migrate shared
pages or swap them out without getting the hypervisor involved.

> Also, what are you trying to accomplish by having the host userspace mmap 
> private pages?

What I would really like to have is non-destructive in-place conversions
of pages. mmap-ing the pages that have been shared back felt like a good
fit for the private=>shared conversion, but in fact I'm not all that
opinionated about the API as long as the behaviour and the performance
are there. Happy to look into alternatives.

FWIW, there are a couple of reasons why I'd like to have in-place
conversions:

 - one goal of pKVM is to migrate some things away from the Arm
   Trustzone environment (e.g. DRM and the likes) and into protected VMs
   instead. This will give Linux a fighting chance to defend itself
   against these things -- they currently have access to _all_ memory.
   And transitioning pages between Linux and Trustzone (donations and
   shares) is fast and non-destructive, so we really do not want pKVM to
   regress by requiring the hypervisor to memcpy things;

 - it can be very useful for protected VMs to do shared=>private
   conversions. Think of a VM receiving some data from the host in a
   shared buffer, and then it wants to operate on that buffer without
   risking to leak confidential informations in a transient state. In
   that case the most logical thing to do is to convert the buffer back
   to private, do whatever needs to be done on that buffer (decrypting a
   frame, ...), and then share it back with the host to consume it;

 - similar to the previous point, a protected VM might want to
   temporarily turn a buffer private to avoid ToCToU issues;

 - once we're able to do device assignment to protected VMs, this might
   allow DMA-ing to a private buffer, and make it shared later w/o
   bouncing.

And there is probably more.

IIUC, the private fd proposal as it stands requires shared and private
pages to come from entirely distinct places. So it's not entirely clear
to me how any of the above could be supported without having the
hypervisor memcpy the data during conversions, which I really don't want
to do for performance reasons.

> Is the idea that multiple guest could share the same page until such time as 
> one of them tries to write to it?

That would certainly be possible to implement in the pKVM
environment with the right tracking, so I think it is worth considering
as a future goal.

Thanks,
Quentin



[PATCH v9 00/45] CXl 2.0 emulation Support

2022-04-04 Thread Jonathan Cameron via
CI passing both with the full series and at appropriate points
for a partial series merge if desired (at end of each section
tests are introduced)
https://gitlab.com/jic23/qemu/-/pipelines/508396913
Possible partial sets:
1-15 (end with the test of the pxb-cxl host bridge)
16-22 (end with the test for root port and type3 device)
23-39 (end with tests on x86 pc for CFMWS including BIOS table updates)
40-41 (arm64 virt support + simple test case)
42 (documentation - we could pull this forwards to before the arm support)
43-45 (switch support)

Note the gitlab branch also has additional patches on top of these
that will form the part of future postings (PCIe DOE, CDAT,
serial number support and improved fidelity of emulation)
Several people have asked about contributing additional features.
As those come in I'll apply them on top of this series and handle
rebases etc as necessary whilst we seek to get this first set
of patches upstream.

Changes since v8:
 Thanks to Adam Manzanares, Alison Schofield and Mark Cave-Ayland
 for review.
For reference v8 thread at:
https://lore.kernel.org/qemu-devel/20220318150635.24600-1-jonathan.came...@huawei.com/
 
 - Fix crash when no hostmem region provided (from CI)
 - Fix a mid series build bug (from chasing that CI issue)
 - (various patches) Switch the various struct cxl_dvsec_* to typdefs
   CXLDVSECDeviceGPF etc. This reduces line lengths in a patch to add
   write masks for PCI config space that will be part of a follow up
   to this series.
 - (various) Switch away from old style initializers and associated
   renames (Mark) 
 - (patch 2, various) Use sizeof() or local size variable rather than
   hard coding division by 4 or 8 when indexing into register arrays (Adam)
 - (patch 2) Add comment for strange write mask CXL_RAS_UNC_ERR_SEVERITY (Adam)
 - (patch 2) Fix wrong mask for COR_ERR
 - (patch 2) Add a comment explaining less than obvious fact we can use
   a contrived order of capabilities to allow a single number to represent
   which ones should be enabled. (Adam)
 - (patch 2) Wrong version number for RAS cap header (Adam)
 - (patch 2) Wrong space left for HDM decoders (Adam)
 - (patch 2) Fix field of cxl_dvsec_port_extensions to be
   alt_prefetch_limit_high (Adam)
 - (patch 2) Add CXLDVSECDeviceGPF (noticed as part of follow up series prep)
 - (patch 3) Improve docs around the large ASCI art figure (Adam)
 - (patch 3) Rename CXL_DEVICE_REGISTERS_* to CXL_DEVICE_STATUS_REGISTERS *
   to match the specification (Adam)
 - (patch 3) Extra references to the specification (Adam)
 - (patch 3) Rename a few fields in CXL_DEV_BG_CMD_STS to more closely match
   the specification (Adam)
 - (patch 17) Drop stale ifdef (Mark)
 - (patch 19) Fix wrong value of part_info->nex_pmem so it now matches
   what the spec requires (Alison)
 - (patch 27) Fix docs to not mention OptsVisitor, to be more detailed
   on what sizes are accepted, provid more detail on what id means and
   update version number (Mark)
 - (patch 27) Use loc_save()/loc_pop() to improve printed error (Mark)
 - (patch 27) Rename config function (Mark)
 - (patch 32) Fix address_space cleanup and move other parts of
   instance_finalize() to pc->exit() to balance what is in pc->realize()
   (Mark)
 - (various) Minor typos and formatting cleanup observed whilst preparing
   series.
Some discussion occurred on allow for volatile memory support rather than
just PMEM. That is postponed to a future patch set. Also some discussion
on future work coordination.

Mark's suggestion of using PCI BDF for naming unfortunately doesn't
work as they are not constant (or indeed enumerated at all in some cases)

I'm resisting the urge to have this series continue to grow with
additional features on the basis it is already huge and what we have
here is useful + functional.

Updated background info:

Looking in particular for:
* Review of the PCI interactions
* x86 and ARM machine interactions (particularly the memory maps)
* Review of the interleaving approach - is the basic idea
  acceptable?
* Review of the command line interface.
* CXL related review welcome but much of that got reviewed
  in earlier versions and hasn't changed substantially.

TODOs:

* Volatile memory devices (easy but it's more code so left for now).
* Hotplug?  May not need much but it's not tested yet!
* More tests and tighter verification that values written to hardware
  are actually valid - stuff that real hardware would check.
* Testing, testing and more testing.  I have been running a basic
  set of ARM and x86 tests on this, but there is always room for
  more tests and greater automation.
* CFMWS flags as requested by Ben.
* Parititioning support - ability to change the balance of volatile
  and non volatile memory on demand.
* Trace points as suggested my Mark to help with debugging memory
  interleaving setup.

Why do we want QEMU emulation of CXL?

As Ben stated in V3, QEMU support has been critical to getting OS
software written given lack of ava

[PATCH v9 02/45] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 hw/Kconfig |   1 +
 hw/cxl/Kconfig |   3 +
 hw/cxl/cxl-component-utils.c   | 225 +
 hw/cxl/meson.build |   4 +
 hw/meson.build |   1 +
 include/hw/cxl/cxl.h   |  16 +++
 include/hw/cxl/cxl_component.h | 197 +
 include/hw/cxl/cxl_pci.h   | 146 +
 8 files changed, 593 insertions(+)

diff --git a/hw/Kconfig b/hw/Kconfig
index ad20cce0a9..50e0952889 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -6,6 +6,7 @@ source audio/Kconfig
 source block/Kconfig
 source char/Kconfig
 source core/Kconfig
+source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
 source gpio/Kconfig
diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
new file mode 100644
index 00..8e67519b16
--- /dev/null
+++ b/hw/cxl/Kconfig
@@ -0,0 +1,3 @@
+config CXL
+bool
+default y if PCI_EXPRESS
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
new file mode 100644
index 00..22e52cef17
--- /dev/null
+++ b/hw/cxl/cxl-component-utils.c
@@ -0,0 +1,225 @@
+/*
+ * CXL Utility library for components
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+
+static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return 0;
+}
+
+if (cregs->special_ops && cregs->special_ops->read) {
+return cregs->special_ops->read(cxl_cstate, offset, size);
+} else {
+return cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)];
+}
+}
+
+static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
+unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return;
+}
+if (cregs->special_ops && cregs->special_ops->write) {
+cregs->special_ops->write(cxl_cstate, offset, value, size);
+} else {
+cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)] = value;
+}
+}
+
+/*
+ * 8.2.3
+ *   The access restrictions specified in Section 8.2.2 also apply to CXL 2.0
+ *   Component Registers.
+ *
+ * 8.2.2
+ *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
+ *   reads are not permitted.
+ *   • A 64 bit register shall be accessed as a 8 Bytes quantity. Partial
+ *   reads are not permitted.
+ *
+ * As of the spec defined today, only 4 byte registers exist.
+ */
+static const MemoryRegionOps cache_mem_ops = {
+.read = cxl_cache_mem_read_reg,
+.write = cxl_cache_mem_write_reg,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
+void cxl_component_register_block_init(Object *obj,
+   CXLComponentState *cxl_cstate,
+   const char *type)
+{
+ComponentRegisters *

Re: [PATCH v8 04/46] hw/cxl/device: Introduce a CXL device (8.2.8)

2022-04-04 Thread Adam Manzanares
On Fri, Apr 01, 2022 at 02:30:34PM +0100, Jonathan Cameron wrote:
> On Thu, 31 Mar 2022 22:13:20 +
> Adam Manzanares  wrote:
> 
> > On Wed, Mar 30, 2022 at 06:48:48PM +0100, Jonathan Cameron wrote:
> > > On Tue, 29 Mar 2022 18:13:59 +
> > > Adam Manzanares  wrote:
> > >   
> > > > On Fri, Mar 18, 2022 at 03:05:53PM +, Jonathan Cameron wrote:  
> > > > > From: Ben Widawsky 
> > > > > 
> > > > > A CXL device is a type of CXL component. Conceptually, a CXL device
> > > > > would be a leaf node in a CXL topology. From an emulation perspective,
> > > > > CXL devices are the most complex and so the actual implementation is
> > > > > reserved for discrete commits.
> > > > > 
> > > > > This new device type is specifically catered towards the eventual
> > > > > implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
> > > > > specification.
> > > > > 
> > > > > Signed-off-by: Ben Widawsky 
> > > > > Signed-off-by: Jonathan Cameron 
> > > > > Reviewed-by: Alex Bennée   
> > > 
> > > ...
> > >   
> > > > > diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> > > > > new file mode 100644
> > > > > index 00..b2416e45bf
> > > > > --- /dev/null
> > > > > +++ b/include/hw/cxl/cxl_device.h
> > > > > @@ -0,0 +1,165 @@
> > > > > +/*
> > > > > + * QEMU CXL Devices
> > > > > + *
> > > > > + * Copyright (c) 2020 Intel
> > > > > + *
> > > > > + * This work is licensed under the terms of the GNU GPL, version 2. 
> > > > > See the
> > > > > + * COPYING file in the top-level directory.
> > > > > + */
> > > > > +
> > > > > +#ifndef CXL_DEVICE_H
> > > > > +#define CXL_DEVICE_H
> > > > > +
> > > > > +#include "hw/register.h"
> > > > > +
> > > > > +/*
> > > > > + * The following is how a CXL device's MMIO space is laid out. The 
> > > > > only
> > > > > + * requirement from the spec is that the capabilities array and the 
> > > > > capability
> > > > > + * headers start at offset 0 and are contiguously packed. The 
> > > > > headers themselves
> > > > > + * provide offsets to the register fields. For this emulation, 
> > > > > registers will
> > > > > + * start at offset 0x80 (m == 0x80). No secondary mailbox is 
> > > > > implemented which
> > > > > + * means that n = m + sizeof(mailbox registers) + sizeof(device 
> > > > > registers).
> > > > 
> > > > What is n here, the start offset of the mailbox registers, this 
> > > > question is 
> > > > based on the figure below?  
> > > 
> > > I'll expand on this to say
> > > 
> > > means that the offset of the start of the mailbox payload (n) is given by
> > > n = m + sizeof
> > > 
> > > Which means the diagram below is wrong as should align with top
> > > of mailbox registers.
> > >   
> > > >   
> > > > > + *
> > > > > + * This is roughly described in 8.2.8 Figure 138 of the CXL 2.0 spec 
> > > > >  
> > > I'm going drop this comment as that figure appears unrelated to me.
> > >   
> > > > > + *
> > > > > + *   +-+
> > > > > + *   | |
> > > > > + *   |Memory Device Registers  |
> > > > > + *   | |
> > > > > + * n + PAYLOAD_SIZE_MAX  ---
> > > > > + *  ^| |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || Mailbox Payload |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  || |
> > > > > + *  |---
> > > > > + *  ||   Mailbox Registers |
> > > > > + *  || |
> > > > > + *  n---
> > > > > + *  ^| |
> > > > > + *  ||Device Registers |
> > > > > + *  || |
> > > > > + *  m-->
> > > > > + *  ^|  Memory Device Capability Header|
> > > > > + *  |---
> > > > > + *  || Mailbox Capability Header   |
> > > > > + *  |-- 
> > > > > + *  || Device Capability Header|
> > > > > + *  |---
> > > > > + *  || 

[PATCH v9 01/45] hw/pci/cxl: Add a CXL component type (interface)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 hw/pci/pci.c | 10 ++
 include/hw/pci/pci.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dae9119bfe..a7f5c43587 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -201,6 +201,11 @@ static const TypeInfo pci_bus_info = {
 .class_init = pci_bus_class_init,
 };
 
+static const TypeInfo cxl_interface_info = {
+.name  = INTERFACE_CXL_DEVICE,
+.parent= TYPE_INTERFACE,
+};
+
 static const TypeInfo pcie_interface_info = {
 .name  = INTERFACE_PCIE_DEVICE,
 .parent= TYPE_INTERFACE,
@@ -2182,6 +2187,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+if (object_class_dynamic_cast(klass, INTERFACE_CXL_DEVICE)) {
+pci_dev->cap_present |= QEMU_PCIE_CAP_CXL;
+}
+
 pci_dev = do_pci_register_device(pci_dev,
  object_get_typename(OBJECT(qdev)),
  pci_dev->devfn, errp);
@@ -2938,6 +2947,7 @@ static void pci_register_types(void)
 type_register_static(&pci_bus_info);
 type_register_static(&pcie_bus_info);
 type_register_static(&conventional_pci_interface_info);
+type_register_static(&cxl_interface_info);
 type_register_static(&pcie_interface_info);
 type_register_static(&pci_device_type_info);
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 3a32b8dd40..98f0d1b844 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -194,6 +194,8 @@ enum {
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
 #define QEMU_PCIE_EXTCAP_INIT_BITNR 9
 QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
+#define QEMU_PCIE_CXL_BITNR 10
+QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
@@ -201,6 +203,12 @@ typedef struct PCIDeviceClass PCIDeviceClass;
 DECLARE_OBJ_CHECKERS(PCIDevice, PCIDeviceClass,
  PCI_DEVICE, TYPE_PCI_DEVICE)
 
+/*
+ * Implemented by devices that can be plugged on CXL buses. In the spec, this 
is
+ * actually a "CXL Component, but we name it device to match the PCI naming.
+ */
+#define INTERFACE_CXL_DEVICE "cxl-device"
+
 /* Implemented by devices that can be plugged on PCI Express buses */
 #define INTERFACE_PCIE_DEVICE "pci-express-device"
 
-- 
2.32.0




[PATCH v9 06/45] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 122 ++-
 hw/cxl/cxl-mailbox-utils.c  | 164 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl.h|   3 +
 include/hw/cxl/cxl_device.h |  19 -
 5 files changed, 307 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 241f9f82e3..f6c3e0f095 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -44,6 +44,108 @@ static uint64_t dev_reg_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
+static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+switch (size) {
+case 1:
+return cxl_dstate->mbox_reg_state[offset];
+case 2:
+return cxl_dstate->mbox_reg_state16[offset / size];
+case 4:
+return cxl_dstate->mbox_reg_state32[offset / size];
+case 8:
+return cxl_dstate->mbox_reg_state64[offset / size];
+default:
+g_assert_not_reached();
+}
+}
+
+static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CTRL:
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_CAP:
+/* RO register */
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 32-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+reg_state[offset / sizeof(*reg_state)] = value;
+}
+
+static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CMD:
+break;
+case A_CXL_DEV_BG_CMD_STS:
+/* BG not supported */
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_STS:
+/* Read only register, will get updated by the state machine */
+return;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 64-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+
+reg_state[offset / sizeof(*reg_state)] = value;
+}
+
+static void mailbox_reg_write(void *opaque, hwaddr offset, uint64_t value,
+  unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+memcpy(cxl_dstate->mbox_reg_state + offset, &value, size);
+return;
+}
+
+switch (size) {
+case 4:
+mailbox_mem_writel(cxl_dstate->mbox_reg_state32, offset, value);
+break;
+case 8:
+mailbox_mem_writeq(cxl_dstate->mbox_reg_state64, offset, value);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (ARRAY_FIELD_EX32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CTRL,
+ DOORBELL)) {
+cxl_process_mailbox(cxl_dstate);
+}
+}
+
+static const MemoryRegionOps mailbox_ops = {
+.read = mailbox_reg_read,
+.write = mailbox_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps dev_ops = {
 .read = dev_reg_read,
 .write = NULL, /* status register is read only */
@@ -84,20 +186,33 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "cap-array", CXL_CAPS_SIZE);
 memory_region_init_io(&cxl_dstate->device, obj, &dev_ops, cxl_dstate,
   "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
+memory_region_init_io(&cxl_dstate->mailbox, obj, &mailbox_ops, cxl_dstate,
+  "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
 
 memory_region_add_subregion(&cxl_dstate->device_registers, 0,
 &cxl_dstate->caps);
 memory_region_add_subregion(&cxl_dstate->device_registers,
 CXL_DEVICE_STATUS_REGISTERS_OFFSET,
 &cxl_dstate->device);
+memory_region_add_subregion(&cxl_dstate->device_registers,
+CXL_MAILBOX_REGI

[PATCH v9 03/45] MAINTAINERS: Add entry for Compute Express Link Emulation

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

The CXL emulation will be jointly maintained by Ben Widawsky
and Jonathan Cameron.  Broken out as a separate patch
to improve visibility.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef7..1b09419977 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2544,6 +2544,13 @@ F: qapi/block*.json
 F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
+Compute Express Link
+M: Ben Widawsky 
+M: Jonathan Cameron 
+S: Supported
+F: hw/cxl/
+F: include/hw/cxl/
+
 Dirty Bitmaps
 M: Eric Blake 
 M: Vladimir Sementsov-Ogievskiy 
-- 
2.32.0




[PATCH v9 05/45] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 109 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  31 +-
 3 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
new file mode 100644
index 00..241f9f82e3
--- /dev/null
+++ b/hw/cxl/cxl-device-utils.c
@@ -0,0 +1,109 @@
+/*
+ * CXL Utility library for devices
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Device registers have no restrictions per the spec, and so fall back to the
+ * default memory mapped register rules in 8.2:
+ *   Software shall use CXL.io Memory Read and Write to access memory mapped
+ *   register defined in this section. Unless otherwise specified, software
+ *   shall restrict the accesses width based on the following:
+ *   • A 32 bit register shall be accessed as a 1 Byte, 2 Bytes or 4 Bytes
+ * quantity.
+ *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8
+ * Bytes
+ *   • The address shall be a multiple of the access width, e.g. when
+ * accessing a register as a 4 Byte quantity, the address shall be
+ * multiple of 4.
+ *   • The accesses shall map to contiguous bytes.If these rules are not
+ * followed, the behavior is undefined
+ */
+
+static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (size == 4) {
+return cxl_dstate->caps_reg_state32[offset / 
sizeof(*cxl_dstate->caps_reg_state32)];
+} else {
+return cxl_dstate->caps_reg_state64[offset / 
sizeof(*cxl_dstate->caps_reg_state64)];
+}
+}
+
+static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+return 0;
+}
+
+static const MemoryRegionOps dev_ops = {
+.read = dev_reg_read,
+.write = NULL, /* status register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
+static const MemoryRegionOps caps_ops = {
+.read = caps_reg_read,
+.write = NULL, /* caps registers are read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
+void cxl_device_register_block_init(Object *obj, CXLDeviceState *cxl_dstate)
+{
+/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec */
+memory_region_init(&cxl_dstate->device_registers, obj, "device-registers",
+   pow2ceil(CXL_MMIO_SIZE));
+
+memory_region_init_io(&cxl_dstate->caps, obj, &caps_ops, cxl_dstate,
+  "cap-array", CXL_CAPS_SIZE);
+memory_region_init_io(&cxl_dstate->device, obj, &dev_ops, cxl_dstate,
+  "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
+
+memory_region_add_subregion(&cxl_dstate->device_registers, 0,
+&cxl_dstate->caps);
+memory_region_add_subregion(&cxl_dstate->device_registers,
+CXL_DEVICE_STATUS_REGISTERS_OFFSET,
+&cxl_dstate->device);
+}
+
+static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+{
+uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
+const int cap_count = 1;
+
+/* CXL Device Capabilities Array Register */
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_VERSION, 1);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_COUNT, cap_count);
+
+cxl_device_cap_init(cxl_dstate, DEVICE_STATUS, 1);
+device_reg_init_common(cxl_dstate);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 3231b5de1e..dd7c6f8e5a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -1,4 +1,5 @@
 softmmu_ss.add(when: 'CONFIG_CXL',
if_true: files(
'cxl-component-utils.c',
+   'cxl-device-utils.c',
))
diff --git a/include/hw/cxl/cxl_device.h b/include/

[PATCH v9 07/45] hw/cxl/device: Add memory device utilities

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 38 -
 include/hw/cxl/cxl_device.h | 21 +---
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index f6c3e0f095..687759b301 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -131,6 +131,31 @@ static void mailbox_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 }
 
+static uint64_t mdev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t retval = 0;
+
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MEDIA_STATUS, 1);
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MBOX_READY, 1);
+
+return retval;
+}
+
+static const MemoryRegionOps mdev_ops = {
+.read = mdev_reg_read,
+.write = NULL, /* memory device register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 8,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps mailbox_ops = {
 .read = mailbox_reg_read,
 .write = mailbox_reg_write,
@@ -188,6 +213,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "device-status", CXL_DEVICE_STATUS_REGISTERS_LENGTH);
 memory_region_init_io(&cxl_dstate->mailbox, obj, &mailbox_ops, cxl_dstate,
   "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
+memory_region_init_io(&cxl_dstate->memory_device, obj, &mdev_ops,
+  cxl_dstate, "memory device caps",
+  CXL_MEMORY_DEVICE_REGISTERS_LENGTH);
 
 memory_region_add_subregion(&cxl_dstate->device_registers, 0,
 &cxl_dstate->caps);
@@ -197,6 +225,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
 memory_region_add_subregion(&cxl_dstate->device_registers,
 CXL_MAILBOX_REGISTERS_OFFSET,
 &cxl_dstate->mailbox);
+memory_region_add_subregion(&cxl_dstate->device_registers,
+CXL_MEMORY_DEVICE_REGISTERS_OFFSET,
+&cxl_dstate->memory_device);
 }
 
 static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
@@ -209,10 +240,12 @@ static void mailbox_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->payload_size = CXL_MAILBOX_MAX_PAYLOAD_SIZE;
 }
 
+static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
 void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 {
 uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
-const int cap_count = 2;
+const int cap_count = 3;
 
 /* CXL Device Capabilities Array Register */
 ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
@@ -225,5 +258,8 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2);
 mailbox_reg_init_common(cxl_dstate);
 
+cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
+memdev_reg_init_common(cxl_dstate);
+
 assert(cxl_initialize_mailbox(cxl_dstate) == 0);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 35489f635a..954205653e 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -72,15 +72,20 @@
 #define CXL_MAILBOX_REGISTERS_LENGTH \
 (CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
 
-#define CXL_MMIO_SIZE   \
-(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \
- CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_OFFSET \
+(CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8
+
+#define CXL_MMIO_SIZE   \
+(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_STATUS_REGISTERS_LENGTH + \
+ CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
 
 typedef struct cxl_device_state {
 MemoryRegion device_registers;
 
 /* mmio for device capabilities array - 8.2.8.2 */
 MemoryRegion device;
+MemoryRegion memory_device;
 struct {
 MemoryRegion caps;
 union {
@@ -153,6

[PATCH v9 09/45] hw/cxl/device: Timestamp implementation (8.2.9.3)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Errata F4 to CXL 2.0 clarified the meaning of the timer as the
sum of the value set with the timestamp set command and the number
of nano seconds since it was last set.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c  | 42 +
 include/hw/cxl/cxl_device.h |  6 ++
 2 files changed, 48 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index fb1f53f48e..4584aa31f7 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -44,6 +44,9 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+TIMESTAMP   = 0x03,
+#define GET   0x0
+#define SET   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -106,9 +109,46 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.3.1 */
+static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+uint64_t time, delta;
+uint64_t final_time = 0;
+
+if (cxl_dstate->timestamp.set) {
+/* First find the delta from the last time the host set the time. */
+time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+delta = time - cxl_dstate->timestamp.last_set;
+final_time = cxl_dstate->timestamp.host_set + delta;
+}
+
+/* Then adjust the actual time */
+stq_le_p(cmd->payload, final_time);
+*len = 8;
+
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.3.2 */
+static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+cxl_dstate->timestamp.set = true;
+cxl_dstate->timestamp.last_set = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+cxl_dstate->timestamp.host_set = le64_to_cpu(*(uint64_t *)cmd->payload);
+
+*len = 0;
+return CXL_MBOX_SUCCESS;
+}
+
 static QemuUUID cel_uuid;
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
 static struct cxl_cmd cxl_cmd_set[256][256] = {
@@ -120,6 +160,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
+[TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 954205653e..797a22ddb4 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -111,6 +111,12 @@ typedef struct cxl_device_state {
 size_t cel_size;
 };
 
+struct {
+bool set;
+uint64_t last_set;
+uint64_t host_set;
+} timestamp;
+
 /* memory region for persistent memory, HDM */
 uint64_t pmem_size;
 } CXLDeviceState;
-- 
2.32.0




[PATCH v9 04/45] hw/cxl/device: Introduce a CXL device (8.2.8)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Reviewed by: Adam Manzanares 
---
 include/hw/cxl/cxl.h|   1 +
 include/hw/cxl/cxl_device.h | 166 
 2 files changed, 167 insertions(+)

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 8c738c7a2b..b9d1ac3fad 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -12,5 +12,6 @@
 
 #include "cxl_pci.h"
 #include "cxl_component.h"
+#include "cxl_device.h"
 
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
new file mode 100644
index 00..9513aaac77
--- /dev/null
+++ b/include/hw/cxl/cxl_device.h
@@ -0,0 +1,166 @@
+/*
+ * QEMU CXL Devices
+ *
+ * Copyright (c) 2020 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_DEVICE_H
+#define CXL_DEVICE_H
+
+#include "hw/register.h"
+
+/*
+ * The following is how a CXL device's Memory Device registers are laid out.
+ * The only requirement from the spec is that the capabilities array and the
+ * capability headers start at offset 0 and are contiguously packed. The 
headers
+ * themselves provide offsets to the register fields. For this emulation, the
+ * actual registers  * will start at offset 0x80 (m == 0x80). No secondary
+ * mailbox is implemented which means that the offset of the start of the
+ * mailbox payload (n) is given by
+ * n = m + sizeof(mailbox registers) + sizeof(device registers).
+ *
+ *   +-+
+ *   | |
+ *   |Memory Device Registers  |
+ *   | |
+ * n + PAYLOAD_SIZE_MAX  ---
+ *  ^| |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  || Mailbox Payload |
+ *  || |
+ *  || |
+ *  || |
+ *  n---
+ *  ^|   Mailbox Registers |
+ *  || |
+ *  |---
+ *  || |
+ *  ||Device Registers |
+ *  || |
+ *  m-->
+ *  ^|  Memory Device Capability Header|
+ *  |---
+ *  || Mailbox Capability Header   |
+ *  |---
+ *  || Device Capability Header|
+ *  |---
+ *  || Device Cap Array Register   |
+ *  0+-+
+ *
+ */
+
+#define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
+#define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
+#define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */
+
+#define CXL_DEVICE_STATUS_REGISTERS_OFFSET 0x80 /* Read comment above */
+#define CXL_DEVICE_STATUS_REGISTERS_LENGTH 0x8 /* 8.2.8.3.1 */
+
+#define CXL_MAILBOX_REGISTERS_OFFSET \
+(CXL_DEVICE_STATUS_REGISTERS_OFFSET + CXL_DEVICE_STATUS_REGISTERS_LENGTH)
+#define CXL_MAILBOX_REGISTERS_SIZE 0x20 /* 8.2.8.4, Figure 139 */
+#define CXL_MAILBOX_PAYLOAD_SHIFT 11
+#define CXL_MAILBOX_MAX_PAYLOAD_SIZE (1 << CXL_MAILBOX_PAYLOAD_SHIFT)
+#define CXL_MAILBOX_REGISTERS_LENGTH \
+(CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
+
+typedef struct cxl_device_state {
+MemoryRegion device_registers;
+
+/* mmio for device capabilities array - 8.2.8.2 */
+MemoryRegion device;
+MemoryRegion caps;
+
+/* mmio for the mailbox registers 8.2.8.4 */
+MemoryRegion mailbox;
+
+/* memory region for persistent memory, HDM */
+uint64_t pmem_size;
+} CXLDeviceState;
+
+/* Initialize the register block for a device */
+void cxl_device_register_block_init(Object *obj, 

[PATCH v9 10/45] hw/cxl/device: Add log commands (8.2.9.4) + CEL

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4584aa31f7..db473135c7 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -47,6 +47,9 @@ enum {
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
+LOGS= 0x04,
+#define GET_SUPPORTED 0x0
+#define GET_LOG   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -147,6 +150,70 @@ static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
 
 static QemuUUID cel_uuid;
 
+/* 8.2.9.4.1 */
+static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint16_t entries;
+uint8_t rsvd[6];
+struct {
+QemuUUID uuid;
+uint32_t size;
+} log_entries[1];
+} QEMU_PACKED *supported_logs = (void *)cmd->payload;
+QEMU_BUILD_BUG_ON(sizeof(*supported_logs) != 0x1c);
+
+supported_logs->entries = 1;
+supported_logs->log_entries[0].uuid = cel_uuid;
+supported_logs->log_entries[0].size = 4 * cxl_dstate->cel_size;
+
+*len = sizeof(*supported_logs);
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.4.2 */
+static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+QemuUUID uuid;
+uint32_t offset;
+uint32_t length;
+} QEMU_PACKED QEMU_ALIGNED(16) *get_log = (void *)cmd->payload;
+
+/*
+ * 8.2.9.4.2
+ *   The device shall return Invalid Parameter if the Offset or Length
+ *   fields attempt to access beyond the size of the log as reported by Get
+ *   Supported Logs.
+ *
+ * XXX: Spec is wrong, "Invalid Parameter" isn't a thing.
+ * XXX: Spec doesn't address incorrect UUID incorrectness.
+ *
+ * The CEL buffer is large enough to fit all commands in the emulation, so
+ * the only possible failure would be if the mailbox itself isn't big
+ * enough.
+ */
+if (get_log->offset + get_log->length > cxl_dstate->payload_size) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+if (!qemu_uuid_is_equal(&get_log->uuid, &cel_uuid)) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+/* Store off everything to local variables so we can wipe out the payload 
*/
+*len = get_log->length;
+
+memmove(cmd->payload, cxl_dstate->cel_log + get_log->offset,
+   get_log->length);
+
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -162,6 +229,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
+[LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
+[LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.32.0




[PATCH v9 12/45] hw/pci/cxl: Create a CXL bus type

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 9 -
 include/hw/pci/pci_bus.h| 7 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index d4514227a8..a6caa1e7b5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,7 +24,7 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
-enum BusType { PCI, PCIE };
+enum BusType { PCI, PCIE, CXL };
 
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
@@ -35,6 +35,10 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
  TYPE_PXB_PCIE_BUS)
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
+ TYPE_PXB_CXL_BUS)
+
 struct PXBBus {
 /*< private >*/
 PCIBus parent_obj;
@@ -251,6 +255,9 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 ds = qdev_new(TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+} else if (type == CXL) {
+bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
+bus->flags |= PCI_BUS_CXL;
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..eb94e7e85c 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -24,6 +24,8 @@ enum PCIBusFlags {
 PCI_BUS_IS_ROOT = 0x0001,
 /* PCIe extended configuration space is accessible on this bus */
 PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
+/* This is a CXL Type BUS */
+PCI_BUS_CXL = 0x0004,
 };
 
 struct PCIBus {
@@ -53,6 +55,11 @@ struct PCIBus {
 Notifier machine_done;
 };
 
+static inline bool pci_bus_is_cxl(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_CXL);
+}
+
 static inline bool pci_bus_is_root(PCIBus *bus)
 {
 return !!(bus->flags & PCI_BUS_IS_ROOT);
-- 
2.32.0




[PATCH v9 08/45] hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-mailbox-utils.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 2557f41f61..fb1f53f48e 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -38,6 +38,14 @@
  *  a register interface that already deals with it.
  */
 
+enum {
+EVENTS  = 0x01,
+#define GET_RECORDS   0x0
+#define CLEAR_RECORDS   0x1
+#define GET_INTERRUPT_POLICY   0x2
+#define SET_INTERRUPT_POLICY   0x3
+};
+
 /* 8.2.8.4.5.1 Command Return Codes */
 typedef enum {
 CXL_MBOX_SUCCESS = 0x0,
@@ -93,9 +101,26 @@ struct cxl_cmd {
 return CXL_MBOX_SUCCESS;  \
 }
 
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
+DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
+DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
+
 static QemuUUID cel_uuid;
 
-static struct cxl_cmd cxl_cmd_set[256][256] = {};
+#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_LOG_CHANGE (1 << 4)
+
+static struct cxl_cmd cxl_cmd_set[256][256] = {
+[EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
+cmd_events_get_records, 1, 0 },
+[EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
+cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+[EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
+cmd_events_get_interrupt_policy, 0, 0 },
+[EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
+cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+};
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
-- 
2.32.0




[PATCH v9 11/45] hw/pxb: Use a type for realizing expanders

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index de932286b5..d4514227a8 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,6 +24,8 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
+enum BusType { PCI, PCIE };
+
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
@@ -221,7 +223,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
0;
 }
 
-static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
+static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
+   Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
 DeviceState *ds, *bds = NULL;
@@ -246,7 +249,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 }
 
 ds = qdev_new(TYPE_PXB_HOST);
-if (pcie) {
+if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
@@ -295,7 +298,7 @@ static void pxb_dev_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pxb_dev_realize_common(dev, false, errp);
+pxb_dev_realize_common(dev, PCI, errp);
 }
 
 static void pxb_dev_exitfn(PCIDevice *pci_dev)
@@ -348,7 +351,7 @@ static void pxb_pcie_dev_realize(PCIDevice *dev, Error 
**errp)
 return;
 }
 
-pxb_dev_realize_common(dev, true, errp);
+pxb_dev_realize_common(dev, PCIE, errp);
 }
 
 static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data)
-- 
2.32.0




[PATCH v9 18/45] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation.
Whilst the implementation currently does no validity checks on the
encoder set up, future work will add sanity checking specific to
the type of cxl component.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/mem/cxl_type3.c | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 329a6ea2a9..5c93fbbd9b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -50,6 +50,48 @@ static void build_dvsecs(CXLType3Dev *ct3d)
GPF_DEVICE_DVSEC_REVID, dvsec);
 }
 
+static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
+{
+ComponentRegisters *cregs = &ct3d->cxl_cstate.crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+
+assert(which == 0);
+
+/* TODO: Sanity checks that the decoder is possible */
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+
+static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+int which_hdm = -1;
+
+assert(size == 4);
+g_assert(offset <= CXL2_COMPONENT_CM_REGION_SIZE);
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+which_hdm = 0;
+break;
+default:
+break;
+}
+
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+hdm_decoder_commit(ct3d, which_hdm);
+}
+}
+
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 MemoryRegion *mr;
@@ -93,6 +135,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ct3d->cxl_cstate.pdev = pci_dev;
 build_dvsecs(ct3d);
 
+regs->special_ops = g_new0(MemoryRegionOps, 1);
+regs->special_ops->write = ct3d_reg_write;
+
 cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate,
   TYPE_CXL_TYPE3);
 
@@ -107,6 +152,15 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  &ct3d->cxl_dstate.device_registers);
 }
 
+static void ct3_exit(PCIDevice *pci_dev)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(pci_dev);
+CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
+ComponentRegisters *regs = &cxl_cstate->crb;
+
+g_free(regs->special_ops);
+}
+
 static void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CXL_TYPE3(dev);
@@ -128,6 +182,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
 pc->realize = ct3_realize;
+pc->exit = ct3_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
 pc->device_id = 0xd93; /* LVF for now */
-- 
2.32.0




[PATCH v9 14/45] hw/pxb: Allow creation of a CXL PXB (host bridge)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Also necessary is to add an exception to scripts/device-crash-test
similar to that for exiting pxb as both must created on a PCIexpress
host bus.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan.Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 86 -
 hw/pci/pci.c|  7 +++
 include/hw/pci/pci.h|  6 ++
 scripts/device-crash-test   |  1 +
 4 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index a6caa1e7b5..f762eb4a6e 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -17,6 +17,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -56,6 +57,16 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
+#define TYPE_PXB_CXL_DEVICE "pxb-cxl"
+DECLARE_INSTANCE_CHECKER(PXBDev, PXB_CXL_DEV,
+ TYPE_PXB_CXL_DEVICE)
+
+typedef struct CXLHost {
+PCIHostState parent_obj;
+
+CXLComponentState cxl_cstate;
+} CXLHost;
+
 struct PXBDev {
 /*< private >*/
 PCIDevice parent_obj;
@@ -68,6 +79,11 @@ struct PXBDev {
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
+/* A CXL PXB's parent bus is PCIe, so the normal check won't work */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PXB_CXL_DEVICE)) {
+return PXB_CXL_DEV(dev);
+}
+
 return pci_bus_is_express(pci_get_bus(dev))
 ? PXB_PCIE_DEV(dev) : PXB_DEV(dev);
 }
@@ -112,11 +128,20 @@ static const TypeInfo pxb_pcie_bus_info = {
 .class_init= pxb_bus_class_init,
 };
 
+static const TypeInfo pxb_cxl_bus_info = {
+.name  = TYPE_PXB_CXL_BUS,
+.parent= TYPE_CXL_BUS,
+.instance_size = sizeof(PXBBus),
+.class_init= pxb_bus_class_init,
+};
+
 static const char *pxb_host_root_bus_path(PCIHostState *host_bridge,
   PCIBus *rootbus)
 {
-PXBBus *bus = pci_bus_is_express(rootbus) ?
-  PXB_PCIE_BUS(rootbus) : PXB_BUS(rootbus);
+PXBBus *bus = pci_bus_is_cxl(rootbus) ?
+  PXB_CXL_BUS(rootbus) :
+  pci_bus_is_express(rootbus) ? PXB_PCIE_BUS(rootbus) :
+PXB_BUS(rootbus);
 
 snprintf(bus->bus_path, 8, ":%02x", pxb_bus_num(rootbus));
 return bus->bus_path;
@@ -218,6 +243,10 @@ static int pxb_map_irq_fn(PCIDevice *pci_dev, int pin)
 return pin - PCI_SLOT(pxb->devfn);
 }
 
+static void pxb_dev_reset(DeviceState *dev)
+{
+}
+
 static gint pxb_compare(gconstpointer a, gconstpointer b)
 {
 const PXBDev *pxb_a = a, *pxb_b = b;
@@ -389,13 +418,66 @@ static const TypeInfo pxb_pcie_dev_info = {
 },
 };
 
+static void pxb_cxl_dev_realize(PCIDevice *dev, Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+
+/* A CXL PXB's parent bus is still PCIe */
+if (!pci_bus_is_express(pci_get_bus(dev))) {
+error_setg(errp, "pxb-cxl devices cannot reside on a PCI bus");
+return;
+}
+if (!ms->cxl_devices_state->is_enabled) {
+error_setg(errp, "Machine does not have cxl=on");
+return;
+}
+
+pxb_dev_realize_common(dev, CXL, errp);
+pxb_dev_reset(DEVICE(dev));
+}
+
+static void pxb_cxl_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc   = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+k->realize = pxb_cxl_dev_realize;
+k->exit= pxb_dev_exitfn;
+/*
+ * XXX: These types of bridges don't actually show up in the hierarchy so
+ * vendor, device, class, etc. ids are intentionally left out.
+ */
+
+dc->desc = "CXL Host Bridge";
+device_class_set_props(dc, pxb_dev_properties);
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+
+/* Host bridges aren't hotplu

[PATCH v9 19/45] hw/cxl/device: Add some trivial commands

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 4ae0561dfc..c8188d7087 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "hw/cxl/cxl.h"
 #include "hw/pci/pci.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/uuid.h"
 
@@ -44,6 +45,8 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+FIRMWARE_UPDATE = 0x02,
+#define GET_INFO  0x0
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
@@ -52,6 +55,8 @@ enum {
 #define GET_LOG   0x1
 IDENTIFY= 0x40,
 #define MEMORY_DEVICE 0x0
+CCLS= 0x41,
+#define GET_PARTITION_INFO 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -114,6 +119,39 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.2.1 */
+static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint8_t slots_supported;
+uint8_t slot_info;
+uint8_t caps;
+uint8_t rsvd[0xd];
+char fw_rev1[0x10];
+char fw_rev2[0x10];
+char fw_rev3[0x10];
+char fw_rev4[0x10];
+} QEMU_PACKED *fw_info;
+QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
+
+if (cxl_dstate->pmem_size < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+fw_info = (void *)cmd->payload;
+memset(fw_info, 0, sizeof(*fw_info));
+
+fw_info->slots_supported = 2;
+fw_info->slot_info = BIT(0) | BIT(3);
+fw_info->caps = 0;
+pstrcpy(fw_info->fw_rev1, sizeof(fw_info->fw_rev1), "BWFW VERSION 0");
+
+*len = sizeof(*fw_info);
+return CXL_MBOX_SUCCESS;
+}
+
 /* 8.2.9.3.1 */
 static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
   CXLDeviceState *cxl_dstate,
@@ -258,6 +296,33 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint64_t active_vmem;
+uint64_t active_pmem;
+uint64_t next_vmem;
+uint64_t next_pmem;
+} QEMU_PACKED *part_info = (void *)cmd->payload;
+QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+/* PMEM only */
+part_info->active_vmem = 0;
+part_info->next_vmem = 0;
+part_info->active_pmem = size / (256 << 20);
+part_info->next_pmem = 0;
+
+*len = sizeof(*part_info);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -271,12 +336,16 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
+cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
 cmd_identify_memory_device, 0, 0 },
+[CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
+cmd_ccls_get_partition_info, 0, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.32.0




Re: [PATCH v1 7/9] colo-compare: safe finalization

2022-04-04 Thread Maxim Davydov
The main problem that if we call object_new_with_class() and then 
object_unref(), it fails. First of all, this is due to the fact that 
finalize expects that net/colo-compare.c:colo_compare_complete() has 
been called before.


On 3/30/22 17:54, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:

Fixes some possible issues with finalization. For example, finalization
immediately after instance_init fails on the assert.

Signed-off-by: Maxim Davydov 
---
  net/colo-compare.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62554b5b3c..81d8de0aaa 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1426,7 +1426,7 @@ static void colo_compare_finalize(Object *obj)
  break;
  }
  }
-    if (QTAILQ_EMPTY(&net_compares)) {
if colo_compare_active == false, event_mtx and event_complete_cond 
didn't inited in colo_compare_complete()

+    if (QTAILQ_EMPTY(&net_compares) && colo_compare_active) {
  colo_compare_active = false;
  qemu_mutex_destroy(&event_mtx);
  qemu_cond_destroy(&event_complete_cond);
@@ -1442,19 +1442,26 @@ static void colo_compare_finalize(Object *obj)
    colo_compare_timer_del(s);
  -    qemu_bh_delete(s->event_bh);
s->event_bh wasn't allocated in colo_compare_iothread() in 
colo_compare_complete()

+    if (s->event_bh) {
+    qemu_bh_delete(s->event_bh);
+    }
  -    AioContext *ctx = iothread_get_aio_context(s->iothread);
-    aio_context_acquire(ctx);
-    AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
-    if (s->notify_dev) {
-    AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
s->iothread == NULL after .instance_init (it can be detected in 
colo_compare_complete(), if it has been called)

+    if (s->iothread) {
+    AioContext *ctx = iothread_get_aio_context(s->iothread);
+    aio_context_acquire(ctx);
+    AIO_WAIT_WHILE(ctx, !s->out_sendco.done);
+    if (s->notify_dev) {
+    AIO_WAIT_WHILE(ctx, !s->notify_sendco.done);
+    }
+    aio_context_release(ctx);
  }
-    aio_context_release(ctx);
    /* Release all unhandled packets after compare thead exited */
  g_queue_foreach(&s->conn_list, colo_flush_packets, s);
-    AIO_WAIT_WHILE(NULL, !s->out_sendco.done);
In normal situation, it flushes all packets and sets s->out_sendco.done 
= true via compare_chr_send (we wait this event). But s->conn_list isn't 
initialized, s->out_sendco.done == false and won't become true. So, it's 
infinite waiting.

+    /* Without colo_compare_complete done == false without packets */
+    if (!g_queue_is_empty(&s->out_sendco.send_list)) {
+    AIO_WAIT_WHILE(NULL, !s->out_sendco.done);
+    }


I think, would be good to add more description for this last change. 
It's not as obvious as previous two changes.



g_queue_clear(&s->conn_list);
  g_queue_clear(&s->out_sendco.send_list);




--
Best regards,
Maxim Davydov




[PATCH v9 25/45] acpi/cxl: Create the CEDT (9.14.1)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

Link: 
https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qm...@intel.com/T/#t
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/acpi/cxl.c   | 68 +
 hw/i386/acpi-build.c| 27 
 hw/pci-bridge/pci_expander_bridge.c | 17 
 include/hw/acpi/cxl.h   |  5 +++
 include/hw/pci/pci_bridge.h | 20 +
 5 files changed, 120 insertions(+), 17 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index ca1f04f359..aa4af86a4c 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,7 +18,11 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
@@ -26,6 +30,70 @@
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID - currently equal to bus number */
+build_append_int_noprefix(table_data, cxl->bus_nr, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base - subregion within a container that is in PA space */
+build_append_int_noprefix(table_data, mr->container->addr + mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+cedt_build_chbs(cedt->buf, PXB_CXL_DEV(obj));
+}
+
+return 0;
+}
+
+void cxl_build_cedt(MachineState *ms, GArray *table_offsets, GArray 
*table_data,
+BIOSLinker *linker, const char *oem_id,
+const char *oem_table_id)
+{
+Aml *cedt;
+AcpiTable table = { .sig = "CEDT", .rev = 1, .oem_id = oem_id,
+.oem_table_id = oem_table_id };
+
+acpi_add_table(table_offsets, table_data);
+acpi_table_begin(&table, table_data);
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+free_aml_allocator();
+
+acpi_table_end(linker, &table);
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 59ede8b2e9..c125939ed6 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -77,6 +77,7 @@
 #include "hw/acpi/ipmi.h"
 #include "hw/acpi/hmat.h"
 #include "hw/acpi/viot.h"
+#include "hw/acpi/

[PATCH v9 16/45] hw/cxl/rp: Add a root port

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
  -device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/Kconfig  |   5 +
 hw/pci-bridge/cxl_root_port.c  | 231 +
 hw/pci-bridge/meson.build  |   1 +
 hw/pci-bridge/pcie_root_port.c |   6 +-
 hw/pci/pci.c   |   4 +-
 5 files changed, 245 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index f8df4315ba..02614f49aa 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -27,3 +27,8 @@ config DEC_PCI
 
 config SIMBA
 bool
+
+config CXL
+bool
+default y if PCI_EXPRESS && PXB
+depends on PCI_EXPRESS && MSI_NONBROKEN && PXB
diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
new file mode 100644
index 00..dfbf59ceb3
--- /dev/null
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -0,0 +1,231 @@
+/*
+ * CXL 2.0 Root Port Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/range.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+#define CXL_ROOT_PORT_DID 0x7075
+
+/* Copied from the gen root port which we derive */
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+(GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+#define CXL_ROOT_PORT_DVSEC_OFFSET \
+(GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
+
+typedef struct CXLRootPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+CXLComponentState cxl_cstate;
+PCIResReserve res_reserve;
+} CXLRootPort;
+
+#define TYPE_CXL_ROOT_PORT "cxl-rp"
+DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
+
+static void latch_registers(CXLRootPort *crp)
+{
+uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECPortExtensions){ 0 };
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECPortGPF){
+.rsvd= 0,
+.phase1_ctrl = 1, /* 1μs timeout */
+.phase2_ctrl = 1, /* 1μs timeout */
+};
+cxl_component_create_dvsec(cxl, GPF_PORT_DVSEC_LENGTH, GPF_PORT_DVSEC,
+   GPF_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
+.cap = 0x26, /* IO, Mem, non-MLD */
+.ctrl= 0x2,
+.status  = 0x26, /* same */
+.rcvd_mod_ts_data_phase1 = 0xef,
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_rp_realize(DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+CXLRootPort *crp   = CXL_ROOT_PORT(dev);
+CXLComponentState *cxl_cstate = &crp->cxl_cstate;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+MemoryRegion *component

[PATCH v9 26/45] hw/cxl/component: Add utils for interleave parameter encoding/decoding

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

Both registers and the CFMWS entries in CDAT use simple encodings
for the number of interleave ways and the interleave granularity.
Introduce simple conversion functions to/from the unencoded
number / size.  So far the iw decode has not been needed so is
it not implemented.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-component-utils.c   | 34 ++
 include/hw/cxl/cxl_component.h |  8 
 2 files changed, 42 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 22e52cef17..1a1adbd4cb 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 #include "hw/pci/pci.h"
 #include "hw/cxl/cxl.h"
 
@@ -223,3 +224,36 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, 
uint16_t length,
 range_init_nofail(&cxl->dvsecs[type], cxl->dvsec_offset, length);
 cxl->dvsec_offset += length;
 }
+
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp)
+{
+switch (iw) {
+case 1: return 0x0;
+case 2: return 0x1;
+case 4: return 0x2;
+case 8: return 0x3;
+case 16: return 0x4;
+case 3: return 0x8;
+case 6: return 0x9;
+case 12: return 0xa;
+default:
+error_setg(errp, "Interleave ways: %d not supported", iw);
+return 0;
+}
+}
+
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp)
+{
+switch (gran) {
+case 256: return 0;
+case 512: return 1;
+case 1024: return 2;
+case 2048: return 3;
+case 4096: return 4;
+case 8192: return 5;
+case 16384: return 6;
+default:
+error_setg(errp, "Interleave granularity: %" PRIu64 " invalid", gran);
+return 0;
+}
+}
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 5b15bd6c3f..b0f95d3484 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -194,4 +194,12 @@ void cxl_component_register_init_common(uint32_t 
*reg_state,
 void cxl_component_create_dvsec(CXLComponentState *cxl_cstate, uint16_t length,
 uint16_t type, uint8_t rev, uint8_t *body);
 
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp);
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp);
+
+static inline hwaddr cxl_decode_ig(int ig)
+{
+return 1 << (ig + 8);
+}
+
 #endif
-- 
2.32.0




[PATCH v9 22/45] qtests/cxl: Add initial root port and CXL type3 tests

2022-04-04 Thread Jonathan Cameron via
At this stage we can boot configurations with host bridges,
root ports and type 3 memory devices, so add appropriate
tests.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 tests/qtest/cxl-test.c | 126 +
 1 file changed, 126 insertions(+)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 1006c8ae4e..5f0794e816 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -8,6 +8,54 @@
 #include "qemu/osdep.h"
 #include "libqtest-single.h"
 
+#define QEMU_PXB_CMD "-machine q35,cxl=on " \
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "
+
+#define QEMU_2PXB_CMD "-machine q35,cxl=on " \
+  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "
+
+#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
+
+/* Dual ports on first pxb */
+#define QEMU_2RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 "
+
+/* Dual ports on each of the pxb instances */
+#define QEMU_4RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 " \
+ "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
+ "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
+
+#define QEMU_T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+ "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M "  
  \
+ "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 "
+
+#define QEMU_2T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 "
+
+#define QEMU_4T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+  "-object memory-backend-file,id=lsa0,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa1,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,lsa=lsa1,id=cxl-pmem1 " \
+  "-object 
memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa2,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp2,memdev=cxl-mem2,lsa=lsa2,id=cxl-pmem2 " \
+  "-object 
memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "\
+  "-object memory-backend-file,id=lsa3,mem-path=%s,size=256M " 
   \
+  "-device 
cxl-type3,bus=rp3,memdev=cxl-mem3,lsa=lsa3,id=cxl-pmem3 "
+
+static void cxl_basic_hb(void)
+{
+qtest_start("-machine q35,cxl=on");
+qtest_end();
+}
 
 static void cxl_basic_pxb(void)
 {
@@ -15,9 +63,87 @@ static void cxl_basic_pxb(void)
 qtest_end();
 }
 
+static void cxl_pxb_with_window(void)
+{
+qtest_start(QEMU_PXB_CMD);
+qtest_end();
+}
+
+static void cxl_2pxb_with_window(void)
+{
+qtest_start(QEMU_2PXB_CMD);
+qtest_end();
+}
+
+static void cxl_root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_RP);
+qtest_end();
+}
+
+static void cxl_2root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_2RP);
+qtest_end();
+}
+
+static void cxl_t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D, tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
+static void cxl_1pxb_2rp_2t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_2RP QEMU_2T3D,
+tmpfs, tmpfs, tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
+static void cxl_2pxb_4rp_4t3d(void)
+{
+g_autoptr(GString) cmdline = g_string_new(NULL);
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+g_string_printf(cmdline, QEMU_2PXB_CMD QEMU_4RP QEMU_4T3D,
+tmpfs, tmpfs, tmpfs, tmpfs, tmpfs, tmpfs,
+tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+}
+
 int main

[PATCH v9 13/45] cxl: Machine level control on whether CXL support is enabled

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/core/machine.c| 28 
 hw/i386/pc.c |  1 +
 include/hw/boards.h  |  2 ++
 include/hw/cxl/cxl.h |  4 
 4 files changed, 35 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d856485cb4..6ff5dba64e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 #include "sysemu/qtest.h"
 #include "hw/pci/pci.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 #include "exec/confidential-guest-support.h"
@@ -545,6 +546,20 @@ static void machine_set_nvdimm_persistence(Object *obj, 
const char *value,
 nvdimms_state->persistence_string = g_strdup(value);
 }
 
+static bool machine_get_cxl(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->cxl_devices_state->is_enabled;
+}
+
+static void machine_set_cxl(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->cxl_devices_state->is_enabled = value;
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
@@ -777,6 +792,8 @@ static void machine_class_init(ObjectClass *oc, void *data)
 mc->default_ram_size = 128 * MiB;
 mc->rom_file_has_mr = true;
 
+/* Few machines support CXL, so default to off */
+mc->cxl_supported = false;
 /* numa node memory size aligned on 8MB by default.
  * On Linux, each node's border has to be 8MB aligned
  */
@@ -922,6 +939,16 @@ static void machine_initfn(Object *obj)
 "Valid values are cpu, mem-ctrl");
 }
 
+if (mc->cxl_supported) {
+Object *obj = OBJECT(ms);
+
+ms->cxl_devices_state = g_new0(CXLState, 1);
+object_property_add_bool(obj, "cxl", machine_get_cxl, machine_set_cxl);
+object_property_set_description(obj, "cxl",
+"Set on/off to enable/disable "
+"CXL instantiation");
+}
+
 if (mc->cpu_index_to_instance_props && mc->get_default_cpu_node_id) {
 ms->numa_state = g_new0(NumaState, 1);
 object_property_add_bool(obj, "hmat",
@@ -956,6 +983,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->device_memory);
 g_free(ms->nvdimms_state);
 g_free(ms->numa_state);
+g_free(ms->cxl_devices_state);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fd55fc725c..e2849fc741 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1758,6 +1758,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->cxl_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index c92ac8815c..680718dafc 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -269,6 +269,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool cxl_supported;
 bool numa_mem_supported;
 bool auto_enable_numa;
 SMPCompatProps smp_props;
@@ -360,6 +361,7 @@ struct MachineState {
 CPUArchIdList *possible_cpus;
 CpuTopology smp;
 struct NVDIMMState *nvdimms_state;
+struct CXLState *cxl_devices_state;
 struct NumaState *numa_state;
 };
 
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 554ad93b6b..31af92fd5e 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -17,4 +17,8 @@
 #define CXL_COMPONENT_REG_BAR_IDX 0
 #define CXL_DEVICE_REG_BAR_IDX 2
 
+typedef struct CXLState {
+bool is_enabled;
+} CXLState;
+
 #endif
-- 
2.32.0




[PATCH v9 28/45] acpi/cxl: Introduce CFMWS structures in CEDT

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

The CEDT CXL Fixed Window Memory Window Structures (CFMWs)
define regions of the host phyiscal address map which
(via an impdef means) are configured such that they have
a particular interleave setup across one or more CXL Host Bridges.

Reported-by: Alison Schofield 
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/acpi/cxl.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index aa4af86a4c..31d5235136 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -60,6 +60,64 @@ static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
 build_append_int_noprefix(table_data, memory_region_size(mr), 8);
 }
 
+/*
+ * CFMWS entries in CXL 2.0 ECN: CEDT CFMWS & QTG _DSM.
+ * Interleave ways encoding in CXL 2.0 ECN: 3, 6, 12 and 16-way memory
+ * interleaving.
+ */
+static void cedt_build_cfmws(GArray *table_data, MachineState *ms)
+{
+CXLState *cxls = ms->cxl_devices_state;
+GList *it;
+
+for (it = cxls->fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+int i;
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 36 + 4 * fw->num_targets, 2);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base HPA */
+build_append_int_noprefix(table_data, fw->mr.addr, 8);
+
+/* Window Size */
+build_append_int_noprefix(table_data, fw->size, 8);
+
+/* Host Bridge Interleave Ways */
+build_append_int_noprefix(table_data, fw->enc_int_ways, 1);
+
+/* Host Bridge Interleave Arithmetic */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge Interleave Granularity */
+build_append_int_noprefix(table_data, fw->enc_int_gran, 4);
+
+/* Window Restrictions */
+build_append_int_noprefix(table_data, 0x0f, 2); /* No restrictions */
+
+/* QTG ID */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge List (list of UIDs - currently bus_nr) */
+for (i = 0; i < fw->num_targets; i++) {
+g_assert(fw->target_hbs[i]);
+build_append_int_noprefix(table_data, fw->target_hbs[i]->bus_nr, 
4);
+}
+}
+}
+
 static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
 {
 Aml *cedt = opaque;
@@ -86,6 +144,7 @@ void cxl_build_cedt(MachineState *ms, GArray *table_offsets, 
GArray *table_data,
 /* reserve space for CEDT header */
 
 object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+cedt_build_cfmws(cedt->buf, ms);
 
 /* copy AML table into ACPI tables blob and patch header there */
 g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
-- 
2.32.0




[PATCH v9 23/45] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO.  This patch includes
i386/pc support.
Also hook up the device reset now that we have have the MMIO
space in which the results are visible.

Note that we duplicate the PCI express case for the aml_build but
the implementations will diverge when the CXL specific _OSC is
introduced.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/i386/acpi-build.c| 25 ++-
 hw/i386/pc.c| 27 +++-
 hw/pci-bridge/pci_expander_bridge.c | 65 +
 include/hw/cxl/cxl.h| 14 +++
 4 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index dcf6ece3d0..2d81b0f40c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1572,10 +1573,21 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+
+if (pci_bus_is_cxl(bus)) {
+dev = aml_device("CL%.02X", bus_num);
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-if (pci_bus_is_express(bus)) {
+if (pci_bus_is_cxl(bus)) {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+
+/* Expander bridges do not have ACPI PCI Hot-plug enabled */
+aml_append(dev, build_q35_osc_method(true));
+} else if (pci_bus_is_express(bus)) {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
 aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
 
@@ -1595,6 +1607,15 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (pci_bus_is_cxl(bus)) {
+MemoryRegion *mr = &machine->cxl_devices_state->host_mr;
+uint64_t base = mr->addr;
+
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + memory_region_size(mr) - 1);
+}
 }
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e2849fc741..da74f08f9e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -75,6 +75,7 @@
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/qapi-visit-machine.h"
@@ -813,6 +814,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr cxl_base;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -902,6 +904,26 @@ void pc_memory_init(PCMachineState *pcms,
 &machine->device_memory->mr);
 }
 
+if (machine->cxl_devices_state->is_enabled) {
+MemoryRegion *mr = &machine->cxl_devices_state->host_mr;
+hwaddr cxl_size = MiB;
+
+if (pcmc->has_reserved_memory && machine->device_memory->base) {
+cxl_base = machine->device_memory->base;
+if (!pcmc->broken_reserved_end) {
+cxl_base += memory_region_size(&machine->device_memory->mr);
+}
+} else if (pcms->sgx_epc.size != 0) {
+cxl_base = sgx_epc_above_4g_end(&pcms->sgx_epc);
+} else {
+cxl_base = 0x1ULL + x86ms->above_4g_mem_size;
+}
+
+e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
+memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
+memory_region_add_subregion(system_memory, cxl_base, mr);
+}
+
 /* Initialize PC system firmware */
 pc_system_firmware_init(pcms, rom_memory);
 
@@ -962,7 +984,10 @@ uint64_t pc_pci_hole64_start(void)
 X86MachineState *x86ms = X86_MACHINE(pcms);
 uint64_t hole64_start = 0;
 
-if (pcmc->has_reserved_memory && ms->device_memory->base) {
+if (ms->cxl_devices_state->host_mr.addr) {
+hole64_start = ms->cxl_devices_state->host_mr.addr +
+  

[PATCH v9 34/45] hw/cxl/component Add a dumb HDM decoder handler

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Add a trivial handler for now to cover the root bridge
where we could do some error checking in future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-component-utils.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 1a1adbd4cb..148f9f30d9 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -32,6 +32,31 @@ static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr 
offset,
 }
 }
 
+static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
+ uint32_t value)
+{
+ComponentRegisters *cregs = &cxl_cstate->crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+break;
+default:
+break;
+}
+
+memory_region_transaction_begin();
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+memory_region_transaction_commit();
+}
+
 static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
 unsigned size)
 {
@@ -45,6 +70,12 @@ static void cxl_cache_mem_write_reg(void *opaque, hwaddr 
offset, uint64_t value,
 }
 if (cregs->special_ops && cregs->special_ops->write) {
 cregs->special_ops->write(cxl_cstate, offset, value, size);
+return;
+}
+
+if (offset >= A_CXL_HDM_DECODER_CAPABILITY &&
+offset <= A_CXL_HDM_DECODER0_TARGET_LIST_HI) {
+dumb_hdm_handler(cxl_cstate, offset, value);
 } else {
 cregs->cache_mem_registers[offset / 
sizeof(*cregs->cache_mem_registers)] = value;
 }
-- 
2.32.0




[PATCH v9 15/45] qtest/cxl: Introduce initial test for pxb-cxl only.

2022-04-04 Thread Jonathan Cameron via
Initial test with just pxb-cxl.  Other tests will be added
alongside functionality.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 
---
 tests/qtest/cxl-test.c  | 23 +++
 tests/qtest/meson.build |  4 
 2 files changed, 27 insertions(+)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
new file mode 100644
index 00..1006c8ae4e
--- /dev/null
+++ b/tests/qtest/cxl-test.c
@@ -0,0 +1,23 @@
+/*
+ * QTest testcase for CXL
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+
+static void cxl_basic_pxb(void)
+{
+qtest_start("-machine q35,cxl=on -device pxb-cxl,bus=pcie.0");
+qtest_end();
+}
+
+int main(int argc, char **argv)
+{
+g_test_init(&argc, &argv, NULL);
+qtest_add_func("/pci/cxl/basic_pxb", cxl_basic_pxb);
+return g_test_run();
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index d25f82bb5a..6e1ad4dc9a 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -41,6 +41,9 @@ qtests_pci = \
   (config_all_devices.has_key('CONFIG_VGA') ? ['display-vga-test'] : []) + 
 \
   (config_all_devices.has_key('CONFIG_IVSHMEM_DEVICE') ? ['ivshmem-test'] : [])
 
+qtests_cxl = \
+  (config_all_devices.has_key('CONFIG_CXL') ? ['cxl-test'] : [])
+
 qtests_i386 = \
   (slirp.found() ? ['pxe-test', 'test-netfilter'] : []) + \
   (config_host.has_key('CONFIG_POSIX') ? ['test-filter-mirror'] : []) +
 \
@@ -75,6 +78,7 @@ qtests_i386 = \
slirp.found() ? ['virtio-net-failover'] : []) + 
 \
   (unpack_edk2_blobs ? ['bios-tables-test'] : []) +
 \
   qtests_pci + 
 \
+  qtests_cxl + 
 \
   ['fdc-test',
'ide-test',
'hd-geo-test',
-- 
2.32.0




[PATCH v9 30/45] pci/pcie_port: Add pci_find_port_by_pn()

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

Simple function to search a PCIBus to find a port by
it's port number.

CXL interleave decoding uses the port number as a target
so it is necessary to locate the port when doing interleave
decoding.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci/pcie_port.c | 25 +
 include/hw/pci/pcie_port.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c
index e95c1e5519..687e4e763a 100644
--- a/hw/pci/pcie_port.c
+++ b/hw/pci/pcie_port.c
@@ -136,6 +136,31 @@ static void pcie_port_class_init(ObjectClass *oc, void 
*data)
 device_class_set_props(dc, pcie_port_props);
 }
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn)
+{
+int devfn;
+
+for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+PCIDevice *d = bus->devices[devfn];
+PCIEPort *port;
+
+if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+continue;
+}
+
+if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+continue;
+}
+
+port = PCIE_PORT(d);
+if (port->port == pn) {
+return d;
+}
+}
+
+return NULL;
+}
+
 static const TypeInfo pcie_port_type_info = {
 .name = TYPE_PCIE_PORT,
 .parent = TYPE_PCI_BRIDGE,
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index e25b289ce8..7b8193061a 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -39,6 +39,8 @@ struct PCIEPort {
 
 void pcie_port_init_reg(PCIDevice *d);
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn);
+
 #define TYPE_PCIE_SLOT "pcie-slot"
 OBJECT_DECLARE_SIMPLE_TYPE(PCIESlot, PCIE_SLOT)
 
-- 
2.32.0




[PATCH v9 43/45] pci-bridge/cxl_upstream: Add a CXL switch upstream port

2022-04-04 Thread Jonathan Cameron via
An initial simple upstream port emulation to allow the creation
of CXL switches. The Device ID has been allocated for this use.

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_upstream.c | 211 +++
 hw/pci-bridge/meson.build|   2 +-
 include/hw/cxl/cxl.h |   4 +
 3 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_upstream.c b/hw/pci-bridge/cxl_upstream.c
new file mode 100644
index 00..5a06aeef67
--- /dev/null
+++ b/hw/pci-bridge/cxl_upstream.c
@@ -0,0 +1,211 @@
+/*
+ * Emulated CXL Switch Upstream Port
+ *
+ * Copyright (c) 2022 Huawei Technologies.
+ *
+ * Based on xio31130_upstream.c
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_port.h"
+
+#define CXL_UPSTREAM_PORT_MSI_NR_VECTOR 1
+
+#define CXL_UPSTREAM_PORT_MSI_OFFSET 0x70
+#define CXL_UPSTREAM_PORT_PCIE_CAP_OFFSET 0x90
+#define CXL_UPSTREAM_PORT_AER_OFFSET 0x100
+#define CXL_UPSTREAM_PORT_DVSEC_OFFSET \
+(CXL_UPSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+
+typedef struct CXLUpstreamPort {
+/*< private >*/
+PCIEPort parent_obj;
+
+/*< public >*/
+CXLComponentState cxl_cstate;
+} CXLUpstreamPort;
+
+CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp)
+{
+return &usp->cxl_cstate;
+}
+
+static void cxl_usp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
+   uint32_t val, int len)
+{
+CXLUpstreamPort *usp = CXL_USP(dev);
+
+if (range_contains(&usp->cxl_cstate.dvsecs[EXTENSIONS_PORT_DVSEC], addr)) {
+uint8_t *reg = &dev->config[addr];
+addr -= usp->cxl_cstate.dvsecs[EXTENSIONS_PORT_DVSEC].lob;
+if (addr == PORT_CONTROL_OFFSET) {
+if (pci_get_word(reg) & PORT_CONTROL_UNMASK_SBR) {
+/* unmask SBR */
+qemu_log_mask(LOG_UNIMP, "SBR mask control is not 
supported\n");
+}
+if (pci_get_word(reg) & PORT_CONTROL_ALT_MEMID_EN) {
+/* Alt Memory & ID Space Enable */
+qemu_log_mask(LOG_UNIMP,
+  "Alt Memory & ID space is not supported\n");
+}
+}
+}
+}
+
+static void cxl_usp_write_config(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+
+cxl_usp_dvsec_write_config(d, address, val, len);
+}
+
+static void latch_registers(CXLUpstreamPort *usp)
+{
+uint32_t *reg_state = usp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_UPSTREAM_PORT);
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, TARGET_COUNT, 8);
+}
+
+static void cxl_usp_reset(DeviceState *qdev)
+{
+PCIDevice *d = PCI_DEVICE(qdev);
+CXLUpstreamPort *usp = CXL_USP(qdev);
+
+pci_bridge_reset(qdev);
+pcie_cap_deverr_reset(d);
+latch_registers(usp);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECPortExtensions){
+.status = 0x1, /* Port Power Management Init Complete */
+};
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
+.cap = 0x27, /* Cache, IO, Mem, non-MLD */
+.ctrl= 0x27, /* Cache, IO, Mem */
+.status  = 0x26, /* same */
+.rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(CXLDVSECRegisterLocator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_usp_realize(PCIDevice *d, Error **errp)
+{
+PCIEPort *p = PCIE_PORT(d);
+CXLUpstreamPort *usp = CXL_USP(d);
+CXLComponentState *cxl_cstate = &usp->cxl_cstate;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+MemoryRegion *component_bar = &cregs->component_registers;
+int rc;
+
+pci_bridge_initfn(d, TYPE_PCIE_BUS);
+pcie_port_init_reg(d);
+
+rc = msi_init(d, CXL_UPSTREAM_PORT_MSI_OFFSET,
+  CXL_UPSTREAM_PORT_MSI_NR_VECTOR, true, true, errp);
+if (rc) {
+assert(rc == -ENOTSUP);
+goto err_bridge;
+}
+
+rc = pcie_cap_init(d, CXL_UPST

[PATCH v9 35/45] i386/pc: Enable CXL fixed memory windows

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

Add the CFMWs memory regions to the memorymap and adjust the
PCI window to avoid hitting the same memory.

Signed-off-by: Jonathan Cameron 
---
 hw/i386/pc.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index da74f08f9e..48a86ac8a4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -814,7 +814,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
-hwaddr cxl_base;
+hwaddr cxl_base, cxl_resv_end = 0;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -922,6 +922,24 @@ void pc_memory_init(PCMachineState *pcms,
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
 memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
 memory_region_add_subregion(system_memory, cxl_base, mr);
+cxl_resv_end = cxl_base + cxl_size;
+if (machine->cxl_devices_state->fixed_windows) {
+hwaddr cxl_fmw_base;
+GList *it;
+
+cxl_fmw_base = ROUND_UP(cxl_base + cxl_size, 256 * MiB);
+for (it = machine->cxl_devices_state->fixed_windows; it; it = 
it->next) {
+CXLFixedWindow *fw = it->data;
+
+fw->base = cxl_fmw_base;
+memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
+  "cxl-fixed-memory-region", fw->size);
+memory_region_add_subregion(system_memory, fw->base, &fw->mr);
+e820_add_entry(fw->base, fw->size, E820_RESERVED);
+cxl_fmw_base += fw->size;
+cxl_resv_end = cxl_fmw_base;
+}
+}
 }
 
 /* Initialize PC system firmware */
@@ -951,6 +969,10 @@ void pc_memory_init(PCMachineState *pcms,
 if (!pcmc->broken_reserved_end) {
 res_mem_end += memory_region_size(&machine->device_memory->mr);
 }
+
+if (machine->cxl_devices_state->is_enabled) {
+res_mem_end = cxl_resv_end;
+}
 *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));
 fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
 }
@@ -987,6 +1009,13 @@ uint64_t pc_pci_hole64_start(void)
 if (ms->cxl_devices_state->host_mr.addr) {
 hole64_start = ms->cxl_devices_state->host_mr.addr +
 memory_region_size(&ms->cxl_devices_state->host_mr);
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+for (it = ms->cxl_devices_state->fixed_windows; it; it = it->next) 
{
+CXLFixedWindow *fw = it->data;
+hole64_start = fw->mr.addr + memory_region_size(&fw->mr);
+}
+}
 } else if (pcmc->has_reserved_memory && ms->device_memory->base) {
 hole64_start = ms->device_memory->base;
 if (!pcmc->broken_reserved_end) {
-- 
2.32.0




[PATCH v9 17/45] hw/cxl/device: Add a memory device (8.2.8.5)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

Once the CXL topology is fully configure and address decoders committed,
the guest physical address for the memory device is part of a larger
window which is owned by the platform.  The creation of these windows
is later in this series.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"

Note: Dropped PCDIMM info interfaces for now.  They can be added if
appropriate at a later date.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  |  46 +++
 hw/mem/Kconfig  |   5 ++
 hw/mem/cxl_type3.c  | 159 
 hw/mem/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  15 
 include/hw/cxl/cxl_pci.h|  21 +
 include/hw/pci/pci_ids.h|   1 +
 7 files changed, 248 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index db473135c7..4ae0561dfc 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -50,6 +50,8 @@ enum {
 LOGS= 0x04,
 #define GET_SUPPORTED 0x0
 #define GET_LOG   0x1
+IDENTIFY= 0x40,
+#define MEMORY_DEVICE 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -214,6 +216,48 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+/* 8.2.9.5.1.1 */
+static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+char fw_revision[0x10];
+uint64_t total_capacity;
+uint64_t volatile_capacity;
+uint64_t persistent_capacity;
+uint64_t partition_align;
+uint16_t info_event_log_size;
+uint16_t warning_event_log_size;
+uint16_t failure_event_log_size;
+uint16_t fatal_event_log_size;
+uint32_t lsa_size;
+uint8_t poison_list_max_mer[3];
+uint16_t inject_poison_limit;
+uint8_t poison_caps;
+uint8_t qos_telemetry_caps;
+} QEMU_PACKED *id;
+QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
+
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+id = (void *)cmd->payload;
+memset(id, 0, sizeof(*id));
+
+/* PMEM only */
+snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
+
+id->total_capacity = size / (256 << 20);
+id->persistent_capacity = size / (256 << 20);
+
+*len = sizeof(*id);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -231,6 +275,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
+[IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
+cmd_identify_memory_device, 0, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index 03dbb3c7df..73c5ae8ad9 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -11,3 +11,8 @@ config NVDIMM
 
 config SPARSE_MEM
 bool
+
+config CXL_MEM_DEVICE
+bool
+default y if CXL
+select MEM_DEVICE
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
new file mode 100644
index 00..329a6ea2a9
--- /dev/null
+++ b/hw/mem/cxl_type3.c
@@ -0,0 +1,159 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/mem/memory-device.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/pci/pci.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/range.h"
+#include "qemu/rcu.h"
+#include "sysemu/hostmem.h"
+#include "hw/cxl/cxl.h"
+
+static void build_dvsecs(CXLType3Dev *ct3d)
+{
+CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(CXLDVSECDevice){
+.cap = 0x1e,
+.ctrl = 0x6,
+.status2 = 0x2,
+.range1_size_hi = ct3d->hostmem->size >> 32,
+.range1_size_lo = (2 << 5) | (2 << 2) | 0x3 |
+(ct3d->hostmem->size & 0xF000),
+.range

[PATCH v9 31/45] CXL/cxl_component: Add cxl_get_hb_cstate()

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

Accessor to get hold of the cxl state for a CXL host bridge
without exposing the internals of the implementation.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 7 +++
 include/hw/cxl/cxl_component.h  | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index b4813b6851..963fa41a11 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -72,6 +72,13 @@ static GList *pxb_dev_list;
 
 #define TYPE_PXB_HOST "pxb-host"
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb)
+{
+CXLHost *host = PXB_CXL_HOST(hb);
+
+return &host->cxl_cstate;
+}
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index b0f95d3484..779a7b1a97 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -202,4 +202,6 @@ static inline hwaddr cxl_decode_ig(int ig)
 return 1 << (ig + 8);
 }
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
+
 #endif
-- 
2.32.0




Re: [PATCH v1 8/9] qom: add command to print initial properties

2022-04-04 Thread Maxim Davydov



On 3/30/22 18:17, Vladimir Sementsov-Ogievskiy wrote:

29.03.2022 00:15, Maxim Davydov wrote:
The command "query-init-properties" is needed to get values of 
properties
after initialization (not only default value). It makes sense, for 
example,

when working with x86_64-cpu.
All machine types (and x-remote-object, because its init uses machime
type's infrastructure) should be skipped, because only the one 
instance can

be correctly initialized.

Signed-off-by: Maxim Davydov 
---
  qapi/qom.json  |  69 ++
  qom/qom-qmp-cmds.c | 121 +
  2 files changed, 190 insertions(+)

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..1eedc441eb 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -949,3 +949,72 @@
  ##
  { 'command': 'object-del', 'data': {'id': 'str'},
    'allow-preconfig': true }
+
+##
+# @InitValue:
+#
+# Not all objects have default values but they have "initial" values.
+#
+# @name: property name
+#
+# @value: Current value (default or after initialization. It makes 
sence,

+# for example, for x86-cpus)
+#
+# Since: 7.0


7.1 (here and below)


+#
+##
+{ 'struct': 'InitValue',
+  'data': { 'name': 'str',
+    '*value': 'any' } }
+


[..]


diff --git a/qom/qom-qmp-cmds.c b/qom/qom-qmp-cmds.c
index 2d6f41ecc7..c1bb3f1f8b 100644
--- a/qom/qom-qmp-cmds.c
+++ b/qom/qom-qmp-cmds.c
@@ -27,6 +27,7 @@
  #include "qemu/cutils.h"
  #include "qom/object_interfaces.h"
  #include "qom/qom-qobject.h"
+#include "hw/boards.h"
    ObjectPropertyInfoList *qmp_qom_list(const char *path, Error **errp)
  {
@@ -235,3 +236,123 @@ void qmp_object_del(const char *id, Error **errp)
  {
  user_creatable_del(id, errp);
  }
+
+static void query_object_prop(InitValueList **props_list, 
ObjectProperty *prop,

+  Object *obj, Error **errp)
+{
+    InitValue *prop_info = NULL;
+
+    /* Skip inconsiderable properties */
+    if (strcmp(prop->name, "type") == 0 ||
+    strcmp(prop->name, "realized") == 0 ||
+    strcmp(prop->name, "hotpluggable") == 0 ||
+    strcmp(prop->name, "hotplugged") == 0 ||
+    strcmp(prop->name, "parent_bus") == 0) {
+    return;
+    }
+
+    prop_info = g_malloc0(sizeof(*prop_info));
+    prop_info->name = g_strdup(prop->name);
+    prop_info->value = NULL;
+    if (prop->defval) {
+    prop_info->value = qobject_ref(prop->defval);
+    } else if (prop->get) {
+    /*
+ * crash-information in x86-cpu uses errp to return current 
state.

+ * So, after requesting this property it returns GenericError:
+ * "No crash occured"
+ */
+    if (strcmp(prop->name, "crash-information") != 0) {
+    prop_info->value = object_property_get_qobject(obj, 
prop->name,

+ errp);
+    }
+    }


Hmmm. Should we instead call prop->get() when is is available, and 
only if not use prep->defval?
default properties more rare and sometimes can give more information (if 
the device developer thought that there should be a default value). And 
I think that if prop->get() isn't available, prop->defval() isn't too.



+    prop_info->has_value = !!prop_info->value;
+
+    QAPI_LIST_PREPEND(*props_list, prop_info);
+}
+
+typedef struct QIPData {
+    InitPropsList **dev_list;
+    Error **errp;
+} QIPData;
+
+static void query_init_properties_tramp(gpointer list_data, gpointer 
opaque)

+{
+    ObjectClass *k = list_data;
+    Object *obj;
+    ObjectClass *parent;
+    GHashTableIter iter;
+
+    QIPData *data = opaque;
+    ClassPropertiesList *class_props_list = NULL;
+    InitProps *dev_info;
+
+    /* Only one machine can be initialized correctly (it's already 
happened) */

+    if (object_class_dynamic_cast(k, TYPE_MACHINE)) {
+    return;
+    }
+
+    const char *klass_name = object_class_get_name(k);
+    /*
+ * Uses machine type infrastructure with notifiers. It causes 
immediate

+ * notify and SEGSEGV during remote_object_machine_done
+ */
+    if (strcmp(klass_name, "x-remote-object") == 0) {
+    return;
+    }
+
+    dev_info = g_malloc0(sizeof(*dev_info));
+    dev_info->name = g_strdup(klass_name);
+
+    obj = object_new_with_class(k);
+
+    /*
+ * Part of ObjectPropertyIterator infrastructure, but we need 
more precise

+ * control of current class to dump appropriate features
+ * This part was taken out from loop because first 
initialization differ

+ * from other reinitializations
+ */
+    parent = object_get_class(obj);


hmm.. obj = object_new_with_class(k); parent = 
object_get_class(obj);.. Looks for me like parent should be equal to 
k. Or object_ API is rather unobvious.

I'll change it)



+    g_hash_table_iter_init(&iter, obj->properties);
+    const char *prop_owner_name = object_get_typename(obj);
+    do {
+    InitValueList *prop_list = NULL;
+    ClassProperties *class_data;
+
+    gpointer key, val;
+    while (g_hash_table_iter_ne

[PATCH v9 20/45] hw/cxl/device: Plumb real Label Storage Area (LSA) sizing

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  |  3 +++
 hw/mem/cxl_type3.c  |  9 +
 include/hw/cxl/cxl_device.h | 11 ++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index c8188d7087..492739aef3 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -277,6 +277,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 } QEMU_PACKED *id;
 QEMU_BUILD_BUG_ON(sizeof(*id) != 0x43);
 
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t size = cxl_dstate->pmem_size;
 
 if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
@@ -291,6 +293,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 
 id->total_capacity = size / (256 << 20);
 id->persistent_capacity = size / (256 << 20);
+id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 5c93fbbd9b..14d8b0c503 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -176,10 +176,16 @@ static Property ct3_props[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static uint64_t get_lsa_size(CXLType3Dev *ct3d)
+{
+return 0;
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+CXLType3Class *cvc = CXL_TYPE3_CLASS(oc);
 
 pc->realize = ct3_realize;
 pc->exit = ct3_exit;
@@ -192,11 +198,14 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 dc->desc = "CXL PMEM Device (Type 3)";
 dc->reset = ct3d_reset;
 device_class_set_props(dc, ct3_props);
+
+cvc->get_lsa_size = get_lsa_size;
 }
 
 static const TypeInfo ct3d_info = {
 .name = TYPE_CXL_TYPE3,
 .parent = TYPE_PCI_DEVICE,
+.class_size = sizeof(struct CXLType3Class),
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
 .interfaces = (InterfaceInfo[]) {
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index d8da2c7b68..ea2571a69b 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -236,6 +236,7 @@ struct CXLType3Dev {
 
 /* Properties */
 HostMemoryBackend *hostmem;
+HostMemoryBackend *lsa;
 
 /* State */
 CXLComponentState cxl_cstate;
@@ -243,6 +244,14 @@ struct CXLType3Dev {
 };
 
 #define TYPE_CXL_TYPE3 "cxl-type3"
-OBJECT_DECLARE_SIMPLE_TYPE(CXLType3Dev, CXL_TYPE3)
+OBJECT_DECLARE_TYPE(CXLType3Dev, CXLType3Class, CXL_TYPE3)
+
+struct CXLType3Class {
+/* Private */
+PCIDeviceClass parent_class;
+
+/* public */
+uint64_t (*get_lsa_size)(CXLType3Dev *ct3d);
+};
 
 #endif
-- 
2.32.0




[PATCH v9 32/45] mem/cxl_type3: Add read and write functions for associated hostmem.

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

Once a read or write reaches a CXL type 3 device, the HDM decoders
on the device are used to establish the Device Physical Address
which should be accessed.  These functions peform the required maths
and then use a device specific address space to access the
hostmem->mr to fullfil the actual operation.  Note that failed writes
are silent, but failed reads return poison.  Note this is based
loosely on:

https://lore.kernel.org/qemu-devel/20200817161853.593247-6-f4...@amsat.org/
[RFC PATCH 0/9] hw/misc: Add support for interleaved memory accesses

Only lightly tested so far.  More complex test cases yet to be written.

Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c  | 91 +
 include/hw/cxl/cxl_device.h |  6 +++
 2 files changed, 97 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 9578e72576..53fd57579b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -95,7 +95,9 @@ static void ct3d_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
+DeviceState *ds = DEVICE(ct3d);
 MemoryRegion *mr;
+char *name;
 
 if (!ct3d->hostmem) {
 error_setg(errp, "memdev property must be set");
@@ -110,6 +112,15 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 memory_region_set_nonvolatile(mr, true);
 memory_region_set_enabled(mr, true);
 host_memory_backend_set_mapped(ct3d->hostmem, true);
+
+if (ds->id) {
+name = g_strdup_printf("cxl-type3-dpa-space:%s", ds->id);
+} else {
+name = g_strdup("cxl-type3-dpa-space");
+}
+address_space_init(&ct3d->hostmem_as, mr, name);
+g_free(name);
+
 ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
 
 if (!ct3d->lsa) {
@@ -165,6 +176,86 @@ static void ct3_exit(PCIDevice *pci_dev)
 ComponentRegisters *regs = &cxl_cstate->crb;
 
 g_free(regs->special_ops);
+address_space_destroy(&ct3d->hostmem_as);
+}
+
+/* TODO: Support multiple HDM decoders and DPA skip */
+static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
+{
+uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
+uint64_t decoder_base, decoder_size, hpa_offset;
+uint32_t hdm0_ctrl;
+int ig, iw;
+
+decoder_base = (((uint64_t)cache_mem[R_CXL_HDM_DECODER0_BASE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_BASE_LO]);
+if ((uint64_t)host_addr < decoder_base) {
+return false;
+}
+
+hpa_offset = (uint64_t)host_addr - decoder_base;
+
+decoder_size = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_SIZE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_SIZE_LO];
+if (hpa_offset >= decoder_size) {
+return false;
+}
+
+hdm0_ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+iw = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IW);
+ig = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IG);
+
+*dpa = (MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) >> iw);
+
+return true;
+}
+
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
+   unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+/* TODO support volatile region */
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_ERROR;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, &dpa_offset)) {
+return MEMTX_ERROR;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_ERROR;
+}
+
+return address_space_read(&ct3d->hostmem_as, dpa_offset, attrs, data, 
size);
+}
+
+MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
+unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CXL_TYPE3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_OK;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, &dpa_offset)) {
+return MEMTX_OK;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_OK;
+}
+return address_space_write(&ct3d->hostmem_as, dpa_offset, attrs,
+   &data, size);
 }
 
 static void ct3d_reset(DeviceState *dev)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 4285fbda08..1e141b6621 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -239,6 +239,7 @@ struct CXLType3Dev {
 HostMemoryBackend *lsa;
 
 /* State */
+AddressSpace hostmem_as;
 CXLComponentState cxl_cstate;
 CXLDeviceState cxl_dstate;
 };
@@ -259,4 +260,9 @@ struct CXLType3Class {
 uint64_t offset);
 };
 
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *da

[PATCH v9 38/45] tests/acpi: Add tables for CXL emulation.

2022-04-04 Thread Jonathan Cameron via
Tables that differ from normal Q35 tables when running the CXL test.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/CEDT.cxl| Bin 0 -> 184 bytes
 tests/data/acpi/q35/DSDT.cxl| Bin 0 -> 9615 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   2 --
 3 files changed, 2 deletions(-)

diff --git a/tests/data/acpi/q35/CEDT.cxl b/tests/data/acpi/q35/CEDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b8fa06b00e65712e91e0a5ea0d9277e0146d1c00
 100644
GIT binary patch
literal 184
zcmZ>EbqU$Qz`(%x(aGQ0BUr&HBEVSz2pEB4AU23*U{GMV2P7eE5T6mshKVRJ@Sw=U
r)I#JL88kqeKtKSd14gp~1^Iy(qF)E31_T6{AT-z>kXmGQAh!SjnYIc6

literal 0
HcmV?d1

diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c1206defed0154e9024702bba88453b2790a306d
 100644
GIT binary patch
literal 9615
zcmeHN&2JmW9i1g9X|-HQONzE^`9p-`^eaU|`6EeNq%pZmk+ejbCaE|94R9$bt!$^r
zB8h=GhEZ7o632(O3FDx*(k=t^*8T%U4YY^$W}qk?>Dm}&yW-l
z9}AR+<^E>ho8P?InSIL{dUdby)5jSzBDphev7XMoSas9*7>qGGr*EeeJI|V1UartG
z;*prqydLN0IONRKx4qnI!T9;6|B>&%@vd*Q1GaX@xwX~~-oD|lF#=s)3oMIHocwgF
zo@+I?U90MrGG?n-^6czA%QRcAIE$LCtXE@ZYqjLD)XGHbOx=y$yu@7Z++w#f*4a$V
zT28b4x8t8L96a^Wxi_+RpZn_%ZeFrt035@&iSOe99sH}lb=dDZA9OAl*ND!qEp}%=
z={4m$z}MTB&B-%p%(5*6N7`>)^X{jM>yV^!ZJ{
z-~XLBWzH4mlue;BZx*ZhhE!=l8>wn;6|0Rhvl+YhAkJdV>kh@UFXSs;x?1yE>D1G$
zdLzpMD)9prTJlX|oU8Hv7ka#(J!0&4{)otm$_qsV(;&
zuoz=#&DWl!)=+;px93asY>Rg>(l4MX)l%(j#PTiMS)O?+DuIM*Zl74rc>s%h6h-UN
zDw$@VwWnbC%x8vCFgDl*zK=wZt+{=)d}eirH8ZQROl#~2^-y#B*h;mrDC>@i`)z1g
z$C@e_Z${sYn&y!$Uh^^cOnHYh1~hte1m}MAew3L<9L{;X)^K-P6A$knuR34>Gt48*
zKo?aK5Bq4V>ed@Z{H|@8xHS~G=)2W44qm#sRnMQsEcl~s;l{-&aC4dbXI&%Khgd)5jaR*TdAhq2PK|rd{OO7o+bCGrTcP>~F
z%z$fr9N8GQeb!4vjq7w^x97ThIv1^pAUPIcQ>-2MH`W~isOrQMNa^WGP3NSp6QQcp(sWvyPD|H`P}LdNbjCHEaa|`uRVSW>
z%Kfx8owlwMp{jFA(>bN-oYHk7RCP{kI;SqMyPOlUe2n$Co-6QQazsp(8=I+MCigsRS2P3Nqpb5_@hP}P~zbfz?&DP1Q*Rp%Z}
z=N?Vx9$hCwRp(w!=Uz?cUR@_bRVUsgF#6fER4+^6Z>r|U$h>fFzn
z+3-oYpEGme!*0J|x(`EQdLedRW6o>Ld7X(+WggI&2Q=mZorzFo9@LlzHReH`iBM%8
z(wK)d<{_PlP-PzGOzAR*Ia5C44-2Mza3dt9yn_o`&IK*!f}Rth$Z5hrvxLsM+
zW@_LZi9-WLfV3irB9KX8paRD&$za5i?K6;6Kz9rjp_B##63>2YsMKVx?QYQ>l
zU^x>8szAv=1(fF`14Srx!axOlU^x>8szAv=1(Y+%
zKoLrvFi?TzOcu&LjgxD0RX>1(q{mpbC@>R6seC3>2Z%2?G^a&V+#~P%=;f
z=
z3{*fllMEE0)CmI>Sk8ojDo`>|0p(0GP=rz^3{+q_69%e4$v_2^Gs!>^N}VuJf#pmX
zr~)Me6;RG314Srx!axOM_lrzad5lWpfP=V!47^ngz0~JutBm+e#b;3XqDF%v2HBf}Afg)54RAItE
z6($*|!XyJ#m@rU<2?JG_WS|O@3{+vlKouqoRAG{VDoip^g$V;ym@rU1;PyD(5hxMRFC$v_d}Hpq~evTtFah-BZwKoQBlB?Cn$`<4t8A(o2fTd+{pwLARB
zYL9-9-X5o~Z1ehepNi72R9e-b^$w$2JDY{$p3Tw0rGsZOti7Dg)A0)0$y)hDOw|
z^s+L6cZ955^02X7LyJKsnq5!qwPxR&W|L948^iOP;Yp0ui_{EX2kKE1(3)&2(eg@l
zc8$)hEnYH1>ro5{x5neSR=rj?Zf=Hcp!8H8X3q^|$KuIX&Yn}U=XmxU+NWmErABAZ
zHMh&8T`Z+xxi8diMIIr&tE6{%dUqhbyGwdEOz#bdx3Hk~mDPJndXJ~~2GV=Gr1!$~
z`N8zDM@XMn(&u^l{6PBrF6r}O`lZ42V~>!2NlCxN(=QFAU)m-8QkcFln11{b(ifEU
z1)jbzkiM`>`a+n#IGBFo5z-fx^hKV&IFP=$OZpk`^!*uqHJV!L*sg{UL9U)hN=@~BdZ%4r*nFB
zc+(lGPL%1@IR_rDXK&Bc%Q9QMArKT=B2g@^v{%Elf^$)0A6X
zO&4E%BY5TV57V!`{Q9exUt53qb=EZ8>dJCBTBj_lV0>bKVjDJA_2EghbpDBL+0EN1{T0GbXYQ2)OE`q7TJ@8jJoywN*Zu4+el-rxl}2c0i!~U
z`s3%h9yCZaaw_XqOPS1KhMFNZj>b|6x3Tn6q-%9H**k6~lev&8j$`#cJK22f{8KTx
zwLwBj04=`{79&}}{O){b@B&E)tQjo34_#SV?)jHOY~1mZoed*k8-d{mtbJ$2{#nO2Zmpxp57q}$a>0XzxGCMaTZ5&z
zE$bh5Cp&Gck2D_#qpo+44)q+}_h9`7wXw}Ex6!KG&vZ`?!T89)?OV5^!BEHGB6yeA
zX=5=T6FZAk;Tl_~TMljO&Bf8;hU!9lF0YO5=JKR_rrxl3>E_X+WvG61UT|SV-vm}<
zu#}(|2Mf{7BU{&e`&k;(nE&`-dSG0eJsRDZ0p)BX5w}c+)dqSGO-*Cdv=JvUZ1cj!
z)B#MMLN(vYXO6LO#?wTiG3A_z(Ir0d!#S0Cnx(!2>>{I%*x3;jJ61|T)vfTBY6xff
z=y~yQ$sKi9JXDP5i1n%2%H=Bbg{?$6b)ObfH)%)s1~47)LbkK_3v_+8T4ko&sI4$M
zclo)CO<~ffYzkvnbmXqED2@N+Uz->HaI^f|=RTZwp8xX~yY3Y)e8s(D><%BJrtwW?
zYMt0U?s~CZ6WZXMKw{c#K1sY2q=!yUq5_w;olhEc=v}%!t!+Z5PEEo*8!Sjhw)lLm
zOx$W)TftjxE5g=-tFN`!@C%5ocb(2UK$Bu;%3~W;VC)oRQIP1YTalfTTv!s_DRJ@4
zxOQdDav1I4-Pm9(xY|bDH#Q6wY~1i^`u2SBlH!e9zP1DYWxxOYmwarpQ%t>*e$el?
z9*ny3OI+!NS0e1j6pLULiG9lcj$;%qr;nu!-co2R*k-D1{r|CqKQ#Q0jHOc;
zOFtetmVAtuUyK|}w{hjp&{(n=)(FK|Ix3z@hujxe@aGI==sd&lK1q+!p5d&_i`NXk
zqs{;5QF7ujLk>;xv)pC{Dp%eQ#h*Ch;z7aR57Xo6
zafAQs<8-WAtk%R0puf~StOR{N;$3sNuDkYK+t`Pv!#B?(ef@YVIUY0MdN@DPN}4e%
zf(Ii-C+P}_aK88Ot~R%yTsr59-vCo*^W{}o>M=s&k1cA8oiS&O-e{du6Wq_7;y4Y8
z=61ZE$%y~Ypi910&payv+<$`)q(zV64;-lQm^?X7Cr!MBFNQ>5Bck9TIm%K

[PATCH v9 40/45] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl

2022-04-04 Thread Jonathan Cameron via
Code based on i386/pc enablement.
The memory layout places space for 16 host bridge register regions after
the GIC_REDIST2 in the extended memmap.
The CFMWs are placed above the extended memmap.

Only create the CEDT table if cxl=on set for the machine.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
---
 hw/arm/virt-acpi-build.c | 33 +
 hw/arm/virt.c| 40 +++-
 include/hw/arm/virt.h|  1 +
 3 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 449fab0080..86a2f40437 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -39,9 +39,11 @@
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/utils.h"
 #include "hw/acpi/pci.h"
+#include "hw/acpi/cxl.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/tpm.h"
+#include "hw/cxl/cxl.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
@@ -157,10 +159,29 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 }
 
+/* Uses local definition of AcpiBuildState so can't easily be common code */
+static void build_acpi0017(Aml *table)
+{
+Aml *dev, *scope, *method;
+
+scope =  aml_scope("_SB");
+dev = aml_device("CXLM");
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0017")));
+
+method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(0x01)));
+aml_append(dev, method);
+
+aml_append(scope, dev);
+aml_append(table, scope);
+}
+
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
   uint32_t irq, VirtMachineState *vms)
 {
 int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
+bool cxl_present = false;
+PCIBus *bus = vms->bus;
 struct GPEXConfig cfg = {
 .mmio32 = memmap[VIRT_PCIE_MMIO],
 .pio= memmap[VIRT_PCIE_PIO],
@@ -174,6 +195,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
MemMapEntry *memmap,
 }
 
 acpi_dsdt_add_gpex(scope, &cfg);
+QLIST_FOREACH(bus, &vms->bus->child, sibling) {
+if (pci_bus_is_cxl(bus)) {
+cxl_present = true;
+}
+}
+if (cxl_present) {
+build_acpi0017(scope);
+}
 }
 
 static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
@@ -991,6 +1020,10 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
vms->oem_table_id);
 }
 }
+if (ms->cxl_devices_state->is_enabled) {
+cxl_build_cedt(ms, table_offsets, tables_blob, tables->linker,
+   vms->oem_id, vms->oem_table_id);
+}
 
 if (ms->nvdimms_state->is_enabled) {
 nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9969645c0b..9f81b166c0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -78,6 +78,7 @@
 #include "hw/virtio/virtio-mem-pci.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/guest-random.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
@@ -178,6 +179,7 @@ static const MemMapEntry base_memmap[] = {
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
 [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
+[VIRT_CXL_HOST] =   { 0x0, 64 * KiB * 16 }, /* 16 UID */
 [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
 /* Second PCIe window */
 [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },
@@ -1508,6 +1510,17 @@ static void create_pcie(VirtMachineState *vms)
 }
 }
 
+static void create_cxl_host_reg_region(VirtMachineState *vms)
+{
+MemoryRegion *sysmem = get_system_memory();
+MachineState *ms = MACHINE(vms);
+MemoryRegion *mr = &ms->cxl_devices_state->host_mr;
+
+memory_region_init(mr, OBJECT(ms), "cxl_host_reg",
+   vms->memmap[VIRT_CXL_HOST].size);
+memory_region_add_subregion(sysmem, vms->memmap[VIRT_CXL_HOST].base, mr);
+}
+
 static void create_platform_bus(VirtMachineState *vms)
 {
 DeviceState *dev;
@@ -1670,7 +1683,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
-hwaddr base, device_memory_base, device_memory_size, memtop;
+hwaddr base, device_memory_base, device_memory_size, memtop, cxl_fmw_base;
 int i;
 
 vms->memmap = extended_memmap;
@@ -1762,6 +1775,20 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 memory_region_init(&ms->device_memory->mr, OBJECT(vms),
"device-memory", device_memory_size);
 }
+
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+
+cxl_fmw_base = ROUND_UP(base, 256 * MiB);
+f

[PATCH v9 39/45] qtest/cxl: Add more complex test cases with CFMWs

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Add CXL Fixed Memory Windows to the CXL tests.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
---
 tests/qtest/cxl-test.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
index 5f0794e816..079011af6a 100644
--- a/tests/qtest/cxl-test.c
+++ b/tests/qtest/cxl-test.c
@@ -9,11 +9,13 @@
 #include "libqtest-single.h"
 
 #define QEMU_PXB_CMD "-machine q35,cxl=on " \
- "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+ "-cxl-fixed-memory-window targets.0=cxl.0,size=4G "
 
-#define QEMU_2PXB_CMD "-machine q35,cxl=on " \
+#define QEMU_2PXB_CMD "-machine q35,cxl=on "\
   "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
-  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 "
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
+  "-cxl-fixed-memory-window 
targets.0=cxl.0,targets.1=cxl.1,size=4G "
 
 #define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
 
-- 
2.32.0




[PATCH v9 42/45] docs/cxl: Add initial Compute eXpress Link (CXL) documentation.

2022-04-04 Thread Jonathan Cameron via
Provide an introduction to the main components of a CXL system,
with detailed explanation of memory interleaving, example command
lines and kernel configuration.

This was a challenging document to write due to the need to extract
only that subset of CXL information which is relevant to either
users of QEMU emulation of CXL or to those interested in the
implementation.  Much of CXL is concerned with specific elements of
the protocol, management of memory pooling etc which is simply
not relevant to what is currently planned for CXL emulation
in QEMU.  All comments welcome

Signed-off-by: Jonathan Cameron 
---
 docs/system/device-emulation.rst |   1 +
 docs/system/devices/cxl.rst  | 302 +++
 2 files changed, 303 insertions(+)

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 0b3a3d73ad..2da2bd5d64 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -83,6 +83,7 @@ Emulated Devices
:maxdepth: 1
 
devices/can.rst
+   devices/cxl.rst
devices/ivshmem.rst
devices/net.rst
devices/nvme.rst
diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
new file mode 100644
index 00..9293cbf01a
--- /dev/null
+++ b/docs/system/devices/cxl.rst
@@ -0,0 +1,302 @@
+Compute Express Link (CXL)
+==
+From the view of a single host, CXL is an interconnect standard that
+targets accelerators and memory devices attached to a CXL host.
+This description will focus on those aspects visible either to
+software running on a QEMU emulated host or to the internals of
+functional emulation. As such, it will skip over many of the
+electrical and protocol elements that would be more of interest
+for real hardware and will dominate more general introductions to CXL.
+It will also completely ignore the fabric management aspects of CXL
+by considering only a single host and a static configuration.
+
+CXL shares many concepts and much of the infrastructure of PCI Express,
+with CXL Host Bridges, which have CXL Root Ports which may be directly
+attached to CXL or PCI End Points. Alternatively there may be CXL Switches
+with CXL and PCI Endpoints attached below them.  In many cases additional
+control and capabilities are exposed via PCI Express interfaces.
+This sharing of interfaces and hence emulation code is is reflected
+in how the devices are emulated in QEMU. In most cases the various
+CXL elements are built upon an equivalent PCIe devices.
+
+CXL devices support the following interfaces:
+
+* Most conventional PCIe interfaces
+
+  - Configuration space access
+  - BAR mapped memory accesses used for registers and mailboxes.
+  - MSI/MSI-X
+  - AER
+  - DOE mailboxes
+  - IDE
+  - Many other PCI express defined interfaces..
+
+* Memory operations
+
+  - Equivalent of accessing DRAM / NVDIMMs. Any access / feature
+supported by the host for normal memory should also work for
+CXL attached memory devices.
+
+* Cache operations. The are mostly irrelevant to QEMU emulation as
+  QEMU is not emulating a coherency protocol. Any emulation related
+  to these will be device specific and is out of the scope of this
+  document.
+
+CXL 2.0 Device Types
+
+CXL 2.0 End Points are often categorized into three types.
+
+**Type 1:** These support coherent caching of host memory.  Example might
+be a crypto accelerators.  May also have device private memory accessible
+via means such as PCI memory reads and writes to BARs.
+
+**Type 2:** These support coherent caching of host memory and host
+managed device memory (HDM) for which the coherency protocol is managed
+by the host. This is a complex topic, so for more information on CXL
+coherency see the CXL 2.0 specification.
+
+**Type 3 Memory devices:**  These devices act as a means of attaching
+additional memory (HDM) to a CXL host including both volatile and
+persistent memory. The CXL topology may support interleaving across a
+number of Type 3 memory devices using HDM Decoders in the host, host
+bridge, switch upstream port and endpoints.
+
+Scope of CXL emulation in QEMU
+--
+The focus of CXL emulation is CXL revision 2.0 and later. Earlier CXL
+revisions defined a smaller set of features, leaving much of the control
+interface as implementation defined or device specific, making generic
+emulation challenging with host specific firmware being responsible
+for setup and the Endpoints being presented to operating systems
+as Root Complex Integrated End Points. CXL rev 2.0 looks a lot
+more like PCI Express, with fully specified discoverability
+of the CXL topology.
+
+CXL System components
+--
+A CXL system is made up a Host with a number of 'standard components'
+the control and capabilities of which are discoverable by system software
+using means described in the CXL 2.0 specification.
+
+CXL Fixed Memory Windows (CFMW)
+~~~
+A CFM

[PATCH v9 21/45] hw/cxl/device: Implement get/set Label Storage Area (LSA)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

Implement get and set handlers for the Label Storage Area
used to hold data describing persistent memory configuration
so that it can be ensured it is seen in the same configuration
after reboot.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  | 60 +
 hw/mem/cxl_type3.c  | 56 +-
 include/hw/cxl/cxl_device.h |  5 
 3 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 492739aef3..bb66c765a5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -57,6 +57,8 @@ enum {
 #define MEMORY_DEVICE 0x0
 CCLS= 0x41,
 #define GET_PARTITION_INFO 0x0
+#define GET_LSA   0x2
+#define SET_LSA   0x3
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -326,7 +328,62 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint32_t offset;
+uint32_t length;
+} QEMU_PACKED *get_lsa;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+uint32_t offset, length;
+
+get_lsa = (void *)cmd->payload;
+offset = get_lsa->offset;
+length = get_lsa->length;
+
+if (offset + length > cvc->get_lsa_size(ct3d)) {
+*len = 0;
+return CXL_MBOX_INVALID_INPUT;
+}
+
+*len = cvc->get_lsa(ct3d, get_lsa, length, offset);
+return CXL_MBOX_SUCCESS;
+}
+
+static ret_code cmd_ccls_set_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct set_lsa_pl {
+uint32_t offset;
+uint32_t rsvd;
+uint8_t data[];
+} QEMU_PACKED;
+struct set_lsa_pl *set_lsa_payload = (void *)cmd->payload;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
+const size_t hdr_len = offsetof(struct set_lsa_pl, data);
+uint16_t plen = *len;
+
+*len = 0;
+if (!plen) {
+return CXL_MBOX_SUCCESS;
+}
+
+if (set_lsa_payload->offset + plen > cvc->get_lsa_size(ct3d) + hdr_len) {
+return CXL_MBOX_INVALID_INPUT;
+}
+plen -= hdr_len;
+
+cvc->set_lsa(ct3d, set_lsa_payload->data, plen, set_lsa_payload->offset);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
@@ -349,6 +406,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_identify_memory_device, 0, 0 },
 [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
 cmd_ccls_get_partition_info, 0, 0 },
+[CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
+[CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
+~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 14d8b0c503..9578e72576 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/pmem.h"
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
@@ -111,6 +112,11 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 host_memory_backend_set_mapped(ct3d->hostmem, true);
 ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
 
+if (!ct3d->lsa) {
+error_setg(errp, "lsa property must be set");
+return false;
+}
+
 return true;
 }
 
@@ -173,12 +179,58 @@ static void ct3d_reset(DeviceState *dev)
 static Property ct3_props[] = {
 DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
+ HostMemoryBackend *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-return 0;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+return memory_region_size(mr);
+}
+
+static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
+uint64_t offset)
+{
+assert(offset + size <= memory_region_size(mr));
+assert(offset + size > offset);
+}
+
+static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+validate_lsa_acc

[PATCH v9 45/45] docs/cxl: Add switch documentation

2022-04-04 Thread Jonathan Cameron via
Switches were already introduced, but now we support them update
the documentation to provide an example in diagram and
qemu command line parameter forms.

Signed-off-by: Jonathan Cameron 
---
 docs/system/devices/cxl.rst | 88 -
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 9293cbf01a..abf7c1f243 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -118,8 +118,6 @@ and associated component register access via PCI bars.
 
 CXL Switch
 ~~
-Not yet implemented in QEMU.
-
 Here we consider a simple CXL switch with only a single
 virtual hierarchy. Whilst more complex devices exist, their
 visibility to a particular host is generally the same as for
@@ -137,6 +135,10 @@ BARs.  The Upstream Port has the configuration interfaces 
for
 the HDM decoders which route incoming memory accesses to the
 appropriate downstream port.
 
+A CXL switch is created in a similar fashion to PCI switches
+by creating an upstream port (cxl-upstream) and a number of
+downstream ports on the internal switch bus (cxl-downstream).
+
 CXL Memory Devices - Type 3
 ~~~
 CXL type 3 devices use a PCI class code and are intended to be supported
@@ -240,6 +242,62 @@ Notes:
 they will take the Host Physical Addresses of accesses and map
 them to their own local Device Physical Address Space (DPA).
 
+Example topology involving a switch::
+
+  |<--SYSTEM PHYSICAL ADDRESS MAP (1)->|
+  |__   __   __|
+  |   |  | |  | |  |   |
+  |   | CFMW 0   | |  CXL Fixed Memory Window 1   | | CFMW 1   |   |
+  |   | HB0 only | |  Configured to interleave memory | | HB1 only |   |
+  |   |  | |  memory accesses across HB0/HB1  | |  |   |
+  |   |x_| |__| |__|   |
+   | | | |
+   | | | |
+   | | |
+  Interleave Decoder | | |
+   Matches this HB   | | |
+   \_| |_/
+   __|__  _|___
+  | || |
+  | CXL HB 0|| CXL HB 1|
+  | HB IntLv Decoders   || HB IntLv Decoders   |
+  | PCI/CXL Root Bus 0c || PCI/CXL Root Bus 0d |
+  | || |
+  |___x_||_|
+  |  |  |   |
+  |
+   A HB 0 HDM Decoder
+   matches this Port
+   ___|___
+  |  Root Port 0  |
+  |  Appears in   |
+  |  PCI topology |
+  |  As 0c:00.0   |
+  |___x___|
+  |
+  |
+  \_
+|
+|
+---
+   |Switch 0  USP as PCI 0d:00.0   |
+   |USP has HDM decoder which direct traffic to|
+   |appropiate downstream port |
+   |Switch BUS appears as 0e   |
+   |x__|
+|  |   |  |
+|  |   |  |
+   _|_   __|__   __|_   __|___
+   (4)| x | | | || |  |
+  | CXL Type3 0   | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
+  |   | | | || |  |
+  | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...)  |
+  | Decoder to go | | | || |  |
+  | from host PA  | | PCI 10:00.0 | | PCI 11:00.0| | PCI 12:00.0  |
+  | to device PA  | | | || |  |
+  | PCI as 0f:00.0| | | || |  |
+  |___| |_| || |__|
+
 Example command lines
 -
 A very simple setup with just one directly attached CXL Type 3 device::
@@ -279,6 +337,32 @@ the CXL Type3 device directly attached (no switches).::
   -device cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \
   -cxl-fixed-memory-window 
targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k
 
+An example of 4 devices below a switch suitable for 1, 2

[PATCH for-7.1 02/18] hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE

2022-04-04 Thread Peter Maydell
Now we have removed the only use of TYPE_EXYNOS4210_IRQ_GATE we can
delete the device entirely.

Signed-off-by: Peter Maydell 
---
 hw/intc/exynos4210_gic.c | 107 ---
 1 file changed, 107 deletions(-)

diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index bc73d1f1152..794f6b5ac72 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -373,110 +373,3 @@ static void exynos4210_gic_register_types(void)
 }
 
 type_init(exynos4210_gic_register_types)
-
-/* IRQ OR Gate struct.
- *
- * This device models an OR gate. There are n_in input qdev gpio lines and one
- * output sysbus IRQ line. The output IRQ level is formed as OR between all
- * gpio inputs.
- */
-
-#define TYPE_EXYNOS4210_IRQ_GATE "exynos4210.irq_gate"
-OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210IRQGateState, EXYNOS4210_IRQ_GATE)
-
-struct Exynos4210IRQGateState {
-SysBusDevice parent_obj;
-
-uint32_t n_in;  /* inputs amount */
-uint32_t *level;/* input levels */
-qemu_irq out;   /* output IRQ */
-};
-
-static Property exynos4210_irq_gate_properties[] = {
-DEFINE_PROP_UINT32("n_in", Exynos4210IRQGateState, n_in, 1),
-DEFINE_PROP_END_OF_LIST(),
-};
-
-static const VMStateDescription vmstate_exynos4210_irq_gate = {
-.name = "exynos4210.irq_gate",
-.version_id = 2,
-.minimum_version_id = 2,
-.fields = (VMStateField[]) {
-VMSTATE_VBUFFER_UINT32(level, Exynos4210IRQGateState, 1, NULL, n_in),
-VMSTATE_END_OF_LIST()
-}
-};
-
-/* Process a change in IRQ input. */
-static void exynos4210_irq_gate_handler(void *opaque, int irq, int level)
-{
-Exynos4210IRQGateState *s = (Exynos4210IRQGateState *)opaque;
-uint32_t i;
-
-assert(irq < s->n_in);
-
-s->level[irq] = level;
-
-for (i = 0; i < s->n_in; i++) {
-if (s->level[i] >= 1) {
-qemu_irq_raise(s->out);
-return;
-}
-}
-
-qemu_irq_lower(s->out);
-}
-
-static void exynos4210_irq_gate_reset(DeviceState *d)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(d);
-
-memset(s->level, 0, s->n_in * sizeof(*s->level));
-}
-
-/*
- * IRQ Gate initialization.
- */
-static void exynos4210_irq_gate_init(Object *obj)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(obj);
-SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
-
-sysbus_init_irq(sbd, &s->out);
-}
-
-static void exynos4210_irq_gate_realize(DeviceState *dev, Error **errp)
-{
-Exynos4210IRQGateState *s = EXYNOS4210_IRQ_GATE(dev);
-
-/* Allocate general purpose input signals and connect a handler to each of
- * them */
-qdev_init_gpio_in(dev, exynos4210_irq_gate_handler, s->n_in);
-
-s->level = g_malloc0(s->n_in * sizeof(*s->level));
-}
-
-static void exynos4210_irq_gate_class_init(ObjectClass *klass, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(klass);
-
-dc->reset = exynos4210_irq_gate_reset;
-dc->vmsd = &vmstate_exynos4210_irq_gate;
-device_class_set_props(dc, exynos4210_irq_gate_properties);
-dc->realize = exynos4210_irq_gate_realize;
-}
-
-static const TypeInfo exynos4210_irq_gate_info = {
-.name  = TYPE_EXYNOS4210_IRQ_GATE,
-.parent= TYPE_SYS_BUS_DEVICE,
-.instance_size = sizeof(Exynos4210IRQGateState),
-.instance_init = exynos4210_irq_gate_init,
-.class_init= exynos4210_irq_gate_class_init,
-};
-
-static void exynos4210_irq_gate_register_types(void)
-{
-type_register_static(&exynos4210_irq_gate_info);
-}
-
-type_init(exynos4210_irq_gate_register_types)
-- 
2.25.1




[PATCH v9 44/45] pci-bridge/cxl_downstream: Add a CXL switch downstream port

2022-04-04 Thread Jonathan Cameron via
Emulation of a simple CXL Switch downstream port.
The Device ID has been allocated for this use.

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host.c  |  43 +-
 hw/pci-bridge/cxl_downstream.c | 244 +
 hw/pci-bridge/meson.build  |   2 +-
 3 files changed, 286 insertions(+), 3 deletions(-)

diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index 469b3c4ced..317f5a37ca 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -130,8 +130,9 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr 
addr,
 
 static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
 {
-CXLComponentState *hb_cstate;
+CXLComponentState *hb_cstate, *usp_cstate;
 PCIHostState *hb;
+CXLUpstreamPort *usp;
 int rb_index;
 uint32_t *cache_mem;
 uint8_t target;
@@ -165,8 +166,46 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow 
*fw, hwaddr addr)
 }
 
 d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
+if (!d) {
+return NULL;
+}
+
+if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+return d;
+}
+
+/*
+ * Could also be a switch.  Note only one level of switching currently
+ * supported.
+ */
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_USP)) {
+return NULL;
+}
+usp = CXL_USP(d);
+
+usp_cstate = cxl_usp_to_cstate(usp);
+if (!usp_cstate) {
+return NULL;
+}
+
+cache_mem = usp_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, &target);
+if (!target_found) {
+return NULL;
+}
+
+d = pcie_find_port_by_pn(&PCI_BRIDGE(d)->sec_bus, target);
+if (!d) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(d))->devices[0];
+if (!d) {
+return NULL;
+}
 
-if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
 return NULL;
 }
 
diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c
new file mode 100644
index 00..641593203e
--- /dev/null
+++ b/hw/pci-bridge/cxl_downstream.c
@@ -0,0 +1,244 @@
+/*
+ * Emulated CXL Switch Downstream Port
+ *
+ * Copyright (c) 2022 Huawei Technologies.
+ *
+ * Based on xio31130_downstream.c
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "hw/pci/pcie_port.h"
+#include "qapi/error.h"
+
+typedef struct CXLDownStreamPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+/*< public >*/
+CXLComponentState cxl_cstate;
+} CXLDownstreamPort;
+
+#define TYPE_CXL_DSP "cxl-downstream"
+DECLARE_INSTANCE_CHECKER(CXLDownstreamPort, CXL_DSP, TYPE_CXL_DSP)
+
+#define CXL_DOWNSTREAM_PORT_MSI_OFFSET 0x70
+#define CXL_DOWNSTREAM_PORT_MSI_NR_VECTOR 1
+#define CXL_DOWNSTREAM_PORT_EXP_OFFSET 0x90
+#define CXL_DOWNSTREAM_PORT_AER_OFFSET 0x100
+#define CXL_DOWNSTREAM_PORT_DVSEC_OFFSET\
+(CXL_DOWNSTREAM_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+
+static void latch_registers(CXLDownstreamPort *dsp)
+{
+uint32_t *reg_state = dsp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_DOWNSTREAM_PORT);
+}
+
+/* TODO: Look at sharing this code acorss all CXL port types */
+static void cxl_dsp_dvsec_write_config(PCIDevice *dev, uint32_t addr,
+  uint32_t val, int len)
+{
+CXLDownstreamPort *dsp = CXL_DSP(dev);
+CXLComponentState *cxl_cstate = &dsp->cxl_cstate;
+
+if (range_contains(&cxl_cstate->dvsecs[EXTENSIONS_PORT_DVSEC], addr)) {
+uint8_t *reg = &dev->config[addr];
+addr -= cxl_cstate->dvsecs[EXTENSIONS_PORT_DVSEC].lob;
+if (addr == PORT_CONTROL_OFFSET) {
+if (pci_get_word(reg) & PORT_CONTROL_UNMASK_SBR) {
+/* unmask SBR */
+qemu_log_mask(LOG_UNIMP, "SBR mask control is not 
supported\n");
+}
+if (pci_get_word(reg) & PORT_CONTROL_ALT_MEMID_EN) {
+/* Alt Memory & ID Space Enable */
+qemu_log_mask(LOG_UNIMP,
+  "Alt Memory & ID space is not supported\n");
+
+}
+}
+}
+}
+
+static void cxl_dsp_config_write(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+uint16_t slt_ctl, slt_sta;
+
+pcie_cap_slot_get(d, &slt_ctl, &slt_sta);
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, slt_ctl, slt_sta, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+
+cxl_dsp_dvsec_write_config(d, address, val, len);
+}
+
+static void cxl_dsp_reset(DeviceState *qdev)
+{
+PCIDevice *d = PCI_DEVICE(qdev);
+CXLDownstreamPort *dsp = CXL_DSP(qdev);
+
+pcie_cap_deverr_reset(d);
+pc

[PATCH v9 24/45] acpi/cxl: Add _OSC implementation (9.14.2)

2022-04-04 Thread Jonathan Cameron via
From: Ben Widawsky 

CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

iasl -d decodes the result of this patch as:

Name (SUPP, Zero)
Name (CTRL, Zero)
Name (SUPC, Zero)
Name (CTRC, Zero)
Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
{
CreateDWordField (Arg3, Zero, CDW1)
If (((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") /* PCI Host 
Bridge Device */) || (Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") 
/* Unknown UUID */)))
{
CreateDWordField (Arg3, 0x04, CDW2)
CreateDWordField (Arg3, 0x08, CDW3)
Local0 = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
Local0 &= 0x1F
If ((Arg1 != One))
{
CDW1 |= 0x08
}

If ((CDW3 != Local0))
{
CDW1 |= 0x10
}

SUPP = CDW2 /* \_SB_.PC0C._OSC.CDW2 */
CTRL = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
CDW3 = Local0
If ((Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") /* Unknown 
UUID */))
{
CreateDWordField (Arg3, 0x0C, CDW4)
CreateDWordField (Arg3, 0x10, CDW5)
SUPC = CDW4 /* \_SB_.PC0C._OSC.CDW4 */
CTRC = CDW5 /* \_SB_.PC0C._OSC.CDW5 */
CDW5 |= One
}

Return (Arg3)
}
Else
{
CDW1 |= 0x04
Return (Arg3)
}

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/acpi/Kconfig   |   5 ++
 hw/acpi/cxl-stub.c|  12 
 hw/acpi/cxl.c | 130 ++
 hw/acpi/meson.build   |   4 +-
 hw/i386/acpi-build.c  |  15 +++--
 include/hw/acpi/cxl.h |  23 
 6 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 19caebde6c..3703aca212 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -5,6 +5,7 @@ config ACPI_X86
 bool
 select ACPI
 select ACPI_NVDIMM
+select ACPI_CXL
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
 select ACPI_HMAT
@@ -66,3 +67,7 @@ config ACPI_ERST
 bool
 default y
 depends on ACPI && PCI
+
+config ACPI_CXL
+bool
+depends on ACPI
diff --git a/hw/acpi/cxl-stub.c b/hw/acpi/cxl-stub.c
new file mode 100644
index 00..15bc21076b
--- /dev/null
+++ b/hw/acpi/cxl-stub.c
@@ -0,0 +1,12 @@
+
+/*
+ * Stubs for ACPI platforms that don't support CXl
+ */
+#include "qemu/osdep.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/cxl.h"
+
+void build_cxl_osc_method(Aml *dev)
+{
+g_assert_not_reached();
+}
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
new file mode 100644
index 00..ca1f04f359
--- /dev/null
+++ b/hw/acpi/cxl.c
@@ -0,0 +1,130 @@
+/*
+ * CXL ACPI Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cxl/cxl.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/cxl.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+
+static Aml *__build_cxl_osc_method(void)
+{
+Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
+Aml *a_ctrl = aml_local(0);
+Aml *a_cdw1 = aml_name("CDW1");
+
+method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
+/* CDW1 is used for the return value so is present whether or not a match 
occurs */
+aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
+
+/*
+ * Generate shared section between:
+ * CXL 2.0 - 9.14.2.1.4 and
+ * PCI Firmware Specification 3.0
+ * 4.5.1. _OSC Interface for PCI Host Bridge Devices
+ * The _OSC interface for a PCI/PCI-X/PCI Express hierarchy is
+ * identified by the Universal Unique IDentifier (UUID)
+ * 33DB4D5B-1FF7-401C-9657-7441C03DD766
+ * The _OSC interface for a CXL Host bridge is
+ * identified by the UUID 68F2D50B-C469-4D8A-BD3D-941A103FD3FC
+ * A CXL Host bridge is compatible with a PCI host bridge so
+ * for the shared section match both.
+ */
+if_uuid = aml_if(
+aml_lor(aml_equal(aml_arg(0),
+  aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")),
+aml_equal(a

Re: [PATCH] [PATCH RFC v2] Implements Backend Program conventions for vhost-user-scsi

2022-04-04 Thread Stefan Hajnoczi
On Mon, 4 Apr 2022 at 15:51, Sakshi Kaushik  wrote:
> I am not able to find vhost-user-scsi inside build/contrib/vhost-user-scsi 
> despite running the 'make' command.

It is probably not being built because the dependencies are not
installed on your machine. Here are the contents of the
contrib/vhost-user-scsi/meson.build file:

  if libiscsi.found()
executable('vhost-user-scsi', files('vhost-user-scsi.c'),
   dependencies: [qemuutil, libiscsi, vhost_user],
   build_by_default: targetos == 'linux',
   install: false)
  endif

The build machine must be a Linux machine and it must have the
libiscsi-dev (Debian/Ubuntu), libiscsi-devel (Fedora/CentOS/RHEL), or
similarly-named package installed. You can run QEMU's ./configure
script and look at the output to see if it detected libiscsi.

Stefan



[PATCH for-7.1 01/18] hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device

2022-04-04 Thread Peter Maydell
The Exynos4210 SoC device currently uses a custom device
"exynos4210.irq_gate" to model the OR gate that feeds each CPU's IRQ
line.  We have a standard TYPE_OR_IRQ device for this now, so use
that instead.

(This is a migration compatibility break, but that is OK for this
machine type.)

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  1 +
 hw/arm/exynos4210.c | 31 ---
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 60b9e126f55..3999034053e 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -102,6 +102,7 @@ struct Exynos4210State {
 MemoryRegion bootreg_mem;
 I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
 qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
+qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 0299e81f853..dfc0a4eec25 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -205,7 +205,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 {
 Exynos4210State *s = EXYNOS4210_SOC(socdev);
 MemoryRegion *system_mem = get_system_memory();
-qemu_irq gate_irq[EXYNOS4210_NCPUS][EXYNOS4210_IRQ_GATE_NINPUTS];
 SysBusDevice *busdev;
 DeviceState *dev, *uart[4], *pl330[3];
 int i, n;
@@ -235,18 +234,13 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 
 /* IRQ Gate */
 for (i = 0; i < EXYNOS4210_NCPUS; i++) {
-dev = qdev_new("exynos4210.irq_gate");
-qdev_prop_set_uint32(dev, "n_in", EXYNOS4210_IRQ_GATE_NINPUTS);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
-/* Get IRQ Gate input in gate_irq */
-for (n = 0; n < EXYNOS4210_IRQ_GATE_NINPUTS; n++) {
-gate_irq[i][n] = qdev_get_gpio_in(dev, n);
-}
-busdev = SYS_BUS_DEVICE(dev);
-
-/* Connect IRQ Gate output to CPU's IRQ line */
-sysbus_connect_irq(busdev, 0,
-   qdev_get_gpio_in(DEVICE(s->cpu[i]), ARM_CPU_IRQ));
+DeviceState *orgate = DEVICE(&s->cpu_irq_orgate[i]);
+object_property_set_int(OBJECT(orgate), "num-lines",
+EXYNOS4210_IRQ_GATE_NINPUTS,
+&error_abort);
+qdev_realize(orgate, NULL, &error_abort);
+qdev_connect_gpio_out(orgate, 0,
+  qdev_get_gpio_in(DEVICE(s->cpu[i]), 
ARM_CPU_IRQ));
 }
 
 /* Private memory region and Internal GIC */
@@ -256,7 +250,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_realize_and_unref(busdev, &error_fatal);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
-sysbus_connect_irq(busdev, n, gate_irq[n][0]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
 }
 for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
 s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -275,7 +270,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 /* Map Distributer interface */
 sysbus_mmio_map(busdev, 1, EXYNOS4210_EXT_GIC_DIST_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
-sysbus_connect_irq(busdev, n, gate_irq[n][1]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 1));
 }
 for (n = 0; n < EXYNOS4210_EXT_GIC_NIRQ; n++) {
 s->irqs.ext_gic_irq[n] = qdev_get_gpio_in(dev, n);
@@ -488,6 +484,11 @@ static void exynos4210_init(Object *obj)
 object_initialize_child(obj, name, orgate, TYPE_OR_IRQ);
 g_free(name);
 }
+
+for (i = 0; i < ARRAY_SIZE(s->cpu_irq_orgate); i++) {
+g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
+object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
+}
 }
 
 static void exynos4210_class_init(ObjectClass *klass, void *data)
-- 
2.25.1




[PATCH for-7.1 07/18] hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c

2022-04-04 Thread Peter Maydell
The function exynos4210_init_board_irqs() currently lives in
exynos4210_gic.c, but it isn't really part of the exynos4210.gic
device -- it is a function that implements (some of) the wiring up of
interrupts between the SoC's GIC and combiner components.  This means
it fits better in exynos4210.c, which is the SoC-level code.  Move it
there. Similarly, exynos4210_git_irq() is used almost only in the
SoC-level code, so move it too.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |   4 -
 hw/arm/exynos4210.c | 202 +++
 hw/intc/exynos4210_gic.c| 204 
 3 files changed, 202 insertions(+), 208 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index a9f186370ee..d83e96a091e 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -111,10 +111,6 @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
 void exynos4210_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info);
 
-/* Initialize board IRQs.
- * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
-void exynos4210_init_board_irqs(Exynos4210State *s);
-
 /* Get IRQ number from exynos4210 IRQ subsystem stub.
  * To identify IRQ source use internal combiner group and bit number
  *  grp - group number
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 11e321d7830..742666ba779 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -101,6 +101,208 @@
 #define EXYNOS4210_PL330_BASE1_ADDR 0x1269
 #define EXYNOS4210_PL330_BASE2_ADDR 0x1285
 
+enum ExtGicId {
+EXT_GIC_ID_MDMA_LCD0 = 66,
+EXT_GIC_ID_PDMA0,
+EXT_GIC_ID_PDMA1,
+EXT_GIC_ID_TIMER0,
+EXT_GIC_ID_TIMER1,
+EXT_GIC_ID_TIMER2,
+EXT_GIC_ID_TIMER3,
+EXT_GIC_ID_TIMER4,
+EXT_GIC_ID_MCT_L0,
+EXT_GIC_ID_WDT,
+EXT_GIC_ID_RTC_ALARM,
+EXT_GIC_ID_RTC_TIC,
+EXT_GIC_ID_GPIO_XB,
+EXT_GIC_ID_GPIO_XA,
+EXT_GIC_ID_MCT_L1,
+EXT_GIC_ID_IEM_APC,
+EXT_GIC_ID_IEM_IEC,
+EXT_GIC_ID_NFC,
+EXT_GIC_ID_UART0,
+EXT_GIC_ID_UART1,
+EXT_GIC_ID_UART2,
+EXT_GIC_ID_UART3,
+EXT_GIC_ID_UART4,
+EXT_GIC_ID_MCT_G0,
+EXT_GIC_ID_I2C0,
+EXT_GIC_ID_I2C1,
+EXT_GIC_ID_I2C2,
+EXT_GIC_ID_I2C3,
+EXT_GIC_ID_I2C4,
+EXT_GIC_ID_I2C5,
+EXT_GIC_ID_I2C6,
+EXT_GIC_ID_I2C7,
+EXT_GIC_ID_SPI0,
+EXT_GIC_ID_SPI1,
+EXT_GIC_ID_SPI2,
+EXT_GIC_ID_MCT_G1,
+EXT_GIC_ID_USB_HOST,
+EXT_GIC_ID_USB_DEVICE,
+EXT_GIC_ID_MODEMIF,
+EXT_GIC_ID_HSMMC0,
+EXT_GIC_ID_HSMMC1,
+EXT_GIC_ID_HSMMC2,
+EXT_GIC_ID_HSMMC3,
+EXT_GIC_ID_SDMMC,
+EXT_GIC_ID_MIPI_CSI_4LANE,
+EXT_GIC_ID_MIPI_DSI_4LANE,
+EXT_GIC_ID_MIPI_CSI_2LANE,
+EXT_GIC_ID_MIPI_DSI_2LANE,
+EXT_GIC_ID_ONENAND_AUDI,
+EXT_GIC_ID_ROTATOR,
+EXT_GIC_ID_FIMC0,
+EXT_GIC_ID_FIMC1,
+EXT_GIC_ID_FIMC2,
+EXT_GIC_ID_FIMC3,
+EXT_GIC_ID_JPEG,
+EXT_GIC_ID_2D,
+EXT_GIC_ID_PCIe,
+EXT_GIC_ID_MIXER,
+EXT_GIC_ID_HDMI,
+EXT_GIC_ID_HDMI_I2C,
+EXT_GIC_ID_MFC,
+EXT_GIC_ID_TVENC,
+};
+
+enum ExtInt {
+EXT_GIC_ID_EXTINT0 = 48,
+EXT_GIC_ID_EXTINT1,
+EXT_GIC_ID_EXTINT2,
+EXT_GIC_ID_EXTINT3,
+EXT_GIC_ID_EXTINT4,
+EXT_GIC_ID_EXTINT5,
+EXT_GIC_ID_EXTINT6,
+EXT_GIC_ID_EXTINT7,
+EXT_GIC_ID_EXTINT8,
+EXT_GIC_ID_EXTINT9,
+EXT_GIC_ID_EXTINT10,
+EXT_GIC_ID_EXTINT11,
+EXT_GIC_ID_EXTINT12,
+EXT_GIC_ID_EXTINT13,
+EXT_GIC_ID_EXTINT14,
+EXT_GIC_ID_EXTINT15
+};
+
+/*
+ * External GIC sources which are not from External Interrupt Combiner or
+ * External Interrupts are starting from EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ,
+ * which is INTG16 in Internal Interrupt Combiner.
+ */
+
+static const uint32_t
+combiner_grp_to_gic_id[64 - EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
+/* int combiner groups 16-19 */
+{ }, { }, { }, { },
+/* int combiner group 20 */
+{ 0, EXT_GIC_ID_MDMA_LCD0 },
+/* int combiner group 21 */
+{ EXT_GIC_ID_PDMA0, EXT_GIC_ID_PDMA1 },
+/* int combiner group 22 */
+{ EXT_GIC_ID_TIMER0, EXT_GIC_ID_TIMER1, EXT_GIC_ID_TIMER2,
+EXT_GIC_ID_TIMER3, EXT_GIC_ID_TIMER4 },
+/* int combiner group 23 */
+{ EXT_GIC_ID_RTC_ALARM, EXT_GIC_ID_RTC_TIC },
+/* int combiner group 24 */
+{ EXT_GIC_ID_GPIO_XB, EXT_GIC_ID_GPIO_XA },
+/* int combiner group 25 */
+{ EXT_GIC_ID_IEM_APC, EXT_GIC_ID_IEM_IEC },
+/* int combiner group 26 */
+{ EXT_GIC_ID_UART0, EXT_GIC_ID_UART1, EXT_GIC_ID_UART2, EXT_GIC_ID_UART3,
+EXT_GIC_ID_UART4 },
+/* int combiner group 27 */
+{ EXT_GIC_ID_I2C0, EXT_GIC_ID_I2C1, EXT_GIC_ID_I2C2, EXT_GIC_ID_I2C3,
+EXT_GIC_ID_I2C4, EXT_GIC_ID_I2C5, EXT_GIC_ID_I2C6,
+EXT_GIC_ID_I2C7 },
+/* int combiner group 28 */
+{ EXT_GIC_ID_SPI0, EXT_GIC_ID_SPI1, EXT_GIC_ID_SPI2 , EXT_GIC_ID_USB_HOST},
+/* int combiner 

[PATCH v9 27/45] hw/cxl/host: Add support for CXL Fixed Memory Windows.

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

The concept of these is introduced in [1] in terms of the
description the CEDT ACPI table. The principal is more general.
Unlike once traffic hits the CXL root bridges, the host system
memory address routing is implementation defined and effectively
static once observable by standard / generic system software.
Each CXL Fixed Memory Windows (CFMW) is a region of PA space
which has fixed system dependent routing configured so that
accesses can be routed to the CXL devices below a set of target
root bridges. The accesses may be interleaved across multiple
root bridges.

For QEMU we could have fully specified these regions in terms
of a base PA + size, but as the absolute address does not matter
it is simpler to let individual platforms place the memory regions.

ExampleS:
-cxl-fixed-memory-window targets.0=cxl.0,size=128G
-cxl-fixed-memory-window targets.0=cxl.1,size=128G
-cxl-fixed-memory-window 
targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k

Specifies
* 2x 128G regions not interleaved across root bridges, one for each of
  the root bridges with ids cxl.0 and cxl.1
* 256G region interleaved across root bridges with ids cxl.0 and cxl.1
with a 2k interleave granularity.

When system software enumerates the devices below a given root bridge
it can then decide which CFMW to use. If non interleave is desired
(or possible) it can use the appropriate CFMW for the root bridge in
question.  If there are suitable devices to interleave across the
two root bridges then it may use the 3rd CFMS.

A number of other designs were considered but the following constraints
made it hard to adapt existing QEMU approaches to this particular problem.
1) The size must be known before a specific architecture / board brings
   up it's PA memory map.  We need to set up an appropriate region.
2) Using links to the host bridges provides a clean command line interface
   but these links cannot be established until command line devices have
   been added.

Hence the two step process used here of first establishing the size,
interleave-ways and granularity + caching the ids of the host bridges
and then, once available finding the actual host bridges so they can
be used later to support interleave decoding.

[1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / specifications)

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host-stubs.c | 14 ++
 hw/cxl/cxl-host.c   | 94 +
 hw/cxl/meson.build  |  6 +++
 include/hw/cxl/cxl.h| 21 +
 qapi/machine.json   | 21 +
 qemu-options.hx | 38 +
 softmmu/vl.c| 47 +
 7 files changed, 241 insertions(+)

diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
new file mode 100644
index 00..f8fd278d5d
--- /dev/null
+++ b/hw/cxl/cxl-host-stubs.c
@@ -0,0 +1,14 @@
+/*
+ * CXL host parameter parsing routine stubs
+ *
+ * Copyright (c) 2022 Huawei
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+void cxl_fixed_memory_window_config(MachineState *ms,
+CXLFixedMemoryWindowOptions *object,
+Error **errp) {};
+
+void cxl_fixed_memory_window_link_targets(Error **errp) {};
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
new file mode 100644
index 00..ec5a75cbf5
--- /dev/null
+++ b/hw/cxl/cxl-host.c
@@ -0,0 +1,94 @@
+/*
+ * CXL host parameter parsing routines
+ *
+ * Copyright (c) 2022 Huawei
+ * Modeled loosely on the NUMA options handling in hw/core/numa.c
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "sysemu/qtest.h"
+#include "hw/boards.h"
+
+#include "qapi/qapi-visit-machine.h"
+#include "hw/cxl/cxl.h"
+
+void cxl_fixed_memory_window_config(MachineState *ms,
+CXLFixedMemoryWindowOptions *object,
+Error **errp)
+{
+CXLFixedWindow *fw = g_malloc0(sizeof(*fw));
+strList *target;
+int i;
+
+for (target = object->targets; target; target = target->next) {
+fw->num_targets++;
+}
+
+fw->enc_int_ways = cxl_interleave_ways_enc(fw->num_targets, errp);
+if (*errp) {
+return;
+}
+
+fw->targets = g_malloc0_n(fw->num_targets, sizeof(*fw->targets));
+for (i = 0, target = object->targets; target; i++, target = target->next) {
+/* This link cannot be resolved yet, so stash the name for now */
+fw->targets[i] = g_strdup(target->value);
+}
+
+if (object->size % (256 * MiB)) {
+error_setg(errp,
+   "Size of a CXL fixed memory window must my a multiple of 
256MiB");
+return;
+}
+fw->size = object->size;
+
+if (object->has_interleave_granularity) {
+fw->enc_int_gran =
+cxl_interleave_granularity_enc(object->inte

[PATCH for-7.1 00/18] hw/arm: Make exynos4210 use TYPE_SPLIT_IRQ

2022-04-04 Thread Peter Maydell
The primary aim of this patchset is to make the exynos4210 code use
the TYPE_SPLIT_IRQ device instead of the old qemu_split_irq() function
(which we are trying to get rid of). However, the current code is
quite complicated and so we have to do a fair amount of refactoring
in order to be able to use TYPE_SPLIT_IRQ in a clean way.

The interrupt wiring on this SoC is complicated and interrupt
lines from devices may be wired up to multiple places:
 * a GIC device
 * an internal combiner
 * an external combiner
(a combiner is a fairly simple "OR multiple IRQ sources together
in groups with enable/disable and mask logic" piece of hardware).
In some cases an interrupt is wired up to more than one input
on each combiner.

The current code has a struct Exynos4210Irq where it keeps arrays of
qemu_irqs corresponding to the inputs of these devices, and it handles
the "wire interrupt lines to multiple places" logic in functions which
are called by the SoC device model but which live in the source files
with the combiner and GIC models. This series moves the logic to the
SoC device model's source file (because it is really part of the SoC
wiring, not part of the individual combiner or GIC devices) and makes
use of the TYPE_SPLIT_IRQ ability to provide more than 2 output lines
to simplify things so that each interrupt line connects to just one
splitter, whose outputs go to all the places they need to. In the
new setup, these splitter devices clearly belong to the SoC object,
and so they are created as QOM children of it. The Exynos4210Irq
struct ends up unneeded and is deleted.

I have also done some conversion of specific child devices of this SoC
from the old-style "call qemu_new()" to the new-style "embed the child
device struct in the parent state struct". I haven't done a complete
conversion, but only touched those devices where making the conversion
was useful for the TYPE_SPLIT_IRQ changes.

I don't have a datasheet for this SoC that describes all the external
combiner and external GIC wiring, so I have mostly kept the QEMU
behaviour the same as it is currently. In a few places, however, I
have fixed what seem to me to be fairly clearly bugs in the current
handling. (Largely these bugs weren't visible to the guest because
we weren't actually connecting up devices to the affected bits of
the interrupt line wiring.)

I've tested this with a simple Linux image, which I think is basically
the same one as the 'make check-acceptance' test. If anybody has
access to other test images that would be interesting.

thanks
-- PMM

Peter Maydell (18):
  hw/arm/exynos4210: Use TYPE_OR_IRQ instead of custom OR-gate device
  hw/intc/exynos4210_gic: Remove unused TYPE_EXYNOS4210_IRQ_GATE
  hw/arm/exynos4210: Put a9mpcore device into state struct
  hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct
  hw/arm/exynos4210: Coalesce board_irqs and irq_table
  hw/arm/exynos4210: Fix code style nit in combiner_grp_to_gic_id[]
  hw/arm/exynos4210: Move exynos4210_init_board_irqs() into exynos4210.c
  hw/arm/exynos4210: Put external GIC into state struct
  hw/arm/exynos4210: Drop ext_gic_irq[] from Exynos4210Irq struct
  hw/arm/exynos4210: Move exynos4210_combiner_get_gpioin() into
exynos4210.c
  hw/arm/exynos4210: Delete unused macro definitions
  hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()
  hw/arm/exynos4210: Fill in irq_table[] for internal-combiner-only IRQ
lines
  hw/arm/exynos4210: Connect MCT_G0 and MCT_G1 to both combiners
  hw/arm/exynos4210: Don't connect multiple lines to external GIC inputs
  hw/arm/exynos4210: Fold combiner splits into
exynos4210_init_board_irqs()
  hw/arm/exynos4210: Put combiners into state struct
  hw/arm/exynos4210: Drop Exynos4210Irq struct

 include/hw/arm/exynos4210.h   |  50 ++-
 include/hw/intc/exynos4210_combiner.h |  57 
 include/hw/intc/exynos4210_gic.h  |  43 +++
 hw/arm/exynos4210.c   | 430 +++---
 hw/intc/exynos4210_combiner.c | 108 +--
 hw/intc/exynos4210_gic.c  | 344 +
 MAINTAINERS   |   2 +-
 7 files changed, 508 insertions(+), 526 deletions(-)
 create mode 100644 include/hw/intc/exynos4210_combiner.h
 create mode 100644 include/hw/intc/exynos4210_gic.h

-- 
2.25.1




[PATCH v9 29/45] hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl

2022-04-04 Thread Jonathan Cameron via
This adds code to instantiate the slightly extended ACPI root port
description in DSDT as per the CXL 2.0 specification.

Basically a cut and paste job from the i386/pc code.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
Reviewed-by: Alex Bennée 
---
 hw/arm/Kconfig  |  1 +
 hw/pci-host/gpex-acpi.c | 20 +---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 97f3b38019..219262a8da 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -29,6 +29,7 @@ config ARM_VIRT
 select ACPI_APEI
 select ACPI_VIOT
 select VIRTIO_MEM_SUPPORTED
+select ACPI_CXL
 
 config CHEETAH
 bool
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index e7e162a00a..7c7316bc96 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -5,6 +5,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_host.h"
+#include "hw/acpi/cxl.h"
 
 static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq)
 {
@@ -139,6 +140,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 QLIST_FOREACH(bus, &bus->child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+bool is_cxl = pci_bus_is_cxl(bus);
 
 if (!pci_bus_is_root(bus)) {
 continue;
@@ -154,8 +156,16 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 }
 
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+if (is_cxl) {
+struct Aml *pkg = aml_package(2);
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
+aml_append(pkg, aml_eisaid("PNP0A08"));
+aml_append(pkg, aml_eisaid("PNP0A03"));
+aml_append(dev, aml_name_decl("_CID", pkg));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb Device")));
@@ -175,7 +185,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 cfg->pio.base, 0, 0, 0);
 aml_append(dev, aml_name_decl("_CRS", crs));
 
-acpi_dsdt_add_pci_osc(dev);
+if (is_cxl) {
+build_cxl_osc_method(dev);
+} else {
+acpi_dsdt_add_pci_osc(dev);
+}
 
 aml_append(scope, dev);
 }
-- 
2.32.0




[PATCH for-7.1 03/18] hw/arm/exynos4210: Put a9mpcore device into state struct

2022-04-04 Thread Peter Maydell
The exynos4210 SoC mostly creates its child devices as if it were
board code.  This includes the a9mpcore object.  Switch that to a
new-style "embedded in the state struct" creation, because in the
next commit we're going to want to refer to the object again further
down in the exynos4210_realize() function.

Signed-off-by: Peter Maydell 
---
I don't propose to try to do a full conversion of every child
device; I'm only going to do them where it makes a subsequent
commit a bit nicer, like this one.
---
 include/hw/arm/exynos4210.h |  2 ++
 hw/arm/exynos4210.c | 11 ++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 3999034053e..215c039b414 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -26,6 +26,7 @@
 
 #include "hw/or-irq.h"
 #include "hw/sysbus.h"
+#include "hw/cpu/a9mpcore.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
 
@@ -103,6 +104,7 @@ struct Exynos4210State {
 I2CBus *i2c_if[EXYNOS4210_I2C_NUMBER];
 qemu_or_irq pl330_irq_orgate[EXYNOS4210_NUM_DMA];
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
+A9MPPrivState a9mpcore;
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index dfc0a4eec25..ef4d646eb91 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -244,17 +244,16 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 }
 
 /* Private memory region and Internal GIC */
-dev = qdev_new(TYPE_A9MPCORE_PRIV);
-qdev_prop_set_uint32(dev, "num-cpu", EXYNOS4210_NCPUS);
-busdev = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(busdev, &error_fatal);
+qdev_prop_set_uint32(DEVICE(&s->a9mpcore), "num-cpu", EXYNOS4210_NCPUS);
+busdev = SYS_BUS_DEVICE(&s->a9mpcore);
+sysbus_realize(busdev, &error_fatal);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_SMP_PRIVATE_BASE_ADDR);
 for (n = 0; n < EXYNOS4210_NCPUS; n++) {
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
 }
 for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
-s->irqs.int_gic_irq[n] = qdev_get_gpio_in(dev, n);
+s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->a9mpcore), n);
 }
 
 /* Cache controller */
@@ -489,6 +488,8 @@ static void exynos4210_init(Object *obj)
 g_autofree char *name = g_strdup_printf("cpu-irq-orgate%d", i);
 object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
 }
+
+object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
 }
 
 static void exynos4210_class_init(ObjectClass *klass, void *data)
-- 
2.25.1




[PATCH for-7.1 12/18] hw/arm/exynos4210: Use TYPE_SPLIT_IRQ in exynos4210_init_board_irqs()

2022-04-04 Thread Peter Maydell
In exynos4210_init_board_irqs(), use the TYPE_SPLIT_IRQ device
instead of qemu_irq_split().

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  9 
 hw/arm/exynos4210.c | 41 +
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index f0769a4045b..f58ee0f2686 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -28,6 +28,7 @@
 #include "hw/sysbus.h"
 #include "hw/cpu/a9mpcore.h"
 #include "hw/intc/exynos4210_gic.h"
+#include "hw/core/split-irq.h"
 #include "target/arm/cpu-qom.h"
 #include "qom/object.h"
 
@@ -71,6 +72,13 @@
 
 #define EXYNOS4210_NUM_DMA  3
 
+/*
+ * We need one splitter for every external combiner input, plus
+ * one for every non-zero entry in combiner_grp_to_gic_id[].
+ * We'll assert in exynos4210_init_board_irqs() if this is wrong.
+ */
+#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 60)
+
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
@@ -95,6 +103,7 @@ struct Exynos4210State {
 qemu_or_irq cpu_irq_orgate[EXYNOS4210_NCPUS];
 A9MPPrivState a9mpcore;
 Exynos4210GicState ext_gic;
+SplitIRQ splitter[EXYNOS4210_NUM_SPLITTERS];
 };
 
 #define TYPE_EXYNOS4210_SOC "exynos4210"
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 86a9a0dae12..919821833b5 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -263,6 +263,8 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 uint32_t grp, bit, irq_id, n;
 Exynos4210Irq *is = &s->irqs;
 DeviceState *extgicdev = DEVICE(&s->ext_gic);
+int splitcount = 0;
+DeviceState *splitter;
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
@@ -276,13 +278,19 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
 /* MCT_G1 is passed to External and GIC */
 irq_id = EXT_GIC_ID_MCT_G1;
 }
+
+assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
+splitter = DEVICE(&s->splitter[splitcount]);
+qdev_prop_set_uint16(splitter, "num-lines", 2);
+qdev_realize(splitter, NULL, &error_abort);
+splitcount++;
+s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
+qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
 if (irq_id) {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
- qdev_get_gpio_in(extgicdev,
-  irq_id - 32));
+qdev_connect_gpio_out(splitter, 1,
+  qdev_get_gpio_in(extgicdev, irq_id - 32));
 } else {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
-is->ext_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 1, is->ext_combiner_irq[n]);
 }
 }
 for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
@@ -293,11 +301,23 @@ static void exynos4210_init_board_irqs(Exynos4210State *s)
  EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][bit];
 
 if (irq_id) {
-s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
- qdev_get_gpio_in(extgicdev,
-  irq_id - 32));
+assert(splitcount < EXYNOS4210_NUM_SPLITTERS);
+splitter = DEVICE(&s->splitter[splitcount]);
+qdev_prop_set_uint16(splitter, "num-lines", 2);
+qdev_realize(splitter, NULL, &error_abort);
+splitcount++;
+s->irq_table[n] = qdev_get_gpio_in(splitter, 0);
+qdev_connect_gpio_out(splitter, 0, is->int_combiner_irq[n]);
+qdev_connect_gpio_out(splitter, 1,
+  qdev_get_gpio_in(extgicdev, irq_id - 32));
 }
 }
+/*
+ * We check this here to avoid a more obscure assert later when
+ * qdev_assert_realized_properly() checks that we realized every
+ * child object we initialized.
+ */
+assert(splitcount == EXYNOS4210_NUM_SPLITTERS);
 }
 
 /*
@@ -766,6 +786,11 @@ static void exynos4210_init(Object *obj)
 object_initialize_child(obj, name, &s->cpu_irq_orgate[i], TYPE_OR_IRQ);
 }
 
+for (i = 0; i < ARRAY_SIZE(s->splitter); i++) {
+g_autofree char *name = g_strdup_printf("irq-splitter%d", i);
+object_initialize_child(obj, name, &s->splitter[i], TYPE_SPLIT_IRQ);
+}
+
 object_initialize_child(obj, "a9mpcore", &s->a9mpcore, TYPE_A9MPCORE_PRIV);
 object_initialize_child(obj, "ext-gic", &s->ext_gic, TYPE_EXYNOS4210_GIC);
 }
-- 
2.25.1




[PATCH v9 33/45] cxl/cxl-host: Add memops for CFMWS region.

2022-04-04 Thread Jonathan Cameron via
From: Jonathan Cameron 

These memops perform interleave decoding, walking down the
CXL topology from CFMWS described host interleave
decoder via CXL host bridge HDM decoders, through the CXL
root ports and finally call CXL type 3 specific read and write
functions.

Note that, whilst functional the current implementation does
not support:
* switches
* multiple HDM decoders at a given level.
* unaligned accesses across the interleave boundaries

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host-stubs.c |   2 +
 hw/cxl/cxl-host.c   | 128 
 include/hw/cxl/cxl.h|   2 +
 3 files changed, 132 insertions(+)

diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
index f8fd278d5d..24465a52ab 100644
--- a/hw/cxl/cxl-host-stubs.c
+++ b/hw/cxl/cxl-host-stubs.c
@@ -12,3 +12,5 @@ void cxl_fixed_memory_window_config(MachineState *ms,
 Error **errp) {};
 
 void cxl_fixed_memory_window_link_targets(Error **errp) {};
+
+const MemoryRegionOps cfmws_ops;
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index ec5a75cbf5..469b3c4ced 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -15,6 +15,10 @@
 
 #include "qapi/qapi-visit-machine.h"
 #include "hw/cxl/cxl.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_port.h"
 
 void cxl_fixed_memory_window_config(MachineState *ms,
 CXLFixedMemoryWindowOptions *object,
@@ -92,3 +96,127 @@ void cxl_fixed_memory_window_link_targets(Error **errp)
 }
 }
 }
+
+/* TODO: support, multiple hdm decoders */
+static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
+uint8_t *target)
+{
+uint32_t ctrl;
+uint32_t ig_enc;
+uint32_t iw_enc;
+uint32_t target_reg;
+uint32_t target_idx;
+
+ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) {
+return false;
+}
+
+ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
+iw_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW);
+target_idx = (addr / cxl_decode_ig(ig_enc)) % (1 << iw_enc);
+
+if (target_idx > 4) {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= target_idx * 8;
+} else {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= (target_idx - 4) * 8;
+}
+*target = target_reg & 0xff;
+
+return true;
+}
+
+static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
+{
+CXLComponentState *hb_cstate;
+PCIHostState *hb;
+int rb_index;
+uint32_t *cache_mem;
+uint8_t target;
+bool target_found;
+PCIDevice *rp, *d;
+
+/* Address is relative to memory region. Convert to HPA */
+addr += fw->base;
+
+rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
+hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl.cxl_host_bridge);
+if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
+return NULL;
+}
+
+hb_cstate = cxl_get_hb_cstate(hb);
+if (!hb_cstate) {
+return NULL;
+}
+
+cache_mem = hb_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, &target);
+if (!target_found) {
+return NULL;
+}
+
+rp = pcie_find_port_by_pn(hb->bus, target);
+if (!rp) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
+
+if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3)) {
+return NULL;
+}
+
+return d;
+}
+
+static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
+  unsigned size, MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+*data = 0;
+/* Reads to invalid address return poison */
+return MEMTX_ERROR;
+}
+
+return cxl_type3_read(d, addr + fw->base, data, size, attrs);
+}
+
+static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size,
+   MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+/* Writes to invalid address are silent */
+return MEMTX_OK;
+}
+
+return cxl_type3_write(d, addr + fw->base, data, size, attrs);
+}
+
+const MemoryRegionOps cfmws_ops = {
+.read_with_attrs = cxl_read_cfmws,
+.write_with_attrs = cxl_write_cfmws,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+};
diff --

[PATCH for-7.1 11/18] hw/arm/exynos4210: Delete unused macro definitions

2022-04-04 Thread Peter Maydell
Delete a couple of #defines which are never used.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index b564e3582bb..f0769a4045b 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -67,10 +67,6 @@
 #define EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ   \
 (EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ * 8)
 
-/* IRQs number for external and internal GIC */
-#define EXYNOS4210_EXT_GIC_NIRQ (160-32)
-#define EXYNOS4210_INT_GIC_NIRQ 64
-
 #define EXYNOS4210_I2C_NUMBER   9
 
 #define EXYNOS4210_NUM_DMA  3
-- 
2.25.1




[PATCH for-7.1 04/18] hw/arm/exynos4210: Drop int_gic_irq[] from Exynos4210Irq struct

2022-04-04 Thread Peter Maydell
The only time we use the int_gic_irq[] array in the Exynos4210Irq
struct is in the exynos4210_realize() function: we initialize it with
the GPIO inputs of the a9mpcore device, and then a bit later on we
connect those to the outputs of the internal combiner.  Now that the
a9mpcore object is easily accessible as s->a9mpcore we can make the
connection directly from one device to the other without going via
this array.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h | 1 -
 hw/arm/exynos4210.c | 6 ++
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 215c039b414..923ce987627 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -82,7 +82,6 @@
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
-qemu_irq int_gic_irq[EXYNOS4210_INT_GIC_NIRQ];
 qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
 qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 } Exynos4210Irq;
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index ef4d646eb91..60fc5a2ffe7 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -252,9 +252,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(DEVICE(&s->cpu_irq_orgate[n]), 0));
 }
-for (n = 0; n < EXYNOS4210_INT_GIC_NIRQ; n++) {
-s->irqs.int_gic_irq[n] = qdev_get_gpio_in(DEVICE(&s->a9mpcore), n);
-}
 
 /* Cache controller */
 sysbus_create_simple("l2x0", EXYNOS4210_L2X0_BASE_ADDR, NULL);
@@ -281,7 +278,8 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 busdev = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(busdev, &error_fatal);
 for (n = 0; n < EXYNOS4210_MAX_INT_COMBINER_OUT_IRQ; n++) {
-sysbus_connect_irq(busdev, n, s->irqs.int_gic_irq[n]);
+sysbus_connect_irq(busdev, n,
+   qdev_get_gpio_in(DEVICE(&s->a9mpcore), n));
 }
 exynos4210_combiner_get_gpioin(&s->irqs, dev, 0);
 sysbus_mmio_map(busdev, 0, EXYNOS4210_INT_COMBINER_BASE_ADDR);
-- 
2.25.1




[PATCH for-7.1 16/18] hw/arm/exynos4210: Fold combiner splits into exynos4210_init_board_irqs()

2022-04-04 Thread Peter Maydell
At this point, the function exynos4210_init_board_irqs() splits input
IRQ lines to connect them to the input combiner, output combiner and
external GIC.  The function exynos4210_combiner_get_gpioin() splits
some of the combiner input lines further to connect them to multiple
different inputs on the combiner.

Because (unlike qemu_irq_split()) the TYPE_SPLIT_IRQ device has a
configurable number of outputs, we can do all this in one place, by
making exynos4210_init_board_irqs() add extra outputs to the splitter
device when it must be connected to more than one input on each
combiner.

We do this with a new data structure, the combinermap, which is an
array each of whose elements is a list of the interrupt IDs on the
combiner which must be tied together.  As we loop through each
interrupt ID, if we find that it is the first one in one of these
lists, we configure the splitter device with eonugh extra outputs and
wire them up to the other interrupt IDs in the list.

Conveniently, for all the cases where this is necessary, the
lowest-numbered interrupt ID in each group is in the range of the
external combiner, so we only need to code for this in the first of
the two loops in exynos4210_init_board_irqs().

The old code in exynos4210_combiner_get_gpioin() which is being
deleted here had several problems which don't exist in the new code
in its handling of the multi-core timer interrupts:
 (1) the case labels specified bits 4 ... 8, but bit '8' doesn't
 exist; these should have been 4 ... 7
 (2) it used the input irq[EXYNOS4210_COMBINER_GET_IRQ_NUM(1, bit + 4)]
 multiple times as the input of several different splitters,
 which isn't allowed
 (3) in an apparent cut-and-paste error, the cases for all the
 multi-core timer inputs used "bit + 4" even though the
 bit range for the case was (intended to be) 4 ... 7, which
 meant it was looking at non-existent bits 8 ... 11.
None of these exist in the new code.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |   6 +-
 hw/arm/exynos4210.c | 178 +++-
 2 files changed, 119 insertions(+), 65 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 7da3eddea5f..f24617f681d 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -74,10 +74,12 @@
 
 /*
  * We need one splitter for every external combiner input, plus
- * one for every non-zero entry in combiner_grp_to_gic_id[].
+ * one for every non-zero entry in combiner_grp_to_gic_id[],
+ * minus one for every external combiner ID in second or later
+ * places in a combinermap[] line.
  * We'll assert in exynos4210_init_board_irqs() if this is wrong.
  */
-#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 54)
+#define EXYNOS4210_NUM_SPLITTERS (EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ + 38)
 
 typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 39e334e0773..05b28cf5905 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -254,6 +254,76 @@ combiner_grp_to_gic_id[64 - 
EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 #define EXYNOS4210_COMBINER_GET_BIT_NUM(irq) \
 ((irq) - 8 * EXYNOS4210_COMBINER_GET_GRP_NUM(irq))
 
+/*
+ * Some interrupt lines go to multiple combiner inputs.
+ * This data structure defines those: each array element is
+ * a list of combiner inputs which are connected together;
+ * the one with the smallest interrupt ID value must be first.
+ * As with combiner_grp_to_gic_id[], we rely on (0, 0) not being
+ * wired to anything so we can use 0 as a terminator.
+ */
+#define IRQNO(G, B) EXYNOS4210_COMBINER_GET_IRQ_NUM(G, B)
+#define IRQNONE 0
+
+#define COMBINERMAP_SIZE 16
+
+static const int combinermap[COMBINERMAP_SIZE][6] = {
+/* MDNIE_LCD1 */
+{ IRQNO(0, 4), IRQNO(1, 0), IRQNONE },
+{ IRQNO(0, 5), IRQNO(1, 1), IRQNONE },
+{ IRQNO(0, 6), IRQNO(1, 2), IRQNONE },
+{ IRQNO(0, 7), IRQNO(1, 3), IRQNONE },
+/* TMU */
+{ IRQNO(2, 4), IRQNO(3, 4), IRQNONE },
+{ IRQNO(2, 5), IRQNO(3, 5), IRQNONE },
+{ IRQNO(2, 6), IRQNO(3, 6), IRQNONE },
+{ IRQNO(2, 7), IRQNO(3, 7), IRQNONE },
+/* LCD1 */
+{ IRQNO(11, 4), IRQNO(12, 0), IRQNONE },
+{ IRQNO(11, 5), IRQNO(12, 1), IRQNONE },
+{ IRQNO(11, 6), IRQNO(12, 2), IRQNONE },
+{ IRQNO(11, 7), IRQNO(12, 3), IRQNONE },
+/* Multi-core timer */
+{ IRQNO(1, 4), IRQNO(12, 4), IRQNO(35, 4), IRQNO(51, 4), IRQNO(53, 4), 
IRQNONE },
+{ IRQNO(1, 5), IRQNO(12, 5), IRQNO(35, 5), IRQNO(51, 5), IRQNO(53, 5), 
IRQNONE },
+{ IRQNO(1, 6), IRQNO(12, 6), IRQNO(35, 6), IRQNO(51, 6), IRQNO(53, 6), 
IRQNONE },
+{ IRQNO(1, 7), IRQNO(12, 7), IRQNO(35, 7), IRQNO(51, 7), IRQNO(53, 7), 
IRQNONE },
+};
+
+#undef IRQNO
+
+static const int *combinermap_entry(int irq)
+{
+/*
+ * If the interrupt number passed in is the first entry in some
+ * line of the combine

[PATCH v9 36/45] tests/acpi: q35: Allow addition of a CXL test.

2022-04-04 Thread Jonathan Cameron via
Add exceptions for the DSDT and the new CEDT tables
specific to a new CXL test in the following patch.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/CEDT.cxl| 0
 tests/data/acpi/q35/DSDT.cxl| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)

diff --git a/tests/data/acpi/q35/CEDT.cxl b/tests/data/acpi/q35/CEDT.cxl
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..7c7f9fbc44 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT.cxl",
+"tests/data/acpi/q35/CEDT.cxl",
-- 
2.32.0




[PATCH for-7.1 05/18] hw/arm/exynos4210: Coalesce board_irqs and irq_table

2022-04-04 Thread Peter Maydell
The exynos4210 code currently has two very similar arrays of IRQs:

 * board_irqs is a field of the Exynos4210Irq struct which is filled
   in by exynos4210_init_board_irqs() with the appropriate qemu_irqs
   for each IRQ the board/SoC can assert
 * irq_table is a set of qemu_irqs pointed to from the
   Exynos4210State struct.  It's allocated in exynos4210_init_irq,
   and the only behaviour these irqs have is that they pass on the
   level to the equivalent board_irqs[] irq

The extra indirection through irq_table is unnecessary, so coalesce
these into a single irq_table[] array as a direct field in
Exynos4210State which exynos4210_init_board_irqs() fills in.

Signed-off-by: Peter Maydell 
---
 include/hw/arm/exynos4210.h |  8 ++--
 hw/arm/exynos4210.c |  6 +-
 hw/intc/exynos4210_gic.c| 32 
 3 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/include/hw/arm/exynos4210.h b/include/hw/arm/exynos4210.h
index 923ce987627..a9f186370ee 100644
--- a/include/hw/arm/exynos4210.h
+++ b/include/hw/arm/exynos4210.h
@@ -83,7 +83,6 @@ typedef struct Exynos4210Irq {
 qemu_irq int_combiner_irq[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 qemu_irq ext_combiner_irq[EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ];
 qemu_irq ext_gic_irq[EXYNOS4210_EXT_GIC_NIRQ];
-qemu_irq board_irqs[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 } Exynos4210Irq;
 
 struct Exynos4210State {
@@ -92,7 +91,7 @@ struct Exynos4210State {
 /*< public >*/
 ARMCPU *cpu[EXYNOS4210_NCPUS];
 Exynos4210Irq irqs;
-qemu_irq *irq_table;
+qemu_irq irq_table[EXYNOS4210_MAX_INT_COMBINER_IN_IRQ];
 
 MemoryRegion chipid_mem;
 MemoryRegion iram_mem;
@@ -112,12 +111,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(Exynos4210State, EXYNOS4210_SOC)
 void exynos4210_write_secondary(ARMCPU *cpu,
 const struct arm_boot_info *info);
 
-/* Initialize exynos4210 IRQ subsystem stub */
-qemu_irq *exynos4210_init_irq(Exynos4210Irq *env);
-
 /* Initialize board IRQs.
  * These IRQs contain splitted Int/External Combiner and External Gic IRQs */
-void exynos4210_init_board_irqs(Exynos4210Irq *s);
+void exynos4210_init_board_irqs(Exynos4210State *s);
 
 /* Get IRQ number from exynos4210 IRQ subsystem stub.
  * To identify IRQ source use internal combiner group and bit number
diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 60fc5a2ffe7..11e321d7830 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -228,10 +228,6 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 qdev_realize(DEVICE(cpuobj), NULL, &error_fatal);
 }
 
-/*** IRQs ***/
-
-s->irq_table = exynos4210_init_irq(&s->irqs);
-
 /* IRQ Gate */
 for (i = 0; i < EXYNOS4210_NCPUS; i++) {
 DeviceState *orgate = DEVICE(&s->cpu_irq_orgate[i]);
@@ -296,7 +292,7 @@ static void exynos4210_realize(DeviceState *socdev, Error 
**errp)
 sysbus_mmio_map(busdev, 0, EXYNOS4210_EXT_COMBINER_BASE_ADDR);
 
 /* Initialize board IRQs. */
-exynos4210_init_board_irqs(&s->irqs);
+exynos4210_init_board_irqs(s);
 
 /*** Memory ***/
 
diff --git a/hw/intc/exynos4210_gic.c b/hw/intc/exynos4210_gic.c
index 794f6b5ac72..ec79b96f6d1 100644
--- a/hw/intc/exynos4210_gic.c
+++ b/hw/intc/exynos4210_gic.c
@@ -192,30 +192,14 @@ 
combiner_grp_to_gic_id[64-EXYNOS4210_MAX_EXT_COMBINER_OUT_IRQ][8] = {
 #define EXYNOS4210_GIC_CPU_REGION_SIZE  0x100
 #define EXYNOS4210_GIC_DIST_REGION_SIZE 0x1000
 
-static void exynos4210_irq_handler(void *opaque, int irq, int level)
-{
-Exynos4210Irq *s = (Exynos4210Irq *)opaque;
-
-/* Bypass */
-qemu_set_irq(s->board_irqs[irq], level);
-}
-
-/*
- * Initialize exynos4210 IRQ subsystem stub.
- */
-qemu_irq *exynos4210_init_irq(Exynos4210Irq *s)
-{
-return qemu_allocate_irqs(exynos4210_irq_handler, s,
-EXYNOS4210_MAX_INT_COMBINER_IN_IRQ);
-}
-
 /*
  * Initialize board IRQs.
  * These IRQs contain splitted Int/External Combiner and External Gic IRQs.
  */
-void exynos4210_init_board_irqs(Exynos4210Irq *s)
+void exynos4210_init_board_irqs(Exynos4210State *s)
 {
 uint32_t grp, bit, irq_id, n;
+Exynos4210Irq *is = &s->irqs;
 
 for (n = 0; n < EXYNOS4210_MAX_EXT_COMBINER_IN_IRQ; n++) {
 irq_id = 0;
@@ -230,11 +214,11 @@ void exynos4210_init_board_irqs(Exynos4210Irq *s)
 irq_id = EXT_GIC_ID_MCT_G1;
 }
 if (irq_id) {
-s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
-s->ext_gic_irq[irq_id-32]);
+s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
+is->ext_gic_irq[irq_id - 32]);
 } else {
-s->board_irqs[n] = qemu_irq_split(s->int_combiner_irq[n],
-s->ext_combiner_irq[n]);
+s->irq_table[n] = qemu_irq_split(is->int_combiner_irq[n],
+is->ext_combiner_irq[n]);
 }
 }
 for (; n < EXYNOS4210_MAX_INT_COMBINER_IN_IRQ; n++) {
@@ -245,8 +229,8 @@

  1   2   >