date:20240301


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

Currently, -smp supports up to 7-levels topology hierarchy:
   -drawers/books/sockets/dies/clusters/cores/threads.

Though no machine supports all these 7 levels yet, these 7 levels have
the strict containment relationship and together form the generic CPU
topology representation of QEMU.

Also, note that the maxcpus is calculated by multiplying all 7 levels:

   maxcpus = drawers * books * sockets * dies * clusters *
 cores * threads.

To cover this code path, it is necessary to test the full topology case
(with all 7 levels). This also helps to avoid introducing new issues by
further expanding the CPU topology in the future.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 143 
  1 file changed, 143 insertions(+)


FWIW:
Acked-by: Thomas Huth

Re: [PATCH 8/8] tests/unit/test-smp-parse.c: Test smp_props.has_clusters


On 18/01/2024 15.48, Zhao Liu wrote:

From: Zhao Liu 

The smp_props.has_clusters in MachineClass is not a user configured
field, and it indicates if user specifies "clusters" in -smp.

After -smp parsing, other module could aware if the cluster level
is configured by user. This is used when the machine has only 1 cluster
since there's only 1 cluster by default.

Add the check to cover this field.

Signed-off-by: Zhao Liu 
---
  tests/unit/test-smp-parse.c | 20 ++--
  1 file changed, 14 insertions(+), 6 deletions(-)


Acked-by: Thomas Huth

[PULL 1/6] libqos/virtio.c: init all elems in qvring_indirect_desc_setup()

From: Daniel Henrique Barboza 

The loop isn't setting the values for the last element. Every other
element is being initialized with addr = 0, flags = VRING_DESC_F_NEXT
and next = i + 1. The last elem is never touched.

This became a problem when enabling a RISC-V 'virt' libqos machine in
the 'indirect' test of virti-blk-test.c. The 'flags' for the last
element will end up being an odd number (since we didn't touch it).
Being an odd number it will be mistaken by VRING_DESC_F_NEXT, which
happens to be 1.

Deep into hw/virt/virtio.c, in virtqueue_split_pop(), into
virtqueue_split_read_next_desc(), a check for VRING_DESC_F_NEXT will be
made to see if we're supposed to chain. The code will keep up chaining
in the last element because the uninitialized value happens to be odd.
We'll error out right after that because desc->next (which is also
uninitialized) will be >= max. A VIRTQUEUE_READ_DESC_ERROR will be
returned, with an error message like this in the stderr:

qemu-system-riscv64: Desc next is 49391

Since we never returned, we'll end up timing out at qvirtio_wait_used_elem():

ERROR:../tests/qtest/libqos/virtio.c:236:qvirtio_wait_used_elem:
assertion failed: (g_get_monotonic_time() - start_time <= timeout_us)

The root cause is using uninitialized values from guest_alloc() in
qvring_indirect_desc_setup(). There's no guarantee that the memory pages
retrieved will be zeroed, so we can't make assumptions. In fact, commit
5b4f72f5e8 ("tests/qtest: properly initialise the vring used idx") fixed a
similar problem stating "It is probably not wise to assume guest memory
is zeroed anyway". I concur.

Initialize all elems in qvring_indirect_desc_setup().

Fixes: f294b029aa ("libqos: Added indirect descriptor support to virtio 
implementation")
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 
Reviewed-by: Thomas Huth 
Message-ID: <20240217192607.32565-2-dbarb...@ventanamicro.com>
Signed-off-by: Thomas Huth 
---
 tests/qtest/libqos/virtio.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 410513225f..4f39124eba 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -280,14 +280,27 @@ QVRingIndirectDesc *qvring_indirect_desc_setup(QTestState 
*qs, QVirtioDevice *d,
 indirect->elem = elem;
 indirect->desc = guest_alloc(alloc, sizeof(struct vring_desc) * elem);
 
-for (i = 0; i < elem - 1; ++i) {
+for (i = 0; i < elem; ++i) {
 /* indirect->desc[i].addr */
 qvirtio_writeq(d, qs, indirect->desc + (16 * i), 0);
-/* indirect->desc[i].flags */
-qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12,
-   VRING_DESC_F_NEXT);
-/* indirect->desc[i].next */
-qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, i + 1);
+
+/*
+ * If it's not the last element of the ring, set
+ * the chain (VRING_DESC_F_NEXT) flag and
+ * desc->next. Clear the last element - there's
+ * no guarantee that guest_alloc() will do it.
+ */
+if (i != elem - 1) {
+/* indirect->desc[i].flags */
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12,
+   VRING_DESC_F_NEXT);
+
+/* indirect->desc[i].next */
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, i + 1);
+} else {
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12, 0);
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, 0);
+}
 }
 
 return indirect;
-- 
2.44.0

[PULL 2/6] libqos/virtio.c: fix 'avail_event' offset in qvring_init()

From: Daniel Henrique Barboza 

In qvring_init() we're writing vq->used->avail_event at "vq->used + 2 +
array_size".  The struct pointed by vq->used is, from virtio_ring.h
Linux header):

 *  // A ring of used descriptor heads with free-running index.
 *  __virtio16 used_flags;
 *  __virtio16 used_idx;
 *  struct vring_used_elem used[num];
 *  __virtio16 avail_event_idx;

So 'flags' is the word right at vq->used. 'idx' is vq->used + 2. We need
to skip 'used_idx' by adding + 2 bytes, and then sum the vector size, to
reach avail_event_idx. An example on how to properly access this field
can be found in qvirtqueue_kick():

avail_event = qvirtio_readw(d, qts, vq->used + 4 +
sizeof(struct vring_used_elem) * vq->size);

This error was detected when enabling the RISC-V 'virt' libqos machine.
The 'idx' test from vhost-user-blk-test.c errors out with a timeout in
qvirtio_wait_used_elem(). The timeout happens because when processing
the first element, 'avail_event' is read in qvirtqueue_kick() as non-zero
because we didn't initialize it properly (and the memory at that point
happened to be non-zero). 'idx' is 0.

All of this makes this condition fail because "idx - avail_event" will
overflow and be non-zero:

/* < 1 because we add elements to avail queue one by one */
if ((flags & VRING_USED_F_NO_NOTIFY) == 0 &&
(!vq->event || (uint16_t)(idx-avail_event) < 1)) {
d->bus->virtqueue_kick(d, vq);
}

As a result the virtqueue is never kicked and we'll timeout waiting for it.

Fixes: 1053587c3f ("libqos: Added EVENT_IDX support")
Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Alistair Francis 
Reviewed-by: Thomas Huth 
Message-ID: <20240217192607.32565-3-dbarb...@ventanamicro.com>
Signed-off-by: Thomas Huth 
---
 tests/qtest/libqos/virtio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 4f39124eba..82a6e122bf 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -265,7 +265,7 @@ void qvring_init(QTestState *qts, const QGuestAllocator 
*alloc, QVirtQueue *vq,
 /* vq->used->idx */
 qvirtio_writew(vq->vdev, qts, vq->used + 2, 0);
 /* vq->used->avail_event */
-qvirtio_writew(vq->vdev, qts, vq->used + 2 +
+qvirtio_writew(vq->vdev, qts, vq->used + 4 +
sizeof(struct vring_used_elem) * vq->size, 0);
 }
 
-- 
2.44.0

[PULL 6/6] chardev/char-socket: Fix TLS io channels sending too much data to the backend

Commit ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
changed the behavior of the TLS io channels to schedule a second reading
attempt if there is still incoming data pending. This caused a regression
with backends like the sclpconsole that check in their read function that
the sender does not try to write more bytes to it than the device can
currently handle.

The problem can be reproduced like this:

 1) In one terminal, do this:

  mkdir qemu-pki
  cd qemu-pki
  openssl genrsa 2048 > ca-key.pem
  openssl req -new -x509 -nodes -days 365000 -key ca-key.pem -out ca-cert.pem
  # enter some dummy value for the cert
  openssl genrsa 2048 > server-key.pem
  openssl req -new -x509 -nodes -days 365000 -key server-key.pem \
-out server-cert.pem
  # enter some other dummy values for the cert

  gnutls-serv --echo --x509cafile ca-cert.pem --x509keyfile server-key.pem \
  --x509certfile server-cert.pem -p 8338

 2) In another terminal, do this:

  wget 
https://download.fedoraproject.org/pub/fedora-secondary/releases/39/Cloud/s390x/images/Fedora-Cloud-Base-39-1.5.s390x.qcow2

  qemu-system-s390x -nographic -nodefaults \
-hda Fedora-Cloud-Base-39-1.5.s390x.qcow2 \
-object 
tls-creds-x509,id=tls0,endpoint=client,verify-peer=false,dir=$PWD/qemu-pki \
-chardev socket,id=tls_chardev,host=localhost,port=8338,tls-creds=tls0 \
-device sclpconsole,chardev=tls_chardev,id=tls_serial

QEMU then aborts after a second or two with:

  qemu-system-s390x: ../hw/char/sclpconsole.c:73: chr_read: Assertion
   `size <= SIZE_BUFFER_VT220 - scon->iov_data_len' failed.
 Aborted (core dumped)

It looks like the second read does not trigger the chr_can_read() function
to be called before the second read, which should normally always be done
before sending bytes to a character device to see how much it can handle,
so the s->max_size in tcp_chr_read() still contains the old value from the
previous read. Let's make sure that we use the up-to-date value by calling
tcp_chr_read_poll() again here.

Cc: qemu-sta...@nongnu.org
Fixes: ffda5db65a ("io/channel-tls: fix handling of bigger read buffers")
Buglink: https://issues.redhat.com/browse/RHEL-24614
Reviewed-by: "Daniel P. Berrangé" 
Message-ID: <20240229104339.42574-1-th...@redhat.com>
Reviewed-by: Antoine Damhet 
Tested-by: Antoine Damhet 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Thomas Huth 
---
 chardev/char-socket.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 67e3334423..8a0406cc1e 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -496,9 +496,9 @@ static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition 
cond, void *opaque)
 s->max_size <= 0) {
 return TRUE;
 }
-len = sizeof(buf);
-if (len > s->max_size) {
-len = s->max_size;
+len = tcp_chr_read_poll(opaque);
+if (len > sizeof(buf)) {
+len = sizeof(buf);
 }
 size = tcp_chr_recv(chr, (void *)buf, len);
 if (size == 0 || (size == -1 && errno != EAGAIN)) {
-- 
2.44.0

[PULL 0/6] Misc fixes (libqos vring, Kconfig, TLS io channels, ...)

 Hi Peter!

The following changes since commit c0c6a0e3528b88aaad0b9d333e295707a195587b:

  Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu 
into staging (2024-02-28 17:27:10 +)

are available in the Git repository at:

  https://gitlab.com/thuth/qemu.git tags/pull-request-2024-03-01

for you to fetch changes up to 462945cd22d2bcd233401ed3aa167d83a8e35b05:

  chardev/char-socket: Fix TLS io channels sending too much data to the backend 
(2024-03-01 08:27:33 +0100)


* Fix some bugs in the vring setup of libqos
* Fix GIC settings when using --without-default-devices
* Fix USB PCAP streams on Windows
* Remove temporary files from test-util-sockets
* Fix TLS io channels sending too much data to the backend


Benjamin David Lunt (1):
  hw/usb/bus.c: PCAP adding 0xA in Windows version

Daniel Henrique Barboza (2):
  libqos/virtio.c: init all elems in qvring_indirect_desc_setup()
  libqos/virtio.c: fix 'avail_event' offset in qvring_init()

Thomas Huth (3):
  hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"
  tests/unit/test-util-sockets: Remove temporary file after test
  chardev/char-socket: Fix TLS io channels sending too much data to the 
backend

 chardev/char-socket.c  |  6 +++---
 hw/usb/bus.c   |  5 +++--
 tests/qtest/libqos/virtio.c| 27 ---
 tests/unit/test-util-sockets.c |  1 +
 hw/intc/Kconfig| 12 ++--
 5 files changed, 33 insertions(+), 18 deletions(-)

[PULL 5/6] tests/unit/test-util-sockets: Remove temporary file after test

test-util-sockets leaves the temporary socket files around in the
temporary files folder. Let's better remove them at the end of the
testing.

Fixes: 4d3a329af5 ("tests/util-sockets: add abstract unix socket cases")
Message-ID: <20240226082728.249753-1-th...@redhat.com>
Reviewed-by: Marc-André Lureau 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Thomas Huth 
---
 tests/unit/test-util-sockets.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/unit/test-util-sockets.c b/tests/unit/test-util-sockets.c
index 63909ccb2b..4c9dd0b271 100644
--- a/tests/unit/test-util-sockets.c
+++ b/tests/unit/test-util-sockets.c
@@ -326,6 +326,7 @@ static void test_socket_unix_abstract(void)
 test_socket_unix_abstract_row(&matrix[i]);
 }
 
+unlink(addr.u.q_unix.path);
 g_free(addr.u.q_unix.path);
 }
 
-- 
2.44.0

[PULL 3/6] hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"

When using "--without-default-devices", the ARM_GICV3_TCG and ARM_GIC_KVM
settings currently get disabled, though the arm virt machine is only of
very limited use in that case. This also causes the migration-test to
fail in such builds. Let's make sure that we always keep the GIC switches
enabled in the --without-default-devices builds, too.

Message-ID: <20240221110059.152665-1-th...@redhat.com>
Tested-by: Fabiano Rosas 
Signed-off-by: Thomas Huth 
---
 hw/intc/Kconfig | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/intc/Kconfig b/hw/intc/Kconfig
index 97d550b06b..2b5b2d2301 100644
--- a/hw/intc/Kconfig
+++ b/hw/intc/Kconfig
@@ -12,10 +12,6 @@ config IOAPIC
 bool
 select I8259
 
-config ARM_GIC
-bool
-select MSI_NONBROKEN
-
 config OPENPIC
 bool
 select MSI_NONBROKEN
@@ -25,14 +21,18 @@ config APIC
 select MSI_NONBROKEN
 select I8259
 
+config ARM_GIC
+bool
+select ARM_GICV3_TCG if TCG
+select ARM_GIC_KVM if KVM
+select MSI_NONBROKEN
+
 config ARM_GICV3_TCG
 bool
-default y
 depends on ARM_GIC && TCG
 
 config ARM_GIC_KVM
 bool
-default y
 depends on ARM_GIC && KVM
 
 config XICS
-- 
2.44.0

[PULL 4/6] hw/usb/bus.c: PCAP adding 0xA in Windows version

From: Benjamin David Lunt 

Since Windows text files use CRLFs for all \n, the Windows version of QEMU
inserts a CR in the PCAP stream when a LF is encountered when using USB PCAP
files. This is due to the fact that the PCAP file is opened as TEXT instead
of BINARY.

To show an example, when using a very common protocol to USB disks, the BBB
protocol uses a 10-byte command packet. For example, the READ_CAPACITY(10)
command will have a command block length of 10 (0xA). When this 10-byte
command (part of the 31-byte CBW) is placed into the PCAP file, the Windows
file manager inserts a 0xD before the 0xA, turning the 31-byte CBW into a
32-byte CBW.

Actual CBW:
  0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0a 25 USBC...%
  0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ...

PCAP CBW
  0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0d 0a USBC
  0050 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 %..

I believe simply opening the PCAP file as BINARY instead of TEXT will fix
this issue.

Cc: qemu-sta...@nongnu.org
Resolves: https://bugs.launchpad.net/qemu/+bug/2054889
Signed-off-by: Benjamin David Lunt 
Message-ID: <000101da6823$ce1bbf80$6a533e80$@fysnet.net>
[thuth: Break long line to avoid checkpatch.pl error]
Signed-off-by: Thomas Huth 
---
 hw/usb/bus.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb/bus.c b/hw/usb/bus.c
index 796769fadb..bfab2807d7 100644
--- a/hw/usb/bus.c
+++ b/hw/usb/bus.c
@@ -260,13 +260,14 @@ static void usb_qdev_realize(DeviceState *qdev, Error 
**errp)
 }
 
 if (dev->pcap_filename) {
-int fd = qemu_open_old(dev->pcap_filename, O_CREAT | O_WRONLY | 
O_TRUNC, 0666);
+int fd = qemu_open_old(dev->pcap_filename,
+   O_CREAT | O_WRONLY | O_TRUNC | O_BINARY, 0666);
 if (fd < 0) {
 error_setg(errp, "open %s failed", dev->pcap_filename);
 usb_qdev_unrealize(qdev);
 return;
 }
-dev->pcap = fdopen(fd, "w");
+dev->pcap = fdopen(fd, "wb");
 usb_pcap_init(dev->pcap);
 }
 }
-- 
2.44.0

Re: [PATCH v6 00/23] migration: File based migration with multifd and mapped-ram

2024-03-01 Thread Daniel P . Berrangé

On Fri, Mar 01, 2024 at 09:50:32AM +0800, Peter Xu wrote:
> On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> > Based-on: 74aa0fb297 (migration: options incompatible with cpr) # 
> > peterx/migration-next
> > 
> > Hi,
> > 
> > In this v6:
> > 
> > - Minor fixes to 17/23 and 19/23
> 
> The whole set looks good to me now.  I plan to queue it before the
> direct-io stuff.  Any other comments / concerns from anyone?
> 
> Dan, would it be fine I queue the IO patches together?

Yes, that's fine, when the series is ready.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] hw/us/bus.c PCAP adding 0xA in Windows version


On 25/02/2024 20.49, benl...@fysnet.net wrote:

Since Windows text files use CRLFs for all \n, the Windows version of QEMU
inserts a CR in the PCAP stream when a LF is encountered when using USB PCAP
files. This is due to the fact that the PCAP file is opened as TEXT instead
of BINARY.

To show an example, when using a very common protocol to USB disks, the BBB
protocol uses a 10-byte command packet. For example, the READ_CAPACITY(10)
command will have a command block length of 10 (0xA). When this 10-byte
command (part of the 31-byte CBW) is placed into the PCAP file, the Windows
file manager inserts a 0xD before the 0xA, turning the 31-byte CBW into a
32-byte CBW.

Actual CBW:
   0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0a 25 USBC...%
   0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ...

PCAP CBW
   0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0d 0a USBC
   0050 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 %..

I believe simply opening the PCAP file as BINARY instead of TEXT will fix
this issue.

Resolves: https://bugs.launchpad.net/qemu/+bug/2054889
Signed-off-by: Benjamin David Lunt 
---
  hw/usb/bus.c | 2 +-
1 file changed, 2 insertions(+), 2 deletions(-)

diff -u a/hw/usb/bus.c b/hw/usb/bus.c
--- a/hw/usb/bus.c
+++ b/hw/usb/bus.c
@@ -273,13 +273,13 @@
  }
  
  if (dev->pcap_filename) {

-int fd = qemu_open_old(dev->pcap_filename, O_CREAT | O_WRONLY |
O_TRUNC, 0666);
+int fd = qemu_open_old(dev->pcap_filename, O_CREAT | O_WRONLY |
O_TRUNC | O_BINARY, 0666);


 Hi Benjamin!

Thanks for the patch! Since it's rather a trivial patch and our USB 
maintainer Gerd is pretty much busy with other stuff right now, I went ahead 
and put it in my current pull request.


Two things to notice: First, something along the way (likely your mail 
program) added a line break after the "O_WRONLY |" in the above two lines, 
so I had to undo that change manually before I was able to apply the patch 
... please try to use "git send-email" for sending patches, then such things 
don't happen.
And second, please use the scripts/checkpatch.pl script to check your 
patches - it was complaining about a line getting too long here, so I went 
ahead and fixed that, too (i.e. no need to resend, just a FYI for future 
patches).


 Thanks,
  Thomas


  if (fd < 0) {
  error_setg(errp, "open %s failed", dev->pcap_filename);
  usb_qdev_unrealize(qdev);
  return;
  }
-dev->pcap = fdopen(fd, "w");
+dev->pcap = fdopen(fd, "wb");
  usb_pcap_init(dev->pcap);
  }
  }

[PATCH v1 1/1] memory tier: acpi/hmat: create CPUless memory tiers after obtaining HMAT info

2024-03-01 Thread Ho-Ren (Jack) Chuang

* Introduce `mt_init_with_hmat()`
We defer memory tier initialization for those CPUless NUMA nodes
until acquiring HMAT info. `mt_init_with_hmat()` is introduced to
post-create CPUless memory tiers after obtaining HMAT info.
It iterates through each CPUless memory node, creating memory tiers if
necessary. Finally, it calculates demotion tables again at the end.

* Introduce `hmat_find_alloc_memory_type()`
Find or allocate a memory type in the `hmat_memory_types` list.

* Make `set_node_memory_tier()` more generic
This function can also be used for setting other memory types for a node.
To do so, a new argument is added to specify a memory type.

* Handle cases where there is no HMAT when creating memory tiers
If no HMAT is specified, it falls back to using `default_dram_type`.

* Change adist calculation code to use another new lock, mt_perf_lock.
Iterating through CPUlist nodes requires holding the `memory_tier_lock`.
However, `mt_calc_adistance()` will end up trying to acquire the same lock,
leading to a potential deadlock. Therefore, we propose introducing a
standalone `mt_perf_lock` to protect `default_dram_perf`. This approach not
only avoids deadlock but also prevents holding a large lock simultaneously.

Signed-off-by: Ho-Ren (Jack) Chuang 
Signed-off-by: Hao Xiang 
---
 drivers/acpi/numa/hmat.c |  3 ++
 include/linux/memory-tiers.h |  6 +++
 mm/memory-tiers.c| 76 
 3 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index d6b85f0f6082..9f57338b3cb5 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -1038,6 +1038,9 @@ static __init int hmat_init(void)
if (!hmat_set_default_dram_perf())
register_mt_adistance_algorithm(&hmat_adist_nb);
 
+   /* Post-create CPUless memory tiers after getting HMAT info */
+   mt_init_with_hmat();
+
return 0;
 out_put:
hmat_free_structures();
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 69e781900082..2f845e90c033 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -48,6 +48,7 @@ int mt_calc_adistance(int node, int *adist);
 int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
 const char *source);
 int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
+void mt_init_with_hmat(void);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
@@ -136,5 +137,10 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
 {
return -EIO;
 }
+
+static inline void mt_init_with_hmat(void)
+{
+
+}
 #endif /* CONFIG_NUMA */
 #endif  /* _LINUX_MEMORY_TIERS_H */
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 0537664620e5..7a0a579b3deb 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -35,7 +35,9 @@ struct node_memory_type_map {
 };
 
 static DEFINE_MUTEX(memory_tier_lock);
+static DEFINE_MUTEX(mt_perf_lock);
 static LIST_HEAD(memory_tiers);
+static LIST_HEAD(hmat_memory_types);
 static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
 struct memory_dev_type *default_dram_type;
 
@@ -502,7 +504,7 @@ static inline void __init_node_memory_type(int node, struct 
memory_dev_type *mem
}
 }
 
-static struct memory_tier *set_node_memory_tier(int node)
+static struct memory_tier *set_node_memory_tier(int node, struct 
memory_dev_type *new_memtype)
 {
struct memory_tier *memtier;
struct memory_dev_type *memtype;
@@ -514,7 +516,7 @@ static struct memory_tier *set_node_memory_tier(int node)
if (!node_state(node, N_MEMORY))
return ERR_PTR(-EINVAL);
 
-   __init_node_memory_type(node, default_dram_type);
+   __init_node_memory_type(node, new_memtype);
 
memtype = node_memory_types[node].memtype;
node_set(node, memtype->nodes);
@@ -623,6 +625,56 @@ void clear_node_memory_type(int node, struct 
memory_dev_type *memtype)
 }
 EXPORT_SYMBOL_GPL(clear_node_memory_type);
 
+static struct memory_dev_type *hmat_find_alloc_memory_type(int adist)
+{
+   bool found = false;
+   struct memory_dev_type *mtype;
+
+   list_for_each_entry(mtype, &hmat_memory_types, list) {
+   if (mtype->adistance == adist) {
+   found = true;
+   break;
+   }
+   }
+   if (!found) {
+   mtype = alloc_memory_type(adist);
+   if (!IS_ERR(mtype))
+   list_add(&mtype->list, &hmat_memory_types);
+   }
+   return mtype;
+}
+
+static void mt_create_with_hmat(int node)
+{
+   struct memory_dev_type *mtype = NULL;
+   int adist = MEMTIER_ADISTANCE_DRAM;
+
+   mt_calc_adistance(node, &adist);
+   if (adist != MEMTIER_ADISTANCE_DRAM) {
+   mtype = hmat_find_alloc_memory_type

[PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes

2024-03-01 Thread Ho-Ren (Jack) Chuang

The memory tiering component in the kernel is functionally useless for
CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the nodes
are lumped together in the DRAM tier.
https://lore.kernel.org/linux-mm/ph0pr08mb7955e9f08ccb64f23963b5c3a8...@ph0pr08mb7955.namprd08.prod.outlook.com/T/

This patchset automatically resolves the issues. It delays the initialization
of memory tiers for CPUless NUMA nodes until they obtain HMAT information
at boot time, eliminating the need for user intervention.
If no HMAT specified, it falls back to using `default_dram_type`.

Example usecase:
We have CXL memory on the host, and we create VMs with a new system memory
device backed by host CXL memory. We inject CXL memory performance attributes
through QEMU, and the guest now sees memory nodes with performance attributes
in HMAT. With this change, we enable the guest kernel to construct
the correct memory tiering for the memory nodes.

Ho-Ren (Jack) Chuang (1):
  memory tier: acpi/hmat: create CPUless memory tiers after obtaining
HMAT info

 drivers/acpi/numa/hmat.c |  3 ++
 include/linux/memory-tiers.h |  6 +++
 mm/memory-tiers.c| 76 
 3 files changed, 77 insertions(+), 8 deletions(-)

-- 
Hao Xiang and Ho-Ren (Jack) Chuang

Re: [PATCH v7 2/2] hw/acpi: Implement the SRAT GI affinity structure

2024-03-01 Thread Ankit Agrawal

>>
>> One more thing.  Right now we can't use Generic Initiators as
>> HMAT initiators.  That also wants fixing given that's their
>> normal usecase rather than what you are using them for so it
>> should 'work'.
>
> Something along the lines of this will do the job.

Thanks! Will incorporate the patch in the next posting.

>
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 4173ef2afa..825cfe86bc 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -41,6 +41,7 @@ struct NodeInfo {
> struct HostMemoryBackend *node_memdev;
> bool present;
> bool has_cpu;
>+    bool has_gi;
> uint8_t lb_info_provided;
> uint16_t initiator;
> uint8_t distance[MAX_NODES];
> diff --git a/hw/acpi/acpi-generic-initiator.c 
> b/hw/acpi/acpi-generic-initiator.c
> index 9179590a42..8a67300320 100644
> --- a/hw/acpi/acpi-generic-initiator.c
> +++ b/hw/acpi/acpi-generic-initiator.c
> @@ -6,6 +6,7 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/acpi-generic-initiator.h"
> #include "hw/pci/pci_device.h"
> +#include "hw/boards.h"
> #include "qapi/error.h"
> #include "qapi/qapi-builtin-visit.h"
> #include "qapi/visitor.h"
> @@ -58,6 +59,7 @@ static void acpi_generic_node_set_node(Object *obj, Visitor 
> *v,
>    const char *name, void *opaque,
>    Error **errp)
> {
> +    MachineState *ms = MACHINE(qdev_get_machine());
> AcpiGenericNode *gn = ACPI_GENERIC_NODE(obj);
> uint32_t value;
>
> @@ -72,6 +74,10 @@ static void acpi_generic_node_set_node(Object *obj, 
> Visitor *v,
> }
>
> gn->node = value;
> +
> +    if (object_dynamic_cast(obj, TYPE_ACPI_GENERIC_INITIATOR)) {
> +    ms->numa_state->nodes[gn->node].has_gi = true;
> +    }
> }
>
> static void acpi_generic_node_class_init(ObjectClass *oc, void *data)
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index b933ae3c06..9b1662b6b8 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -225,7 +225,7 @@ static void hmat_build_table_structs(GArray *table_data, 
> NumaState *numa_state)
> }
>
> for (i = 0; i < numa_state->num_nodes; i++) {
> -    if (numa_state->nodes[i].has_cpu) {
> +    if (numa_state->nodes[i].has_cpu || numa_state->nodes[i].has_gi) {
> initiator_list[num_initiator++] = i;
> }
> }
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index f08956ddb0..58a32f1564 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -229,7 +229,8 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
> NumaHmatLBOptions *node,
>    node->target, numa_state->num_nodes);
> return;
> }
> -    if (!numa_info[node->initiator].has_cpu) {
> +    if (!numa_info[node->initiator].has_cpu &&
> +    !numa_info[node->initiator].has_gi) {
> error_setg(errp, "Invalid initiator=%d, it isn't an "
>    "initiator proximity domain", node->initiator);
> return;
>
> Jonathan
>
> >
> >
> >
>

Re: [PATCH v7 1/2] qom: new object to associate device to numa node

2024-03-01 Thread Ankit Agrawal


>> As for your suggestion of using acpi-dev as the arg to take both
>> pci-dev and acpi-dev.. Would that mean sending a pure pci device
>> (not the corner case you mentioned) through the acpi-dev argument
>> as well? Not sure if that would appropriate.
>
> Ah, looking up my description is unclear. I meant two optional parameters
> pci-dev or acpi-dev - which one was supplied would indicate the type
> of handle to be used.

Yes, that makes sense. But for now only have pci-dev until we have any
acpi-dev related code added? IIRC, this is what we discussed earlier.

Re: [PATCH 06/19] smbios: get rid of smbios_legacy global

2024-03-01 Thread Ani Sinha




> On 29-Feb-2024, at 19:59, Igor Mammedov  wrote:
> 
> On Thu, 29 Feb 2024 16:23:21 +0530
> Ani Sinha  wrote:
> 
>>> On 27-Feb-2024, at 21:17, Igor Mammedov  wrote:
>>> 
>>> clean up smbios_set_defaults() which is reused by legacy
>>> and non legacy machines from being aware of 'legacy' notion
>>> and need to turn it off. And push legacy handling up to
>>> PC machine code where it's relevant.
>>> 
>>> Signed-off-by: Igor Mammedov 

Reviewed-by: Ani Sinha 

>>> ---
>>> PS: I've moved/kept legacy smbios_entries to smbios_get_tables()
>>> but it at least is not visible to API users. To get rid of it
>>> as well, it would be necessary to change how '-smbios' CLI
>>> option is processed. Which is done later in the series.
>>> ---
>>> include/hw/firmware/smbios.h |  2 +-
>>> hw/arm/virt.c|  2 +-
>>> hw/i386/fw_cfg.c |  7 ---
>>> hw/loongarch/virt.c  |  2 +-
>>> hw/riscv/virt.c  |  2 +-
>>> hw/smbios/smbios.c   | 35 +++
>>> 6 files changed, 23 insertions(+), 27 deletions(-)
>>> 
>>> diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
>>> index a187fbbd3d..0818184834 100644
>>> --- a/include/hw/firmware/smbios.h
>>> +++ b/include/hw/firmware/smbios.h
>>> @@ -293,7 +293,7 @@ struct smbios_type_127 {
>>> void smbios_entry_add(QemuOpts *opts, Error **errp);
>>> void smbios_set_cpuid(uint32_t version, uint32_t features);
>>> void smbios_set_defaults(const char *manufacturer, const char *product,
>>> - const char *version, bool legacy_mode,
>>> + const char *version,
>>> bool uuid_encoded, SmbiosEntryPointType ep_type);
>>> void smbios_set_default_processor_family(uint16_t processor_family);
>>> uint8_t *smbios_get_table_legacy(uint32_t expected_t4_count, size_t 
>>> *length);
>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>> index 0af1943697..8588681f27 100644
>>> --- a/hw/arm/virt.c
>>> +++ b/hw/arm/virt.c
>>> @@ -1633,7 +1633,7 @@ static void virt_build_smbios(VirtMachineState *vms)
>>>}
>>> 
>>>smbios_set_defaults("QEMU", product,
>>> -vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
>>> +vmc->smbios_old_sys_ver ? "1.0" : mc->name,
>>>true, SMBIOS_ENTRY_POINT_TYPE_64);
>>> 
>>>/* build the array of physical mem area from base_memmap */
>>> diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
>>> index fcb4fb0769..c1e9c0fd9c 100644
>>> --- a/hw/i386/fw_cfg.c
>>> +++ b/hw/i386/fw_cfg.c
>>> @@ -63,15 +63,16 @@ void fw_cfg_build_smbios(PCMachineState *pcms, 
>>> FWCfgState *fw_cfg)
>>>if (pcmc->smbios_defaults) {
>>>/* These values are guest ABI, do not change */
>>>smbios_set_defaults("QEMU", mc->desc, mc->name,
>>> -pcmc->smbios_legacy_mode, 
>>> pcmc->smbios_uuid_encoded,
>>> +pcmc->smbios_uuid_encoded,
>>>pcms->smbios_entry_point_type);
>>>}
>>> 
>>>/* tell smbios about cpuid version and features */
>>>smbios_set_cpuid(cpu->env.cpuid_version, cpu->env.features[FEAT_1_EDX]);
>>> 
>>> -smbios_tables = smbios_get_table_legacy(ms->smp.cpus, 
>>> &smbios_tables_len);
>>> -if (smbios_tables) {
>>> +if (pcmc->smbios_legacy_mode) {
>>> +smbios_tables = smbios_get_table_legacy(ms->smp.cpus,
>>> +&smbios_tables_len);
>>>fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
>>> smbios_tables, smbios_tables_len);
>>>return;
>>> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
>>> index 0ad7d8c887..73fb3522ba 100644
>>> --- a/hw/loongarch/virt.c
>>> +++ b/hw/loongarch/virt.c
>>> @@ -320,7 +320,7 @@ static void virt_build_smbios(LoongArchMachineState 
>>> *lams)
>>>return;
>>>}
>>> 
>>> -smbios_set_defaults("QEMU", product, mc->name, false,
>>> +smbios_set_defaults("QEMU", product, mc->name,
>>>true, SMBIOS_ENTRY_POINT_TYPE_64);
>>> 
>>>smbios_get_tables(ms, NULL, 0, &smbios_tables, &smbios_tables_len,
>>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>>> index fd35c74781..e2c9529df2 100644
>>> --- a/hw/riscv/virt.c
>>> +++ b/hw/riscv/virt.c
>>> @@ -1235,7 +1235,7 @@ static void virt_build_smbios(RISCVVirtState *s)
>>>product = "KVM Virtual Machine";
>>>}
>>> 
>>> -smbios_set_defaults("QEMU", product, mc->name, false,
>>> +smbios_set_defaults("QEMU", product, mc->name,
>>>true, SMBIOS_ENTRY_POINT_TYPE_64);
>>> 
>>>if (riscv_is_32bit(&s->soc[0])) {
>>> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
>>> index 15339d8dbe..c46fc93357 100644
>>> --- a/hw/smbios/smbios.c
>>> +++ b/hw/smbios/smbios.c
>>> @@ -54,7 +54,6 @@ struct smbios_table {
>>> 
>>> static uint8_t *smbios_entries;
>>> static size_t smbios_entries_len;
>>> -static bool smbios_lega

Re: [PATCH v6 00/23] migration: File based migration with multifd and mapped-ram

2024-03-01 Thread Peter Xu

On Thu, Feb 29, 2024 at 12:29:54PM -0300, Fabiano Rosas wrote:
> Based-on: 74aa0fb297 (migration: options incompatible with cpr) # 
> peterx/migration-next
> 
> Hi,
> 
> In this v6:
> 
> - Minor fixes to 17/23 and 19/23

Thanks both for confirming, queued now.

-- 
Peter Xu

Re: [PATCH v2 1/3] qtest: migration: Enhance qtest migration functions to support 'channels' argument

2024-03-01 Thread Het Gala




On 29/02/24 6:47 am, Fabiano Rosas wrote:

Het Gala  writes:


On 27/02/24 1:04 am, Het Gala wrote:


On 26/02/24 6:31 pm, Fabiano Rosas wrote:

Het Gala  writes:


On 24/02/24 1:42 am, Fabiano Rosas wrote:
this was the same first approach that I attempted. It won't work because

The final 'migrate' QAPI with channels string would look like

{ "execute": "migrate", "arguments": { "channels": "[ { "channel-type":
"main", "addr": { "transport": "socket", "type": "inet", "host":
"10.117.29.84", "port": "4000" }, "multifd-channels": 2 } ]" } }

instead of

{ "execute": "migrate", "arguments": { "channels": [ { "channel-type":
"main", "addr": { "transport": "socket", "type": "inet", "host":
"10.117.29.84", "port": "4000" }, "multifd-channels": 2 } ] } }

It would complain, that channels should be an *array* and not a string.

So, that's the reason parsing was required in qtest too.

I would be glad to hear if there are any ideas to convert /string ->
json object -> add it inside qdict along with uri/ ?


Isn't this what the various qobject_from_json do? How does it work with
the existing tests?

  qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
   "  'arguments': { "
   "  'channels': [ { 'channel-type': 'main',"
   "  'addr': { 'transport': 'socket',"
   "'type': 'inet',"
   "'host': '127.0.0.1',"
   "'port': '0' } } ] } }");

We can pass this^ string successfully to QMP somehow...

I think, here in qtest_qmp_assert_success, we actually can pass the
whole QMP command, and it just asserts that return key is present in
the response, though I am not very much familiar with qtest codebase
to verify how swiftly we can convert string into an actual QObject.

[...]


I tried with qobject_from_json type of utility functions and the error I
got was this :

migration-test: /rpmbuild/SOURCES/qemu/include/qapi/qmp/qobject.h:126:
qobject_type: Assertion `QTYPE_NONE < obj->base.type && obj->base.type <
QTYPE__MAX' failed.

And I suppose this was the case because, there are only limited types of
QTYPE available

typedefenumQType{
QTYPE_NONE,
QTYPE_QNULL,
QTYPE_QNUM,
QTYPE_QSTRING,
QTYPE_QDICT,
QTYPE_QLIST,
QTYPE_QBOOL,
QTYPE__MAX,
} QType;

And 'channels' is a mixture of QDICT and QLIST and hence it is not able
to easily convert from string to json.

Thoughts on this ?

I'm not sure what you tried. This works:

 g_assert(!qdict_haskey(args, "channels"));
 if (channels) {
 channels_obj = qobject_from_json(channels, errp);
 qdict_put_obj(args, "channels", channels_obj);
 }


Are you sure the above works ?

Sorry I want to get this doubt cleared first, Let's me take a step back 
and just focus on the above part and not focus on port issue for now.


Adding a validation test for uri and channels both used together

migration_test_add("/migration/validate_uri/channels/both_set",
   test_validate_uri_channels_both_set);
static void test_validate_uri_channels_both_set(void)
{

   QTestState *from, *to;

   MigrateCommon args = {
    .connect_channels = "'channels': [ { 'channel-type': 'main',"
    "  'addr': { 'transport': 'socket',"
    "    'type': 'inet',"
    "    'host': '127.0.0.1',"
    "    'port': '0' } } ]",
    .listen_uri = "tcp:127.0.0.1:0",
    .result = MIG_TEST_QMP_ERROR,

    };

    if (test_migrate_start(&from, &to, args.listen_uri, &args.start)) {
    return;
    }

    wait_for_serial("src_serial");

    if (args.result == MIG_TEST_QMP_ERROR) {
    migrate_qmp_fail(from, args.connect_uri, args.connect_channels, 
"{}");

    } else {
    migrate_qmp(from, args.connect_uri, args.connect_channels, "{}");
    }

    test_migrate_end(from, to, false);
}

In the migration-helpers.c file, inside migrate_qmp_fail function:

void migrate_qmp_fail(QTestState *who, const char *uri,
  const char *channels, const char *fmt, ...)
{
    [...]

    Error **errp = NULL;
    QObject *channels_obj = NULL;

    [...]

    g_assert(!qdict_haskey(args, "channels"));
    if (channels) {
    channels_obj = qobject_from_json(channels, errp);
    if (!channels_obj) {
    fprintf(stdout, "Everything is right till here also ?\n");
    goto cleanup;
    }
    qdict_put_obj(args, "channels", channels_obj);
    }

    err = qtest_qmp_assert_failure_ref(
    who, "{ 'execute': 'migrate', 'arguments': %p}", args);

   [...]

cleanup:
    qobject_unref(channels_obj);
}

When I ran the test alone, test passed, but instead of giving output as

{"error": {"class": "GenericError", "desc": "need either 'uri' or 
'channels' argument"}}


It just passed,

[PATCH] migration/multifd: Document two places for mapped-ram

2024-03-01 Thread peterx

From: Peter Xu 

Add two documentations for mapped-ram migration on two spots that may not
be extremely clear.

Signed-off-by: Peter Xu 
---
Based-on: <20240229153017.2221-1-faro...@suse.de>
---
 migration/multifd.c | 12 
 migration/ram.c |  8 +++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index b4e5a9dfcc..2942395ce2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -709,6 +709,18 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 {
 if (p->c) {
 migration_ioc_unregister_yank(p->c);
+/*
+ * An explicitly close() on the channel here is normally not
+ * required, but can be helpful for "file:" iochannels, where it
+ * will include an fdatasync() to make sure the data is flushed to
+ * the disk backend.
+ *
+ * The object_unref() cannot guarantee that because: (1) finalize()
+ * of the iochannel is only triggered on the last reference, and
+ * it's not guaranteed that we always hold the last refcount when
+ * reaching here, and, (2) even if finalize() is invoked, it only
+ * does a close(fd) without data flush.
+ */
 qio_channel_close(p->c, &error_abort);
 object_unref(OBJECT(p->c));
 p->c = NULL;
diff --git a/migration/ram.c b/migration/ram.c
index 1f1b5297cf..c79e3de521 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4258,7 +4258,13 @@ static int ram_load_precopy(QEMUFile *f)
 switch (flags & ~RAM_SAVE_FLAG_CONTINUE) {
 case RAM_SAVE_FLAG_MEM_SIZE:
 ret = parse_ramblocks(f, addr);
-
+/*
+ * For mapped-ram migration (to a file) using multifd, we sync
+ * once and for all here to make sure all tasks we queued to
+ * multifd threads are completed, so that all the ramblocks
+ * (including all the guest memory pages within) are fully
+ * loaded after this sync returns.
+ */
 if (migrate_mapped_ram()) {
 multifd_recv_sync_main();
 }
-- 
2.44.0

[PATCH v5 10/17] hw/loongarch: fdt adds cpu interrupt controller node

fdt adds cpu interrupt controller node,
we use 'loongson,cpu-interrupt-controller'.

See:
https://github.com/torvalds/linux/blob/v6.7/drivers/irqchip/irq-loongarch-cpu.c
https://lore.kernel.org/r/20221114113824.1880-2-liupei...@loongson.cn

Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 5f787338a2..a8374c35a4 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -81,7 +81,23 @@ static void virt_flash_map(LoongArchMachineState *lams,
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
 memory_region_add_subregion(sysmem, base,
 sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 
0));
+}
+
+static void fdt_add_cpuic_node(LoongArchMachineState *lams,
+   uint32_t *cpuintc_phandle)
+{
+MachineState *ms = MACHINE(lams);
+char *nodename;
 
+*cpuintc_phandle = qemu_fdt_alloc_phandle(ms->fdt);
+nodename = g_strdup_printf("/cpuic");
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle", *cpuintc_phandle);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+"loongson,cpu-interrupt-controller");
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-controller", NULL, 0);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "#interrupt-cells", 1);
+g_free(nodename);
 }
 
 static void fdt_add_flash_node(LoongArchMachineState *lams)
@@ -492,6 +508,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 CPULoongArchState *env;
 CPUState *cpu_state;
 int cpu, pin, i, start, num;
+uint32_t cpuintc_phandle;
 
 /*
  * The connection of interrupts:
@@ -526,6 +543,9 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 memory_region_add_subregion(&lams->system_iocsr, MAIL_SEND_ADDR,
sysbus_mmio_get_region(SYS_BUS_DEVICE(ipi), 1));
 
+/* Add cpu interrupt-controller */
+fdt_add_cpuic_node(lams, &cpuintc_phandle);
+
 for (cpu = 0; cpu < ms->smp.cpus; cpu++) {
 cpu_state = qemu_get_cpu(cpu);
 cpudev = DEVICE(cpu_state);
-- 
2.25.1

[PATCH v5 08/17] hw/loongarch: Init efi_fdt table

Signed-off-by: Song Gao 
---
 include/hw/loongarch/boot.h |  4 
 hw/loongarch/boot.c | 11 +++
 2 files changed, 15 insertions(+)

diff --git a/include/hw/loongarch/boot.h b/include/hw/loongarch/boot.h
index ce47056608..bbe8c8dd5d 100644
--- a/include/hw/loongarch/boot.h
+++ b/include/hw/loongarch/boot.h
@@ -34,6 +34,10 @@ typedef struct {
 EFI_GUID(0x5568e427, 0x68fc, 0x4f3d,  0xac, 0x74, \
  0xca, 0x55, 0x52, 0x31, 0xcc, 0x68)
 
+#define DEVICE_TREE_GUID \
+EFI_GUID(0xb1b621d5, 0xf19c, 0x41a5,  0x83, 0x0b, \
+ 0xd9, 0x15, 0x2c, 0x69, 0xaa, 0xe0)
+
 struct efi_config_table {
 efi_guid_t guid;
 uint64_t *ptr;
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 6f56d4fd91..fe3e640508 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -111,6 +111,16 @@ static void init_efi_initrd_table(struct efi_system_table 
*systab,
 initrd_table->size = initrd_size;
 }
 
+static void init_efi_fdt_table(struct efi_system_table *systab)
+{
+efi_guid_t tbl_guid = DEVICE_TREE_GUID;
+
+/* efi_configuration_table 3 */
+guidcpy(&systab->tables[2].guid, &tbl_guid);
+systab->tables[2].table = (void *)0x10; /* fdt_base 1MiB */
+systab->nr_tables = 3;
+}
+
 static void init_systab(struct loongarch_boot_info *info, void *p, void *start)
 {
 void *bp_tables_start;
@@ -136,6 +146,7 @@ static void init_systab(struct loongarch_boot_info *info, 
void *p, void *start)
   sizeof(efi_memory_desc_t) * memmap_entries, 64);
 init_efi_initrd_table(systab, p, start);
 p += ROUND_UP(sizeof(struct efi_initrd), 64);
+init_efi_fdt_table(systab);
 
 systab->tables = (struct efi_configuration_table *)(bp_tables_start - 
start);
 }
-- 
2.25.1

[PATCH v5 02/17] hw/loongarch: Add load initrd

we load initrd ramdisk after kernel_high address

Signed-off-by: Song Gao 
---
 hw/loongarch/boot.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 0f2bc15fdf..3075c276d4 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -21,7 +21,8 @@ static uint64_t cpu_loongarch_virt_to_phys(void *opaque, 
uint64_t addr)
 
 static int64_t load_kernel_info(struct loongarch_boot_info *info)
 {
-uint64_t kernel_entry, kernel_low, kernel_high;
+uint64_t kernel_entry, kernel_low, kernel_high, initrd_size;
+ram_addr_t initrd_offset;
 ssize_t kernel_size;
 
 kernel_size = load_elf(info->kernel_filename, NULL,
@@ -36,6 +37,32 @@ static int64_t load_kernel_info(struct loongarch_boot_info 
*info)
  load_elf_strerror(kernel_size));
 exit(1);
 }
+
+if (info->initrd_filename) {
+initrd_size = get_image_size(info->initrd_filename);
+if (initrd_size > 0) {
+initrd_offset = ROUND_UP(kernel_high + 4 * kernel_size, 64 * KiB);
+
+if (initrd_offset + initrd_size > info->ram_size) {
+error_report("memory too small for initial ram disk '%s'",
+ info->initrd_filename);
+exit(1);
+}
+
+initrd_size = load_image_targphys(info->initrd_filename, 
initrd_offset,
+  info->ram_size - initrd_offset);
+}
+
+if (initrd_size == (target_ulong)-1) {
+error_report("could not load initial ram disk '%s'",
+ info->initrd_filename);
+exit(1);
+}
+} else {
+error_report("Need initrd!");
+exit(1);
+}
+
 return kernel_entry;
 }
 
-- 
2.25.1

[PATCH v5 07/17] hw/loongarch: Init efi_initrd table

Signed-off-by: Song Gao 
---
 include/hw/loongarch/boot.h |  9 +
 hw/loongarch/boot.c | 23 +--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/hw/loongarch/boot.h b/include/hw/loongarch/boot.h
index 7ad25080c5..ce47056608 100644
--- a/include/hw/loongarch/boot.h
+++ b/include/hw/loongarch/boot.h
@@ -30,6 +30,10 @@ typedef struct {
 EFI_GUID(0x800f683f, 0xd08b, 0x423a,  0xa2, 0x93, \
  0x96, 0x5c, 0x3c, 0x6f, 0xe2, 0xb4)
 
+#define LINUX_EFI_INITRD_MEDIA_GUID \
+EFI_GUID(0x5568e427, 0x68fc, 0x4f3d,  0xac, 0x74, \
+ 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68)
+
 struct efi_config_table {
 efi_guid_t guid;
 uint64_t *ptr;
@@ -83,6 +87,11 @@ struct efi_boot_memmap {
 efi_memory_desc_t map[32];
 };
 
+struct efi_initrd {
+uint64_t base;
+uint64_t size;
+};
+
 struct loongarch_boot_info {
 uint64_t ram_size;
 const char *kernel_filename;
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index c086070bc8..6f56d4fd91 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -14,6 +14,9 @@
 #include "qemu/error-report.h"
 #include "sysemu/reset.h"
 
+ram_addr_t initrd_offset;
+uint64_t initrd_size;
+
 static const unsigned int slave_boot_code[] = {
   /* Configure reset ebase. */
 0x0400302c,   /* csrwr  $r12,0xc*/
@@ -93,6 +96,21 @@ static void init_efi_boot_memmap(struct efi_system_table 
*systab,
 }
 }
 
+static void init_efi_initrd_table(struct efi_system_table *systab,
+  void *p, void *start)
+{
+efi_guid_t tbl_guid = LINUX_EFI_INITRD_MEDIA_GUID;
+struct efi_initrd *initrd_table  = p;
+
+/* efi_configuration_table 2 */
+guidcpy(&systab->tables[1].guid, &tbl_guid);
+systab->tables[1].table = (struct efi_configuration_table *)(p - start);
+systab->nr_tables = 2;
+
+initrd_table->base = initrd_offset;
+initrd_table->size = initrd_size;
+}
+
 static void init_systab(struct loongarch_boot_info *info, void *p, void *start)
 {
 void *bp_tables_start;
@@ -116,6 +134,8 @@ static void init_systab(struct loongarch_boot_info *info, 
void *p, void *start)
 init_efi_boot_memmap(systab, p, start);
 p += ROUND_UP(sizeof(struct efi_boot_memmap) +
   sizeof(efi_memory_desc_t) * memmap_entries, 64);
+init_efi_initrd_table(systab, p, start);
+p += ROUND_UP(sizeof(struct efi_initrd), 64);
 
 systab->tables = (struct efi_configuration_table *)(bp_tables_start - 
start);
 }
@@ -137,8 +157,7 @@ static uint64_t cpu_loongarch_virt_to_phys(void *opaque, 
uint64_t addr)
 
 static int64_t load_kernel_info(struct loongarch_boot_info *info)
 {
-uint64_t kernel_entry, kernel_low, kernel_high, initrd_size;
-ram_addr_t initrd_offset;
+uint64_t kernel_entry, kernel_low, kernel_high;
 ssize_t kernel_size;
 
 kernel_size = load_elf(info->kernel_filename, NULL,
-- 
2.25.1

[PATCH v5 00/17] Add boot LoongArch elf kernel with FDT

Hi, All

We already support boot efi kernel with bios, but not support boot elf kernel.
This series adds boot elf kernel with FDT.

'LoongArch supports ACPI and FDT. The information that needs to be passed
 to the kernel includes the memmap, the initrd, the command line, optionally
 the ACPI/FDT tables, and so on'  see [1].

Patch 2-8 : Create efi system table, and three efi configuration table
boot_memmap, initd, FDT.
Patch 9-17 : Fixes FDT problems.

Test:
  - Start kernel
See [2] start_kernel.sh
  - Start qcow2
See [2] start_qcow2.sh

V5:
  - Rebase;

V4:
  - patch 3 change slave_boot_code[] to const, and 'static void *p ' to
'void *p';
  - patch 4 fixes build error;
  - patch 10-13, add project and commit link.

V3:
  - Load initrd at  kernel_high + 4 * kernel_size;
  - Load 'boot_rom' at [0 - 1M], the 'boot_rom' includes
slave_boot_code, cmdline_buf and systab_tables;
  - R-b and rebase.

V2:
  - FDT pcie node adds cells 'msi-map';


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arch/loongarch/booting.rst?h=v6.7-rc4

[2]: https://github.com/gaosong-loongson/loongarch-binary/releases

Please review!

Thanks.
Song Gao

Song Gao (17):
  hw/loongarch: Move boot fucntions to boot.c
  hw/loongarch: Add load initrd
  hw/loongarch: Add slave cpu boot_code
  hw/loongarch: Add init_cmdline
  hw/loongarch: Init efi_system_table
  hw/loongarch: Init efi_boot_memmap table
  hw/loongarch: Init efi_initrd table
  hw/loongarch: Init efi_fdt table
  hw/loongarch: Fix fdt memory node wrong 'reg'
  hw/loongarch: fdt adds cpu interrupt controller node
  hw/loongarch: fdt adds Extend I/O Interrupt Controller
  hw/loongarch: fdt adds pch_pic Controller
  hw/loongarch: fdt adds pch_msi Controller
  hw/loongarch: fdt adds pcie irq_map node
  hw/loongarch: fdt remove unused irqchip node
  hw/loongarch: Add cells missing from uart node
  hw/loongarch: Add cells missing from rtc node

 include/hw/intc/loongarch_extioi.h |   1 +
 include/hw/loongarch/boot.h| 109 +
 include/hw/loongarch/virt.h|  14 ++
 include/hw/pci-host/ls7a.h |   2 +
 target/loongarch/cpu.h |   2 +
 hw/loongarch/boot.c| 330 ++
 hw/loongarch/virt.c| 364 -
 hw/loongarch/meson.build   |   1 +
 8 files changed, 661 insertions(+), 162 deletions(-)
 create mode 100644 include/hw/loongarch/boot.h
 create mode 100644 hw/loongarch/boot.c

-- 
2.25.1

[PATCH v5 16/17] hw/loongarch: Add cells missing from uart node

uart node need interrupts and interrupt-parent cells.

Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e2185d7bb4..5d92b2f1aa 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -209,7 +209,8 @@ static void fdt_add_rtc_node(LoongArchMachineState *lams)
 g_free(nodename);
 }
 
-static void fdt_add_uart_node(LoongArchMachineState *lams)
+static void fdt_add_uart_node(LoongArchMachineState *lams,
+  uint32_t *pch_pic_phandle)
 {
 char *nodename;
 hwaddr base = VIRT_UART_BASE;
@@ -222,6 +223,10 @@ static void fdt_add_uart_node(LoongArchMachineState *lams)
 qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0x0, base, 0x0, size);
 qemu_fdt_setprop_cell(ms->fdt, nodename, "clock-frequency", 1);
 qemu_fdt_setprop_string(ms->fdt, "/chosen", "stdout-path", nodename);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
+   VIRT_UART_IRQ - VIRT_GSI_BASE, 0x4);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupt-parent",
+  *pch_pic_phandle);
 g_free(nodename);
 }
 
@@ -593,7 +598,7 @@ static void loongarch_devices_init(DeviceState *pch_pic,
qdev_get_gpio_in(pch_pic,
 VIRT_UART_IRQ - VIRT_GSI_BASE),
115200, serial_hd(0), DEVICE_LITTLE_ENDIAN);
-fdt_add_uart_node(lams);
+fdt_add_uart_node(lams, pch_pic_phandle);
 
 /* Network init */
 pci_init_nic_devices(pci_bus, mc->default_nic);
-- 
2.25.1

[PATCH v5 17/17] hw/loongarch: Add cells missing from rtc node

rtc node need interrupts and interrupt-parent cells.

Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 5d92b2f1aa..6810f78ebd 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -195,7 +195,8 @@ static void fdt_add_flash_node(LoongArchMachineState *lams)
 g_free(nodename);
 }
 
-static void fdt_add_rtc_node(LoongArchMachineState *lams)
+static void fdt_add_rtc_node(LoongArchMachineState *lams,
+ uint32_t *pch_pic_phandle)
 {
 char *nodename;
 hwaddr base = VIRT_RTC_REG_BASE;
@@ -204,8 +205,13 @@ static void fdt_add_rtc_node(LoongArchMachineState *lams)
 
 nodename = g_strdup_printf("/rtc@%" PRIx64, base);
 qemu_fdt_add_subnode(ms->fdt, nodename);
-qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", 
"loongson,ls7a-rtc");
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+"loongson,ls7a-rtc");
 qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", 2, base, 2, size);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
+   VIRT_RTC_IRQ - VIRT_GSI_BASE , 0x4);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupt-parent",
+  *pch_pic_phandle);
 g_free(nodename);
 }
 
@@ -611,7 +617,7 @@ static void loongarch_devices_init(DeviceState *pch_pic,
 sysbus_create_simple("ls7a_rtc", VIRT_RTC_REG_BASE,
  qdev_get_gpio_in(pch_pic,
  VIRT_RTC_IRQ - VIRT_GSI_BASE));
-fdt_add_rtc_node(lams);
+fdt_add_rtc_node(lams, pch_pic_phandle);
 
 /* acpi ged */
 lams->acpi_ged = create_acpi_ged(pch_pic, lams);
-- 
2.25.1

[PATCH v5 09/17] hw/loongarch: Fix fdt memory node wrong 'reg'

The right fdt memory node like [1], not [2]

  [1]
memory@0 {
device_type = "memory";
reg = <0x00 0x00 0x00 0x1000>;
};
  [2]
memory@0 {
device_type = "memory";
reg = <0x02 0x00 0x02 0x1000>;
};

Reviewed-by: Bibo Mao 
Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 47e040628f..5f787338a2 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -290,7 +290,7 @@ static void fdt_add_memory_node(MachineState *ms,
 char *nodename = g_strdup_printf("/memory@%" PRIx64, base);
 
 qemu_fdt_add_subnode(ms->fdt, nodename);
-qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 2, base, 2, size);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0, base, 0, size);
 qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "memory");
 
 if (ms->numa_state && ms->numa_state->num_nodes) {
-- 
2.25.1

[PATCH v5 04/17] hw/loongarch: Add init_cmdline

Add init_cmline and set boot_info->a0, a1

Signed-off-by: Song Gao 
---
 include/hw/loongarch/virt.h |  2 ++
 target/loongarch/cpu.h  |  2 ++
 hw/loongarch/boot.c | 19 +++
 3 files changed, 23 insertions(+)

diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 6e6a5e487a..94ff15ad66 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -31,6 +31,8 @@
 #define VIRT_GED_MEM_ADDR   (VIRT_GED_EVT_ADDR + ACPI_GED_EVT_SEL_LEN)
 #define VIRT_GED_REG_ADDR   (VIRT_GED_MEM_ADDR + MEMORY_HOTPLUG_IO_LEN)
 
+#define COMMAND_LINE_SIZE   512
+
 struct LoongArchMachineState {
 /*< private >*/
 MachineState parent_obj;
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index ec37579fd6..ce02ef3979 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -361,6 +361,8 @@ typedef struct CPUArchState {
 uint32_t mp_state;
 /* Store ipistate to access from this struct */
 DeviceState *ipistate;
+
+struct loongarch_boot_info *boot_info;
 #endif
 } CPULoongArchState;
 
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 2f398260af..897d4636b6 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -62,6 +62,16 @@ static const unsigned int slave_boot_code[] = {
 0x4c20,   /* jirl   $r0,$r1,0   */
 };
 
+static void init_cmdline(struct loongarch_boot_info *info, void *p, void 
*start)
+{
+hwaddr cmdline_addr = (hwaddr)p - (hwaddr)start;
+
+info->a0 = 1;
+info->a1 = cmdline_addr;
+
+memcpy(p, info->kernel_cmdline, COMMAND_LINE_SIZE);
+}
+
 static uint64_t cpu_loongarch_virt_to_phys(void *opaque, uint64_t addr)
 {
 return addr & MAKE_64BIT_MASK(0, TARGET_PHYS_ADDR_SPACE_BITS);
@@ -121,6 +131,10 @@ static void reset_load_elf(void *opaque)
 
 cpu_reset(CPU(cpu));
 if (env->load_elf) {
+   if (cpu == LOONGARCH_CPU(first_cpu)) {
+env->gpr[4] = env->boot_info->a0;
+env->gpr[5] = env->boot_info->a1;
+}
 cpu_set_pc(CPU(cpu), env->elf_address);
 }
 }
@@ -160,8 +174,13 @@ static void loongarch_firmware_boot(LoongArchMachineState 
*lams,
 
 static void init_boot_rom(struct loongarch_boot_info *info, void *p)
 {
+void *start = p;
+
 memcpy(p, &slave_boot_code, sizeof(slave_boot_code));
 p += sizeof(slave_boot_code);
+
+init_cmdline(info, p, start);
+p += COMMAND_LINE_SIZE;
 }
 
 static void loongarch_direct_kernel_boot(struct loongarch_boot_info *info)
-- 
2.25.1

[PATCH v5 06/17] hw/loongarch: Init efi_boot_memmap table

Signed-off-by: Song Gao 
---
 include/hw/loongarch/boot.h | 27 +
 include/hw/loongarch/virt.h | 10 ++
 hw/loongarch/boot.c | 39 +
 hw/loongarch/virt.c | 11 ++-
 4 files changed, 78 insertions(+), 9 deletions(-)

diff --git a/include/hw/loongarch/boot.h b/include/hw/loongarch/boot.h
index aa12a60ffb..7ad25080c5 100644
--- a/include/hw/loongarch/boot.h
+++ b/include/hw/loongarch/boot.h
@@ -21,6 +21,15 @@ typedef struct {
 uint8_t b[16];
 } efi_guid_t __attribute__((aligned(8)));
 
+#define EFI_GUID(a, b, c, d...) (efi_guid_t){ {
\
+(a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, 
\
+(b) & 0xff, ((b) >> 8) & 0xff, 
\
+(c) & 0xff, ((c) >> 8) & 0xff, d } }
+
+#define LINUX_EFI_BOOT_MEMMAP_GUID \
+EFI_GUID(0x800f683f, 0xd08b, 0x423a,  0xa2, 0x93, \
+ 0x96, 0x5c, 0x3c, 0x6f, 0xe2, 0xb4)
+
 struct efi_config_table {
 efi_guid_t guid;
 uint64_t *ptr;
@@ -56,6 +65,24 @@ struct efi_system_table {
 struct efi_configuration_table *tables;
 };
 
+typedef struct {
+uint32_t type;
+uint32_t pad;
+uint64_t phys_addr;
+uint64_t virt_addr;
+uint64_t num_pages;
+uint64_t attribute;
+} efi_memory_desc_t;
+
+struct efi_boot_memmap {
+uint64_t map_size;
+uint64_t desc_size;
+uint32_t desc_ver;
+uint64_t map_key;
+uint64_t buff_size;
+efi_memory_desc_t map[32];
+};
+
 struct loongarch_boot_info {
 uint64_t ram_size;
 const char *kernel_filename;
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 94ff15ad66..d937586dbb 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -33,6 +33,16 @@
 
 #define COMMAND_LINE_SIZE   512
 
+extern struct memmap_entry *memmap_table;
+extern unsigned memmap_entries;
+
+struct memmap_entry {
+uint64_t address;
+uint64_t length;
+uint32_t type;
+uint32_t reserved;
+};
+
 struct LoongArchMachineState {
 /*< private >*/
 MachineState parent_obj;
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 769212fce2..c086070bc8 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -62,8 +62,40 @@ static const unsigned int slave_boot_code[] = {
 0x4c20,   /* jirl   $r0,$r1,0   */
 };
 
+static inline void *guidcpy(void *dst, const void *src)
+{
+return memcpy(dst, src, sizeof(efi_guid_t));
+}
+
+static void init_efi_boot_memmap(struct efi_system_table *systab,
+ void *p, void *start)
+{
+unsigned i;
+struct efi_boot_memmap *boot_memmap = p;
+efi_guid_t tbl_guid = LINUX_EFI_BOOT_MEMMAP_GUID;
+
+/* efi_configuration_table 1 */
+guidcpy(&systab->tables[0].guid, &tbl_guid);
+systab->tables[0].table = (struct efi_configuration_table *)(p - start);
+systab->nr_tables = 1;
+
+boot_memmap->desc_size = sizeof(efi_memory_desc_t);
+boot_memmap->desc_ver = 1;
+boot_memmap->map_size = 0;
+
+efi_memory_desc_t *map = p + sizeof(struct efi_boot_memmap);
+for (i = 0; i < memmap_entries; i++) {
+map = (void *)boot_memmap + sizeof(*map);
+map[i].type = memmap_table[i].type;
+map[i].phys_addr = memmap_table[i].address;
+map[i].num_pages = memmap_table[i].length >> 16; /* 64KB align*/
+p += sizeof(efi_memory_desc_t);
+}
+}
+
 static void init_systab(struct loongarch_boot_info *info, void *p, void *start)
 {
+void *bp_tables_start;
 struct efi_system_table *systab = p;
 
 info->a2 = (uint64_t)p - (uint64_t)start;
@@ -79,6 +111,13 @@ static void init_systab(struct loongarch_boot_info *info, 
void *p, void *start)
 p += ROUND_UP(sizeof(struct efi_system_table), 64);
 
 systab->tables = p;
+bp_tables_start = p;
+
+init_efi_boot_memmap(systab, p, start);
+p += ROUND_UP(sizeof(struct efi_boot_memmap) +
+  sizeof(efi_memory_desc_t) * memmap_entries, 64);
+
+systab->tables = (struct efi_configuration_table *)(bp_tables_start - 
start);
 }
 
 static void init_cmdline(struct loongarch_boot_info *info, void *p, void 
*start)
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index d929e4f9bf..47e040628f 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -342,15 +342,8 @@ static void virt_powerdown_req(Notifier *notifier, void 
*opaque)
 acpi_send_event(s->acpi_ged, ACPI_POWER_DOWN_STATUS);
 }
 
-struct memmap_entry {
-uint64_t address;
-uint64_t length;
-uint32_t type;
-uint32_t reserved;
-};
-
-static struct memmap_entry *memmap_table;
-static unsigned memmap_entries;
+struct memmap_entry *memmap_table;
+unsigned memmap_entries;
 
 static void memmap_add_entry(uint64_t address, uint64_t length, uint32_t type)
 {
-- 
2.25.1

[PATCH v5 03/17] hw/loongarch: Add slave cpu boot_code

Signed-off-by: Song Gao 
---
 hw/loongarch/boot.c | 70 -
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 3075c276d4..2f398260af 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -14,6 +14,54 @@
 #include "qemu/error-report.h"
 #include "sysemu/reset.h"
 
+static const unsigned int slave_boot_code[] = {
+  /* Configure reset ebase. */
+0x0400302c,   /* csrwr  $r12,0xc*/
+
+  /* Disable interrupt. */
+0x0380100c,   /* ori$r12,$r0,0x4*/
+0x04000180,   /* csrxchg$r0,$r12,0x0*/
+
+  /* Clear mailbox. */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x038081ad,   /* ori$r13,$r13,0x20  */
+0x06481da0,   /* iocsrwr.d  $r0,$r13*/
+
+  /* Enable IPI interrupt.  */
+0x142c,   /* lu12i.w$r12,1(0x1) */
+0x0400118c,   /* csrxchg$r12,$r12,0x4   */
+0x02fffc0c,   /* addi.d $r12,$r0,-1(0xfff)  */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x038011ad,   /* ori$r13,$r13,0x4   */
+0x064819ac,   /* iocsrwr.w  $r12,$r13   */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x038081ad,   /* ori$r13,$r13,0x20  */
+
+  /* Wait for wakeup  <.L11>:   */
+0x06488000,   /* idle   0x0 */
+0x0340,   /* andi   $r0,$r0,0x0 */
+0x064809ac,   /* iocsrrd.w  $r12,$r13   */
+0x43fff59f,   /* beqz   $r12,-12(0x74) # 48 <.L11> */
+
+  /* Read and clear IPI interrupt.  */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x064809ac,   /* iocsrrd.w  $r12,$r13   */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x038031ad,   /* ori$r13,$r13,0xc   */
+0x064819ac,   /* iocsrwr.w  $r12,$r13   */
+
+  /* Disable  IPI interrupt.*/
+0x142c,   /* lu12i.w$r12,1(0x1) */
+0x04001180,   /* csrxchg$r0,$r12,0x4*/
+
+  /* Read mail buf and jump to specified entry */
+0x142d,   /* lu12i.w$r13,1(0x1) */
+0x038081ad,   /* ori$r13,$r13,0x20  */
+0x06480dac,   /* iocsrrd.d  $r12,$r13   */
+0x00150181,   /* move   $r1,$r12*/
+0x4c20,   /* jirl   $r0,$r1,0   */
+};
+
 static uint64_t cpu_loongarch_virt_to_phys(void *opaque, uint64_t addr)
 {
 return addr & MAKE_64BIT_MASK(0, TARGET_PHYS_ADDR_SPACE_BITS);
@@ -110,8 +158,15 @@ static void loongarch_firmware_boot(LoongArchMachineState 
*lams,
 fw_cfg_add_kernel_info(info, lams->fw_cfg);
 }
 
+static void init_boot_rom(struct loongarch_boot_info *info, void *p)
+{
+memcpy(p, &slave_boot_code, sizeof(slave_boot_code));
+p += sizeof(slave_boot_code);
+}
+
 static void loongarch_direct_kernel_boot(struct loongarch_boot_info *info)
 {
+void  *p, *bp;
 int64_t kernel_addr = 0;
 LoongArchCPU *lacpu;
 CPUState *cs;
@@ -123,11 +178,24 @@ static void loongarch_direct_kernel_boot(struct 
loongarch_boot_info *info)
 exit(1);
 }
 
+/* Load 'boot_rom' at [0 - 1MiB] */
+p = g_malloc0(1 * MiB);
+bp = p;
+init_boot_rom(info, p);
+rom_add_blob_fixed("boot_rom", bp, 1 * MiB, 0);
+
 CPU_FOREACH(cs) {
 lacpu = LOONGARCH_CPU(cs);
 lacpu->env.load_elf = true;
-lacpu->env.elf_address = kernel_addr;
+if (cs == first_cpu) {
+lacpu->env.elf_address = kernel_addr;
+} else {
+lacpu->env.elf_address = 0;
+}
+lacpu->env.boot_info = info;
 }
+
+g_free(bp);
 }
 
 void loongarch_load_kernel(MachineState *ms, struct loongarch_boot_info *info)
-- 
2.25.1

[PATCH v5 15/17] hw/loongarch: fdt remove unused irqchip node

Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 31 +--
 1 file changed, 1 insertion(+), 30 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index ea73a80628..e2185d7bb4 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -410,34 +410,6 @@ static void fdt_add_pcie_node(const LoongArchMachineState 
*lams,
 g_free(nodename);
 }
 
-static void fdt_add_irqchip_node(LoongArchMachineState *lams)
-{
-MachineState *ms = MACHINE(lams);
-char *nodename;
-uint32_t irqchip_phandle;
-
-irqchip_phandle = qemu_fdt_alloc_phandle(ms->fdt);
-qemu_fdt_setprop_cell(ms->fdt, "/", "interrupt-parent", irqchip_phandle);
-
-nodename = g_strdup_printf("/intc@%lx", VIRT_IOAPIC_REG_BASE);
-qemu_fdt_add_subnode(ms->fdt, nodename);
-qemu_fdt_setprop_cell(ms->fdt, nodename, "#interrupt-cells", 3);
-qemu_fdt_setprop(ms->fdt, nodename, "interrupt-controller", NULL, 0);
-qemu_fdt_setprop_cell(ms->fdt, nodename, "#address-cells", 0x2);
-qemu_fdt_setprop_cell(ms->fdt, nodename, "#size-cells", 0x2);
-qemu_fdt_setprop(ms->fdt, nodename, "ranges", NULL, 0);
-
-qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
-"loongarch,ls7a");
-
-qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
- 2, VIRT_IOAPIC_REG_BASE,
- 2, PCH_PIC_ROUTE_ENTRY_OFFSET);
-
-qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle", irqchip_phandle);
-g_free(nodename);
-}
-
 static void fdt_add_memory_node(MachineState *ms,
 uint64_t base, uint64_t size, int node_id)
 {
@@ -959,8 +931,7 @@ static void loongarch_init(MachineState *machine)
 
 /* Initialize the IO interrupt subsystem */
 loongarch_irq_init(lams);
-fdt_add_irqchip_node(lams);
-platform_bus_add_all_fdt_nodes(machine->fdt, "/intc",
+platform_bus_add_all_fdt_nodes(machine->fdt, "/platic",
VIRT_PLATFORM_BUS_BASEADDRESS,
VIRT_PLATFORM_BUS_SIZE,
VIRT_PLATFORM_BUS_IRQ);
-- 
2.25.1

[PATCH v5 01/17] hw/loongarch: Move boot fucntions to boot.c

Move some boot functions to boot.c and struct
loongarch_boot_info into struct LoongArchMachineState.

Signed-off-by: Song Gao 
---
 include/hw/loongarch/boot.h |  21 ++
 include/hw/loongarch/virt.h |   2 +
 hw/loongarch/boot.c | 125 
 hw/loongarch/virt.c | 123 +++
 hw/loongarch/meson.build|   1 +
 5 files changed, 157 insertions(+), 115 deletions(-)
 create mode 100644 include/hw/loongarch/boot.h
 create mode 100644 hw/loongarch/boot.c

diff --git a/include/hw/loongarch/boot.h b/include/hw/loongarch/boot.h
new file mode 100644
index 00..3275c1e295
--- /dev/null
+++ b/include/hw/loongarch/boot.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Definitions for LoongArch boot.
+ *
+ * Copyright (C) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef HW_LOONGARCH_BOOT_H
+#define HW_LOONGARCH_BOOT_H
+
+struct loongarch_boot_info {
+uint64_t ram_size;
+const char *kernel_filename;
+const char *kernel_cmdline;
+const char *initrd_filename;
+uint64_t a0, a1, a2;
+};
+
+void loongarch_load_kernel(MachineState *ms, struct loongarch_boot_info *info);
+
+#endif /* HW_LOONGARCH_BOOT_H */
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 6ef9a92394..6e6a5e487a 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -13,6 +13,7 @@
 #include "qemu/queue.h"
 #include "hw/intc/loongarch_ipi.h"
 #include "hw/block/flash.h"
+#include "hw/loongarch/boot.h"
 
 #define LOONGARCH_MAX_CPUS  256
 
@@ -53,6 +54,7 @@ struct LoongArchMachineState {
 MemoryRegion system_iocsr;
 MemoryRegion iocsr_mem;
 AddressSpace as_iocsr;
+struct loongarch_boot_info bootinfo;
 };
 
 #define TYPE_LOONGARCH_MACHINE  MACHINE_TYPE_NAME("virt")
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
new file mode 100644
index 00..0f2bc15fdf
--- /dev/null
+++ b/hw/loongarch/boot.c
@@ -0,0 +1,125 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LoongArch boot helper functions.
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "target/loongarch/cpu.h"
+#include "hw/loongarch/virt.h"
+#include "hw/loader.h"
+#include "elf.h"
+#include "qemu/error-report.h"
+#include "sysemu/reset.h"
+
+static uint64_t cpu_loongarch_virt_to_phys(void *opaque, uint64_t addr)
+{
+return addr & MAKE_64BIT_MASK(0, TARGET_PHYS_ADDR_SPACE_BITS);
+}
+
+static int64_t load_kernel_info(struct loongarch_boot_info *info)
+{
+uint64_t kernel_entry, kernel_low, kernel_high;
+ssize_t kernel_size;
+
+kernel_size = load_elf(info->kernel_filename, NULL,
+   cpu_loongarch_virt_to_phys, NULL,
+   &kernel_entry, &kernel_low,
+   &kernel_high, NULL, 0,
+   EM_LOONGARCH, 1, 0);
+
+if (kernel_size < 0) {
+error_report("could not load kernel '%s': %s",
+ info->kernel_filename,
+ load_elf_strerror(kernel_size));
+exit(1);
+}
+return kernel_entry;
+}
+
+static void reset_load_elf(void *opaque)
+{
+LoongArchCPU *cpu = opaque;
+CPULoongArchState *env = &cpu->env;
+
+cpu_reset(CPU(cpu));
+if (env->load_elf) {
+cpu_set_pc(CPU(cpu), env->elf_address);
+}
+}
+
+static void fw_cfg_add_kernel_info(struct loongarch_boot_info *info,
+   FWCfgState *fw_cfg)
+{
+/*
+ * Expose the kernel, the command line, and the initrd in fw_cfg.
+ * We don't process them here at all, it's all left to the
+ * firmware.
+ */
+load_image_to_fw_cfg(fw_cfg,
+ FW_CFG_KERNEL_SIZE, FW_CFG_KERNEL_DATA,
+ info->kernel_filename,
+ false);
+
+if (info->initrd_filename) {
+load_image_to_fw_cfg(fw_cfg,
+ FW_CFG_INITRD_SIZE, FW_CFG_INITRD_DATA,
+ info->initrd_filename, false);
+}
+
+if (info->kernel_cmdline) {
+fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
+   strlen(info->kernel_cmdline) + 1);
+fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA,
+  info->kernel_cmdline);
+}
+}
+
+static void loongarch_firmware_boot(LoongArchMachineState *lams,
+struct loongarch_boot_info *info)
+{
+fw_cfg_add_kernel_info(info, lams->fw_cfg);
+}
+
+static void loongarch_direct_kernel_boot(struct loongarch_boot_info *info)
+{
+int64_t kernel_addr = 0;
+LoongArchCPU *lacpu;
+CPUState *cs;
+
+if (info->kernel_filename) {
+kernel_addr = load_kernel_info(info);
+} else {
+error_report("Need kernel filename\n");
+exit(1);
+}
+
+CPU_FOREACH(cs) {
+lacpu = LOONGARCH_CPU(cs);
+lacpu-

Re: [PATCH v5 06/12] tests/plugin/mem: migrate to new per_vcpu API


On 2/29/24 5:46 PM, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 2/29/24 11:08 AM, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 2/29/24 2:08 AM, Alex Bennée wrote:

Luc Michel  writes:


Hi Pierrick,







My bad. Other plugins enable only inline when both are supplied, so I
missed this here.
I added an explicit error when user enable callback and inline at the
same time on this plugin.

We could default to inline unless we need the more features of
callbacks?



This remark applies for all plugins in general. Now we have safe and
correct inline operations, callbacks are not needed in some case.

Would that be acceptable for you that we delay this "cleanup" to
existing plugins after soft freeze? So we can focus on merging current
API changes needed.


As long as we fix the double reporting bug sure.



Yes, fixed in v6.

Re: [PATCH v6 06/12] tests/plugin/mem: migrate to new per_vcpu API


On 2/29/24 10:08 PM, Alex Bennée wrote:

Pierrick Bouvier  writes:


Reviewed-by: Luc Michel 
Signed-off-by: Pierrick Bouvier 
---
  tests/plugin/mem.c | 46 +++---
  1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index 44e91065ba7..b650dddcce1 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -16,9 +16,14 @@
  
  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
  
-static uint64_t inline_mem_count;

-static uint64_t cb_mem_count;
-static uint64_t io_count;
+typedef struct {
+uint64_t mem_count;
+uint64_t io_count;
+} CPUCount;
+
+static struct qemu_plugin_scoreboard *counts;
+static qemu_plugin_u64 mem_count;
+static qemu_plugin_u64 io_count;
  static bool do_inline, do_callback;
  static bool do_haddr;
  static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
@@ -27,16 +32,16 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
  {
  g_autoptr(GString) out = g_string_new("");
  
-if (do_inline) {

-g_string_printf(out, "inline mem accesses: %" PRIu64 "\n", 
inline_mem_count);
-}
-if (do_callback) {
-g_string_append_printf(out, "callback mem accesses: %" PRIu64 "\n", 
cb_mem_count);
+if (do_inline || do_callback) {
+g_string_printf(out, "mem accesses: %" PRIu64 "\n",
+qemu_plugin_u64_sum(mem_count));
  }
  if (do_haddr) {
-g_string_append_printf(out, "io accesses: %" PRIu64 "\n", io_count);
+g_string_append_printf(out, "io accesses: %" PRIu64 "\n",
+   qemu_plugin_u64_sum(io_count));
  }
  qemu_plugin_outs(out->str);
+qemu_plugin_scoreboard_free(counts);
  }
  
  static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,

@@ -46,12 +51,12 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t meminfo,
  struct qemu_plugin_hwaddr *hwaddr;
  hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
  if (qemu_plugin_hwaddr_is_io(hwaddr)) {
-io_count++;
+qemu_plugin_u64_add(io_count, cpu_index, 1);
  } else {
-cb_mem_count++;
+qemu_plugin_u64_add(mem_count, cpu_index, 1);
  }
  } else {
-cb_mem_count++;
+qemu_plugin_u64_add(mem_count, cpu_index, 1);
  }
  }
  
@@ -64,9 +69,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)

  struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
  
  if (do_inline) {

-qemu_plugin_register_vcpu_mem_inline(insn, rw,
- QEMU_PLUGIN_INLINE_ADD_U64,
- &inline_mem_count, 1);
+qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+insn, rw,
+QEMU_PLUGIN_INLINE_ADD_U64,
+mem_count, 1);
  }
  if (do_callback) {
  qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem,
@@ -117,6 +123,16 @@ QEMU_PLUGIN_EXPORT int 
qemu_plugin_install(qemu_plugin_id_t id,
  }
  }
  
+if (do_inline && do_callback) {

+fprintf(stderr,
+"can't enable inline and callback counting at the same 
time\n");
+return -1;
+}
+
+counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
+mem_count = qemu_plugin_scoreboard_u64_in_struct(
+counts, CPUCount, mem_count);
+io_count = qemu_plugin_scoreboard_u64_in_struct(counts, CPUCount, 
io_count);
  qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
  qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
  return 0;


You need the following to keep check-tcg working:

modified   tests/tcg/Makefile.target
@@ -168,7 +168,7 @@ RUN_TESTS+=$(EXTRA_RUNS)
  
  # Some plugins need additional arguments above the default to fully

  # exercise things. We can define them on a per-test basis here.
-run-plugin-%-with-libmem.so: 
PLUGIN_ARGS=$(COMMA)inline=true$(COMMA)callback=true
+run-plugin-%-with-libmem.so: PLUGIN_ARGS=$(COMMA)inline=true
  
  ifeq ($(filter %-softmmu, $(TARGET)),)

  run-%: %



Missed that, will fix in next version. Thanks!

Re: [PATCH v5 00/17] Add boot LoongArch elf kernel with FDT

2024-03-01 Thread gaosong


Hi,

If there are no new comments, I'll add this series to the loongarch-next 
branch next week.


Thanks.
Song Gao

在 2024/3/1 下午5:38, Song Gao 写道:

Hi, All

We already support boot efi kernel with bios, but not support boot elf kernel.
This series adds boot elf kernel with FDT.

'LoongArch supports ACPI and FDT. The information that needs to be passed
  to the kernel includes the memmap, the initrd, the command line, optionally
  the ACPI/FDT tables, and so on'  see [1].

Patch 2-8 : Create efi system table, and three efi configuration table
 boot_memmap, initd, FDT.
Patch 9-17 : Fixes FDT problems.

Test:
   - Start kernel
 See [2] start_kernel.sh
   - Start qcow2
 See [2] start_qcow2.sh

V5:
   - Rebase;

V4:
   - patch 3 change slave_boot_code[] to const, and 'static void *p ' to
 'void *p';
   - patch 4 fixes build error;
   - patch 10-13, add project and commit link.

V3:
   - Load initrd at  kernel_high + 4 * kernel_size;
   - Load 'boot_rom' at [0 - 1M], the 'boot_rom' includes
 slave_boot_code, cmdline_buf and systab_tables;
   - R-b and rebase.

V2:
   - FDT pcie node adds cells 'msi-map';


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arch/loongarch/booting.rst?h=v6.7-rc4

[2]: https://github.com/gaosong-loongson/loongarch-binary/releases

Please review!

Thanks.
Song Gao

Song Gao (17):
   hw/loongarch: Move boot fucntions to boot.c
   hw/loongarch: Add load initrd
   hw/loongarch: Add slave cpu boot_code
   hw/loongarch: Add init_cmdline
   hw/loongarch: Init efi_system_table
   hw/loongarch: Init efi_boot_memmap table
   hw/loongarch: Init efi_initrd table
   hw/loongarch: Init efi_fdt table
   hw/loongarch: Fix fdt memory node wrong 'reg'
   hw/loongarch: fdt adds cpu interrupt controller node
   hw/loongarch: fdt adds Extend I/O Interrupt Controller
   hw/loongarch: fdt adds pch_pic Controller
   hw/loongarch: fdt adds pch_msi Controller
   hw/loongarch: fdt adds pcie irq_map node
   hw/loongarch: fdt remove unused irqchip node
   hw/loongarch: Add cells missing from uart node
   hw/loongarch: Add cells missing from rtc node

  include/hw/intc/loongarch_extioi.h |   1 +
  include/hw/loongarch/boot.h| 109 +
  include/hw/loongarch/virt.h|  14 ++
  include/hw/pci-host/ls7a.h |   2 +
  target/loongarch/cpu.h |   2 +
  hw/loongarch/boot.c| 330 ++
  hw/loongarch/virt.c| 364 -
  hw/loongarch/meson.build   |   1 +
  8 files changed, 661 insertions(+), 162 deletions(-)
  create mode 100644 include/hw/loongarch/boot.h
  create mode 100644 hw/loongarch/boot.c

[PATCH v5 14/17] hw/loongarch: fdt adds pcie irq_map node

Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 73 ++---
 1 file changed, 69 insertions(+), 4 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e1fe1ea97f..ea73a80628 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -316,7 +316,62 @@ static void fdt_add_fw_cfg_node(const 
LoongArchMachineState *lams)
 g_free(nodename);
 }
 
-static void fdt_add_pcie_node(const LoongArchMachineState *lams)
+static void fdt_add_pcie_irq_map_node(const LoongArchMachineState *lams,
+  char *nodename,
+  uint32_t *pch_pic_phandle)
+{
+int pin, dev;
+uint32_t irq_map_stride = 0;
+uint32_t full_irq_map[GPEX_NUM_IRQS *GPEX_NUM_IRQS * 10] = {};
+uint32_t *irq_map = full_irq_map;
+const MachineState *ms = MACHINE(lams);
+
+/* This code creates a standard swizzle of interrupts such that
+ * each device's first interrupt is based on it's PCI_SLOT number.
+ * (See pci_swizzle_map_irq_fn())
+ *
+ * We only need one entry per interrupt in the table (not one per
+ * possible slot) seeing the interrupt-map-mask will allow the table
+ * to wrap to any number of devices.
+ */
+
+for (dev = 0; dev < GPEX_NUM_IRQS; dev++) {
+int devfn = dev * 0x8;
+
+for (pin = 0; pin  < GPEX_NUM_IRQS; pin++) {
+int irq_nr = 16 + ((pin + PCI_SLOT(devfn)) % GPEX_NUM_IRQS);
+int i = 0;
+
+/* Fill PCI address cells */
+irq_map[i] = cpu_to_be32(devfn << 8);
+i += 3;
+
+/* Fill PCI Interrupt cells */
+irq_map[i] = cpu_to_be32(pin + 1);
+i += 1;
+
+/* Fill interrupt controller phandle and cells */
+irq_map[i++] = cpu_to_be32(*pch_pic_phandle);
+irq_map[i++] = cpu_to_be32(irq_nr);
+
+if (!irq_map_stride) {
+irq_map_stride = i;
+}
+irq_map += irq_map_stride;
+}
+}
+
+
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-map", full_irq_map,
+ GPEX_NUM_IRQS * GPEX_NUM_IRQS *
+ irq_map_stride * sizeof(uint32_t));
+qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupt-map-mask",
+ 0x1800, 0, 0, 0x7);
+}
+
+static void fdt_add_pcie_node(const LoongArchMachineState *lams,
+  uint32_t *pch_pic_phandle,
+  uint32_t *pch_msi_phandle)
 {
 char *nodename;
 hwaddr base_mmio = VIRT_PCI_MEM_BASE;
@@ -347,6 +402,11 @@ static void fdt_add_pcie_node(const LoongArchMachineState 
*lams)
  2, base_pio, 2, size_pio,
  1, FDT_PCI_RANGE_MMIO, 2, base_mmio,
  2, base_mmio, 2, size_mmio);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "msi-map",
+   0, *pch_msi_phandle, 0, 0x1);
+
+fdt_add_pcie_irq_map_node(lams, nodename, pch_pic_phandle);
+
 g_free(nodename);
 }
 
@@ -505,7 +565,10 @@ static DeviceState *create_platform_bus(DeviceState 
*pch_pic)
 return dev;
 }
 
-static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState 
*lams)
+static void loongarch_devices_init(DeviceState *pch_pic,
+   LoongArchMachineState *lams,
+   uint32_t *pch_pic_phandle,
+   uint32_t *pch_msi_phandle)
 {
 MachineClass *mc = MACHINE_GET_CLASS(lams);
 DeviceState *gpex_dev;
@@ -551,6 +614,9 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
 gpex_set_irq_num(GPEX_HOST(gpex_dev), i, 16 + i);
 }
 
+/* Add pcie node */
+fdt_add_pcie_node(lams, pch_pic_phandle, pch_msi_phandle);
+
 serial_mm_init(get_system_memory(), VIRT_UART_BASE, 0,
qdev_get_gpio_in(pch_pic,
 VIRT_UART_IRQ - VIRT_GSI_BASE),
@@ -697,7 +763,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 /* Add PCH MSI node */
 fdt_add_pch_msi_node(lams, &eiointc_phandle, &pch_msi_phandle);
 
-loongarch_devices_init(pch_pic, lams);
+loongarch_devices_init(pch_pic, lams, &pch_pic_phandle, &pch_msi_phandle);
 }
 
 static void loongarch_firmware_init(LoongArchMachineState *lams)
@@ -904,7 +970,6 @@ static void loongarch_init(MachineState *machine)
 lams->powerdown_notifier.notify = virt_powerdown_req;
 qemu_register_powerdown_notifier(&lams->powerdown_notifier);
 
-fdt_add_pcie_node(lams);
 /*
  * Since lowmem region starts from 0 and Linux kernel legacy start address
  * at 2 MiB, FDT base address is located at 1 MiB to avoid NULL pointer
-- 
2.25.1

[PATCH v5 05/17] hw/loongarch: Init efi_system_table

Add init_systab and set boot_info->a2

Signed-off-by: Song Gao 
---
 include/hw/loongarch/boot.h | 48 +
 hw/loongarch/boot.c | 22 +
 2 files changed, 70 insertions(+)

diff --git a/include/hw/loongarch/boot.h b/include/hw/loongarch/boot.h
index 3275c1e295..aa12a60ffb 100644
--- a/include/hw/loongarch/boot.h
+++ b/include/hw/loongarch/boot.h
@@ -8,6 +8,54 @@
 #ifndef HW_LOONGARCH_BOOT_H
 #define HW_LOONGARCH_BOOT_H
 
+/* UEFI 2.10 */
+#define EFI_SYSTEM_TABLE_SIGNATURE   0x5453595320494249
+#define EFI_2_100_SYSTEM_TABLE_REVISION  ((2<<16) | (100))
+#define EFI_SPECIFICATION_VERSIONEFI_SYSTEM_TABLE_REVISION
+#define EFI_SYSTEM_TABLE_REVISIONEFI_2_100_SYSTEM_TABLE_REVISION
+
+#define FW_VERSION 0x1
+#define FW_PATCHLEVEL 0x0
+
+typedef struct {
+uint8_t b[16];
+} efi_guid_t __attribute__((aligned(8)));
+
+struct efi_config_table {
+efi_guid_t guid;
+uint64_t *ptr;
+const char name[16];
+};
+
+typedef struct {
+uint64_t signature;
+uint32_t revision;
+uint32_t headersize;
+uint32_t crc32;
+uint32_t reserved;
+} efi_table_hdr_t;
+
+struct efi_configuration_table {
+efi_guid_t guid;
+void *table;
+};
+
+struct efi_system_table {
+efi_table_hdr_t hdr;
+uint64_t fw_vendor;/* physical addr of CHAR16 vendor string */
+uint32_t fw_revision;
+uint64_t con_in_handle;
+uint64_t *con_in;
+uint64_t con_out_handle;
+uint64_t *con_out;
+uint64_t stderr_handle;
+uint64_t stderr;
+uint64_t *runtime;
+uint64_t *boottime;
+uint64_t nr_tables;
+struct efi_configuration_table *tables;
+};
+
 struct loongarch_boot_info {
 uint64_t ram_size;
 const char *kernel_filename;
diff --git a/hw/loongarch/boot.c b/hw/loongarch/boot.c
index 897d4636b6..769212fce2 100644
--- a/hw/loongarch/boot.c
+++ b/hw/loongarch/boot.c
@@ -62,6 +62,25 @@ static const unsigned int slave_boot_code[] = {
 0x4c20,   /* jirl   $r0,$r1,0   */
 };
 
+static void init_systab(struct loongarch_boot_info *info, void *p, void *start)
+{
+struct efi_system_table *systab = p;
+
+info->a2 = (uint64_t)p - (uint64_t)start;
+
+systab->hdr.signature = EFI_SYSTEM_TABLE_SIGNATURE;
+systab->hdr.revision = EFI_SPECIFICATION_VERSION;
+systab->hdr.revision = sizeof(struct efi_system_table),
+systab->fw_revision = FW_VERSION << 16 | FW_PATCHLEVEL << 8;
+systab->runtime = 0;
+systab->boottime = 0;
+systab->nr_tables = 0;
+
+p += ROUND_UP(sizeof(struct efi_system_table), 64);
+
+systab->tables = p;
+}
+
 static void init_cmdline(struct loongarch_boot_info *info, void *p, void 
*start)
 {
 hwaddr cmdline_addr = (hwaddr)p - (hwaddr)start;
@@ -134,6 +153,7 @@ static void reset_load_elf(void *opaque)
if (cpu == LOONGARCH_CPU(first_cpu)) {
 env->gpr[4] = env->boot_info->a0;
 env->gpr[5] = env->boot_info->a1;
+env->gpr[6] = env->boot_info->a2;
 }
 cpu_set_pc(CPU(cpu), env->elf_address);
 }
@@ -181,6 +201,8 @@ static void init_boot_rom(struct loongarch_boot_info *info, 
void *p)
 
 init_cmdline(info, p, start);
 p += COMMAND_LINE_SIZE;
+
+init_systab(info, p, start);
 }
 
 static void loongarch_direct_kernel_boot(struct loongarch_boot_info *info)
-- 
2.25.1

[PATCH v5 12/17] hw/loongarch: fdt adds pch_pic Controller

fdt adds pch pic controller, we use 'loongson,pch-pic-1.0'

See:
https://github.com/torvalds/linux/blob/v6.7/drivers/irqchip/irq-loongson-pch-pic.c
https://lore.kernel.org/r/20200528152757.1028711-4-jiaxun.y...@flygoat.com

Signed-off-by: Song Gao 
---
 include/hw/pci-host/ls7a.h |  1 +
 hw/loongarch/virt.c| 30 +-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index e753449593..fe260f0183 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -24,6 +24,7 @@
 #define VIRT_PCH_REG_BASE0x1000UL
 #define VIRT_IOAPIC_REG_BASE (VIRT_PCH_REG_BASE)
 #define VIRT_PCH_MSI_ADDR_LOW0x2FF0UL
+#define VIRT_PCH_REG_SIZE0x400
 
 /*
  * GSI_BASE is hard-coded with 64 in linux kernel, else kernel fails to boot
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 4091a14080..87908d1f79 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -125,6 +125,31 @@ static void fdt_add_eiointc_node(LoongArchMachineState 
*lams,
 g_free(nodename);
 }
 
+static void fdt_add_pch_pic_node(LoongArchMachineState *lams,
+ uint32_t *eiointc_phandle,
+ uint32_t *pch_pic_phandle)
+{
+MachineState *ms = MACHINE(lams);
+char *nodename;
+hwaddr pch_pic_base = VIRT_PCH_REG_BASE;
+hwaddr pch_pic_size = VIRT_PCH_REG_SIZE;
+
+*pch_pic_phandle = qemu_fdt_alloc_phandle(ms->fdt);
+nodename = g_strdup_printf("/platic@%" PRIx64, pch_pic_base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_cell(ms->fdt,  nodename, "phandle", *pch_pic_phandle);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+"loongson,pch-pic-1.0");
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0,
+   pch_pic_base, 0, pch_pic_size);
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-controller", NULL, 0);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "#interrupt-cells", 2);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupt-parent",
+  *eiointc_phandle);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "loongson,pic-base-vec", 0);
+g_free(nodename);
+}
+
 static void fdt_add_flash_node(LoongArchMachineState *lams)
 {
 MachineState *ms = MACHINE(lams);
@@ -533,7 +558,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 CPULoongArchState *env;
 CPUState *cpu_state;
 int cpu, pin, i, start, num;
-uint32_t cpuintc_phandle, eiointc_phandle;
+uint32_t cpuintc_phandle, eiointc_phandle, pch_pic_phandle;
 
 /*
  * The connection of interrupts:
@@ -624,6 +649,9 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 qdev_connect_gpio_out(DEVICE(d), i, qdev_get_gpio_in(extioi, i));
 }
 
+/* Add PCH PIC node */
+fdt_add_pch_pic_node(lams, &eiointc_phandle, &pch_pic_phandle);
+
 pch_msi = qdev_new(TYPE_LOONGARCH_PCH_MSI);
 start   =  num;
 num = EXTIOI_IRQS - start;
-- 
2.25.1

[PATCH v5 13/17] hw/loongarch: fdt adds pch_msi Controller

fdt adds pch msi controller, we use 'loongson,pch-msi-1.0'.

See:
https://github.com/torvalds/linux/blob/v6.7/drivers/irqchip/irq-loongson-pch-msi.c
https://lore.kernel.org/r/20200528152757.1028711-6-jiaxun.y...@flygoat.com

Signed-off-by: Song Gao 
---
 include/hw/pci-host/ls7a.h |  1 +
 hw/loongarch/virt.c| 33 -
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index fe260f0183..cd7c9ec7bc 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -25,6 +25,7 @@
 #define VIRT_IOAPIC_REG_BASE (VIRT_PCH_REG_BASE)
 #define VIRT_PCH_MSI_ADDR_LOW0x2FF0UL
 #define VIRT_PCH_REG_SIZE0x400
+#define VIRT_PCH_MSI_SIZE0x8
 
 /*
  * GSI_BASE is hard-coded with 64 in linux kernel, else kernel fails to boot
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 87908d1f79..e1fe1ea97f 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -150,6 +150,34 @@ static void fdt_add_pch_pic_node(LoongArchMachineState 
*lams,
 g_free(nodename);
 }
 
+static void fdt_add_pch_msi_node(LoongArchMachineState *lams,
+ uint32_t *eiointc_phandle,
+ uint32_t *pch_msi_phandle)
+{
+MachineState *ms = MACHINE(lams);
+char *nodename;
+hwaddr pch_msi_base = VIRT_PCH_MSI_ADDR_LOW;
+hwaddr pch_msi_size = VIRT_PCH_MSI_SIZE;
+
+*pch_msi_phandle = qemu_fdt_alloc_phandle(ms->fdt);
+nodename = g_strdup_printf("/msi@%" PRIx64, pch_msi_base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle", *pch_msi_phandle);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+"loongson,pch-msi-1.0");
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg",
+   0, pch_msi_base,
+   0, pch_msi_size);
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-controller", NULL, 0);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupt-parent",
+  *eiointc_phandle);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "loongson,msi-base-vec",
+  VIRT_PCH_PIC_IRQ_NUM);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "loongson,msi-num-vecs",
+  EXTIOI_IRQS - VIRT_PCH_PIC_IRQ_NUM);
+g_free(nodename);
+}
+
 static void fdt_add_flash_node(LoongArchMachineState *lams)
 {
 MachineState *ms = MACHINE(lams);
@@ -558,7 +586,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 CPULoongArchState *env;
 CPUState *cpu_state;
 int cpu, pin, i, start, num;
-uint32_t cpuintc_phandle, eiointc_phandle, pch_pic_phandle;
+uint32_t cpuintc_phandle, eiointc_phandle, pch_pic_phandle, 
pch_msi_phandle;
 
 /*
  * The connection of interrupts:
@@ -666,6 +694,9 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
   qdev_get_gpio_in(extioi, i + start));
 }
 
+/* Add PCH MSI node */
+fdt_add_pch_msi_node(lams, &eiointc_phandle, &pch_msi_phandle);
+
 loongarch_devices_init(pch_pic, lams);
 }
 
-- 
2.25.1

[PATCH v5 11/17] hw/loongarch: fdt adds Extend I/O Interrupt Controller

fdt adds Extend I/O Interrupt Controller,
we use 'loongson,ls2k2000-eiointc'.

See:
https://github.com/torvalds/linux/blob/v6.7/drivers/irqchip/irq-loongson-eiointc.c
https://lore.kernel.org/r/764e02d924094580ac0f1d15535f4b98308705c6.1683279769.git.zhoubin...@loongson.cn

Signed-off-by: Song Gao 
---
 include/hw/intc/loongarch_extioi.h |  1 +
 hw/loongarch/virt.c| 30 +-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/hw/intc/loongarch_extioi.h 
b/include/hw/intc/loongarch_extioi.h
index a0a46b888c..410c6e1121 100644
--- a/include/hw/intc/loongarch_extioi.h
+++ b/include/hw/intc/loongarch_extioi.h
@@ -39,6 +39,7 @@
 #define EXTIOI_COREISR_END   (0xB20 - APIC_OFFSET)
 #define EXTIOI_COREMAP_START (0xC00 - APIC_OFFSET)
 #define EXTIOI_COREMAP_END   (0xD00 - APIC_OFFSET)
+#define EXTIOI_SIZE  0x800
 
 typedef struct ExtIOICore {
 uint32_t coreisr[EXTIOI_IRQS_GROUP_COUNT];
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index a8374c35a4..4091a14080 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -100,6 +100,31 @@ static void fdt_add_cpuic_node(LoongArchMachineState *lams,
 g_free(nodename);
 }
 
+static void fdt_add_eiointc_node(LoongArchMachineState *lams,
+  uint32_t *cpuintc_phandle,
+  uint32_t *eiointc_phandle)
+{
+MachineState *ms = MACHINE(lams);
+char *nodename;
+hwaddr extioi_base = APIC_BASE;
+hwaddr extioi_size = EXTIOI_SIZE;
+
+*eiointc_phandle = qemu_fdt_alloc_phandle(ms->fdt);
+nodename = g_strdup_printf("/eiointc@%" PRIx64, extioi_base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle", *eiointc_phandle);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+"loongson,ls2k2000-eiointc");
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-controller", NULL, 0);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "#interrupt-cells", 1);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupt-parent",
+  *cpuintc_phandle);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "interrupts", 3);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0x0,
+   extioi_base, 0x0, extioi_size);
+g_free(nodename);
+}
+
 static void fdt_add_flash_node(LoongArchMachineState *lams)
 {
 MachineState *ms = MACHINE(lams);
@@ -508,7 +533,7 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 CPULoongArchState *env;
 CPUState *cpu_state;
 int cpu, pin, i, start, num;
-uint32_t cpuintc_phandle;
+uint32_t cpuintc_phandle, eiointc_phandle;
 
 /*
  * The connection of interrupts:
@@ -577,6 +602,9 @@ static void loongarch_irq_init(LoongArchMachineState *lams)
 }
 }
 
+/* Add Extend I/O Interrupt Controller node */
+fdt_add_eiointc_node(lams, &cpuintc_phandle, &eiointc_phandle);
+
 pch_pic = qdev_new(TYPE_LOONGARCH_PCH_PIC);
 num = VIRT_PCH_PIC_IRQ_NUM;
 qdev_prop_set_uint32(pch_pic, "pch_pic_irq_num", num);
-- 
2.25.1

Re: [PATCH v5 08/12] tests/plugin/bb: migrate to new per_vcpu API


On 2/29/24 6:21 PM, Alex Bennée wrote:

Pierrick Bouvier  writes:


Signed-off-by: Pierrick Bouvier 


I did notice there is a discrepancy between what libisns and libb
report. The libb looks like an overcount so I wonder if there are some
instructions we are not picking up but I can't see where that would be.

   ➜  ./qemu-hppa -plugin ./tests/plugin/libinsn.so -plugin 
./tests/plugin/libbb.so,inline=true -d plugin  
./tests/tcg/hppa-linux-user/sha512
   1..10
   ok 1 - do_test(&tests[i])
   ok 2 - do_test(&tests[i])
   ok 3 - do_test(&tests[i])
   ok 4 - do_test(&tests[i])
   ok 5 - do_test(&tests[i])
   ok 6 - do_test(&tests[i])
   ok 7 - do_test(&tests[i])
   ok 8 - do_test(&tests[i])
   ok 9 - do_test(&tests[i])
   ok 10 - do_test(&tests[i])
   CPU0: bb's: 54282, insns: 775697
   Total: bb's: 54282, insns: 775697
   cpu 0 insns: 774827
   total insns: 774827

Although weirdly maybe only an hppa thing. Richard?



Do you observe the exact same number if you run only one of the plugin?

bb count number of instructions in an executed block, while insn 
effectively count every instructions ran.
Maybe there is hppa specifity that makes some tb exit in the middle, 
thus executing less instructions than expected from bb count.


I don't know how to reproduce this test. Did you run it from a specific 
docker env?



   ➜  ./qemu-aarch64 -plugin ./tests/plugin/libinsn.so -plugin 
./tests/plugin/libbb.so,inline=true -d plugin  
./tests/tcg/aarch64-linux-user/sha512
   1..10
   ok 1 - do_test(&tests[i])
   ok 2 - do_test(&tests[i])
   ok 3 - do_test(&tests[i])
   ok 4 - do_test(&tests[i])
   ok 5 - do_test(&tests[i])
   ok 6 - do_test(&tests[i])
   ok 7 - do_test(&tests[i])
   ok 8 - do_test(&tests[i])
   ok 9 - do_test(&tests[i])
   ok 10 - do_test(&tests[i])
   CPU0: bb's: 41513, insns: 302671
   Total: bb's: 41513, insns: 302671
   cpu 0 insns: 302671
   total insns: 302671

Anyway:

Reviewed-by: Alex Bennée

[PATCH 0/1] kvm: add support for guest physical bits

2024-03-01 Thread Gerd Hoffmann

The matching kernel bits are here:
https://lore.kernel.org/kvm/20240301101410.356007-1-kra...@redhat.com/T/

Gerd Hoffmann (1):
  kvm: add support for guest physical bits

 target/i386/cpu.h | 1 +
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm.c | 8 
 3 files changed, 10 insertions(+)

-- 
2.44.0

[PATCH 1/1] kvm: add support for guest physical bits

2024-03-01 Thread Gerd Hoffmann

query kvm for supported guest physical address bits using
KVM_CAP_VM_GPA_BITS.  Expose the value to the guest via cpuid
(leaf 0x8008, eax, bits 16-23).

Signed-off-by: Gerd Hoffmann 
---
 target/i386/cpu.h | 1 +
 target/i386/cpu.c | 1 +
 target/i386/kvm/kvm.c | 8 
 3 files changed, 10 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 952174bb6f52..d427218827f6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2026,6 +2026,7 @@ struct ArchCPU {
 
 /* Number of physical address bits supported */
 uint32_t phys_bits;
+uint32_t guest_phys_bits;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2666ef380891..1a6cfc75951e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6570,6 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
 /* 64 bit processor */
  *eax |= (cpu_x86_virtual_addr_width(env) << 8);
+ *eax |= (cpu->guest_phys_bits << 16);
 }
 *ebx = env->features[FEAT_8000_0008_EBX];
 if (cs->nr_cores * cs->nr_threads > 1) {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 42970ab046fa..e06c9d66bb01 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1716,6 +1716,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = &cpu->env;
 uint32_t limit, i, j, cpuid_i;
+uint32_t guest_phys_bits;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
 uint32_t signature[3];
@@ -1751,6 +1752,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
 
+guest_phys_bits = kvm_check_extension(cs->kvm_state, KVM_CAP_VM_GPA_BITS);
+if (guest_phys_bits &&
+(cpu->guest_phys_bits == 0 ||
+ cpu->guest_phys_bits > guest_phys_bits)) {
+cpu->guest_phys_bits = guest_phys_bits;
+}
+
 /*
  * kvm_hyperv_expand_features() is called here for the second time in case
  * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly 
handle
-- 
2.44.0

Re: [PATCH v2 0/5] Add device DM163 (led driver, matrix colors shield & display)

On Wed, 28 Feb 2024 at 12:06, Thomas Huth  wrote:
>
> On 28/02/2024 12.31, Inès Varhol wrote:
> > TLDR: how can I provide a test or an example?
> >
> > I've tested the display by running custom executables and
> > comparing to the result on the real board, but I don't
> > know how to test it using a QTest.

> I'd suggest to try to take a screenshot via QMP or HMP. If you need a third
> party binary to get something on the screen, it's maybe best to write an
> avocado test instead of a qtest. See for example
> tests/avocado/machine_m68k_nextcube.py - it takes a screenshot and then
> converts the console messages from it via Tessaract to text and looks for
> the expected strings there. Alternatively, have a look at
> tests/avocado/machine_s390_ccw_virtio.py - it boots a Linux in the guest
> that writes a string into the /dev/fb0 framebuffer, then takes a screendump
> and then tries to find the string in the file. The latter could maybe also
> be done via a qtest if you know how to set up the framebuffer without a
> guest kernel.

The other extremely popular approach for testing display devices
is "I looked at the screen while I was doing the development,
and it was fine" :-)

-- PMM

Re: [PULL 26/29] contrib/plugins: extend execlog to track register changes

Zhao Liu  writes:

> Hi Alex,
>
> I hit the following warnings (with "./configure --enable-werror"):
>
> /qemu/contrib/plugins/execlog.c: In function ‘registers_init’:
> /qemu/contrib/plugins/execlog.c:330:17: warning: ‘g_pattern_match_string’ is 
> deprecated: Use 'g_pattern_spec_match_string' instead 
> [-Wdeprecated-declarations]
>   330 | if (g_pattern_match_string(pat, rd->name) ||
>   | ^~
> In file included from /usr/include/glib-2.0/glib.h:65,
>  from /qemu/contrib/plugins/execlog.c:9:
> /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
>55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
>   |   ^~
> /qemu/contrib/plugins/execlog.c:331:21: warning: ‘g_pattern_match_string’ is 
> deprecated: Use 'g_pattern_spec_match_string' instead 
> [-Wdeprecated-declarations]
>   331 | g_pattern_match_string(pat, rd_lower)) {
>   | ^~
> In file included from /usr/include/glib-2.0/glib.h:65,
>  from /qemu/contrib/plugins/execlog.c:9:
> /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
>55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
>   |   ^~
> /qemu/contrib/plugins/execlog.c:339:63: warning: passing argument 2 of 
> ‘g_ptr_array_add’ discards ‘const’ qualifier from pointer target type 
> [-Wdiscarded-qualifiers]
>   339 | g_ptr_array_add(all_reg_names,
> reg->name);

Hmm I missed that. Not sure what the neatest solution is in this case -
g_ptr_array_new() doesn't have a destroy func so we shouldn't ever
attempt to free it. We can manually re-add the const qualifier at the
other end for completeness and I guess comment and cast?


>   |~~~^~
> In file included from /usr/include/glib-2.0/glib.h:31,
>  from /qemu/contrib/plugins/execlog.c:9:
> /usr/include/glib-2.0/glib/garray.h:192:62: note: expected ‘gpointer’ {aka 
> ‘void *’} but argument is of type ‘const char *’
>   192 |gpointer  data);
>   |~~^~~~
>
> In addition, I checked my glic version:
>
> $ldd --version
> ldd (Ubuntu GLIBC 2.35-0ubuntu3.5) 2.35
>
> I think it's v2.35. Are these three warning reports valid?

It's the glib (not glibc) version that matters here.
g_pattern_match_string was deprecated in 2.70 when the suggested
alternative was added. However our baseline for glib is still:

  # When bumping glib minimum version, please check also whether to increase
  # the _WIN32_WINNT setting in osdep.h according to the value from glib
  glib_req_ver = '>=2.56.0'
  glib_pc = dependency('glib-2.0', version: glib_req_ver, required: true,
  method: 'pkg-config')

The usual solution for this is to throw in a compat wrapper in
glib-compat.h:

--8<---cut here---start->8---
modified   include/glib-compat.h
@@ -105,6 +105,24 @@ static inline gpointer g_memdup2_qemu(gconstpointer mem, 
gsize byte_size)
 }
 #define g_memdup2(m, s) g_memdup2_qemu(m, s)
 
+/*
+ * g_pattern_match_string has been deprecated in Glib since 2.70 and
+ * will complain about it if you try to use it. Fortunately the
+ * signature of both functions is the same making it easy to work
+ * around.
+ */
+static inline
+gboolean g_pattern_spec_match_string_qemu(GPatternSpec *pspec,
+  const gchar *string)
+{
+#if GLIB_CHECK_VERSION(2, 70, 0)
+return g_pattern_spec_match_string(pspec, string);
+#else
+return g_pattern_match_string(pspec, string);
+#endif
+};
+#define g_pattern_spec_match_string(p, s) g_pattern_spec_match_string_qemu(p, 
s)
+
 #if defined(G_OS_UNIX)
 /*
  * Note: The fallback implementation is not MT-safe, and it returns a copy of
modified   contrib/plugins/execlog.c
@@ -334,8 +334,8 @@ static void registers_init(int vcpu_index)
 for (int p = 0; p < rmatches->len; p++) {
 g_autoptr(GPatternSpec) pat = 
g_pattern_spec_new(rmatches->pdata[p]);
 g_autofree gchar *rd_lower = g_utf8_strdown(rd->name, -1);
-if (g_pattern_match_string(pat, rd->name) ||
-g_pattern_match_string(pat, rd_lower)) {
+if (g_pattern_spec_match_string(pat, rd->name) ||
+g_pattern_spec_match_string(pat, rd_lower)) {
 Register *reg = init_vcpu_register(vcpu_index, rd);
 g_ptr_array_add(registers, reg);
--8<---cut here---end--->8---

but I hesitated to add it for this case as plugins shouldn't assume they
have access to QEMU's internals. Maybe the glib-compat.h header could be
treated as a special case.

>
> I also noticed in target

Re: [PATCH v5 08/12] tests/plugin/bb: migrate to new per_vcpu API

Pierrick Bouvier  writes:

> On 2/29/24 6:21 PM, Alex Bennée wrote:
>> Pierrick Bouvier  writes:
>> 
>>> Signed-off-by: Pierrick Bouvier 
>> I did notice there is a discrepancy between what libisns and libb
>> report. The libb looks like an overcount so I wonder if there are some
>> instructions we are not picking up but I can't see where that would be.
>>➜  ./qemu-hppa -plugin ./tests/plugin/libinsn.so -plugin
>> ./tests/plugin/libbb.so,inline=true -d plugin
>> ./tests/tcg/hppa-linux-user/sha512
>>1..10
>>ok 1 - do_test(&tests[i])
>>ok 2 - do_test(&tests[i])
>>ok 3 - do_test(&tests[i])
>>ok 4 - do_test(&tests[i])
>>ok 5 - do_test(&tests[i])
>>ok 6 - do_test(&tests[i])
>>ok 7 - do_test(&tests[i])
>>ok 8 - do_test(&tests[i])
>>ok 9 - do_test(&tests[i])
>>ok 10 - do_test(&tests[i])
>>CPU0: bb's: 54282, insns: 775697
>>Total: bb's: 54282, insns: 775697
>>cpu 0 insns: 774827
>>total insns: 774827
>> Although weirdly maybe only an hppa thing. Richard?
>> 
>
> Do you observe the exact same number if you run only one of the plugin?
>
> bb count number of instructions in an executed block, while insn
> effectively count every instructions ran.
> Maybe there is hppa specifity that makes some tb exit in the middle,
> thus executing less instructions than expected from bb count.

Almost certainly - I just wasn't sure what would do that on straight
line code. Probably some funky aspect of HPPA I'm not aware off ;-)

>
> I don't know how to reproduce this test. Did you run it from a
> specific docker env?

If you have docker enabled the "make check-tcg" will build and use a
container to build the test cases. If you are on debian you just need:

apt install gcc-hppa-linux-gnu libc6-dev-hppa-cross

and re-run configure.

>
>>➜  ./qemu-aarch64 -plugin ./tests/plugin/libinsn.so -plugin 
>> ./tests/plugin/libbb.so,inline=true -d plugin  
>> ./tests/tcg/aarch64-linux-user/sha512
>>1..10
>>ok 1 - do_test(&tests[i])
>>ok 2 - do_test(&tests[i])
>>ok 3 - do_test(&tests[i])
>>ok 4 - do_test(&tests[i])
>>ok 5 - do_test(&tests[i])
>>ok 6 - do_test(&tests[i])
>>ok 7 - do_test(&tests[i])
>>ok 8 - do_test(&tests[i])
>>ok 9 - do_test(&tests[i])
>>ok 10 - do_test(&tests[i])
>>CPU0: bb's: 41513, insns: 302671
>>Total: bb's: 41513, insns: 302671
>>cpu 0 insns: 302671
>>total insns: 302671
>> Anyway:
>> Reviewed-by: Alex Bennée 
>> 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH V4 10/14] migration: stop vm for cpr

2024-03-01 Thread Cédric Le Goater


On 3/1/24 02:28, Peter Xu wrote:

On Thu, Feb 29, 2024 at 10:21:14AM -0500, Steven Sistare wrote:

On 2/25/2024 9:08 PM, Peter Xu wrote:

On Thu, Feb 22, 2024 at 09:28:36AM -0800, Steve Sistare wrote:

When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare 


Reviewed-by: Peter Xu 

cpr-reboot mode keeps changing behavior.

Could we declare it "experimental" until it's solid?  Maybe a patch to
document this?

Normally IMHO we shouldn't merge a feature if it's not complete, however
cpr-reboot is so special that the mode itself is already merged in 8.2
before I started to merge patches, and it keeps changing things.  I don't
know what else we can do here besides declaring it experimental and not
declare it a stable feature.


Hi Peter, the planned/committed functionality for cpr-reboot changed only once, 
in:
 migration: stop vm for cpr

Suspension to support vfio is an enhancement which adds to the basic 
functionality,
it does not change it.  This was planned all along, but submitted as a separate


If VFIO used to migrate without suspension and now it won't, it's a
behavior change?


series to manage complexity, as I outlined in my qemu community presentation,
which I emailed you at the time.

Other "changes" that arose during review were just clarifications and 
explanations.

So, I don't think cpr-reboot deserves to be condemned to experimental limbo.


IMHO it's not about a feature being condemned, it's about a kindful
heads-up to the users that one needs to take risk on using it until it
becomes stable, it also makes developers easier because of no limitation on
behavior change.  If all the changes are landing, then there's no need for
such a patch.

If so, please propose the planned complete document patch. I had a feeling
that cpr is still not fully understood by even many developers on the list.
It'll be great that such document will contain all the details one needs to
know on the feature, etc. meaning of the name cpr-reboot (what is "cpr"),
when to use it, how to use it, how it differs from "normal" mode
(etc. lifted limitations on some devices that used to block migration?),
what is enforced (vm stop, suspension, etc.) and what is optionally offered
(VFIO, shared-mem, etc.), the expected behaviors, etc.

When send it, please copy relevant people (Alex & Cedric for VFIO, while
Markus could also be a good candidate considering the QMP involvement).


I second that. If we could have a file under docs/, it would be great.

Thanks,

C.

Re: [PATCH v5 08/12] tests/plugin/bb: migrate to new per_vcpu API


On 3/1/24 2:26 PM, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 2/29/24 6:21 PM, Alex Bennée wrote:

Pierrick Bouvier  writes:


Signed-off-by: Pierrick Bouvier 

I did notice there is a discrepancy between what libisns and libb
report. The libb looks like an overcount so I wonder if there are some
instructions we are not picking up but I can't see where that would be.
➜  ./qemu-hppa -plugin ./tests/plugin/libinsn.so -plugin
./tests/plugin/libbb.so,inline=true -d plugin
./tests/tcg/hppa-linux-user/sha512
1..10
ok 1 - do_test(&tests[i])
ok 2 - do_test(&tests[i])
ok 3 - do_test(&tests[i])
ok 4 - do_test(&tests[i])
ok 5 - do_test(&tests[i])
ok 6 - do_test(&tests[i])
ok 7 - do_test(&tests[i])
ok 8 - do_test(&tests[i])
ok 9 - do_test(&tests[i])
ok 10 - do_test(&tests[i])
CPU0: bb's: 54282, insns: 775697
Total: bb's: 54282, insns: 775697
cpu 0 insns: 774827
total insns: 774827
Although weirdly maybe only an hppa thing. Richard?



Do you observe the exact same number if you run only one of the plugin?

bb count number of instructions in an executed block, while insn
effectively count every instructions ran.
Maybe there is hppa specifity that makes some tb exit in the middle,
thus executing less instructions than expected from bb count.


Almost certainly - I just wasn't sure what would do that on straight
line code. Probably some funky aspect of HPPA I'm not aware off ;-)



I don't know how to reproduce this test. Did you run it from a
specific docker env?


If you have docker enabled the "make check-tcg" will build and use a
container to build the test cases. If you are on debian you just need:

 apt install gcc-hppa-linux-gnu libc6-dev-hppa-cross

and re-run configure.



Thanks.

The difference observed predates this series, so there should definitely 
be something specific to this arch.





➜  ./qemu-aarch64 -plugin ./tests/plugin/libinsn.so -plugin 
./tests/plugin/libbb.so,inline=true -d plugin  
./tests/tcg/aarch64-linux-user/sha512
1..10
ok 1 - do_test(&tests[i])
ok 2 - do_test(&tests[i])
ok 3 - do_test(&tests[i])
ok 4 - do_test(&tests[i])
ok 5 - do_test(&tests[i])
ok 6 - do_test(&tests[i])
ok 7 - do_test(&tests[i])
ok 8 - do_test(&tests[i])
ok 9 - do_test(&tests[i])
ok 10 - do_test(&tests[i])
CPU0: bb's: 41513, insns: 302671
Total: bb's: 41513, insns: 302671
cpu 0 insns: 302671
total insns: 302671
Anyway:
Reviewed-by: Alex Bennée

Re: [PATCH v2 0/5] Add device DM163 (led driver, matrix colors shield & display)


On 01/03/2024 11.21, Peter Maydell wrote:

On Wed, 28 Feb 2024 at 12:06, Thomas Huth  wrote:


On 28/02/2024 12.31, Inès Varhol wrote:

TLDR: how can I provide a test or an example?

I've tested the display by running custom executables and
comparing to the result on the real board, but I don't
know how to test it using a QTest.



I'd suggest to try to take a screenshot via QMP or HMP. If you need a third
party binary to get something on the screen, it's maybe best to write an
avocado test instead of a qtest. See for example
tests/avocado/machine_m68k_nextcube.py - it takes a screenshot and then
converts the console messages from it via Tessaract to text and looks for
the expected strings there. Alternatively, have a look at
tests/avocado/machine_s390_ccw_virtio.py - it boots a Linux in the guest
that writes a string into the /dev/fb0 framebuffer, then takes a screendump
and then tries to find the string in the file. The latter could maybe also
be done via a qtest if you know how to set up the framebuffer without a
guest kernel.


The other extremely popular approach for testing display devices
is "I looked at the screen while I was doing the development,
and it was fine" :-)


Well, but that doesn't help that much against regressions that can occur in 
the course of time...


 Thomas

Re: [PATCH v3 2/3] tools: build qemu-vmsr-helper

2024-03-01 Thread Anthony Harivel

Hi Paolo,

> > > +static void compute_default_paths(void)
> > > +{
> > > +socket_path = g_build_filename("/run", "qemu-vmsr-helper.sock", 
> > > NULL);
> > > +pidfile = g_build_filename("/run", "qemu-vmsr-helper.pid", NULL);
> > > +}
> >
> > We shouldn't be hardcoding /run, we need to honour --prefix and
> > --localstatedir args given to configure.  /var/run is a symlink
> > to /run so the end result ends up the same AFAIK
>
> Indeed; just copy from scsi/qemu-pr-helper.c.
>

When I copy the same as compute-default() of scsi/qemu-pr-helper.c, the 
helper is listening to "/var/local/run/qemu-vmsr-helper.sock".

Problem is /var/local/run is 700 while /run is 755.

So I would need to adjust the qemu-vmsr-helper.service to give 
the --socket=PATH of qemu-vmsr-helper.socket (i.e 
 /run/qemu-vmsr-helper.sock)

Problem is when I do that , I fall into the case: 
"Unix socket can't be set when using socket activation"

So I'm a bit confused what to do about that.

> > You never answered my question from the previous posting of this
> >
> > This check is merely validating the the thread ID in the message
> > is a child of the process ID connected to the socket. Any process
> > on the entire host can satisfy this requirement.
> >
> > I don't see what is limiting this to only QEMU as claimed by the
> > commit message, unless you're expecting the UNIX socket permissions
> > to be such that only processes under the qemu:qemu user:group pair
> > can access to the socket ? That would be a libvirt based permissions
> > assumption though.
>
> Yes, this is why the systemd socket uses 600, like
> contrib/systemd/qemu-pr-helper.socket. The socket can be passed via
> SCM_RIGHTS by libvirt, or its permissions can be changed (e.g. 660 and
> root:kvm would make sense on a Debian system), or a separate helper
> can be started by libvirt.
>
> Either way, the policy is left to the user rather than embedding it in
> the provided systemd unit.
>

During this patch test, when I run by hand my VM (without libvirt),
the vmsr helper systemd service/socket was like that: 
[Service]
ExecStart=/usr/bin/qemu-vmsr-helper
User=root
Group=root

and 

[Socket]
ListenStream=/run/qemu-vmsr-helper.sock
SocketUser=qemu
SocketGroup=qemu
SocketMode=0660

And it seems to works. So I'm not sure 100% what to do in my patch.

Should I follow the pr-helper systemd files anyway ?

Regards,
Anthony

Re: [QEMU][PATCH v3 1/7] softmmu: physmem: Split ram_block_add()

Vikram Garhwal  writes:

> Extract ram block list update to a new function ram_block_add_list(). This is
> done to support grant mappings which adds a memory region for granted memory 
> and
> updates the ram_block list.
>
> Signed-off-by: Juergen Gross 
> Signed-off-by: Vikram Garhwal 
> Reviewed-by: Stefano Stabellini 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v7 0/3] string list functions

2024-03-01 Thread Markus Armbruster

Steven Sistare  writes:

> All the changes look good - steve

Thanks!

I can stick this into my next PR, ETA early next week.

[PATCH] replay: Improve error messages about configuration conflicts

2024-03-01 Thread Markus Armbruster

Improve

   Record/replay feature is not supported for '-rtc base=localtime'
   Record/replay feature is not supported for 'smp'
   Record/replay feature is not supported for '-snapshot'

to

   Record/replay is not supported with -rtc base=localtime
   Record/replay is not supported with multiple CPUs
   Record/replay is not supported with -snapshot

Signed-off-by: Markus Armbruster 
---
 replay/replay.c | 2 +-
 system/vl.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/replay/replay.c b/replay/replay.c
index 3fd241a4fc..a2c576c16e 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -511,7 +511,7 @@ void replay_add_blocker(const char *feature)
 {
 Error *reason = NULL;
 
-error_setg(&reason, "Record/replay feature is not supported for '%s'",
+error_setg(&reason, "Record/replay is not supported with %s",
feature);
 replay_blockers = g_slist_prepend(replay_blockers, reason);
 }
diff --git a/system/vl.c b/system/vl.c
index e480afd7a0..cc03a17c09 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1932,7 +1932,7 @@ static void qemu_apply_machine_options(QDict *qdict)
 }
 
 if (current_machine->smp.cpus > 1) {
-replay_add_blocker("smp");
+replay_add_blocker("multiple CPUs");
 }
 }
 
-- 
2.43.0

Re: [PATCH] migration/multifd: Document two places for mapped-ram

2024-03-01 Thread Fabiano Rosas

pet...@redhat.com writes:

> From: Peter Xu 
>
> Add two documentations for mapped-ram migration on two spots that may not
> be extremely clear.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 1/3] qtest: migration: Enhance qtest migration functions to support 'channels' argument

2024-03-01 Thread Het Gala




On 01/03/24 2:19 pm, Het Gala wrote:


On 29/02/24 6:47 am, Fabiano Rosas wrote:

Het Gala  writes:


On 27/02/24 1:04 am, Het Gala wrote:


On 26/02/24 6:31 pm, Fabiano Rosas wrote:

Het Gala  writes:


On 24/02/24 1:42 am, Fabiano Rosas wrote:
this was the same first approach that I attempted. It won't work 
because


The final 'migrate' QAPI with channels string would look like


[...]

I'm not sure what you tried. This works:

 g_assert(!qdict_haskey(args, "channels"));
 if (channels) {
 channels_obj = qobject_from_json(channels, errp);
 qdict_put_obj(args, "channels", channels_obj);
 }


Are you sure the above works ?


Sorry, please ignore the below doubt. I understood what silly mistakes I 
was doing. you were right, qobject_from_json() works fine and converts 
string to json object.


Will follow your latest reply and send out the patch really soon. Thank 
you for unblocking me here quickly :)



[...]


Regards,

Het Gala

Re: [PATCH v2] hw/nvme/ns: Add NVMe NGUID property

2024-03-01 Thread Klaus Jensen

On Feb 22 17:50, Nabih Estefan wrote:
> From: Roque Arcudia Hernandez 
> 
> This patch adds a way to specify an NGUID for a given NVMe Namespace using a
> string of hexadecimal digits with an optional '-' separator to group bytes. 
> For
> instance:
> 
> -device nvme-ns,nguid="e9accd3b83904e13167cf0593437f57d"
> 
> If provided, the NGUID will be part of the Namespace Identification Descriptor
> list and the Identify Namespace data.
> 
> Signed-off-by: Roque Arcudia Hernandez 
> Signed-off-by: Nabih Estefan 
> Reviewed-by: Klaus Jensen 

Thanks, applied to nvme-next!


signature.asc
Description: PGP signature

Re: [PATCH v5 09/12] contrib/plugins/hotblocks: migrate to new per_vcpu API

2024-03-01 Thread Luc Michel

On 14:33 Thu 29 Feb , Alex Bennée wrote:
> Caution: This message originated from an External Source. Use proper caution 
> when opening attachments, clicking links, or responding.
> 
> 
> Luc Michel  writes:
> 
> > On 15:09 Tue 27 Feb , Pierrick Bouvier wrote:
> >> On 2/27/24 2:54 PM, Luc Michel wrote:
> >> > Hi Pierrick,
> >> >
> >> > On 13:14 Mon 26 Feb , Pierrick Bouvier wrote:
> >> > > Signed-off-by: Pierrick Bouvier 
> >> > > ---
> >> > >   contrib/plugins/hotblocks.c | 50 
> >> > > ++---
> >> > >   1 file changed, 30 insertions(+), 20 deletions(-)
> >> > >
> >> > > diff --git a/contrib/plugins/hotblocks.c b/contrib/plugins/hotblocks.c
> >> > > index 4de1b134944..02bc5078bdd 100644
> >> > > --- a/contrib/plugins/hotblocks.c
> >> > > +++ b/contrib/plugins/hotblocks.c
> >> > > @@ -34,8 +34,8 @@ static guint64 limit = 20;
> >> > >*/
> >> > >   typedef struct {
> >> > >   uint64_t start_addr;
> >> > > -uint64_t exec_count;
> >> > > -int  trans_count;
> >> > > +struct qemu_plugin_scoreboard *exec_count;
> >> > > +int trans_count;
> >> > >   unsigned long insns;
> >> > >   } ExecCount;
> >> > >
> >> > > @@ -43,7 +43,17 @@ static gint cmp_exec_count(gconstpointer a, 
> >> > > gconstpointer b)
> >> > >   {
> >> > >   ExecCount *ea = (ExecCount *) a;
> >> > >   ExecCount *eb = (ExecCount *) b;
> >> > > -return ea->exec_count > eb->exec_count ? -1 : 1;
> >> > > +uint64_t count_a =
> >> > > +
> >> > > qemu_plugin_u64_sum(qemu_plugin_scoreboard_u64(ea->exec_count));
> >> > > +uint64_t count_b =
> >> > > +
> >> > > qemu_plugin_u64_sum(qemu_plugin_scoreboard_u64(eb->exec_count));
> >> > > +return count_a > count_b ? -1 : 1;
> >> > > +}
> >> > > +
> >> > > +static void exec_count_free(gpointer key, gpointer value, gpointer 
> >> > > user_data)
> >> > > +{
> >> > > +ExecCount *cnt = value;
> >> > > +qemu_plugin_scoreboard_free(cnt->exec_count);
> >> > >   }
> >> > >
> >> > >   static void plugin_exit(qemu_plugin_id_t id, void *p)
> >> > > @@ -52,7 +62,6 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
> >> > >   GList *counts, *it;
> >> > >   int i;
> >> > >
> >> > > -g_mutex_lock(&lock);
> >> >
> >> > I encountered cases before where the vCPUs continue executing while
> >> > plugin_exit is called. This can happen e.g., when QEMU calls exit(3)
> >> > from one CPU thread. Others will continue to run at the same time the
> >> > atexit callbacks are called.
> >> >
> >> > This also means that you can't really free the resources as you do at
> >> > the end of plugin_exit.
> >> >
> >>
> >> Interesting...
> >>
> >> The current documentation [1] mentions it's the right place to free
> >> resources, and from what I saw, existing plugins assume this too (see
> >> contrib/plugins/cache.c for instance).
> >>
> >> There is probably a bug related to the case you mention, and I'll try to
> >> reproduce this, and see if we can have a proper fix.
> >>
> >> I'm not sure that removing cleanup code from existing plugins is the
> >> right thing to do at the moment, even though there is an existing issue
> >> with some programs.
> >
> > Yes absolutely. The problem is on the QEMU side. FWIW I tried to fix one
> > of those exit cases (semihosted exit syscall) some time ago:
> > https://lore.kernel.org/qemu-devel/20220621125916.25257-1-lmic...@kalray.eu/
> 
> The plugin exit handler should flush all instrumented code:
> 
>   /*
>* Handle exit from linux-user. Unlike the normal atexit() mechanism
>* we need to handle the clean-up manually as it's possible threads
>* are still running. We need to remove all callbacks from code
>* generation, flush the current translations and then we can safely
>* trigger the exit callbacks.
>*/
> 
>   void qemu_plugin_user_exit(void)
>   {
>   enum qemu_plugin_event ev;
>   CPUState *cpu;
> 
>   /*
>* Locking order: we must acquire locks in an order that is consistent
>* with the one in fork_start(). That is:
>* - start_exclusive(), which acquires qemu_cpu_list_lock,
>*   must be called before acquiring plugin.lock.
>* - tb_flush(), which acquires mmap_lock(), must be called
>*   while plugin.lock is not held.
>*/
>   start_exclusive();
> 
>   qemu_rec_mutex_lock(&plugin.lock);
>   /* un-register all callbacks except the final AT_EXIT one */
>   for (ev = 0; ev < QEMU_PLUGIN_EV_MAX; ev++) {
>   if (ev != QEMU_PLUGIN_EV_ATEXIT) {
>   struct qemu_plugin_cb *cb, *next;
> 
>   QLIST_FOREACH_SAFE_RCU(cb, &plugin.cb_lists[ev], entry, next) {
>   plugin_unregister_cb__locked(cb->ctx, ev);
>   }
>   }
>   }
>   CPU_FOREACH(cpu) {
>   qemu_plugin_disable_mem_helpers(cpu);
>   }
>   qemu_rec_mutex_unlock(&plugin.lock);
> 
>   tb_flush(current_cpu);
>   end_exclusive();
> 
>

Re: [PULL v2 1/1] loongarch: Change the UEFI loading mode to loongarch


On 29/2/24 12:38, Song Gao wrote:

From: Xianglai Li 

The UEFI loading mode in loongarch is very different
from that in other architectures:loongarch's UEFI code
is in rom, while other architectures' UEFI code is in flash.

loongarch UEFI can be loaded as follows:
-machine virt,pflash=pflash0-format
-bios ./QEMU_EFI.fd

Other architectures load UEFI using the following methods:
-machine virt,pflash0=pflash0-format,pflash1=pflash1-format

loongarch's UEFI loading method makes qemu and libvirt incompatible
when using NVRAM, and the cost of loongarch's current loading method
far outweighs the benefits, so we decided to use the same UEFI loading
scheme as other architectures.


FYI I'm still trying to find a way to avoid that, planning to discuss
more with libvirt folks. Well, maybe it is a waste of my time and I
should just stop worrying / caring about this long standing issue.


Cc: Andrea Bolognani 
Cc: maob...@loongson.cn
Cc: Philippe Mathieu-Daudé 
Cc: Song Gao 
Cc: zhaotian...@loongson.cn
Signed-off-by: Xianglai Li 
Tested-by: Andrea Bolognani 
Reviewed-by: Song Gao 
Message-Id: 
<0bd892aa9b88e0f4cc904cb70efd0251fc1cde29.1708336919.git.lixiang...@loongson.cn>
Signed-off-by: Song Gao 
---
  hw/loongarch/acpi-build.c   |  29 +--
  hw/loongarch/virt.c | 101 ++--
  include/hw/loongarch/virt.h |  10 ++--
  3 files changed, 107 insertions(+), 33 deletions(-)

Re: [PATCH v2 3/5] linux-user: Add strace for shmat


On 28/2/24 21:25, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  linux-user/strace.c| 23 +++
  linux-user/strace.list |  2 +-
  2 files changed, 24 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/3] qtest: migration: Enhance qtest migration functions to support 'channels' argument

2024-03-01 Thread Fabiano Rosas

Het Gala  writes:

> On 01/03/24 2:19 pm, Het Gala wrote:
>>
>> On 29/02/24 6:47 am, Fabiano Rosas wrote:
>>> Het Gala  writes:
>>>
 On 27/02/24 1:04 am, Het Gala wrote:
>
> On 26/02/24 6:31 pm, Fabiano Rosas wrote:
>> Het Gala  writes:
>>
>>> On 24/02/24 1:42 am, Fabiano Rosas wrote:
>>> this was the same first approach that I attempted. It won't work 
>>> because
>>>
>>> The final 'migrate' QAPI with channels string would look like
>>>
> [...]
>>> I'm not sure what you tried. This works:
>>>
>>>  g_assert(!qdict_haskey(args, "channels"));
>>>  if (channels) {
>>>  channels_obj = qobject_from_json(channels, errp);
>>>  qdict_put_obj(args, "channels", channels_obj);
>>>  }
>>
>> Are you sure the above works ?
>
> Sorry, please ignore the below doubt. I understood what silly mistakes I 
> was doing. you were right, qobject_from_json() works fine and converts 
> string to json object.
>
> Will follow your latest reply and send out the patch really soon. Thank 
> you for unblocking me here quickly :)

Just please don't make it all one patch. Each of those steps could be a
separate patch.

[RFC 1/8] virtio/virtio-pci: Handle extra notification data

Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
 - upper 16 bits: last_avail_idx
 - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
 - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
 - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 13 ++---
 hw/virtio/virtio.c | 13 +
 include/hw/virtio/virtio.h |  1 +
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a7039fb0c..c7c577b177 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
 hwaddr pa;
 
 switch (addr) {
@@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 vdev->queue_sel = val;
 break;
 case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = val & 0x;
+virtio_set_notification_data(vdev, vq_idx, val);
+} else {
+vq_idx = val;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..a61f69b7fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
 return 0;
 }
 
+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data)
+{
+VirtQueue *vq = &vdev->vq[i];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->last_avail_wrap_counter = (data >> 31) & 0x1;
+vq->last_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->last_avail_idx = (data >> 16) & 0x;
+}
+vq->shadow_avail_idx = vq->last_avail_idx;
+}
+
 static enum virtio_device_endian virtio_default_endian(void)
 {
 if (target_words_bigendian()) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..c92d8afc42 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -345,6 +345,7 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index);
 void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
 void virtio_update_irq(VirtIODevice *vdev);
 int virtio_set_features(VirtIODevice *vdev, uint64_t val);
+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data);
 
 /* Base devices.  */
 typedef struct VirtIOBlkConf VirtIOBlkConf;
-- 
2.39.3

[RFC 5/8] virtio-ccw: Handle extra notification data

Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/s390x/s390-virtio-ccw.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 62804cc228..b8e193956c 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -140,9 +140,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
 uint64_t subch_id = args[0];
-uint64_t queue = args[1];
+uint64_t data = args[1];
 SubchDev *sch;
+VirtIODevice *vdev;
 int cssid, ssid, schid, m;
+uint16_t vq_idx;
 
 if (ioinst_disassemble_sch_ident(subch_id, &m, &cssid, &ssid, &schid)) {
 return -EINVAL;
@@ -151,12 +153,20 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_QUEUE_MAX) {
+
+vdev = virtio_ccw_get_vdev(sch);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = data & 0x;
+virtio_set_notification_data(vdev, vq_idx, data);
+} else {
+vq_idx = data;
+}
+
+if (vq_idx >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
-virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
+virtio_queue_notify(vdev, vq_idx);
 return 0;
-
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
2.39.3

[RFC 8/8] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

Extend the virtio device property definitions to include the
VIRTIO_F_NOTIFICATION_DATA feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c92d8afc42..5772737dde 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -369,7 +369,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("notification_data", _state, _field, \
+  VIRTIO_F_NOTIFICATION_DATA, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3

[RFC 7/8] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 58a00336c2..b8048f18e9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5d67..2827d29ce7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3

[RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

 Split VQ
  - Upper 16 bits: last_avail_idx
  - Lower 16 bits: virtqueue index

 Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, ioeventfd is left disabled for any devices
using this feature.

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

Jonah Palmer (8):
  virtio/virtio-pci: Handle extra notification data
  virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-mmio: Handle extra notification data
  virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-ccw: Handle extra notification data
  virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

 hw/block/vhost-user-blk.c|  1 +
 hw/net/vhost_net.c   |  2 ++
 hw/s390x/s390-virtio-ccw.c   | 18 ++
 hw/s390x/virtio-ccw.c|  6 --
 hw/scsi/vhost-scsi.c |  1 +
 hw/scsi/vhost-user-scsi.c|  1 +
 hw/virtio/vhost-user-fs.c|  2 +-
 hw/virtio/vhost-user-vsock.c |  1 +
 hw/virtio/virtio-mmio.c  | 18 ++
 hw/virtio/virtio-pci.c   | 19 ++-
 hw/virtio/virtio.c   | 13 +
 include/hw/virtio/virtio.h   |  5 -
 net/vhost-vdpa.c |  1 +
 13 files changed, 71 insertions(+), 17 deletions(-)

-- 
2.39.3

[RFC 6/8] virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

Prevent ioeventfd from being enabled/disabled when a virtio-ccw device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to the ioeventfd not being able to carry the extra data associated
with this feature, the ioeventfd should be left in a disabled state for
emulated virtio-ccw devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/s390x/virtio-ccw.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index b4676909dd..936ba78fda 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -530,14 +530,16 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 if (ret) {
 break;
 }
-if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_stop_ioeventfd(dev);
 }
 if (virtio_set_status(vdev, status) == 0) {
 if (vdev->status == 0) {
 virtio_ccw_reset_virtio(dev);
 }
-if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, 
VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_start_ioeventfd(dev);
 }
 sch->curr_status.scsw.count = ccw.count - sizeof(status);
-- 
2.39.3

[RFC 4/8] virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

Prevent ioeventfd from being enabled/disabled when a virtio-mmio device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-mmio devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 2bac77460e..fc780a03b2 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -424,7 +424,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 virtio_update_irq(vdev);
 break;
 case VIRTIO_MMIO_STATUS:
-if (!(value & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_stop_ioeventfd(proxy);
 }
 
@@ -436,7 +437,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 virtio_set_status(vdev, value & 0xff);
 
-if (value & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_start_ioeventfd(proxy);
 }
 
-- 
2.39.3

[RFC 3/8] virtio-mmio: Handle extra notification data

Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..2bac77460e 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
 VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+uint16_t vq_idx;
 
 trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,15 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_QUEUE_NOTIFY:
-if (value < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, value);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = value & 0x;
+virtio_set_notification_data(vdev, vq_idx, value);
+} else {
+vq_idx = value;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
2.39.3

[RFC 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c7c577b177..fd9717a0f5 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -420,13 +420,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 }
 break;
 case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_stop_ioeventfd(proxy);
 }
 
 virtio_set_status(vdev, val & 0xFF);
 
-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_start_ioeventfd(proxy);
 }
 
-- 
2.39.3

Re: [PATCH 00/19] Workaround Windows failing to find 64bit SMBIOS entry point with SeaBIOS

2024-03-01 Thread Fiona Ebner

Am 29.02.24 um 14:18 schrieb Fiona Ebner:
> Am 27.02.24 um 16:47 schrieb Igor Mammedov:
>> Windows (10) bootloader when running on top of SeaBIOS, fails to find
>> 
>> SMBIOSv3 entry point. Tracing it shows that it looks for v2 anchor markers   
>> 
>> only and not v3. Tricking it into believing that entry point is found
>> 
>> lets Windows successfully locate and parse SMBIOSv3 tables. Whether it   
>> 
>> will be fixed on Windows side is not clear so here goes a workaround.
>> 
>>  
>> 
>> Idea is to try build v2 tables if QEMU configuration permits,
>> 
>> and fallback to v3 tables otherwise. That will mask Windows issue
>> 
>> form majority of users.  
>> 
>> However if VM configuration can't be described (typically large VMs) 
>> 
>> by v2 tables, QEMU will use SMBIOSv3 and Windows will hit the issue  
>> 
>> again. In this case complain to Microsoft and/or use UEFI instead of 
>> 
>> SeaBIOS (requires reinstall).
>> 
>>  
>> 
>> Default compat setting of smbios-entry-point-type after series   
>> 
>> for pc/q35 machines: 
>> 
>>   * 9.0-newer: 'auto'
>> 
>>   * 8.1-8.2: '64'
>> 
>>   * 8.0-older: '32'  
>> 
>>  
>> 
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2008
>> 
> 
> Thank you! I'm happy to confirm that this series works around the issue :)
> 

While I still didn't do any in-depth testing (don't have enough
knowledge for that anyways), I played around a bit more now, check that
nothing obvious breaks also with a Linux VM and also ran a successful
'make check'.

If that is enough, feel free to add:

Tested-by: Fiona Ebner 

Best Regards,
Fiona

Re: [QEMU][PATCH v3 2/7] xen: add pseudo RAM region for grant mappings

Vikram Garhwal  writes:

> From: Juergen Gross 
>
> Add a memory region which can be used to automatically map granted
> memory. It is starting at 0x8000ULL in order to be able to
> distinguish it from normal RAM.

Is the Xen memory map for HVM guests documented anywhere? I couldn't
find anything googling or on the Xen wiki. I'm guessing this is going to
be shared across all 64 bit HVM arches in Xen?

Anyway:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [RFC 0/4] mirror: implement incremental and bitmap modes

On 29.02.24 17:50, Fiona Ebner wrote:

Am 29.02.24 um 13:00 schrieb Vladimir Sementsov-Ogievskiy:

But anyway, this all could be simply achieved with
bitmap-copying/merging API, if we allow to pass user-given bitmap to the
mirror as working bitmap.

I see, I'll drop the 'bitmap-mode' in the next version if nobody
complains :)

Good. It's a golden rule: never make public interfaces which you don't
actually need for production. I myself sometimes violate it and spend
extra time on developing features, which we later have to just drop as
"not needed downstream, no sense in upstreaming".

Just wondering which new mode I should allow for the @MirrorSyncMode
then? The documentation states:

# @incremental: only copy data described by the dirty bitmap.
# (since: 2.4)
#
# @bitmap: only copy data described by the dirty bitmap. (since: 4.2)
# Behavior on completion is determined by the BitmapSyncMode.

For backup, do_backup_common() just maps @incremental to @bitmap +
@bitmap-mode == @on-success.

Using @bitmap for mirror would lead to being at odds with the
documentation, because it mentions the BitmapSyncMode, which mirror
won't have.

Using @incremental for mirror would be consistent with the
documentation, but behave a bit differently from backup.

Opinions?

Good question.

As we already understood, (block-)job-api needs some spring-cleaning.
Unfortunately I don't have much time on it, but still I decided to start from
finally depreacting block-job-* API and moving to job-*.. Probably
bitmap/bitmap-mode/sync APIs also need some optimization, keeping in mind new
block-dirty-bitmap-merge api.

So, what I could advice in this situation for newc interfaces:

1. be minimalistic
2. add `x-` prefix when unsure

So, following these two rules, what about x-bitmap field, which may be combined
only with 'full' mode, and do what you need?

About documentation: actually, I never liked that we use for backup job "MirrorSyncMode".
Now it looks more like "BackupSyncMode", having two values supported only by backup.

I'm also unsure how mode=full&bitmap=some_bitmap differs from
mode=bitmap&bitmap=some_bitmap..

So, I'd suggest simply rename MirrorSyncMode to BackupSyncMode, and add separate MirrorSyncMode with only
"full", "top" and "none" values.

--
Best regards,
Vladimir

Re: [PULL 0/6] Misc fixes (libqos vring, Kconfig, TLS io channels, ...)

On Fri, 1 Mar 2024 at 08:09, Thomas Huth  wrote:
>
>  Hi Peter!
>
> The following changes since commit c0c6a0e3528b88aaad0b9d333e295707a195587b:
>
>   Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu 
> into staging (2024-02-28 17:27:10 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2024-03-01
>
> for you to fetch changes up to 462945cd22d2bcd233401ed3aa167d83a8e35b05:
>
>   chardev/char-socket: Fix TLS io channels sending too much data to the 
> backend (2024-03-01 08:27:33 +0100)
>
> 
> * Fix some bugs in the vring setup of libqos
> * Fix GIC settings when using --without-default-devices
> * Fix USB PCAP streams on Windows
> * Remove temporary files from test-util-sockets
> * Fix TLS io channels sending too much data to the backend
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL v2 0/1] loongarch-to-apply queue

On Thu, 29 Feb 2024 at 11:38, Song Gao  wrote:
>
> The following changes since commit bfe8020c814a30479a4241aaa78b63960655962b:
>
>   Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
> (2024-02-28 14:23:21 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240229
>
> for you to fetch changes up to c6e9847fc4becba561c631c4505e3b05d4926184:
>
>   loongarch: Change the UEFI loading mode to loongarch (2024-02-29 19:32:45 
> +0800)
>
> 
> pull-loongarch-20240229
>
> V2: fix build error on mipsel
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

[PATCH] target/riscv: move ratified/frozen exts to non-experimental

2024-03-01 Thread Daniel Henrique Barboza

smaia and ssaia were ratified in August 25th 2023 [1].

zvfh and zvfhmin were ratified in August 2nd 2023 [2].

zfbfmin and zvfbf(min|wma) are frozen and moved to public review since
Dec 16th 2023 [3].

zaamo and zalrsc are both marked as "Frozen" since January 24th 2024
[4].

[1] https://jira.riscv.org/browse/RVS-438
[2] https://jira.riscv.org/browse/RVS-871
[3] https://jira.riscv.org/browse/RVS-704
[4] https://jira.riscv.org/browse/RVS-1995

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fd0c7efdda..f5d30510ef 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1463,17 +1463,26 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
 MULTI_EXT_CFG_BOOL("zihintntl", ext_zihintntl, true),
 MULTI_EXT_CFG_BOOL("zihintpause", ext_zihintpause, true),
 MULTI_EXT_CFG_BOOL("zacas", ext_zacas, false),
+MULTI_EXT_CFG_BOOL("zaamo", ext_zaamo, false),
+MULTI_EXT_CFG_BOOL("zalrsc", ext_zalrsc, false),
 MULTI_EXT_CFG_BOOL("zawrs", ext_zawrs, true),
 MULTI_EXT_CFG_BOOL("zfa", ext_zfa, true),
+MULTI_EXT_CFG_BOOL("zfbfmin", ext_zfbfmin, false),
 MULTI_EXT_CFG_BOOL("zfh", ext_zfh, false),
 MULTI_EXT_CFG_BOOL("zfhmin", ext_zfhmin, false),
 MULTI_EXT_CFG_BOOL("zve32f", ext_zve32f, false),
 MULTI_EXT_CFG_BOOL("zve64f", ext_zve64f, false),
 MULTI_EXT_CFG_BOOL("zve64d", ext_zve64d, false),
+MULTI_EXT_CFG_BOOL("zvfbfmin", ext_zvfbfmin, false),
+MULTI_EXT_CFG_BOOL("zvfbfwma", ext_zvfbfwma, false),
+MULTI_EXT_CFG_BOOL("zvfh", ext_zvfh, false),
+MULTI_EXT_CFG_BOOL("zvfhmin", ext_zvfhmin, false),
 MULTI_EXT_CFG_BOOL("sstc", ext_sstc, true),
 
+MULTI_EXT_CFG_BOOL("smaia", ext_smaia, false),
 MULTI_EXT_CFG_BOOL("smepmp", ext_smepmp, false),
 MULTI_EXT_CFG_BOOL("smstateen", ext_smstateen, false),
+MULTI_EXT_CFG_BOOL("ssaia", ext_ssaia, false),
 MULTI_EXT_CFG_BOOL("svade", ext_svade, false),
 MULTI_EXT_CFG_BOOL("svadu", ext_svadu, true),
 MULTI_EXT_CFG_BOOL("svinval", ext_svinval, false),
@@ -1561,19 +1570,6 @@ const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[] = {
 
 /* These are experimental so mark with 'x-' */
 const RISCVCPUMultiExtConfig riscv_cpu_experimental_exts[] = {
-MULTI_EXT_CFG_BOOL("x-smaia", ext_smaia, false),
-MULTI_EXT_CFG_BOOL("x-ssaia", ext_ssaia, false),
-
-MULTI_EXT_CFG_BOOL("x-zaamo", ext_zaamo, false),
-MULTI_EXT_CFG_BOOL("x-zalrsc", ext_zalrsc, false),
-
-MULTI_EXT_CFG_BOOL("x-zvfh", ext_zvfh, false),
-MULTI_EXT_CFG_BOOL("x-zvfhmin", ext_zvfhmin, false),
-
-MULTI_EXT_CFG_BOOL("x-zfbfmin", ext_zfbfmin, false),
-MULTI_EXT_CFG_BOOL("x-zvfbfmin", ext_zvfbfmin, false),
-MULTI_EXT_CFG_BOOL("x-zvfbfwma", ext_zvfbfwma, false),
-
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.43.2

Re: [PATCH v2 06/21] migration: Add Error** argument to .save_setup() handler


On 29.02.24 16:21, Markus Armbruster wrote:

Thomas Huth  writes:


On 29/02/2024 08.20, Vladimir Sementsov-Ogievskiy wrote:

On 29.02.24 09:32, Markus Armbruster wrote:


[...]


Anti-pattern: fail without setting an error.  There might be more
elsewhere in the series.

qapi/error.h's big comment:

   * - On success, the function should not touch *errp.  On failure, it
   *   should set a new error, e.g. with error_setg(errp, ...), or
   *   propagate an existing one, e.g. with error_propagate(errp, ...).
   *
   * - Whenever practical, also return a value that indicates success /
   *   failure.  This can make the error checking more concise, and can
   *   avoid useless error object creation and destruction.  Note that
   *   we still have many functions returning void.  We recommend
   *   • bool-valued functions return true on success / false on failure,
   *   • pointer-valued functions return non-null / null pointer, and
   *   • integer-valued functions return non-negative / negative.

     }
     qemu_put_be64(f, STATTR_FLAG_EOS);
     return 0;
     }

When adding Error **errp to a function, you must also add code to set an
error on failure to every failure path.  Adding it in a later patch in
the same series can be okay,


Personally, I'd prefer not doing so. Creating wrong commits and fixing them in 
same series - better to merge all fixes into bad commit:)


I agree - that might create issues with bisecting later. Please fix it in this 
patch here already!


Depends on the wrongness, really.


In my understanding, unset errp on failure path is wrong enough.

For example, simple pattern

Error *local_err = NULL;

int ret = foo(&local_err);
if (ret < 0) {
error_report_err(local_err);
return;
}

will just crash.

When I write code, I expect that "errp is set" === "ret < 0", for all functions 
returning negative integer on failure.

Also, we have enough code that relying on errp for failure checking:
$ git grep 'if (local_err)' | wc -l
373

Of course, most of this should be checking failure of functions that return 
void, but who knows.



We don't want broken intermediate states, no argument.

But intermediate states that are merely unclean can be acceptable.

For instance, my commit a30ecde6e79 (net: Permit incremental conversion
of init functions to Error) added such Error ** parameters to a somewhat
tangled nest of functions, along with FIXME comments where errors
weren't set.  The next few commits fixed most, but not all of them.
Later commits fixed some more.  The one in tap-win32.c is still there
today.

This was acceptable, because it improved things from "bad error
reporting" to "okay error reporting in most cases, unclean and bad error
reporting in a few cases marked FIXME", with "a few" over time going
down to the one I can't test, and nobody else seems to care about.



--
Best regards,
Vladimir

Re: [RFC 0/4] mirror: implement incremental and bitmap modes

2024-03-01 Thread Fiona Ebner

Am 01.03.24 um 15:14 schrieb Vladimir Sementsov-Ogievskiy:
> 
> As we already understood, (block-)job-api needs some spring-cleaning.
> Unfortunately I don't have much time on it, but still I decided to start
> from finally depreacting block-job-* API and moving to job-*.. Probably
> bitmap/bitmap-mode/sync APIs also need some optimization, keeping in
> mind new block-dirty-bitmap-merge api.
> 
> So, what I could advice in this situation for newc interfaces:
> 
> 1. be minimalistic
> 2. add `x-` prefix when unsure
> 
> So, following these two rules, what about x-bitmap field, which may be
> combined only with 'full' mode, and do what you need?
> 

AFAIU, it should rather be marked as @unstable in QAPI [0]? Then it
doesn't need to be renamed if it becomes stable later.

> About documentation: actually, I never liked that we use for backup job
> "MirrorSyncMode". Now it looks more like "BackupSyncMode", having two
> values supported only by backup.
> 
> I'm also unsure how mode=full&bitmap=some_bitmap differs from
> mode=bitmap&bitmap=some_bitmap..
> 

With the current patches, it was an error to specify @bitmap for other
modes than 'incremental' and 'bitmap'.

> So, I'd suggest simply rename MirrorSyncMode to BackupSyncMode, and add
> separate MirrorSyncMode with only "full", "top" and "none" values.
> 

Sounds good to me!

[0]:
https://gitlab.com/qemu-project/qemu/-/commit/a3c45b3e62962f99338716b1347cfb0d427cea44

Best Regards,
Fiona

Re: [PATCH v2 06/21] migration: Add Error** argument to .save_setup() handler


On 01.03.24 17:44, Vladimir Sementsov-Ogievskiy wrote:

On 29.02.24 16:21, Markus Armbruster wrote:

Thomas Huth  writes:


On 29/02/2024 08.20, Vladimir Sementsov-Ogievskiy wrote:

On 29.02.24 09:32, Markus Armbruster wrote:


[...]


Anti-pattern: fail without setting an error.  There might be more
elsewhere in the series.

qapi/error.h's big comment:

   * - On success, the function should not touch *errp.  On failure, it
   *   should set a new error, e.g. with error_setg(errp, ...), or
   *   propagate an existing one, e.g. with error_propagate(errp, ...).
   *
   * - Whenever practical, also return a value that indicates success /
   *   failure.  This can make the error checking more concise, and can
   *   avoid useless error object creation and destruction.  Note that
   *   we still have many functions returning void.  We recommend
   *   • bool-valued functions return true on success / false on failure,
   *   • pointer-valued functions return non-null / null pointer, and
   *   • integer-valued functions return non-negative / negative.

 }
 qemu_put_be64(f, STATTR_FLAG_EOS);
 return 0;
 }

When adding Error **errp to a function, you must also add code to set an
error on failure to every failure path.  Adding it in a later patch in
the same series can be okay,


Personally, I'd prefer not doing so. Creating wrong commits and fixing them in 
same series - better to merge all fixes into bad commit:)


I agree - that might create issues with bisecting later. Please fix it in this 
patch here already!


Depends on the wrongness, really.


In my understanding, unset errp on failure path is wrong enough.

For example, simple pattern

Error *local_err = NULL;

int ret = foo(&local_err);
if (ret < 0) {
     error_report_err(local_err);
     return;
}

will just crash.

When I write code, I expect that "errp is set" === "ret < 0", for all functions 
returning negative integer on failure.

Also, we have enough code that relying on errp for failure checking:
$ git grep 'if (local_err)' | wc -l
373

Of course, most of this should be checking failure of functions that return 
void, but who knows.



We don't want broken intermediate states, no argument.

But intermediate states that are merely unclean can be acceptable.

For instance, my commit a30ecde6e79 (net: Permit incremental conversion
of init functions to Error) added such Error ** parameters to a somewhat
tangled nest of functions, along with FIXME comments where errors
weren't set.  The next few commits fixed most, but not all of them.
Later commits fixed some more.  The one in tap-win32.c is still there
today.

This was acceptable, because it improved things from "bad error
reporting" to "okay error reporting in most cases, unclean and bad error
reporting in a few cases marked FIXME", with "a few" over time going
down to the one I can't test, and nobody else seems to care about.



You may be sure, that functions you modify are never used in conditions I've described above. But 
you can't guarantee that this will not change in future. Of course, if someone will create new call 
of the function, he should look (at least once) at that function  and see "FIXME", but 
better not rely on this. Also, if someone will add a call to another function that calls function 
with bad error reporting, most probably he will not see the "FIXME"...

Formally, you should add such FIXME to all functions with errp, that may do 
(nested) calls to a function with FIXME

--
Best regards,
Vladimir

[PATCH] Fix unexpected Illegal instruction error on RISC-V.

2024-03-01 Thread SiHuaN

Avoid right-shifting by a negative number of bits when lmul is 8.

Signed-off-by: SiHuaN 
---
 target/riscv/vector_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 84cec73eb2..f0158ea237 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -53,10 +53,11 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, 
target_ulong s1,
  * VLEN * LMUL >= SEW
  * VLEN >> (8 - lmul) >= sew
  * (vlenb << 3) >> (8 - lmul) >= sew
+ * Considering that lmul may be 8, the following form cannot be used.
  * vlenb >> (8 - 3 - lmul) >= sew
  */
 if (vlmul == 4 ||
-cpu->cfg.vlenb >> (8 - 3 - vlmul) < sew) {
+(cpu->cfg.vlenb << 3) >> (8 - vlmul) < sew) {
 vill = true;
 }
 }
-- 
2.44.0

Re: [RFC 0/4] mirror: implement incremental and bitmap modes

On 01.03.24 17:52, Fiona Ebner wrote:

Am 01.03.24 um 15:14 schrieb Vladimir Sementsov-Ogievskiy:

As we already understood, (block-)job-api needs some spring-cleaning.
Unfortunately I don't have much time on it, but still I decided to start
from finally depreacting block-job-* API and moving to job-*.. Probably
bitmap/bitmap-mode/sync APIs also need some optimization, keeping in
mind new block-dirty-bitmap-merge api.

So, what I could advice in this situation for newc interfaces:

1. be minimalistic
2. add `x-` prefix when unsure

So, following these two rules, what about x-bitmap field, which may be
combined only with 'full' mode, and do what you need?

AFAIU, it should rather be marked as @unstable in QAPI [0]? Then it
doesn't need to be renamed if it becomes stable later.

Right, unstable feature is needed, using "x-" is optional.

Recent discussion about it was in my "vhost-user-blk: live resize additional
APIs" series:

https://patchew.org/QEMU/20231006202045.1161543-1-vsement...@yandex-team.ru/20231006202045.1161543-5-vsement...@yandex-team.ru/

Following it, I think it's OK to not care anymore with "x-" prefixes, and rely
on unstable feature.

About documentation: actually, I never liked that we use for backup job
"MirrorSyncMode". Now it looks more like "BackupSyncMode", having two
values supported only by backup.

I'm also unsure how mode=full&bitmap=some_bitmap differs from
mode=bitmap&bitmap=some_bitmap..

With the current patches, it was an error to specify @bitmap for other
modes than 'incremental' and 'bitmap'.

Current documentation says:
# @bitmap: The name of a dirty bitmap to use. Must be present if sync
# is "bitmap" or "incremental". Can be present if sync is "full"
# or "top". Must not be present otherwise.
# (Since 2.4 (drive-backup), 3.1 (blockdev-backup))

So, I'd suggest simply rename MirrorSyncMode to BackupSyncMode, and add
separate MirrorSyncMode with only "full", "top" and "none" values.

Sounds good to me!

[0]:
https://gitlab.com/qemu-project/qemu/-/commit/a3c45b3e62962f99338716b1347cfb0d427cea44

Best Regards,
Fiona

--
Best regards,
Vladimir

Re: [RFC 0/4] mirror: implement incremental and bitmap modes

2024-03-01 Thread Fiona Ebner

Am 01.03.24 um 16:02 schrieb Vladimir Sementsov-Ogievskiy:
> On 01.03.24 17:52, Fiona Ebner wrote:
>> Am 01.03.24 um 15:14 schrieb Vladimir Sementsov-Ogievskiy:
>>>
>>> As we already understood, (block-)job-api needs some spring-cleaning.
>>> Unfortunately I don't have much time on it, but still I decided to start
>>> from finally depreacting block-job-* API and moving to job-*.. Probably
>>> bitmap/bitmap-mode/sync APIs also need some optimization, keeping in
>>> mind new block-dirty-bitmap-merge api.
>>>
>>> So, what I could advice in this situation for newc interfaces:
>>>
>>> 1. be minimalistic
>>> 2. add `x-` prefix when unsure
>>>
>>> So, following these two rules, what about x-bitmap field, which may be
>>> combined only with 'full' mode, and do what you need?
>>>
>>
>> AFAIU, it should rather be marked as @unstable in QAPI [0]? Then it
>> doesn't need to be renamed if it becomes stable later.
> 
> Right, unstable feature is needed, using "x-" is optional.
> 
> Recent discussion about it was in my "vhost-user-blk: live resize
> additional APIs" series:
> 
> https://patchew.org/QEMU/20231006202045.1161543-1-vsement...@yandex-team.ru/20231006202045.1161543-5-vsement...@yandex-team.ru/
> 
> Following it, I think it's OK to not care anymore with "x-" prefixes,
> and rely on unstable feature.
> 

Thanks for the confirmation! I'll go without the prefix in the name then.

>>
>>> About documentation: actually, I never liked that we use for backup job
>>> "MirrorSyncMode". Now it looks more like "BackupSyncMode", having two
>>> values supported only by backup.
>>>
>>> I'm also unsure how mode=full&bitmap=some_bitmap differs from
>>> mode=bitmap&bitmap=some_bitmap..
>>>
>>
>> With the current patches, it was an error to specify @bitmap for other
>> modes than 'incremental' and 'bitmap'.
> 
> Current documentation says:
>   # @bitmap: The name of a dirty bitmap to use.  Must be present if sync
>   # is "bitmap" or "incremental". Can be present if sync is "full"
>   # or "top".  Must not be present otherwise.
>   # (Since 2.4 (drive-backup), 3.1 (blockdev-backup))
> 
> 

This is for backup. The documentation (and behavior) for @bitmap added
by these patches for mirror is different ;)

Best Regards,
Fiona

Re: QNX VM hang on Qemu

2024-03-01 Thread Clément Chigot

Hi Faiq,

On Fri, Feb 23, 2024 at 3:55 PM Faiq Ali Sayed  wrote:
>
> So as far as my understanding, we provide these binaries using Qemu command 
> as depicted in the example you provided and there is no way I found to put 
> them into a single image.
> Regarding the overlapping space, I don't have much idea but I think we could 
> provide a starting address separately to these images something like 
> addr=0x0010.

Where is this 0x0010 address coming from ? Could you confirm with
"readelf -h" that this is the entry point of your image ?

Alternatively and that's what we used locally, qemu is able to guess
the entry point of an image, when passed from -kernel. Therefore, our
command simply looks like:
  |  $ qemu-system-aarch64 -M xlnx-zcu102 -m 4G -no-reboot -nographic
-kernel qnx.img

I'm not the one having built the qnx.img we're using. But it looks
pretty standard at the first look, made with "mkifs" and the kernel
specs from zcu102 evaluation kit.

Hope it helps,
Clément

> So as per your suggestion, I compared my images and I found that the image 
> does not show a virtual disk, and other commands like mkdir, do not have 
> these binaries.
> So these binaries are not included at the time of image creation and I don't 
> exactly know that how can we add these binaries into the QNX image.
>
> The Image that is currently installed in real hardware does not have a 
> debugging symbol, so I can't use GDB  to debug that.
> Now I am looking for a way to create the correct QNX OS image for Qemu.
>
> Any lead in this regard will be really helpful :)
>

Re: [PULL 26/29] contrib/plugins: extend execlog to track register changes

2024-03-01 Thread Zhao Liu

On Fri, Mar 01, 2024 at 10:22:08AM +, Alex Bennée wrote:
> Date: Fri, 01 Mar 2024 10:22:08 +
> From: Alex Bennée 
> Subject: Re: [PULL 26/29] contrib/plugins: extend execlog to track register
>  changes
> 
> Zhao Liu  writes:
> 
> > Hi Alex,
> >
> > I hit the following warnings (with "./configure --enable-werror"):
> >
> > /qemu/contrib/plugins/execlog.c: In function ‘registers_init’:
> > /qemu/contrib/plugins/execlog.c:330:17: warning: ‘g_pattern_match_string’ 
> > is deprecated: Use 'g_pattern_spec_match_string' instead 
> > [-Wdeprecated-declarations]
> >   330 | if (g_pattern_match_string(pat, rd->name) ||
> >   | ^~
> > In file included from /usr/include/glib-2.0/glib.h:65,
> >  from /qemu/contrib/plugins/execlog.c:9:
> > /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
> >55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
> >   |   ^~
> > /qemu/contrib/plugins/execlog.c:331:21: warning: ‘g_pattern_match_string’ 
> > is deprecated: Use 'g_pattern_spec_match_string' instead 
> > [-Wdeprecated-declarations]
> >   331 | g_pattern_match_string(pat, rd_lower)) {
> >   | ^~
> > In file included from /usr/include/glib-2.0/glib.h:65,
> >  from /qemu/contrib/plugins/execlog.c:9:
> > /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
> >55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
> >   |   ^~
> > /qemu/contrib/plugins/execlog.c:339:63: warning: passing argument 2 of 
> > ‘g_ptr_array_add’ discards ‘const’ qualifier from pointer target type 
> > [-Wdiscarded-qualifiers]
> >   339 | g_ptr_array_add(all_reg_names,
> > reg->name);
> 
> Hmm I missed that. Not sure what the neatest solution is in this case -
> g_ptr_array_new() doesn't have a destroy func so we shouldn't ever
> attempt to free it. We can manually re-add the const qualifier at the
> other end for completeness and I guess comment and cast?

I find other palces use 2 ways:
  * Use g_strdup() to create a copy (e.g., net/net.c,
add_nic_model_help()). But I'm not sure if this is OK since you said
we shouldn't attempt to free it. May I ask if the free issue you
mentioned will affect the use of g_strdup() here?
  * Another way is the forced conversion to gpointer (also e.g., in
net/net.c, qemu_get_nic_models()).

Which way do you like? ;-)

> 
> 
> >   |~~~^~
> > In file included from /usr/include/glib-2.0/glib.h:31,
> >  from /qemu/contrib/plugins/execlog.c:9:
> > /usr/include/glib-2.0/glib/garray.h:192:62: note: expected ‘gpointer’ {aka 
> > ‘void *’} but argument is of type ‘const char *’
> >   192 |gpointer  data);
> >   |~~^~~~
> >
> > In addition, I checked my glic version:
> >
> > $ldd --version
> > ldd (Ubuntu GLIBC 2.35-0ubuntu3.5) 2.35
> >
> > I think it's v2.35. Are these three warning reports valid?
> 
> It's the glib (not glibc) version that matters here.
> g_pattern_match_string was deprecated in 2.70 when the suggested
> alternative was added. However our baseline for glib is still:
> 
>   # When bumping glib minimum version, please check also whether to increase
>   # the _WIN32_WINNT setting in osdep.h according to the value from glib
>   glib_req_ver = '>=2.56.0'
>   glib_pc = dependency('glib-2.0', version: glib_req_ver, required: true,
>   method: 'pkg-config')
> 
> The usual solution for this is to throw in a compat wrapper in
> glib-compat.h:
> 
> --8<---cut here---start->8---
> modified   include/glib-compat.h
> @@ -105,6 +105,24 @@ static inline gpointer g_memdup2_qemu(gconstpointer mem, 
> gsize byte_size)
>  }
>  #define g_memdup2(m, s) g_memdup2_qemu(m, s)
>  
> +/*
> + * g_pattern_match_string has been deprecated in Glib since 2.70 and
> + * will complain about it if you try to use it. Fortunately the
> + * signature of both functions is the same making it easy to work
> + * around.
> + */
> +static inline
> +gboolean g_pattern_spec_match_string_qemu(GPatternSpec *pspec,
> +  const gchar *string)
> +{
> +#if GLIB_CHECK_VERSION(2, 70, 0)
> +return g_pattern_spec_match_string(pspec, string);
> +#else
> +return g_pattern_match_string(pspec, string);
> +#endif
> +};
> +#define g_pattern_spec_match_string(p, s) 
> g_pattern_spec_match_string_qemu(p, s)
> +
>  #if defined(G_OS_UNIX)
>  /*
>   * Note: The fallback implementation is not MT-safe, and it returns a copy of
> modified   contrib/plugins/execlog.c
> @@ -334,8 +334,8 @@ static void registers_init(int vcpu_index)
>  for (int p = 0; p < rmatches-

[PATCH v9 1/1] hw: add some convenient trace-events for pcie and shpc hotplug

Add trace-events that may help to debug problems with hotplugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/pci/pcie.c   | 56 +
 hw/pci/shpc.c   | 46 +
 hw/pci/trace-events |  6 +
 3 files changed, 108 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6db0cf69cd..f56079acf5 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -28,6 +28,7 @@
 #include "hw/pci/pcie_regs.h"
 #include "hw/pci/pcie_port.h"
 #include "qemu/range.h"
+#include "trace.h"
 
 //#define DEBUG_PCIE
 #ifdef DEBUG_PCIE
@@ -45,6 +46,23 @@ static bool pcie_sltctl_powered_off(uint16_t sltctl)
 && (sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF;
 }
 
+static const char *pcie_led_state_to_str(uint16_t value)
+{
+switch (value) {
+case PCI_EXP_SLTCTL_PWR_IND_ON:
+case PCI_EXP_SLTCTL_ATTN_IND_ON:
+return "on";
+case PCI_EXP_SLTCTL_PWR_IND_BLINK:
+case PCI_EXP_SLTCTL_ATTN_IND_BLINK:
+return "blink";
+case PCI_EXP_SLTCTL_PWR_IND_OFF:
+case PCI_EXP_SLTCTL_ATTN_IND_OFF:
+return "off";
+default:
+return "invalid";
+}
+}
+
 /***
  * pci express capability helper functions
  */
@@ -735,6 +753,28 @@ void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slt_ctl, 
uint16_t *slt_sta)
 *slt_sta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);
 }
 
+static void find_child_fn(PCIBus *bus, PCIDevice *dev, void *opaque)
+{
+PCIDevice **child = opaque;
+
+if (!*child) {
+*child = dev;
+}
+}
+
+/*
+ * Returns the plugged device or first function of multifunction plugged device
+ */
+static PCIDevice *pcie_cap_slot_find_child(PCIDevice *dev)
+{
+PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
+PCIDevice *child = NULL;
+
+pci_for_each_device(sec_bus, pci_bus_num(sec_bus), find_child_fn, &child);
+
+return child;
+}
+
 void pcie_cap_slot_write_config(PCIDevice *dev,
 uint16_t old_slt_ctl, uint16_t old_slt_sta,
 uint32_t addr, uint32_t val, int len)
@@ -779,6 +819,22 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 sltsta);
 }
 
+if (trace_event_get_state_backends(TRACE_PCIE_CAP_SLOT_WRITE_CONFIG)) {
+DeviceState *parent = DEVICE(dev);
+DeviceState *child = DEVICE(pcie_cap_slot_find_child(dev));
+
+trace_pcie_cap_slot_write_config(
+parent->canonical_path,
+child ? child->canonical_path : "no-child",
+(sltsta & PCI_EXP_SLTSTA_PDS) ? "present" : "not present",
+pcie_led_state_to_str(old_slt_ctl & PCI_EXP_SLTCTL_PIC),
+pcie_led_state_to_str(val & PCI_EXP_SLTCTL_PIC),
+pcie_led_state_to_str(old_slt_ctl & PCI_EXP_SLTCTL_AIC),
+pcie_led_state_to_str(val & PCI_EXP_SLTCTL_AIC),
+(old_slt_ctl & PCI_EXP_SLTCTL_PWR_OFF) ? "off" : "on",
+(val & PCI_EXP_SLTCTL_PWR_OFF) ? "off" : "on");
+}
+
 /*
  * If the slot is populated, power indicator is off and power
  * controller is off, it is safe to detach the devices.
diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c
index d2a5eea69e..aac6f2d034 100644
--- a/hw/pci/shpc.c
+++ b/hw/pci/shpc.c
@@ -8,6 +8,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/msi.h"
+#include "trace.h"
 
 /* TODO: model power only and disabled slot states. */
 /* TODO: handle SERR and wakeups */
@@ -123,6 +124,34 @@
 #define SHPC_PCI_TO_IDX(pci_slot) ((pci_slot) - 1)
 #define SHPC_IDX_TO_PHYSICAL(slot) ((slot) + 1)
 
+static const char *shpc_led_state_to_str(uint8_t value)
+{
+switch (value) {
+case SHPC_LED_ON:
+return "on";
+case SHPC_LED_BLINK:
+return "blink";
+case SHPC_LED_OFF:
+return "off";
+default:
+return "invalid";
+}
+}
+
+static const char *shpc_slot_state_to_str(uint8_t value)
+{
+switch (value) {
+case SHPC_STATE_PWRONLY:
+return "power-only";
+case SHPC_STATE_ENABLED:
+return "enabled";
+case SHPC_STATE_DISABLED:
+return "disabled";
+default:
+return "invalid";
+}
+}
+
 static uint8_t shpc_get_status(SHPCDevice *shpc, int slot, uint16_t msk)
 {
 uint8_t *status = shpc->config + SHPC_SLOT_STATUS(slot);
@@ -302,6 +331,23 @@ static void shpc_slot_command(PCIDevice *d, uint8_t target,
 shpc_set_status(shpc, slot, state, SHPC_SLOT_STATE_MASK);
 }
 
+if (trace_event_get_state_backends(TRACE_SHPC_SLOT_COMMAND)) {
+DeviceState *parent = DEVICE(d);
+int pci_slot = SHPC_IDX_TO_PCI(slot);
+DeviceState *child =
+DEVICE(shpc->sec_bus->devices[PCI_DEVFN(pci_slot, 0)]);
+
+trace_shpc_slot_command(
+parent->canonical_path, pci_slot,
+child ? child->canonical_path : "no-child",
+

[PATCH v9 0/1] pci hotplug tracking

v9: I was convinced, that adding new qapi interfaces here is
questionable. So, to conclude, let's still add two convenient
trace-points, which may help to debug hotplug problems.

Vladimir Sementsov-Ogievskiy (1):
  hw: add some convenient trace-events for pcie and shpc hotplug

 hw/pci/pcie.c   | 56 +
 hw/pci/shpc.c   | 46 +
 hw/pci/trace-events |  6 +
 3 files changed, 108 insertions(+)

-- 
2.34.1

Re: [RFC 0/4] mirror: implement incremental and bitmap modes

On 01.03.24 18:14, Fiona Ebner wrote:

Am 01.03.24 um 16:02 schrieb Vladimir Sementsov-Ogievskiy:

On 01.03.24 17:52, Fiona Ebner wrote:

Am 01.03.24 um 15:14 schrieb Vladimir Sementsov-Ogievskiy:

As we already understood, (block-)job-api needs some spring-cleaning.
Unfortunately I don't have much time on it, but still I decided to start
from finally depreacting block-job-* API and moving to job-*.. Probably
bitmap/bitmap-mode/sync APIs also need some optimization, keeping in
mind new block-dirty-bitmap-merge api.

So, what I could advice in this situation for newc interfaces:

1. be minimalistic
2. add `x-` prefix when unsure

So, following these two rules, what about x-bitmap field, which may be
combined only with 'full' mode, and do what you need?

AFAIU, it should rather be marked as @unstable in QAPI [0]? Then it
doesn't need to be renamed if it becomes stable later.

Right, unstable feature is needed, using "x-" is optional.

Recent discussion about it was in my "vhost-user-blk: live resize
additional APIs" series:

https://patchew.org/QEMU/20231006202045.1161543-1-vsement...@yandex-team.ru/20231006202045.1161543-5-vsement...@yandex-team.ru/

Following it, I think it's OK to not care anymore with "x-" prefixes,
and rely on unstable feature.

Thanks for the confirmation! I'll go without the prefix in the name then.

About documentation: actually, I never liked that we use for backup job
"MirrorSyncMode". Now it looks more like "BackupSyncMode", having two
values supported only by backup.

I'm also unsure how mode=full&bitmap=some_bitmap differs from
mode=bitmap&bitmap=some_bitmap..

With the current patches, it was an error to specify @bitmap for other
modes than 'incremental' and 'bitmap'.

This is for backup. The documentation (and behavior) for @bitmap added
by these patches for mirror is different ;)

I meant backup in "I'm also unsure", just as one more point not consider
backup-bitmap-API as a prototype for mirror-bitmap-API.

Best Regards,
Fiona

--
Best regards,
Vladimir

Re: [Spam][PATCH] replay: Improve error messages about configuration conflicts

2024-03-01 Thread Pavel Dovgalyuk


Reviewed-by: Pavel Dovgalyuk 

On 01.03.2024 15:06, Markus Armbruster wrote:

Improve

Record/replay feature is not supported for '-rtc base=localtime'
Record/replay feature is not supported for 'smp'
Record/replay feature is not supported for '-snapshot'

to

Record/replay is not supported with -rtc base=localtime
Record/replay is not supported with multiple CPUs
Record/replay is not supported with -snapshot

Signed-off-by: Markus Armbruster 
---
  replay/replay.c | 2 +-
  system/vl.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/replay/replay.c b/replay/replay.c
index 3fd241a4fc..a2c576c16e 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -511,7 +511,7 @@ void replay_add_blocker(const char *feature)
  {
  Error *reason = NULL;
  
-error_setg(&reason, "Record/replay feature is not supported for '%s'",

+error_setg(&reason, "Record/replay is not supported with %s",
 feature);
  replay_blockers = g_slist_prepend(replay_blockers, reason);
  }
diff --git a/system/vl.c b/system/vl.c
index e480afd7a0..cc03a17c09 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1932,7 +1932,7 @@ static void qemu_apply_machine_options(QDict *qdict)
  }
  
  if (current_machine->smp.cpus > 1) {

-replay_add_blocker("smp");
+replay_add_blocker("multiple CPUs");
  }
  }

Re: [PULL 2/2] aspeed: fix hardcode boot address 0


Hi Jamin,

On 27/2/24 13:52, Cédric Le Goater wrote:

From: Jamin Lin 

In the previous design of ASPEED SOCs QEMU model, it set the boot
address at "0" which was the hardcode setting for ast10x0, ast2600,
ast2500 and ast2400.

According to the design of ast2700, it has a bootmcu(riscv-32) which
is used for executing SPL and initialize DRAM and copy u-boot image
from SPI/Flash to DRAM at address 0x4 at SPL boot stage.
Then, CPUs(cortex-a35) execute u-boot, kernel and rofs.

Currently, qemu not support emulate two CPU architectures
at the same machine. Therefore, qemu will only support
to emulate CPU(cortex-a35) side for ast2700 and the boot
address is "0x4 ".

Fixed hardcode boot address "0" for future models using
a different mapping address.

Signed-off-by: Troy Lee 
Signed-off-by: Jamin Lin 
Reviewed-by: Cédric Le Goater 


Tip for the email workflow: when someone provide a R-b tag for
a patch, please carry it on in your next iterations.

https://lore.kernel.org/qemu-devel/09f9ca34-4e0c-4ada-b808-643a8c578...@linaro.org/

See 
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#proper-use-of-reviewed-by-tags-can-aid-review



Signed-off-by: Cédric Le Goater 
---
  include/hw/arm/aspeed_soc.h | 2 --
  hw/arm/aspeed.c | 4 +++-
  hw/arm/aspeed_ast2400.c | 4 ++--
  hw/arm/aspeed_ast2600.c | 2 +-
  4 files changed, 6 insertions(+), 6 deletions(-)

Re: [PATCH] Fix unexpected Illegal instruction error on RISC-V.


Hi SiHuaN,

On 1/3/24 15:55, SiHuaN wrote:

Avoid right-shifting by a negative number of bits when lmul is 8.


FYI Demin posted a similar patch, see:
https://lore.kernel.org/qemu-devel/20240225174114.5298-1-demin@starfivetech.com/


Signed-off-by: SiHuaN 
---
  target/riscv/vector_helper.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 84cec73eb2..f0158ea237 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -53,10 +53,11 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, 
target_ulong s1,
   * VLEN * LMUL >= SEW
   * VLEN >> (8 - lmul) >= sew
   * (vlenb << 3) >> (8 - lmul) >= sew
+ * Considering that lmul may be 8, the following form cannot be used.
   * vlenb >> (8 - 3 - lmul) >= sew
   */
  if (vlmul == 4 ||
-cpu->cfg.vlenb >> (8 - 3 - vlmul) < sew) {
+(cpu->cfg.vlenb << 3) >> (8 - vlmul) < sew) {
  vill = true;
  }
  }

Re: [PATCH v9 1/1] hw: add some convenient trace-events for pcie and shpc hotplug


On 1/3/24 16:41, Vladimir Sementsov-Ogievskiy wrote:

Add trace-events that may help to debug problems with hotplugging.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/pci/pcie.c   | 56 +
  hw/pci/shpc.c   | 46 +
  hw/pci/trace-events |  6 +
  3 files changed, 108 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] replay: Improve error messages about configuration conflicts


On 1/3/24 13:06, Markus Armbruster wrote:

Improve

Record/replay feature is not supported for '-rtc base=localtime'
Record/replay feature is not supported for 'smp'
Record/replay feature is not supported for '-snapshot'

to

Record/replay is not supported with -rtc base=localtime
Record/replay is not supported with multiple CPUs
Record/replay is not supported with -snapshot

Signed-off-by: Markus Armbruster 
---
  replay/replay.c | 2 +-
  system/vl.c | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/6] libqos/virtio.c: init all elems in qvring_indirect_desc_setup()

2024-03-01 Thread Michael Tokarev


17.02.2024 22:26, Daniel Henrique Barboza:


diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 410513225f..4f39124eba 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -280,14 +280,27 @@ QVRingIndirectDesc *qvring_indirect_desc_setup(QTestState 
*qs, QVirtioDevice *d,
  indirect->elem = elem;
  indirect->desc = guest_alloc(alloc, sizeof(struct vring_desc) * elem);
  
-for (i = 0; i < elem - 1; ++i) {

+for (i = 0; i < elem; ++i) {
  /* indirect->desc[i].addr */
  qvirtio_writeq(d, qs, indirect->desc + (16 * i), 0);
-/* indirect->desc[i].flags */
-qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12,
-   VRING_DESC_F_NEXT);
-/* indirect->desc[i].next */
-qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, i + 1);
+
+/*
+ * If it's not the last element of the ring, set
+ * the chain (VRING_DESC_F_NEXT) flag and
+ * desc->next. Clear the last element - there's
+ * no guarantee that guest_alloc() will do it.
+ */
+if (i != elem - 1) {
+/* indirect->desc[i].flags */
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12,
+   VRING_DESC_F_NEXT);
+
+/* indirect->desc[i].next */
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, i + 1);
+} else {
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12, 0);
+qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, 0);
+}
  }


In my view it would be cleaner to add 2 extra function calls after this
loop for the i==elem-1 case:

for (i = 0; i < elem - 1; ++i) {
/* indirect->desc[i].addr */
qvirtio_writeq(d, qs, indirect->desc + (16 * i), 0);
/* indirect->desc[i].flags */
qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12,
   VRING_DESC_F_NEXT);
/* indirect->desc[i].next */
qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, i + 1);

}

 /* clear last element */
 qvirtio_writeq(d, qs, indirect->desc + (16 * i), 0);
 qvirtio_writew(d, qs, indirect->desc + (16 * i) + 12, 0);
 qvirtio_writew(d, qs, indirect->desc + (16 * i) + 14, 0);


FWIW, -- it's too late to change it now I think.

/mjt

Re: [PATCH v7 1/2] qom: new object to associate device to numa node

2024-03-01 Thread Alex Williamson

On Fri, 1 Mar 2024 08:33:42 +
Ankit Agrawal  wrote:

> >> As for your suggestion of using acpi-dev as the arg to take both
> >> pci-dev and acpi-dev.. Would that mean sending a pure pci device
> >> (not the corner case you mentioned) through the acpi-dev argument
> >> as well? Not sure if that would appropriate.  
> >
> > Ah, looking up my description is unclear. I meant two optional parameters
> > pci-dev or acpi-dev - which one was supplied would indicate the type
> > of handle to be used.  
> 
> Yes, that makes sense. But for now only have pci-dev until we have any
> acpi-dev related code added? IIRC, this is what we discussed earlier.

Yep, I think we acknowledged that this device supports either PCI or
ACPI devices and we only currently have a use case for PCI devices, so
that's what's implemented and what the interface expects.  A separate
ACPI device interface could be added later or derived from updating the
interface to accept a generic device object.  Thanks,

Alex

Re: Re: [PATCH] Fix unexpected Illegal instruction error on RISC-V.

2024-03-01 Thread 李永泰

Hi Philippe,

Thanks for the heads up. Sorry I didn't check for this before sending out my 
patch.
I'll track this in Demin's thread.


> -原始邮件-
> 发件人: "Philippe Mathieu-Daudé" 
> 发送时间: 2024-03-01 23:51:03 (星期五)
> 收件人: SiHuaN , qemu-devel@nongnu.org
> 抄送: "demin.han" , qemu-riscv 
, "Daniel Henrique Barboza" 
> 主题: Re: [PATCH] Fix unexpected Illegal instruction error on RISC-V.
> 
> Hi SiHuaN,
> 
> On 1/3/24 15:55, SiHuaN wrote:
> > Avoid right-shifting by a negative number of bits when lmul is 8.
> 
> FYI Demin posted a similar patch, see:
> 
https://lore.kernel.org/qemu-devel/20240225174114.5298-1-demin@starfivetech.com/
> 
> > Signed-off-by: SiHuaN 
> > ---
> >   target/riscv/vector_helper.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/target/riscv/vector_helper.c 
b/target/riscv/vector_helper.c
> > index 84cec73eb2..f0158ea237 100644
> > --- a/target/riscv/vector_helper.c
> > +++ b/target/riscv/vector_helper.c
> > @@ -53,10 +53,11 @@ target_ulong HELPER(vsetvl)(CPURISCVState *env, 
target_ulong s1,
> >* VLEN * LMUL >= SEW
> >* VLEN >> (8 - lmul) >= sew
> >* (vlenb << 3) >> (8 - lmul) >= sew
> > + * Considering that lmul may be 8, the following form cannot 
be used.
> >* vlenb >> (8 - 3 - lmul) >= sew
> >*/
> >   if (vlmul == 4 ||
> > -cpu->cfg.vlenb >> (8 - 3 - vlmul) < sew) {
> > +(cpu->cfg.vlenb << 3) >> (8 - vlmul) < 
sew) {
> >   vill = true;
> >   }
> >   }

Re: [PATCH v1 0/5] hw/ppc: SPI model

2024-03-01 Thread Chalapathi V


Hello,

I would greatly appreciate the review comments/suggestions on PATCH V1.

Thank You and Regards,

Chalapathi

On 07-02-2024 21:38, Chalapathi V wrote:

Hello,

In this series of patchset, SPI controller and responder models
for Power10 processor are modelled.

Serial peripheral interface provides full-duplex synchronous serial
communication between single controller and multiple responder devices.

The current configuration supports a single SPI controller reside on the
SPI bus. In p10, SPI controller device model supports a connection to a
single SPI responder such as SPI seeproms, TPM, flash device and an ADC
controller.

SPI controller model is divided into configuration unit, sequencer FSM
and shifter engine. All SPI function control is mapped into the SPI register
space to enable full control by firmware.

SPI configuration component is modelled which contains all SPI configuration
and status registers as well as the hold registers for data to be sent or
having been received.
Shift engine performs serialization and de-serialization according to the
control by the sequencer and according to the setup defined in the
configuration registers.
Sequencer implements the main control logic and
FSM to handle data transmit and data receive control of the shift engine.

Microchip's 25CSM04 SEEPROM device is modelled and connected to SPI bus
"spi_bus2" of SPI controller "PIB_SPIC[2]".

Patches overview in V1.
PATCH1: Create a SPI responder model which includes responder methods
 and SPI bus implementation.
PATCH2: Create a SPI controller model and implement configuration unit
 to model SCOM registers.
PATCH3: SPI controller model: implement sequencer FSM and shift engine.
PATCH4: create SPI SEEPROM model.
PATCH5: Connect SPI controllers to p10 chip and create keystore seeprom
 device on spi_bus2 of PIB_SPIC[2].

Thank You,
Chalapathi

Chalapathi V (5):
   hw/ppc: SPI responder model
   hw/ppc: SPI controller model - registers implementation
   hw/ppc: SPI controller model - sequencer and shifter
   hw/ppc: SPI SEEPROM model
   hw/ppc: SPI controller wiring to P10 chip and create seeprom device

  include/hw/ppc/pnv_chip.h   |4 +
  include/hw/ppc/pnv_spi_controller.h |  101 ++
  include/hw/ppc/pnv_spi_responder.h  |  109 ++
  include/hw/ppc/pnv_spi_seeprom.h|   70 ++
  include/hw/ppc/pnv_xscom.h  |3 +
  hw/ppc/pnv.c|   32 +
  hw/ppc/pnv_spi_controller.c | 1609 +++
  hw/ppc/pnv_spi_responder.c  |  166 +++
  hw/ppc/pnv_spi_seeprom.c|  989 
  hw/ppc/meson.build  |3 +
  10 files changed, 3086 insertions(+)
  create mode 100644 include/hw/ppc/pnv_spi_controller.h
  create mode 100644 include/hw/ppc/pnv_spi_responder.h
  create mode 100644 include/hw/ppc/pnv_spi_seeprom.h
  create mode 100644 hw/ppc/pnv_spi_controller.c
  create mode 100644 hw/ppc/pnv_spi_responder.c
  create mode 100644 hw/ppc/pnv_spi_seeprom.c

Re: [PULL 26/29] contrib/plugins: extend execlog to track register changes

Zhao Liu  writes:

> On Fri, Mar 01, 2024 at 10:22:08AM +, Alex Bennée wrote:
>> Date: Fri, 01 Mar 2024 10:22:08 +
>> From: Alex Bennée 
>> Subject: Re: [PULL 26/29] contrib/plugins: extend execlog to track register
>>  changes
>> 
>> Zhao Liu  writes:
>> 
>> > Hi Alex,
>> >
>> > I hit the following warnings (with "./configure --enable-werror"):
>> >
>> > /qemu/contrib/plugins/execlog.c: In function ‘registers_init’:
>> > /qemu/contrib/plugins/execlog.c:330:17: warning: ‘g_pattern_match_string’ 
>> > is deprecated: Use 'g_pattern_spec_match_string' instead 
>> > [-Wdeprecated-declarations]
>> >   330 | if (g_pattern_match_string(pat, rd->name) ||
>> >   | ^~
>> > In file included from /usr/include/glib-2.0/glib.h:65,
>> >  from /qemu/contrib/plugins/execlog.c:9:
>> > /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
>> >55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
>> >   |   ^~
>> > /qemu/contrib/plugins/execlog.c:331:21: warning: ‘g_pattern_match_string’ 
>> > is deprecated: Use 'g_pattern_spec_match_string' instead 
>> > [-Wdeprecated-declarations]
>> >   331 | g_pattern_match_string(pat, rd_lower)) {
>> >   | ^~
>> > In file included from /usr/include/glib-2.0/glib.h:65,
>> >  from /qemu/contrib/plugins/execlog.c:9:
>> > /usr/include/glib-2.0/glib/gpattern.h:55:15: note: declared here
>> >55 | gboolean  g_pattern_match_string   (GPatternSpec *pspec,
>> >   |   ^~
>> > /qemu/contrib/plugins/execlog.c:339:63: warning: passing argument 2 of 
>> > ‘g_ptr_array_add’ discards ‘const’ qualifier from pointer target type 
>> > [-Wdiscarded-qualifiers]
>> >   339 | g_ptr_array_add(all_reg_names,
>> > reg->name);
>> 
>> Hmm I missed that. Not sure what the neatest solution is in this case -
>> g_ptr_array_new() doesn't have a destroy func so we shouldn't ever
>> attempt to free it. We can manually re-add the const qualifier at the
>> other end for completeness and I guess comment and cast?
>
> I find other palces use 2 ways:
>   * Use g_strdup() to create a copy (e.g., net/net.c,
> add_nic_model_help()). But I'm not sure if this is OK since you said
> we shouldn't attempt to free it. May I ask if the free issue you
> mentioned will affect the use of g_strdup() here?

If we g_strdup we have to remember to free later and ensure the
g_ptr_array has a free func.

>   * Another way is the forced conversion to gpointer (also e.g., in
> net/net.c, qemu_get_nic_models()).

I think this is ok, but with a comment on all_reg_names just to remind
any potential users that the strings are const and allocated by QEMU so
are not to be modified or freed.

>
> Which way do you like? ;-)
>

>> 
>> but I hesitated to add it for this case as plugins shouldn't assume they
>> have access to QEMU's internals. Maybe the glib-compat.h header could be
>> treated as a special case.
>
> Thanks! This works on my side!
>
> I support to fix the compatibility as the above, after all it's always
> confusing when we allow users to use newer glib and see warnings at
> compile time!
>
>> >
>> > I also noticed in target/arm/helper.c, there's another
>> > g_pattern_match_string() but I haven't met the warning.
>> 
>> Hmm that's weird. I suspect glib suppresses the warnings with:
>> 
>>   /* Ask for warnings for anything that was marked deprecated in
>>* the defined version, or before. It is a candidate for rewrite.
>>*/
>>   #define GLIB_VERSION_MIN_REQUIRED GLIB_VERSION_2_56
>>
>
> I'm not too familiar with the QEMU build framework, but based on this, a
> natural question is, can this rule be applied to plugins code as well?
> If so, this would also avoid warning.

I think the simplest solution would be to add:

  #include "glib-compat.h"

into qemu-plugin.h so plugins have the same deprecation rules as the
QEMU they come from. I checked with Michael on IRC and Debian currently
doesn't package any header files but if anyone starts shipping a qemu-dev
package we'll need to make sure we include glib-compat.h in the same
directory and qemu-plugin.h.

>
> Thanks,
> Zhao

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v1 0/5] hw/ppc: SPI model

2024-03-01 Thread Cédric Le Goater


Chalapathi,

On 3/1/24 17:17, Chalapathi V wrote:

Hello,

I would greatly appreciate the review comments/suggestions on PATCH V1.

Thank You and Regards,


I didn't forget but I lacked the time in this release cycle. Sorry
about that.

I have one quick comment though. There are already a few models
implementing SPI controllers in QEMU. When I skimmed through the
patches, I was surprised to see no use of the available SSI
framework. Doesn't the current framework fit your needs ?

Please take a look at files in : include/hw/ssi/* hw/ssi/*

Same comment for the Serial EEPROM model. This device is generic
and not POWER specific. It should be possible to attach the device
model on other machines and different SPI bus provided by QEMU.
This doesn't mean rewriting everything, but the "HW" interface
probably needs to be reworked. It would make it easier to write
unit test (see ests/qtest/) and ease the review also.

Thanks,

C.





Chalapathi

On 07-02-2024 21:38, Chalapathi V wrote:

Hello,

In this series of patchset, SPI controller and responder models
for Power10 processor are modelled.

Serial peripheral interface provides full-duplex synchronous serial
communication between single controller and multiple responder devices.

The current configuration supports a single SPI controller reside on the
SPI bus. In p10, SPI controller device model supports a connection to a
single SPI responder such as SPI seeproms, TPM, flash device and an ADC
controller.

SPI controller model is divided into configuration unit, sequencer FSM
and shifter engine. All SPI function control is mapped into the SPI register
space to enable full control by firmware.

SPI configuration component is modelled which contains all SPI configuration
and status registers as well as the hold registers for data to be sent or
having been received.
Shift engine performs serialization and de-serialization according to the
control by the sequencer and according to the setup defined in the
configuration registers.
Sequencer implements the main control logic and
FSM to handle data transmit and data receive control of the shift engine.

Microchip's 25CSM04 SEEPROM device is modelled and connected to SPI bus
"spi_bus2" of SPI controller "PIB_SPIC[2]".

Patches overview in V1.
PATCH1: Create a SPI responder model which includes responder methods
 and SPI bus implementation.
PATCH2: Create a SPI controller model and implement configuration unit
 to model SCOM registers.
PATCH3: SPI controller model: implement sequencer FSM and shift engine.
PATCH4: create SPI SEEPROM model.
PATCH5: Connect SPI controllers to p10 chip and create keystore seeprom
 device on spi_bus2 of PIB_SPIC[2].

Thank You,
Chalapathi

Chalapathi V (5):
   hw/ppc: SPI responder model
   hw/ppc: SPI controller model - registers implementation
   hw/ppc: SPI controller model - sequencer and shifter
   hw/ppc: SPI SEEPROM model
   hw/ppc: SPI controller wiring to P10 chip and create seeprom device

  include/hw/ppc/pnv_chip.h   |    4 +
  include/hw/ppc/pnv_spi_controller.h |  101 ++
  include/hw/ppc/pnv_spi_responder.h  |  109 ++
  include/hw/ppc/pnv_spi_seeprom.h    |   70 ++
  include/hw/ppc/pnv_xscom.h  |    3 +
  hw/ppc/pnv.c    |   32 +
  hw/ppc/pnv_spi_controller.c | 1609 +++
  hw/ppc/pnv_spi_responder.c  |  166 +++
  hw/ppc/pnv_spi_seeprom.c    |  989 
  hw/ppc/meson.build  |    3 +
  10 files changed, 3086 insertions(+)
  create mode 100644 include/hw/ppc/pnv_spi_controller.h
  create mode 100644 include/hw/ppc/pnv_spi_responder.h
  create mode 100644 include/hw/ppc/pnv_spi_seeprom.h
  create mode 100644 hw/ppc/pnv_spi_controller.c
  create mode 100644 hw/ppc/pnv_spi_responder.c
  create mode 100644 hw/ppc/pnv_spi_seeprom.c

Re: [PATCH 3/9] backends/confidential-guest-support: Add functions to support IGVM

2024-03-01 Thread Daniel P . Berrangé

On Tue, Feb 27, 2024 at 02:50:09PM +, Roy Hopkins wrote:
> In preparation for supporting the processing of IGVM files to configure
> guests, this adds a set of functions to ConfidentialGuestSupport
> allowing configuration of secure virtual machines that can be
> implemented for each supported isolation platform type such as Intel TDX
> or AMD SEV-SNP. These functions will be called by IGVM processing code
> in subsequent patches.
> 
> This commit provides a default implementation of the functions that
> either perform no action or generate a warning or error when they are
> called. Targets that support ConfidentalGuestSupport should override
> these implementations.
> 
> Signed-off-by: Roy Hopkins 
> ---
>  backends/confidential-guest-support.c | 26 
>  include/exec/confidential-guest-support.h | 76 +++
>  2 files changed, 102 insertions(+)
> 
> diff --git a/backends/confidential-guest-support.c 
> b/backends/confidential-guest-support.c
> index da436fb736..42628be8d7 100644
> --- a/backends/confidential-guest-support.c
> +++ b/backends/confidential-guest-support.c
> @@ -14,6 +14,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "exec/confidential-guest-support.h"
> +#include "qemu/error-report.h"
>  
>  OBJECT_DEFINE_ABSTRACT_TYPE(ConfidentialGuestSupport,
>  confidential_guest_support,
> @@ -45,8 +46,33 @@ static void 
> confidential_guest_support_class_init(ObjectClass *oc, void *data)
>  #endif
>  }
>  
> +static int check_support(ConfidentialGuestPlatformType platform,
> + uint16_t platform_version, uint8_t highest_vtl,
> + uint64_t shared_gpa_boundary)
> +{
> +/* Default: no support. */
> +return 0;
> +}
> +
> +static int set_guest_state(hwaddr gpa, uint8_t *ptr, uint64_t len,
> + ConfidentialGuestPageType memory_type,
> + uint16_t cpu_index)
> +{
> +warn_report("Confidential guest memory not supported");
> +return -1;
> +}
> +
> +static int get_mem_map_entry(int index, ConfidentialGuestMemoryMapEntry 
> *entry)
> +{
> +return 1;
> +}

IIUC, all these can reports errors, and as such I think
they should have an 'Error **errp' parameter, so we can
report precise errors in these methods, rather than
less useful generic errors in the caller.

The above 'warn_report' ought to be an error too, since
it is returning an failure code (-1)

> +
>  static void confidential_guest_support_init(Object *obj)
>  {
> +ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
> +cgs->check_support = check_support;
> +cgs->set_guest_state = set_guest_state;
> +cgs->get_mem_map_entry = get_mem_map_entry;
>  }
>  
>  static void confidential_guest_support_finalize(Object *obj)
> diff --git a/include/exec/confidential-guest-support.h 
> b/include/exec/confidential-guest-support.h
> index b08ad8de4d..c43a1a32f1 100644
> --- a/include/exec/confidential-guest-support.h
> +++ b/include/exec/confidential-guest-support.h
> @@ -21,10 +21,46 @@
>  #ifndef CONFIG_USER_ONLY
>  
>  #include "qom/object.h"
> +#include "exec/hwaddr.h"
> +
> +#if defined(CONFIG_IGVM)
> +#include "igvm/igvm.h"
> +#endif
>  
>  #define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
>  OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, 
> CONFIDENTIAL_GUEST_SUPPORT)
>  
> +typedef enum ConfidentialGuestPlatformType {
> +CGS_PLATFORM_SEV,
> +CGS_PLATFORM_SEV_ES,
> +CGS_PLATFORM_SEV_SNP,
> +CGS_PLATFORM_TDX,

QEMU only has support for SEV & SEV_ES today. We should leave the
others until we actually get an impl of SEV-SNP/TDX in QEMU that
supports those platforms.

> +} ConfidentialGuestPlatformType;
> +
> +typedef enum ConfidentialGuestMemoryType {
> +CGS_MEM_RAM,
> +CGS_MEM_RESERVED,
> +CGS_MEM_ACPI,
> +CGS_MEM_NVS,
> +CGS_MEM_UNUSABLE,
> +} ConfidentialGuestMemoryType;
> +
> +typedef struct ConfidentialGuestMemoryMapEntry {
> +uint64_t gpa;
> +uint64_t size;
> +ConfidentialGuestMemoryType type;
> +} ConfidentialGuestMemoryMapEntry;
> +
> +typedef enum ConfidentialGuestPageType {
> +CGS_PAGE_TYPE_NORMAL,
> +CGS_PAGE_TYPE_VMSA,
> +CGS_PAGE_TYPE_ZERO,
> +CGS_PAGE_TYPE_UNMEASURED,
> +CGS_PAGE_TYPE_SECRETS,
> +CGS_PAGE_TYPE_CPUID,
> +CGS_PAGE_TYPE_REQUIRED_MEMORY,
> +} ConfidentialGuestPageType;
> +
>  struct ConfidentialGuestSupport {
>  Object parent;
>  
> @@ -60,6 +96,46 @@ struct ConfidentialGuestSupport {
>   */
>  char *igvm_filename;
>  #endif
> +
> +/*
> + * The following virtual methods need to be implemented by systems that
> + * support confidential guests that can be configured with IGVM and are
> + * used during processing of the IGVM file with process_igvm().
> + */
> +
> +/*
> + * Check for to see if this confidential guest supports a particular
> + * platform or configuration
> + */
> +in

Re: [PATCH] hw/misc: zynq_slcr: set SLC_RST bit in REBOOT_STATUS register