date:20250207

[RFC PATCH v3 0/8] crypto,io,migration: Add support to gnutls_bye()

2025-02-07 Thread Fabiano Rosas

(cover-letter update I forgot on v2:)

This series now contains the two approches we've been discussing to
avoid the TLS termination error on the multifd_recv threads.

The source machine now ends the TLS session with gnutls_bye() and the
destination will consider a premature termination an error. The only
exception is the src <9.1 case where there's a compatibility issue, in
which case the presence of multifd-tls-clean-termination=false will
cause the destination to (always) ignore a premature termination
error.

changes in v3:

Reordered the patches to have the io/crypto stuff at the start and the
compat property before the code that breaks compat.

Commit message improvements.

Turned assert into an warning when gnutls_bye() fails but migration
succeeded (should never happen).

Other minor fixes asked by Daniel.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1661172595

v2:
https://lore.kernel.org/r/20250207142758.6936-1-faro...@suse.de

v1:
https://lore.kernel.org/r/20250206175824.22664-1-faro...@suse.de

Hi,

We've been discussing a way to stop multifd recv threads from getting
an error at the end of migration when the source threads close the
iochannel without ending the TLS session.

The original issue was introduced by commit 1d457daf86
("migration/multifd: Further remove the SYNC on complete") which
altered the synchronization of the source and destination in a manner
that causes the destination to already be waiting at recv() when the
source closes the connection.

One approach would be to issue gnutls_bye() at the source after all
the data has been sent. The destination would then gracefully exit
when it gets EOF.

Aside from stopping the recv thread from seeing an error, this also
creates a contract that all connections should be closed only after
the TLS session is ended. This helps to avoid masking a legitimate
issue where the connection is closed prematurely.

Fabiano Rosas (8):
  crypto: Allow gracefully ending the TLS session
  io: tls: Add qio_channel_tls_bye
  crypto: Remove qcrypto_tls_session_get_handshake_status
  io: Add flags argument to qio_channel_readv_full_all_eof
  io: Add a read flag for relaxed EOF
  migration/multifd: Terminate the TLS connection
  migration/multifd: Add a compat property for TLS termination
  migration: Check migration error after loadvm

 crypto/tlssession.c | 96 ++---
 hw/core/machine.c   |  1 +
 hw/remote/mpqemu-link.c |  2 +-
 include/crypto/tlssession.h | 46 --
 include/io/channel-tls.h| 12 
 include/io/channel.h|  3 +
 io/channel-tls.c| 92 ++-
 io/channel.c|  9 ++-
 io/trace-events |  5 ++
 migration/migration.h   | 33 ++
 migration/multifd.c | 53 +++-
 migration/multifd.h |  2 +
 migration/options.c |  2 +
 migration/savevm.c  |  6 +-
 migration/tls.c |  5 ++
 migration/tls.h |  2 +-
 tests/unit/test-crypto-tlssession.c | 12 ++--
 17 files changed, 305 insertions(+), 76 deletions(-)

-- 
2.35.3

[RFC PATCH v3 3/8] crypto: Remove qcrypto_tls_session_get_handshake_status

2025-02-07 Thread Fabiano Rosas

The correct way of calling qcrypto_tls_session_handshake() requires
calling qcrypto_tls_session_get_handshake_status() right after it so
there's no reason to have a separate method.

Refactor qcrypto_tls_session_handshake() to inform the status in its
own return value and alter the callers accordingly.

No functional change.

Suggested-by: Daniel P. Berrangé 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 crypto/tlssession.c | 63 +++--
 include/crypto/tlssession.h | 32 ---
 io/channel-tls.c|  7 ++--
 tests/unit/test-crypto-tlssession.c | 12 ++
 4 files changed, 39 insertions(+), 75 deletions(-)

diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index d769d7a304..6d8f8df623 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -546,45 +546,35 @@ qcrypto_tls_session_handshake(QCryptoTLSSession *session,
   Error **errp)
 {
 int ret = gnutls_handshake(session->handle);
-if (ret == 0) {
+if (!ret) {
 session->handshakeComplete = true;
-} else {
-if (ret == GNUTLS_E_INTERRUPTED ||
-ret == GNUTLS_E_AGAIN) {
-ret = 1;
-} else {
-if (session->rerr || session->werr) {
-error_setg(errp, "TLS handshake failed: %s: %s",
-   gnutls_strerror(ret),
-   error_get_pretty(session->rerr ?
-session->rerr : session->werr));
-} else {
-error_setg(errp, "TLS handshake failed: %s",
-   gnutls_strerror(ret));
-}
-ret = -1;
-}
-}
-error_free(session->rerr);
-error_free(session->werr);
-session->rerr = session->werr = NULL;
-
-return ret;
-}
-
-
-QCryptoTLSSessionHandshakeStatus
-qcrypto_tls_session_get_handshake_status(QCryptoTLSSession *session)
-{
-if (session->handshakeComplete) {
 return QCRYPTO_TLS_HANDSHAKE_COMPLETE;
-} else if (gnutls_record_get_direction(session->handle) == 0) {
-return QCRYPTO_TLS_HANDSHAKE_RECVING;
+}
+
+if (ret == GNUTLS_E_INTERRUPTED || ret == GNUTLS_E_AGAIN) {
+int direction = gnutls_record_get_direction(session->handle);
+return direction ? QCRYPTO_TLS_HANDSHAKE_SENDING :
+QCRYPTO_TLS_HANDSHAKE_RECVING;
+}
+
+if (session->rerr || session->werr) {
+error_setg(errp, "TLS handshake failed: %s: %s",
+   gnutls_strerror(ret),
+   error_get_pretty(session->rerr ?
+session->rerr : session->werr));
 } else {
-return QCRYPTO_TLS_HANDSHAKE_SENDING;
+error_setg(errp, "TLS handshake failed: %s",
+   gnutls_strerror(ret));
 }
+
+error_free(session->rerr);
+error_free(session->werr);
+session->rerr = session->werr = NULL;
+
+return -1;
 }
 
+
 int
 qcrypto_tls_session_bye(QCryptoTLSSession *session, Error **errp)
 {
@@ -726,13 +716,6 @@ qcrypto_tls_session_handshake(QCryptoTLSSession *sess,
 }
 
 
-QCryptoTLSSessionHandshakeStatus
-qcrypto_tls_session_get_handshake_status(QCryptoTLSSession *sess)
-{
-return QCRYPTO_TLS_HANDSHAKE_COMPLETE;
-}
-
-
 int
 qcrypto_tls_session_bye(QCryptoTLSSession *session, Error **errp)
 {
diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
index c0f64ce989..d77ae0d423 100644
--- a/include/crypto/tlssession.h
+++ b/include/crypto/tlssession.h
@@ -75,12 +75,14 @@
  *  GINT_TO_POINTER(fd));
  *
  *while (1) {
- *   if (qcrypto_tls_session_handshake(sess, errp) < 0) {
+ *   int ret = qcrypto_tls_session_handshake(sess, errp);
+ *
+ *   if (ret < 0) {
  *   qcrypto_tls_session_free(sess);
  *   return -1;
  *   }
  *
- *   switch(qcrypto_tls_session_get_handshake_status(sess)) {
+ *   switch(ret) {
  *   case QCRYPTO_TLS_HANDSHAKE_COMPLETE:
  *   if (qcrypto_tls_session_check_credentials(sess, errp) < )) {
  *   qcrypto_tls_session_free(sess);
@@ -170,7 +172,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoTLSSession, 
qcrypto_tls_session_free)
  *
  * Validate the peer's credentials after a successful
  * TLS handshake. It is an error to call this before
- * qcrypto_tls_session_get_handshake_status() returns
+ * qcrypto_tls_session_handshake() returns
  * QCRYPTO_TLS_HANDSHAKE_COMPLETE
  *
  * Returns 0 if the credentials validated, -1 on error
@@ -226,7 +228,7 @@ void qcrypto_tls_session_set_callbacks(QCryptoTLSSession 
*sess,
  * registered with qcrypto_tls_session_set_callbacks()
  *
  * It is an error to call this before
- * qcrypto_tls_session_get_handshake_status() returns
+ * qcrypto_tls_session_handshake() returns
  * QCRYPTO_TLS_HANDSHAKE_COMPLETE
  *
  * Returns: the number of bytes sent,
@@ -256,7 +258,7 @@ s

[RFC PATCH v3 7/8] migration/multifd: Add a compat property for TLS termination

2025-02-07 Thread Fabiano Rosas

We're currently changing the way the source multifd migration handles
the shutdown of the multifd channels when TLS is in use to perform a
clean termination by calling gnutls_bye().

Older src QEMUs will always close the channel without terminating the
TLS session. New dst QEMUs treat an unclean termination as an error.

Add multifd_clean_tls_termination (default true) that can be switched
on the destination whenever a src QEMU <= 9.2 is in use.

(Note that the compat property is only strictly necessary for src
QEMUs older than 9.1. Due to synchronization coincidences, src QEMUs
9.1 and 9.2 can put the destination in a condition where it doesn't
see the unclean termination. Still, make the property more inclusive
to facilitate potential backports.)

Signed-off-by: Fabiano Rosas 
---
 hw/core/machine.c |  1 +
 migration/migration.h | 33 +
 migration/multifd.c   | 15 +--
 migration/multifd.h   |  2 ++
 migration/options.c   |  2 ++
 5 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 254cc20c4c..02cff735b3 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -42,6 +42,7 @@ GlobalProperty hw_compat_9_2[] = {
 { "virtio-balloon-pci-transitional", "vectors", "0" },
 { "virtio-balloon-pci-non-transitional", "vectors", "0" },
 { "virtio-mem-pci", "vectors", "0" },
+{ "migration", "multifd-clean-tls-termination", "false" },
 };
 const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
 
diff --git a/migration/migration.h b/migration/migration.h
index 4c1fafc2b5..77def0b437 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -443,6 +443,39 @@ struct MigrationState {
  * Default value is false. (since 8.1)
  */
 bool multifd_flush_after_each_section;
+
+/*
+ * This variable only makes sense when set on the machine that is
+ * the destination of a multifd migration with TLS enabled. It
+ * affects the behavior of the last send->recv iteration with
+ * regards to termination of the TLS session.
+ *
+ * When set:
+ *
+ * - the destination QEMU instance can expect to never get a
+ *   GNUTLS_E_PREMATURE_TERMINATION error. Manifested as the error
+ *   message: "The TLS connection was non-properly terminated".
+ *
+ * When clear:
+ *
+ * - the destination QEMU instance can expect to see a
+ *   GNUTLS_E_PREMATURE_TERMINATION error in any multifd channel
+ *   whenever the last recv() call of that channel happens after
+ *   the source QEMU instance has already issued shutdown() on the
+ *   channel.
+ *
+ *   Commit 637280aeb2 (since 9.1) introduced a side effect that
+ *   causes the destination instance to not be affected by the
+ *   premature termination, while commit 1d457daf86 (since 10.0)
+ *   causes the premature termination condition to be once again
+ *   reachable.
+ *
+ * NOTE: Regardless of the state of this option, a premature
+ * termination of the TLS connection might happen due to error at
+ * any moment prior to the last send->recv iteration.
+ */
+bool multifd_clean_tls_termination;
+
 /*
  * This decides the size of guest memory chunk that will be used
  * to track dirty bitmap clearing.  The size of memory chunk will
diff --git a/migration/multifd.c b/migration/multifd.c
index 0296758c08..8045197be8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1151,6 +1151,7 @@ void multifd_recv_sync_main(void)
 
 static void *multifd_recv_thread(void *opaque)
 {
+MigrationState *s = migrate_get_current();
 MultiFDRecvParams *p = opaque;
 Error *local_err = NULL;
 bool use_packets = multifd_use_packets();
@@ -1159,18 +1160,28 @@ static void *multifd_recv_thread(void *opaque)
 trace_multifd_recv_thread_start(p->id);
 rcu_register_thread();
 
+if (!s->multifd_clean_tls_termination) {
+p->read_flags = QIO_CHANNEL_READ_FLAG_RELAXED_EOF;
+}
+
 while (true) {
 uint32_t flags = 0;
 bool has_data = false;
 p->normal_num = 0;
 
+
 if (use_packets) {
+struct iovec iov = {
+.iov_base = (void *)p->packet,
+.iov_len = p->packet_len
+};
+
 if (multifd_recv_should_exit()) {
 break;
 }
 
-ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
-   p->packet_len, &local_err);
+ret = qio_channel_readv_full_all_eof(p->c, &iov, 1, NULL, NULL,
+ p->read_flags, &local_err);
 if (!ret) {
 /* EOF */
 assert(!local_err);
diff --git a/migration/multifd.h b/migration/multifd.h
index bd785b9873..cf408ff721 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -244,6 +244,8 @@ typedef struct {
 uint32_t zer

[RFC PATCH v3 4/8] io: Add flags argument to qio_channel_readv_full_all_eof

2025-02-07 Thread Fabiano Rosas

We want to pass flags into qio_channel_tls_readv() but
qio_channel_readv_full_all_eof() doesn't take a flags argument.

No functional change.

Signed-off-by: Fabiano Rosas 
---
 hw/remote/mpqemu-link.c | 2 +-
 include/io/channel.h| 2 ++
 io/channel.c| 9 ++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
index e25f97680d..49885a1db6 100644
--- a/hw/remote/mpqemu-link.c
+++ b/hw/remote/mpqemu-link.c
@@ -110,7 +110,7 @@ static ssize_t mpqemu_read(QIOChannel *ioc, void *buf, 
size_t len, int **fds,
 bql_unlock();
 }
 
-ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, errp);
+ret = qio_channel_readv_full_all_eof(ioc, &iov, 1, fds, nfds, 0, errp);
 
 if (drop_bql && !iothread && !qemu_in_coroutine()) {
 bql_lock();
diff --git a/include/io/channel.h b/include/io/channel.h
index bdf0bca92a..58940eead5 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -885,6 +885,7 @@ void qio_channel_set_aio_fd_handler(QIOChannel *ioc,
  * @niov: the length of the @iov array
  * @fds: an array of file handles to read
  * @nfds: number of file handles in @fds
+ * @flags: read flags (QIO_CHANNEL_READ_FLAG_*)
  * @errp: pointer to a NULL-initialized error object
  *
  *
@@ -903,6 +904,7 @@ int coroutine_mixed_fn 
qio_channel_readv_full_all_eof(QIOChannel *ioc,
   const struct iovec *iov,
   size_t niov,
   int **fds, size_t *nfds,
+  int flags,
   Error **errp);
 
 /**
diff --git a/io/channel.c b/io/channel.c
index e3f17c24a0..ebd9322765 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -115,7 +115,8 @@ int coroutine_mixed_fn qio_channel_readv_all_eof(QIOChannel 
*ioc,
  size_t niov,
  Error **errp)
 {
-return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
+return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, 0,
+  errp);
 }
 
 int coroutine_mixed_fn qio_channel_readv_all(QIOChannel *ioc,
@@ -130,6 +131,7 @@ int coroutine_mixed_fn 
qio_channel_readv_full_all_eof(QIOChannel *ioc,
   const struct iovec *iov,
   size_t niov,
   int **fds, size_t *nfds,
+  int flags,
   Error **errp)
 {
 int ret = -1;
@@ -155,7 +157,7 @@ int coroutine_mixed_fn 
qio_channel_readv_full_all_eof(QIOChannel *ioc,
 while ((nlocal_iov > 0) || local_fds) {
 ssize_t len;
 len = qio_channel_readv_full(ioc, local_iov, nlocal_iov, local_fds,
- local_nfds, 0, errp);
+ local_nfds, flags, errp);
 if (len == QIO_CHANNEL_ERR_BLOCK) {
 if (qemu_in_coroutine()) {
 qio_channel_yield(ioc, G_IO_IN);
@@ -222,7 +224,8 @@ int coroutine_mixed_fn 
qio_channel_readv_full_all(QIOChannel *ioc,
   int **fds, size_t *nfds,
   Error **errp)
 {
-int ret = qio_channel_readv_full_all_eof(ioc, iov, niov, fds, nfds, errp);
+int ret = qio_channel_readv_full_all_eof(ioc, iov, niov, fds, nfds, 0,
+ errp);
 
 if (ret == 0) {
 error_setg(errp, "Unexpected end-of-file before all data were read");
-- 
2.35.3

[RFC PATCH v3 5/8] io: Add a read flag for relaxed EOF

2025-02-07 Thread Fabiano Rosas

Add a read flag that can inform a channel that it's ok to receive an
EOF at any moment. Channels that have some form of strict EOF
tracking, such as TLS session termination, may choose to ignore EOF
errors with the use of this flag.

This is being added for compatibility with older migration streams
that do not include a TLS termination step.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 include/io/channel.h | 1 +
 io/channel-tls.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index 58940eead5..62b657109c 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -35,6 +35,7 @@ OBJECT_DECLARE_TYPE(QIOChannel, QIOChannelClass,
 #define QIO_CHANNEL_WRITE_FLAG_ZERO_COPY 0x1
 
 #define QIO_CHANNEL_READ_FLAG_MSG_PEEK 0x1
+#define QIO_CHANNEL_READ_FLAG_RELAXED_EOF 0x2
 
 typedef enum QIOChannelFeature QIOChannelFeature;
 
diff --git a/io/channel-tls.c b/io/channel-tls.c
index ecde6b57bf..caf8301a9e 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -359,6 +359,7 @@ static ssize_t qio_channel_tls_readv(QIOChannel *ioc,
 tioc->session,
 iov[i].iov_base,
 iov[i].iov_len,
+flags & QIO_CHANNEL_READ_FLAG_RELAXED_EOF ||
 qatomic_load_acquire(&tioc->shutdown) & QIO_CHANNEL_SHUTDOWN_READ,
 errp);
 if (ret == QCRYPTO_TLS_SESSION_ERR_BLOCK) {
-- 
2.35.3

[RFC PATCH v3 2/8] io: tls: Add qio_channel_tls_bye

2025-02-07 Thread Fabiano Rosas

Add a task dispatcher for gnutls_bye similar to the
qio_channel_tls_handshake_task(). The gnutls_bye() call might be
interrupted and so it needs to be rescheduled.

The migration code will make use of this to help the migration
destination identify a premature EOF. Once the session termination is
in place, any EOF that happens before the source issued gnutls_bye()
will be considered an error.

Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 include/io/channel-tls.h | 12 ++
 io/channel-tls.c | 84 
 io/trace-events  |  5 +++
 3 files changed, 101 insertions(+)

diff --git a/include/io/channel-tls.h b/include/io/channel-tls.h
index 26c67f17e2..7e9023570d 100644
--- a/include/io/channel-tls.h
+++ b/include/io/channel-tls.h
@@ -49,8 +49,20 @@ struct QIOChannelTLS {
 QCryptoTLSSession *session;
 QIOChannelShutdown shutdown;
 guint hs_ioc_tag;
+guint bye_ioc_tag;
 };
 
+/**
+ * qio_channel_tls_bye:
+ * @ioc: the TLS channel object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Perform the TLS session termination. This method will return
+ * immediately and the termination will continue in the background,
+ * provided the main loop is running.
+ */
+void qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp);
+
 /**
  * qio_channel_tls_new_server:
  * @master: the underlying channel object
diff --git a/io/channel-tls.c b/io/channel-tls.c
index aab630e5ae..517ce190a4 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -247,6 +247,85 @@ void qio_channel_tls_handshake(QIOChannelTLS *ioc,
 qio_channel_tls_handshake_task(ioc, task, context);
 }
 
+static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition condition,
+   gpointer user_data);
+
+static void qio_channel_tls_bye_task(QIOChannelTLS *ioc, QIOTask *task,
+ GMainContext *context)
+{
+GIOCondition condition;
+QIOChannelTLSData *data;
+int status;
+Error *err = NULL;
+
+status = qcrypto_tls_session_bye(ioc->session, &err);
+
+if (status < 0) {
+trace_qio_channel_tls_bye_fail(ioc);
+qio_task_set_error(task, err);
+qio_task_complete(task);
+return;
+}
+
+if (status == QCRYPTO_TLS_BYE_COMPLETE) {
+qio_task_complete(task);
+return;
+}
+
+data = g_new0(typeof(*data), 1);
+data->task = task;
+data->context = context;
+
+if (context) {
+g_main_context_ref(context);
+}
+
+if (status == QCRYPTO_TLS_BYE_SENDING) {
+condition = G_IO_OUT;
+} else {
+condition = G_IO_IN;
+}
+
+trace_qio_channel_tls_bye_pending(ioc, status);
+ioc->bye_ioc_tag = qio_channel_add_watch_full(ioc->master, condition,
+  qio_channel_tls_bye_io,
+  data, NULL, context);
+}
+
+
+static gboolean qio_channel_tls_bye_io(QIOChannel *ioc, GIOCondition condition,
+   gpointer user_data)
+{
+QIOChannelTLSData *data = user_data;
+QIOTask *task = data->task;
+GMainContext *context = data->context;
+QIOChannelTLS *tioc = QIO_CHANNEL_TLS(qio_task_get_source(task));
+
+tioc->bye_ioc_tag = 0;
+g_free(data);
+qio_channel_tls_bye_task(tioc, task, context);
+
+if (context) {
+g_main_context_unref(context);
+}
+
+return FALSE;
+}
+
+static void propagate_error(QIOTask *task, gpointer opaque)
+{
+qio_task_propagate_error(task, opaque);
+}
+
+void qio_channel_tls_bye(QIOChannelTLS *ioc, Error **errp)
+{
+QIOTask *task;
+
+task = qio_task_new(OBJECT(ioc), propagate_error, errp, NULL);
+
+trace_qio_channel_tls_bye_start(ioc);
+qio_channel_tls_bye_task(ioc, task, NULL);
+}
 
 static void qio_channel_tls_init(Object *obj G_GNUC_UNUSED)
 {
@@ -379,6 +458,11 @@ static int qio_channel_tls_close(QIOChannel *ioc,
 g_clear_handle_id(&tioc->hs_ioc_tag, g_source_remove);
 }
 
+if (tioc->bye_ioc_tag) {
+trace_qio_channel_tls_bye_cancel(ioc);
+g_clear_handle_id(&tioc->bye_ioc_tag, g_source_remove);
+}
+
 return qio_channel_close(tioc->master, errp);
 }
 
diff --git a/io/trace-events b/io/trace-events
index d4c0f84a9a..dc3a63ba1f 100644
--- a/io/trace-events
+++ b/io/trace-events
@@ -44,6 +44,11 @@ qio_channel_tls_handshake_pending(void *ioc, int status) 
"TLS handshake pending
 qio_channel_tls_handshake_fail(void *ioc) "TLS handshake fail ioc=%p"
 qio_channel_tls_handshake_complete(void *ioc) "TLS handshake complete ioc=%p"
 qio_channel_tls_handshake_cancel(void *ioc) "TLS handshake cancel ioc=%p"
+qio_channel_tls_bye_start(void *ioc) "TLS termination start ioc=%p"
+qio_channel_tls_bye_pending(void *ioc, int status) "TLS termination pending 
ioc=%p status=%d"
+qio_channel_tls_bye_fail(void *ioc) "TLS termination fail ioc=%p"
+q

[RFC PATCH v3 6/8] migration/multifd: Terminate the TLS connection

2025-02-07 Thread Fabiano Rosas

The multifd recv side has been getting a TLS error of
GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
side closes the sockets without ending the TLS session. This has been
masked by the code not checking the migration error after loadvm.

Start ending the TLS session at multifd_send_shutdown() so the recv
side always sees a clean termination (EOF) and we can start to
differentiate that from an actual premature termination that might
possibly happen in the middle of the migration.

There's nothing to be done if a previous migration error has already
broken the connection, so add a comment explaining it and ignore any
errors coming from gnutls_bye().

This doesn't break compat with older recv-side QEMUs because EOF has
always caused the recv thread to exit cleanly.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/multifd.c | 38 +-
 migration/tls.c |  5 +
 migration/tls.h |  2 +-
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index ab73d6d984..0296758c08 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -490,6 +490,36 @@ void multifd_send_shutdown(void)
 return;
 }
 
+for (i = 0; i < migrate_multifd_channels(); i++) {
+MultiFDSendParams *p = &multifd_send_state->params[i];
+
+/* thread_created implies the TLS handshake has succeeded */
+if (p->tls_thread_created && p->thread_created) {
+Error *local_err = NULL;
+/*
+ * The destination expects the TLS session to always be
+ * properly terminated. This helps to detect a premature
+ * termination in the middle of the stream.  Note that
+ * older QEMUs always break the connection on the source
+ * and the destination always sees
+ * GNUTLS_E_PREMATURE_TERMINATION.
+ */
+migration_tls_channel_end(p->c, &local_err);
+
+/*
+ * The above can return an error in case the migration has
+ * already failed. If the migration succeeded, errors are
+ * not expected but there's no need to kill the source.
+ */
+if (local_err && !migration_has_failed(migrate_get_current())) {
+warn_report(
+"multifd_send_%d: Failed to terminate TLS connection: %s",
+p->id, error_get_pretty(local_err));
+break;
+}
+}
+}
+
 multifd_send_terminate_threads();
 
 for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -1141,7 +1171,13 @@ static void *multifd_recv_thread(void *opaque)
 
 ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
p->packet_len, &local_err);
-if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
+if (!ret) {
+/* EOF */
+assert(!local_err);
+break;
+}
+
+if (ret == -1) {
 break;
 }
 
diff --git a/migration/tls.c b/migration/tls.c
index fa03d9136c..5cbf952383 100644
--- a/migration/tls.c
+++ b/migration/tls.c
@@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
   NULL);
 }
 
+void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
+{
+qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
+}
+
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
 {
 if (!migrate_tls()) {
diff --git a/migration/tls.h b/migration/tls.h
index 5797d153cb..58b25e1228 100644
--- a/migration/tls.h
+++ b/migration/tls.h
@@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
QIOChannel *ioc,
const char *hostname,
Error **errp);
-
+void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
 /* Whether the QIO channel requires further TLS handshake? */
 bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
 
-- 
2.35.3

[RFC PATCH v3 1/8] crypto: Allow gracefully ending the TLS session

2025-02-07 Thread Fabiano Rosas

QEMU's TLS session code provides no way to call gnutls_bye() to
terminate a TLS session. Callers of qcrypto_tls_session_read() can
choose to ignore a GNUTLS_E_PREMATURE_TERMINATION error by setting the
gracefulTermination argument.

The QIOChannelTLS ignores the premature termination error whenever
shutdown() has already been issued. This was found to be not enough for
the migration code because shutdown() might not have been issued before
the connection is terminated.

Add support for calling gnutls_bye() in the tlssession layer so users
of QIOChannelTLS can clearly identify the end of a TLS session.

Reviewed-by: Daniel P. Berrangé 
Acked-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 crypto/tlssession.c | 41 +
 include/crypto/tlssession.h | 22 
 2 files changed, 63 insertions(+)

diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index 77286e23f4..d769d7a304 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -585,6 +585,40 @@ qcrypto_tls_session_get_handshake_status(QCryptoTLSSession 
*session)
 }
 }
 
+int
+qcrypto_tls_session_bye(QCryptoTLSSession *session, Error **errp)
+{
+int ret;
+
+if (!session->handshakeComplete) {
+return 0;
+}
+
+ret = gnutls_bye(session->handle, GNUTLS_SHUT_WR);
+
+if (!ret) {
+return QCRYPTO_TLS_BYE_COMPLETE;
+}
+
+if (ret == GNUTLS_E_INTERRUPTED || ret == GNUTLS_E_AGAIN) {
+int direction = gnutls_record_get_direction(session->handle);
+return direction ? QCRYPTO_TLS_BYE_SENDING : QCRYPTO_TLS_BYE_RECVING;
+}
+
+if (session->rerr || session->werr) {
+error_setg(errp, "TLS termination failed: %s: %s", 
gnutls_strerror(ret),
+   error_get_pretty(session->rerr ?
+session->rerr : session->werr));
+} else {
+error_setg(errp, "TLS termination failed: %s", gnutls_strerror(ret));
+}
+
+error_free(session->rerr);
+error_free(session->werr);
+session->rerr = session->werr = NULL;
+
+return -1;
+}
 
 int
 qcrypto_tls_session_get_key_size(QCryptoTLSSession *session,
@@ -699,6 +733,13 @@ qcrypto_tls_session_get_handshake_status(QCryptoTLSSession 
*sess)
 }
 
 
+int
+qcrypto_tls_session_bye(QCryptoTLSSession *session, Error **errp)
+{
+return QCRYPTO_TLS_BYE_COMPLETE;
+}
+
+
 int
 qcrypto_tls_session_get_key_size(QCryptoTLSSession *sess,
  Error **errp)
diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
index f694a5c3c5..c0f64ce989 100644
--- a/include/crypto/tlssession.h
+++ b/include/crypto/tlssession.h
@@ -323,6 +323,28 @@ typedef enum {
 QCryptoTLSSessionHandshakeStatus
 qcrypto_tls_session_get_handshake_status(QCryptoTLSSession *sess);
 
+typedef enum {
+QCRYPTO_TLS_BYE_COMPLETE,
+QCRYPTO_TLS_BYE_SENDING,
+QCRYPTO_TLS_BYE_RECVING,
+} QCryptoTLSSessionByeStatus;
+
+/**
+ * qcrypto_tls_session_bye:
+ * @session: the TLS session object
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Start, or continue, a TLS termination sequence. If the underlying
+ * data channel is non-blocking, then this method may return control
+ * before the termination is complete. The return value will indicate
+ * whether the termination has completed, or is waiting to send or
+ * receive data. In the latter cases, the caller should setup an event
+ * loop watch and call this method again once the underlying data
+ * channel is ready to read or write again.
+ */
+int
+qcrypto_tls_session_bye(QCryptoTLSSession *session, Error **errp);
+
 /**
  * qcrypto_tls_session_get_key_size:
  * @sess: the TLS session object
-- 
2.35.3

[RFC PATCH v3 8/8] migration: Check migration error after loadvm

2025-02-07 Thread Fabiano Rosas

We're currently only checking the QEMUFile error after
qemu_loadvm_state(). This was causing a TLS termination error from
multifd recv threads to be ignored.

Start checking the migration error as well to avoid missing further
errors.

Regarding compatibility concerning the TLS termination error that was
being ignored, for QEMUs <= 9.2 - if the old QEMU is being used as
migration source - the recently added migration property
multifd-tls-clean-termination needs to be set to OFF in the
*destination* machine.

Signed-off-by: Fabiano Rosas 
---
 migration/savevm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index bc375db282..4046faf009 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2940,7 +2940,11 @@ int qemu_loadvm_state(QEMUFile *f)
 
 /* When reaching here, it must be precopy */
 if (ret == 0) {
-ret = qemu_file_get_error(f);
+if (migrate_has_error(migrate_get_current())) {
+ret = -EINVAL;
+} else {
+ret = qemu_file_get_error(f);
+}
 }
 
 /*
-- 
2.35.3

Re: [PATCH] migration: use parameters.mode in cpr_state_save

2025-02-07 Thread Steven Sistare


On 2/5/2025 4:52 PM, Steven Sistare wrote:

On 2/5/2025 4:28 PM, Peter Xu wrote:

On Wed, Feb 05, 2025 at 12:54:01PM -0800, Steve Sistare wrote:

qmp_migrate guarantees that cpr_channel is not null for
MIG_MODE_CPR_TRANSFER when cpr_state_save is called:

 qmp_migrate()
 if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
 return;
 }
 cpr_state_save(cpr_channel)

but cpr_state_save checks for mode differently before using channel,
and Coverity cannot infer that they are equivalent in outgoing QEMU,
and warns that channel may be NULL:

 cpr_state_save(channel)
 MigMode mode = migrate_mode();
 if (mode == MIG_MODE_CPR_TRANSFER) {
 f = cpr_transfer_output(channel, errp);

To make Coverity happy, use parameters.mode in cpr_state_save.

Resolves: Coverity CID 1590980
Reported-by: Peter Maydell 
Signed-off-by: Steve Sistare 
---
  migration/cpr.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/cpr.c b/migration/cpr.c
index 584b0b9..7f20bd5 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -8,6 +8,7 @@
  #include "qemu/osdep.h"
  #include "qapi/error.h"
  #include "migration/cpr.h"
+#include "migration/migration.h"
  #include "migration/misc.h"
  #include "migration/options.h"
  #include "migration/qemu-file.h"
@@ -132,7 +133,7 @@ int cpr_state_save(MigrationChannel *channel, Error **errp)
  {
  int ret;
  QEMUFile *f;
-    MigMode mode = migrate_mode();
+    MigMode mode = migrate_get_current()->parameters.mode;


Are we sure this can make coverity happy?


It should, based on Peter Maydell's analysis, but I would appreciate
if he could apply and test the fix.


Another more straightforward change is caching migrate mode in
qmp_migrate() and also check that before invoking cpr_state_save().


Surely anyone would consider my one-line change to be straight forward.



Given that Coverity complains about channel, and not mode, this is the
most direct fix:


diff --git a/migration/cpr.c b/migration/cpr.c
index 59644e8..224b6ff 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -160,6 +160,7 @@ int cpr_state_save(MigrationChannel *channel, Error **errp)
 trace_cpr_state_save(MigMode_str(mode));

 if (mode == MIG_MODE_CPR_TRANSFER) {
+g_assert(channel);
 f = cpr_transfer_output(channel, errp);
 } else {
 return 0;
---

- Steve

[PATCH v4 7/9] target/*: Remove TARGET_LONG_BITS from cpu-param.h

2025-02-07 Thread Richard Henderson

This is now handled by the configs/targets/*.mak fragment.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/alpha/cpu-param.h  | 2 --
 target/arm/cpu-param.h| 2 --
 target/avr/cpu-param.h| 1 -
 target/hexagon/cpu-param.h| 1 -
 target/hppa/cpu-param.h   | 2 --
 target/i386/cpu-param.h   | 2 --
 target/loongarch/cpu-param.h  | 1 -
 target/m68k/cpu-param.h   | 1 -
 target/microblaze/cpu-param.h | 2 --
 target/mips/cpu-param.h   | 5 -
 target/openrisc/cpu-param.h   | 1 -
 target/ppc/cpu-param.h| 2 --
 target/riscv/cpu-param.h  | 2 --
 target/rx/cpu-param.h | 1 -
 target/s390x/cpu-param.h  | 1 -
 target/sh4/cpu-param.h| 1 -
 target/sparc/cpu-param.h  | 2 --
 target/tricore/cpu-param.h| 1 -
 target/xtensa/cpu-param.h | 1 -
 19 files changed, 31 deletions(-)

diff --git a/target/alpha/cpu-param.h b/target/alpha/cpu-param.h
index c21ddf1afd..ff06e41497 100644
--- a/target/alpha/cpu-param.h
+++ b/target/alpha/cpu-param.h
@@ -8,8 +8,6 @@
 #ifndef ALPHA_CPU_PARAM_H
 #define ALPHA_CPU_PARAM_H
 
-#define TARGET_LONG_BITS 64
-
 /* ??? EV4 has 34 phys addr bits, EV5 has 40, EV6 has 44.  */
 #define TARGET_PHYS_ADDR_SPACE_BITS  44
 
diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index bed29613c8..896b35bd6d 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -9,11 +9,9 @@
 #define ARM_CPU_PARAM_H
 
 #ifdef TARGET_AARCH64
-# define TARGET_LONG_BITS 64
 # define TARGET_PHYS_ADDR_SPACE_BITS  52
 # define TARGET_VIRT_ADDR_SPACE_BITS  52
 #else
-# define TARGET_LONG_BITS 32
 # define TARGET_PHYS_ADDR_SPACE_BITS  40
 # define TARGET_VIRT_ADDR_SPACE_BITS  32
 #endif
diff --git a/target/avr/cpu-param.h b/target/avr/cpu-param.h
index 93c2f470d0..81f3f49ee1 100644
--- a/target/avr/cpu-param.h
+++ b/target/avr/cpu-param.h
@@ -21,7 +21,6 @@
 #ifndef AVR_CPU_PARAM_H
 #define AVR_CPU_PARAM_H
 
-#define TARGET_LONG_BITS 32
 /*
  * TARGET_PAGE_BITS cannot be more than 8 bits because
  * 1.  all IO registers occupy [0x .. 0x00ff] address range, and they
diff --git a/target/hexagon/cpu-param.h b/target/hexagon/cpu-param.h
index 71b4a9b83e..45ee7b4640 100644
--- a/target/hexagon/cpu-param.h
+++ b/target/hexagon/cpu-param.h
@@ -19,7 +19,6 @@
 #define HEXAGON_CPU_PARAM_H
 
 #define TARGET_PAGE_BITS 16 /* 64K pages */
-#define TARGET_LONG_BITS 32
 
 #define TARGET_PHYS_ADDR_SPACE_BITS 36
 #define TARGET_VIRT_ADDR_SPACE_BITS 32
diff --git a/target/hppa/cpu-param.h b/target/hppa/cpu-param.h
index ef3200f0f3..7ed6b5741e 100644
--- a/target/hppa/cpu-param.h
+++ b/target/hppa/cpu-param.h
@@ -8,8 +8,6 @@
 #ifndef HPPA_CPU_PARAM_H
 #define HPPA_CPU_PARAM_H
 
-#define TARGET_LONG_BITS  64
-
 #if defined(CONFIG_USER_ONLY) && defined(TARGET_ABI32)
 # define TARGET_PHYS_ADDR_SPACE_BITS  32
 # define TARGET_VIRT_ADDR_SPACE_BITS  32
diff --git a/target/i386/cpu-param.h b/target/i386/cpu-param.h
index 8c75abe141..b0e884c5d7 100644
--- a/target/i386/cpu-param.h
+++ b/target/i386/cpu-param.h
@@ -9,7 +9,6 @@
 #define I386_CPU_PARAM_H
 
 #ifdef TARGET_X86_64
-# define TARGET_LONG_BITS 64
 # define TARGET_PHYS_ADDR_SPACE_BITS  52
 /*
  * ??? This is really 48 bits, sign-extended, but the only thing
@@ -18,7 +17,6 @@
  */
 # define TARGET_VIRT_ADDR_SPACE_BITS  47
 #else
-# define TARGET_LONG_BITS 32
 # define TARGET_PHYS_ADDR_SPACE_BITS  36
 # define TARGET_VIRT_ADDR_SPACE_BITS  32
 #endif
diff --git a/target/loongarch/cpu-param.h b/target/loongarch/cpu-param.h
index db5ad1c69f..52437946e5 100644
--- a/target/loongarch/cpu-param.h
+++ b/target/loongarch/cpu-param.h
@@ -8,7 +8,6 @@
 #ifndef LOONGARCH_CPU_PARAM_H
 #define LOONGARCH_CPU_PARAM_H
 
-#define TARGET_LONG_BITS 64
 #define TARGET_PHYS_ADDR_SPACE_BITS 48
 #define TARGET_VIRT_ADDR_SPACE_BITS 48
 
diff --git a/target/m68k/cpu-param.h b/target/m68k/cpu-param.h
index 5bbe623ba7..7afbf6d302 100644
--- a/target/m68k/cpu-param.h
+++ b/target/m68k/cpu-param.h
@@ -8,7 +8,6 @@
 #ifndef M68K_CPU_PARAM_H
 #define M68K_CPU_PARAM_H
 
-#define TARGET_LONG_BITS 32
 /*
  * Coldfire Linux uses 8k pages
  * and m68k linux uses 4k pages
diff --git a/target/microblaze/cpu-param.h b/target/microblaze/cpu-param.h
index 00efb509e3..c866ec6c14 100644
--- a/target/microblaze/cpu-param.h
+++ b/target/microblaze/cpu-param.h
@@ -17,11 +17,9 @@
  * of address space.
  */
 #ifdef CONFIG_USER_ONLY
-#define TARGET_LONG_BITS 32
 #define TARGET_PHYS_ADDR_SPACE_BITS 32
 #define TARGET_VIRT_ADDR_SPACE_BITS 32
 #else
-#define TARGET_LONG_BITS 64
 #define TARGET_PHYS_ADDR_SPACE_BITS 64
 #define TARGET_VIRT_ADDR_SPACE_BITS 64
 #endif
diff --git a/target/mips/cpu-param.h b/target/mips/cpu-param.h
index f3a37e2dbe..11b3ac0ac6 100644
--- a/target/mips/cpu-param.h
+++ b/target/mips/cpu-param.h
@@ -7,11 +7,6 @@
 #ifndef MIPS_CPU_PARAM_H
 #define MIPS_CPU_PARAM_H
 
-#ifdef TARGET_MIP

[PATCH v4 2/9] meson: Disallow 64-bit on 32-bit KVM emulation

2025-02-07 Thread Richard Henderson

Require a 64-bit host binary to spawn a 64-bit guest.

Reviewed-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/meson.build b/meson.build
index e50a103f8a..1af8aeb194 100644
--- a/meson.build
+++ b/meson.build
@@ -277,21 +277,27 @@ else
   host_arch = cpu
 endif
 
-if cpu in ['x86', 'x86_64']
+if cpu == 'x86'
+  kvm_targets = ['i386-softmmu']
+elif cpu == 'x86_64'
   kvm_targets = ['i386-softmmu', 'x86_64-softmmu']
 elif cpu == 'aarch64'
   kvm_targets = ['aarch64-softmmu']
 elif cpu == 's390x'
   kvm_targets = ['s390x-softmmu']
-elif cpu in ['ppc', 'ppc64']
+elif cpu == 'ppc'
+  kvm_targets = ['ppc-softmmu']
+elif cpu == 'ppc64'
   kvm_targets = ['ppc-softmmu', 'ppc64-softmmu']
-elif cpu in ['mips', 'mips64']
+elif cpu == 'mips'
+  kvm_targets = ['mips-softmmu', 'mipsel-softmmu']
+elif cpu == 'mips64'
   kvm_targets = ['mips-softmmu', 'mipsel-softmmu', 'mips64-softmmu', 
'mips64el-softmmu']
-elif cpu in ['riscv32']
+elif cpu == 'riscv32'
   kvm_targets = ['riscv32-softmmu']
-elif cpu in ['riscv64']
+elif cpu == 'riscv64'
   kvm_targets = ['riscv64-softmmu']
-elif cpu in ['loongarch64']
+elif cpu == 'loongarch64'
   kvm_targets = ['loongarch64-softmmu']
 else
   kvm_targets = []
-- 
2.43.0

[PATCH v4 6/9] configure: Define TARGET_LONG_BITS in configs/targets/*.mak

2025-02-07 Thread Richard Henderson

Define TARGET_LONG_BITS in each target's configure fragment.
Do this without removing the define in target/*/cpu-param.h
so that errors are caught like so:

In file included from .../src/include/exec/cpu-defs.h:26,
 from ../src/target/hppa/cpu.h:24,
 from ../src/linux-user/qemu.h:4,
 from ../src/linux-user/hppa/cpu_loop.c:21:
../src/target/hppa/cpu-param.h:11: error: "TARGET_LONG_BITS" redefined [-Werror]
   11 | #define TARGET_LONG_BITS  64
  |
In file included from .../src/include/qemu/osdep.h:36,
 from ../src/linux-user/hppa/cpu_loop.c:20:
./hppa-linux-user-config-target.h:32: note: this is the location of the 
previous definition
   32 | #define TARGET_LONG_BITS 32
  |
cc1: all warnings being treated as errors

Reviewed-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 configs/targets/aarch64-bsd-user.mak| 1 +
 configs/targets/aarch64-linux-user.mak  | 1 +
 configs/targets/aarch64-softmmu.mak | 1 +
 configs/targets/aarch64_be-linux-user.mak   | 1 +
 configs/targets/alpha-linux-user.mak| 1 +
 configs/targets/alpha-softmmu.mak   | 1 +
 configs/targets/arm-bsd-user.mak| 1 +
 configs/targets/arm-linux-user.mak  | 1 +
 configs/targets/arm-softmmu.mak | 1 +
 configs/targets/armeb-linux-user.mak| 1 +
 configs/targets/avr-softmmu.mak | 1 +
 configs/targets/hexagon-linux-user.mak  | 1 +
 configs/targets/hppa-linux-user.mak | 2 ++
 configs/targets/hppa-softmmu.mak| 1 +
 configs/targets/i386-bsd-user.mak   | 1 +
 configs/targets/i386-linux-user.mak | 1 +
 configs/targets/i386-softmmu.mak| 1 +
 configs/targets/loongarch64-linux-user.mak  | 1 +
 configs/targets/loongarch64-softmmu.mak | 1 +
 configs/targets/m68k-linux-user.mak | 1 +
 configs/targets/m68k-softmmu.mak| 1 +
 configs/targets/microblaze-linux-user.mak   | 1 +
 configs/targets/microblaze-softmmu.mak  | 3 +++
 configs/targets/microblazeel-linux-user.mak | 1 +
 configs/targets/microblazeel-softmmu.mak| 3 +++
 configs/targets/mips-linux-user.mak | 1 +
 configs/targets/mips-softmmu.mak| 1 +
 configs/targets/mips64-linux-user.mak   | 1 +
 configs/targets/mips64-softmmu.mak  | 1 +
 configs/targets/mips64el-linux-user.mak | 1 +
 configs/targets/mips64el-softmmu.mak| 1 +
 configs/targets/mipsel-linux-user.mak   | 1 +
 configs/targets/mipsel-softmmu.mak  | 1 +
 configs/targets/mipsn32-linux-user.mak  | 1 +
 configs/targets/mipsn32el-linux-user.mak| 1 +
 configs/targets/or1k-linux-user.mak | 1 +
 configs/targets/or1k-softmmu.mak| 1 +
 configs/targets/ppc-linux-user.mak  | 1 +
 configs/targets/ppc-softmmu.mak | 1 +
 configs/targets/ppc64-linux-user.mak| 1 +
 configs/targets/ppc64-softmmu.mak   | 1 +
 configs/targets/ppc64le-linux-user.mak  | 1 +
 configs/targets/riscv32-linux-user.mak  | 1 +
 configs/targets/riscv32-softmmu.mak | 1 +
 configs/targets/riscv64-bsd-user.mak| 1 +
 configs/targets/riscv64-linux-user.mak  | 1 +
 configs/targets/riscv64-softmmu.mak | 1 +
 configs/targets/rx-softmmu.mak  | 1 +
 configs/targets/s390x-linux-user.mak| 1 +
 configs/targets/s390x-softmmu.mak   | 1 +
 configs/targets/sh4-linux-user.mak  | 1 +
 configs/targets/sh4-softmmu.mak | 1 +
 configs/targets/sh4eb-linux-user.mak| 1 +
 configs/targets/sh4eb-softmmu.mak   | 1 +
 configs/targets/sparc-linux-user.mak| 1 +
 configs/targets/sparc-softmmu.mak   | 1 +
 configs/targets/sparc32plus-linux-user.mak  | 1 +
 configs/targets/sparc64-linux-user.mak  | 1 +
 configs/targets/sparc64-softmmu.mak | 1 +
 configs/targets/tricore-softmmu.mak | 1 +
 configs/targets/x86_64-bsd-user.mak | 1 +
 configs/targets/x86_64-linux-user.mak   | 1 +
 configs/targets/x86_64-softmmu.mak  | 1 +
 configs/targets/xtensa-linux-user.mak   | 1 +
 configs/targets/xtensa-softmmu.mak  | 1 +
 configs/targets/xtensaeb-linux-user.mak | 1 +
 configs/targets/xtensaeb-softmmu.mak| 1 +
 67 files changed, 72 insertions(+)

diff --git a/configs/targets/aarch64-bsd-user.mak 
b/configs/targets/aarch64-bsd-user.mak
index 8aaa5d8c80..f99c73377a 100644
--- a/configs/targets/aarch64-bsd-user.mak
+++ b/configs/targets/aarch64-bsd-user.mak
@@ -1,3 +1,4 @@
 TARGET_ARCH=aarch64
 TARGET_BASE_ARCH=arm
 TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml 
gdb-xml/aarch64-pauth.xml
+TARGET_LONG_BITS=64
diff --git a/configs/targets/aarch64-linux-user.mak 
b/configs/targets/aarch64-linux-user.mak
index 4c6570f56a..b779ac3b4a 100644
--- a/configs/targets/aarch64-linux-user.mak
+++ b/configs/targets/aarch64-linux-user.mak
@@ -6,3 +6,4 @@ CONFIG_

Re: [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities

2025-02-07 Thread Fabiano Rosas

Prasad Pandit  writes:

> From: Prasad Pandit 
>
> Migration capabilities are set in multiple '.start_hook'
> functions for various tests. Instead, consolidate setting
> capabilities in 'set_migration_capabilities()' function
> which is called from various 'test_*_common()' functions.
> While simplifying the capabilities setting, it helps
> to declutter the test sources.
>
> Suggested-by: Fabiano Rosas 
> Signed-off-by: Prasad Pandit 
> ---
>  tests/qtest/migration/compression-tests.c |  7 +--
>  tests/qtest/migration/cpr-tests.c |  4 +-
>  tests/qtest/migration/file-tests.c| 44 +---
>  tests/qtest/migration/framework.c | 63 ---
>  tests/qtest/migration/framework.h |  8 ++-
>  tests/qtest/migration/postcopy-tests.c| 10 ++--
>  tests/qtest/migration/precopy-tests.c | 19 +++

Isn't there a 16 channel multifd setup in this file? I don't see it in
this patch.

>  tests/qtest/migration/tls-tests.c | 11 ++--
>  8 files changed, 79 insertions(+), 87 deletions(-)
>
> diff --git a/tests/qtest/migration/compression-tests.c 
> b/tests/qtest/migration/compression-tests.c
> index 3252ba2f73..13a2b2d74f 100644
> --- a/tests/qtest/migration/compression-tests.c
> +++ b/tests/qtest/migration/compression-tests.c
> @@ -43,7 +43,7 @@ static void test_multifd_tcp_zstd(void)
>  static void test_multifd_postcopy_tcp_zstd(void)
>  {
>  MigrateCommon args = {
> -.postcopy_ram = true,
> +.caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
>  .listen_uri = "defer",
>  .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
>  };
> @@ -114,10 +114,6 @@ migrate_hook_start_xbzrle(QTestState *from,
>QTestState *to)
>  {
>  migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432);
> -
> -migrate_set_capability(from, "xbzrle", true);
> -migrate_set_capability(to, "xbzrle", true);
> -
>  return NULL;
>  }
>  
> @@ -129,6 +125,7 @@ static void test_precopy_unix_xbzrle(void)
>  .listen_uri = uri,
>  .start_hook = migrate_hook_start_xbzrle,
>  .iterations = 2,
> +.caps[MIGRATION_CAPABILITY_XBZRLE] = true,
>  /*
>   * XBZRLE needs pages to be modified when doing the 2nd+ round
>   * iteration to have real data pushed to the stream.
> diff --git a/tests/qtest/migration/cpr-tests.c 
> b/tests/qtest/migration/cpr-tests.c
> index 44ce89aa5b..818fa95133 100644
> --- a/tests/qtest/migration/cpr-tests.c
> +++ b/tests/qtest/migration/cpr-tests.c
> @@ -24,9 +24,6 @@ static void *migrate_hook_start_mode_reboot(QTestState 
> *from, QTestState *to)
>  migrate_set_parameter_str(from, "mode", "cpr-reboot");
>  migrate_set_parameter_str(to, "mode", "cpr-reboot");
>  
> -migrate_set_capability(from, "x-ignore-shared", true);
> -migrate_set_capability(to, "x-ignore-shared", true);
> -
>  return NULL;
>  }
>  
> @@ -39,6 +36,7 @@ static void test_mode_reboot(void)
>  .connect_uri = uri,
>  .listen_uri = "defer",
>  .start_hook = migrate_hook_start_mode_reboot,
> +.caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true,
>  };
>  
>  test_file_common(&args, true);
> diff --git a/tests/qtest/migration/file-tests.c 
> b/tests/qtest/migration/file-tests.c
> index 84225c8c33..bc551949f9 100644
> --- a/tests/qtest/migration/file-tests.c
> +++ b/tests/qtest/migration/file-tests.c
> @@ -107,15 +107,6 @@ static void test_precopy_file_offset_bad(void)
>  test_file_common(&args, false);
>  }
>  
> -static void *migrate_hook_start_mapped_ram(QTestState *from,
> -   QTestState *to)
> -{
> -migrate_set_capability(from, "mapped-ram", true);
> -migrate_set_capability(to, "mapped-ram", true);
> -
> -return NULL;
> -}
> -
>  static void test_precopy_file_mapped_ram_live(void)
>  {
>  g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
> @@ -123,7 +114,7 @@ static void test_precopy_file_mapped_ram_live(void)
>  MigrateCommon args = {
>  .connect_uri = uri,
>  .listen_uri = "defer",
> -.start_hook = migrate_hook_start_mapped_ram,
> +.caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
>  };
>  
>  test_file_common(&args, false);
> @@ -136,26 +127,12 @@ static void test_precopy_file_mapped_ram(void)
>  MigrateCommon args = {
>  .connect_uri = uri,
>  .listen_uri = "defer",
> -.start_hook = migrate_hook_start_mapped_ram,
> +.caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
>  };
>  
>  test_file_common(&args, true);
>  }
>  
> -static void *migrate_hook_start_multifd_mapped_ram(QTestState *from,
> -   QTestState *to)
> -{
> -migrate_hook_start_mapped_ram(from, to);
> -
> -migrate_set_parameter_int(from, "multifd-channels", 4);
> -migrate_set_parameter_int(to, "multifd-channels", 4);
> -
> -migrate

[PULL 1/6] hw/char: Add emulation of Diva GSP PCI management boards

2025-02-07 Thread deller

From: Helge Deller 

The Diva GSP ("Guardian Service Processor") PCI boards are Remote
Management cards for PA-RISC machines.  They come with built-in 16550A
UARTs for serial consoles and modem functionalities, as well as a
mailbox-like memory area for hardware auto-reboot functionality.

Latest generation HP PA-RISC server machines use those Diva cards
for console output.

Signed-off-by: Helge Deller 
---
 MAINTAINERS |   1 +
 hw/char/Kconfig |   3 +
 hw/char/diva-gsp.c  | 297 
 hw/char/meson.build |   1 +
 4 files changed, 302 insertions(+)
 create mode 100644 hw/char/diva-gsp.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bf737eb6db..e09a8d2791 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1193,6 +1193,7 @@ M: Richard Henderson 
 M: Helge Deller 
 S: Maintained
 F: configs/devices/hppa-softmmu/default.mak
+F: hw/char/diva-gsp.c
 F: hw/display/artist.c
 F: hw/hppa/
 F: hw/input/lasips2.c
diff --git a/hw/char/Kconfig b/hw/char/Kconfig
index 1dc20ee4c2..3f702565e6 100644
--- a/hw/char/Kconfig
+++ b/hw/char/Kconfig
@@ -66,6 +66,9 @@ config RENESAS_SCI
 config AVR_USART
 bool
 
+config DIVA_GSP
+bool
+
 config MCHP_PFSOC_MMUART
 bool
 select SERIAL
diff --git a/hw/char/diva-gsp.c b/hw/char/diva-gsp.c
new file mode 100644
index 00..ecec1f7bb1
--- /dev/null
+++ b/hw/char/diva-gsp.c
@@ -0,0 +1,297 @@
+/*
+ * HP Diva GSP controller
+ *
+ * The Diva PCI boards are Remote Management cards for PA-RISC machines.
+ * They come with built-in 16550A multi UARTs for serial consoles
+ * and a mailbox-like memory area for hardware auto-reboot functionality.
+ * GSP stands for "Guardian Service Processor". Later products were marketed
+ * "Management Processor" (MP).
+ *
+ * Diva cards are multifunctional cards. The first part, the aux port,
+ * is on physical machines not useable but we still try to mimic it here.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2025 Helge Deller 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "hw/char/serial.h"
+#include "hw/irq.h"
+#include "hw/pci/pci_device.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "migration/vmstate.h"
+
+#define PCI_DEVICE_ID_HP_DIVA   0x1048
+/* various DIVA GSP cards: */
+#define PCI_DEVICE_ID_HP_DIVA_TOSCA10x1049
+#define PCI_DEVICE_ID_HP_DIVA_TOSCA20x104A
+#define PCI_DEVICE_ID_HP_DIVA_MAESTRO   0x104B
+#define PCI_DEVICE_ID_HP_REO_IOC0x10f1
+#define PCI_DEVICE_ID_HP_DIVA_HALFDOME  0x1223
+#define PCI_DEVICE_ID_HP_DIVA_KEYSTONE  0x1226
+#define PCI_DEVICE_ID_HP_DIVA_POWERBAR  0x1227
+#define PCI_DEVICE_ID_HP_DIVA_EVEREST   0x1282
+#define PCI_DEVICE_ID_HP_DIVA_AUX   0x1290
+#define PCI_DEVICE_ID_HP_DIVA_RMP3  0x1301
+#define PCI_DEVICE_ID_HP_DIVA_HURRICANE 0x132a
+
+
+#define PCI_SERIAL_MAX_PORTS 4
+
+typedef struct PCIDivaSerialState {
+PCIDevicedev;
+MemoryRegion membar;/* for serial ports */
+MemoryRegion mailboxbar;/* for hardware mailbox */
+uint32_t subvendor;
+uint32_t ports;
+char *name[PCI_SERIAL_MAX_PORTS];
+SerialState  state[PCI_SERIAL_MAX_PORTS];
+uint32_t level[PCI_SERIAL_MAX_PORTS];
+qemu_irq *irqs;
+uint8_t  prog_if;
+bool disable;
+} PCIDivaSerialState;
+
+static void diva_pci_exit(PCIDevice *dev)
+{
+PCIDivaSerialState *pci = DO_UPCAST(PCIDivaSerialState, dev, dev);
+SerialState *s;
+int i;
+
+for (i = 0; i < pci->ports; i++) {
+s = pci->state + i;
+qdev_unrealize(DEVICE(s));
+memory_region_del_subregion(&pci->membar, &s->io);
+g_free(pci->name[i]);
+}
+qemu_free_irqs(pci->irqs, pci->ports);
+}
+
+static void multi_serial_irq_mux(void *opaque, int n, int level)
+{
+PCIDivaSerialState *pci = opaque;
+int i, pending = 0;
+
+pci->level[n] = level;
+for (i = 0; i < pci->ports; i++) {
+if (pci->level[i]) {
+pending = 1;
+}
+}
+pci_set_irq(&pci->dev, pending);
+}
+
+struct diva_info {
+unsigned int nports:4; /* number of serial ports */
+unsigned int omask:12; /* offset mask: BIT(1) -> offset 8 */
+};
+
+static struct diva_info diva_get_diva_info(PCIDeviceClass *pc)
+{
+switch (pc->subsystem_id) {
+case PCI_DEVICE_ID_HP_DIVA_POWERBAR:
+case PCI_DEVICE_ID_HP_DIVA_HURRICANE:
+return (struct diva_info) { .nports = 1,
+.omask = BIT(0) };
+case PCI_DEVICE_ID_HP_DIVA_TOSCA2:
+return (struct diva_info) { .nports = 2,
+.omask = BIT(0) | BIT(1) };
+case PCI_DEVICE_ID_HP_DIVA_TOSCA1:
+case PCI_DEVICE_ID_HP_DIVA_HALFDOME:
+case PCI_DEVICE_ID_HP_DIVA_KEYSTONE:
+return (struct diva_info) { .nports = 3,
+.omask = BIT(0) | BIT(1) | BIT(2) };
+case PCI_DEVICE_ID_HP_DIVA_EVEREST: /* e.g. in rp3410 */
+return (struct diva_info) {

[PULL 2/6] hw/hppa: Wire up Diva GSP card

2025-02-07 Thread deller

From: Helge Deller 

Until now we used a standard serial-pci device to emulate a HP serial
console.  This worked nicely with 32-bit Linux and 32-bit HP-UX, but
64-bit HP-UX crashes with it and expects either a Diva GSP card, or a real
64-bit capable PCI graphic card (which we don't have yet).
In order to continue with 64-bit HP-UX, switch over to the recently
added Diva GSP card emulation.

Signed-off-by: Helge Deller 
---
 hw/hppa/Kconfig   |  1 +
 hw/hppa/machine.c | 31 +++
 2 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/hw/hppa/Kconfig b/hw/hppa/Kconfig
index 9312c4294a..cab21045de 100644
--- a/hw/hppa/Kconfig
+++ b/hw/hppa/Kconfig
@@ -11,6 +11,7 @@ config HPPA_B160L
 select LASI
 select SERIAL_MM
 select SERIAL_PCI
+select DIVA_GSP
 select ISA_BUS
 select I8259
 select IDE_CMD646
diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index b6135d9526..9c98b4c229 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -383,26 +383,17 @@ static void machine_HP_common_init_tail(MachineState 
*machine, PCIBus *pci_bus,
 
 pci_init_nic_devices(pci_bus, mc->default_nic);
 
-/* BMC board: HP Powerbar SP2 Diva (with console only) */
-pci_dev = pci_new(-1, "pci-serial");
-if (!lasi_dev) {
-/* bind default keyboard/serial to Diva card */
-qdev_prop_set_chr(DEVICE(pci_dev), "chardev", serial_hd(0));
-}
-qdev_prop_set_uint8(DEVICE(pci_dev), "prog_if", 0);
-pci_realize_and_unref(pci_dev, pci_bus, &error_fatal);
-pci_config_set_vendor_id(pci_dev->config, PCI_VENDOR_ID_HP);
-pci_config_set_device_id(pci_dev->config, 0x1048);
-pci_set_word(&pci_dev->config[PCI_SUBSYSTEM_VENDOR_ID], PCI_VENDOR_ID_HP);
-pci_set_word(&pci_dev->config[PCI_SUBSYSTEM_ID], 0x1227); /* Powerbar */
-
-/* create a second serial PCI card when running Astro */
-if (serial_hd(1) && !lasi_dev) {
-pci_dev = pci_new(-1, "pci-serial-4x");
-qdev_prop_set_chr(DEVICE(pci_dev), "chardev1", serial_hd(1));
-qdev_prop_set_chr(DEVICE(pci_dev), "chardev2", serial_hd(2));
-qdev_prop_set_chr(DEVICE(pci_dev), "chardev3", serial_hd(3));
-qdev_prop_set_chr(DEVICE(pci_dev), "chardev4", serial_hd(4));
+/* BMC board: HP Diva GSP */
+dev = qdev_new("diva-gsp");
+if (!object_property_get_bool(OBJECT(dev), "disable", NULL)) {
+pci_dev = pci_new_multifunction(PCI_DEVFN(2, 0), "diva-gsp");
+if (!lasi_dev) {
+/* bind default keyboard/serial to Diva card */
+qdev_prop_set_chr(DEVICE(pci_dev), "chardev1", serial_hd(0));
+qdev_prop_set_chr(DEVICE(pci_dev), "chardev2", serial_hd(1));
+qdev_prop_set_chr(DEVICE(pci_dev), "chardev3", serial_hd(2));
+qdev_prop_set_chr(DEVICE(pci_dev), "chardev4", serial_hd(3));
+}
 pci_realize_and_unref(pci_dev, pci_bus, &error_fatal);
 }
 
-- 
2.47.0

Re: [PATCH 1/2] i386/xen: Move KVM_XEN_HVM_CONFIG ioctl to kvm_xen_init_vcpu()

2025-02-07 Thread David Woodhouse

On 7 February 2025 15:37:40 GMT, Sean Christopherson  wrote:
>On Fri, Feb 07, 2025, David Woodhouse wrote:
>> From: David Woodhouse 
>> 
>> At the time kvm_xen_init() is called, hyperv_enabled() doesn't yet work, so
>> the correct MSR index to use for the hypercall page isn't known.
>> 
>> Rather than setting it to the default and then shifting it later for the
>> Hyper-V case with a confusing second call to kvm_init_xen(), just do it
>> once in kvm_xen_init_vcpu().
>
>Is it possible the funky double-init is deliberate, to ensure that Xen is
>configured in KVM during VM setup?  I looked through KVM and didn't see any
>obvious dependencies, but that doesn't mean a whole lot.

I am fairly sure there are no such dependencies. It was just this way because 
shifting the MSR to accommodate Hyper-V (and making kvm_xen_init() idempotent 
in order to do so) was an afterthought. In retrospect, I should have done it 
this way from the start. It's cleaner. And you don't require as much caffeine 
to understand it :)

[PULL 6/6] target/hppa: Update SeaBIOS-hppa

2025-02-07 Thread deller

From: Helge Deller 

Update to lastest SeaBIOS-hppa which sets up the
LMMIO range for the internal artist graphic card.

Signed-off-by: Helge Deller 
---
 roms/seabios-hppa | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/roms/seabios-hppa b/roms/seabios-hppa
index 1c516b4813..3391c58096 16
--- a/roms/seabios-hppa
+++ b/roms/seabios-hppa
@@ -1 +1 @@
-Subproject commit 1c516b481339f511d83a4afba9a48d1ac904e93e
+Subproject commit 3391c580960febcb9fa8f686f9666adaa462c349
-- 
2.47.0

[PULL 4/6] hw/hppa: Avoid creation of artist if disabled on command line

2025-02-07 Thread deller

From: Helge Deller 

Do not create the artist graphic card if the user disabled it
with "-global artist.disable=true" on the command line.

Signed-off-by: Helge Deller 
---
 hw/hppa/machine.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index 9c98b4c229..c5f247633e 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -366,12 +366,15 @@ static void machine_HP_common_init_tail(MachineState 
*machine, PCIBus *pci_bus,
 
 /* Graphics setup. */
 if (machine->enable_graphics && vga_interface_type != VGA_NONE) {
-vga_interface_created = true;
 dev = qdev_new("artist");
 s = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(s, &error_fatal);
-sysbus_mmio_map(s, 0, translate(NULL, LASI_GFX_HPA));
-sysbus_mmio_map(s, 1, translate(NULL, ARTIST_FB_ADDR));
+bool disabled = object_property_get_bool(OBJECT(dev), "disable", NULL);
+if (!disabled) {
+sysbus_realize_and_unref(s, &error_fatal);
+vga_interface_created = true;
+sysbus_mmio_map(s, 0, translate(NULL, LASI_GFX_HPA));
+sysbus_mmio_map(s, 1, translate(NULL, ARTIST_FB_ADDR));
+}
 }
 
 /* Network setup. */
-- 
2.47.0

[PULL 5/6] hw/pci-host/astro: Add LMMIO range support

2025-02-07 Thread deller

From: Helge Deller 

Each Astro on 64-bit machines supports up to four LMMIO regions.
Those regions are used by graphic cards and other PCI devices which
need to map huge memory areas. The LMMIO regions are configured and
set up by SeaBIOS-hppa and then used as-is by the operating systems
(Linux, HP-UX).

With this addition it's now possible to add other PCI graphic
cards on the command line, e.g. with "-device ati-vga".

Signed-off-by: Helge Deller 
---
 hw/pci-host/astro.c | 52 +
 include/hw/pci-host/astro.h |  6 ++---
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/hw/pci-host/astro.c b/hw/pci-host/astro.c
index 62e9c8acbf..039cc3ad01 100644
--- a/hw/pci-host/astro.c
+++ b/hw/pci-host/astro.c
@@ -521,6 +521,53 @@ static ElroyState *elroy_init(int num)
  * Astro Runway chip.
  */
 
+static void adjust_LMMIO_DIRECT_mapping(AstroState *s, unsigned int reg_index)
+{
+MemoryRegion *lmmio_alias;
+unsigned int lmmio_index, map_route;
+hwaddr map_addr;
+uint32_t map_size;
+struct ElroyState *elroy;
+
+/* pointer to LMMIO_DIRECT entry */
+lmmio_index = reg_index / 3;
+lmmio_alias = &s->lmmio_direct[lmmio_index];
+
+map_addr  = s->ioc_ranges[3 * lmmio_index + 0];
+map_size  = s->ioc_ranges[3 * lmmio_index + 1];
+map_route = s->ioc_ranges[3 * lmmio_index + 2];
+
+/* find elroy to which this address is routed */
+map_route &= (ELROY_NUM - 1);
+elroy = s->elroy[map_route];
+
+if (lmmio_alias->enabled) {
+memory_region_set_enabled(lmmio_alias, false);
+}
+
+map_addr = F_EXTEND(map_addr);
+map_addr &= TARGET_PAGE_MASK;
+map_size = (~map_size) + 1;
+map_size &= TARGET_PAGE_MASK;
+
+/* exit if disabled or zero map size */
+if (!(map_addr & 1) || !map_size) {
+return;
+}
+
+if (!memory_region_size(lmmio_alias)) {
+memory_region_init_alias(lmmio_alias, OBJECT(elroy),
+"pci-lmmmio-alias", &elroy->pci_mmio,
+(uint32_t) map_addr, map_size);
+memory_region_add_subregion(get_system_memory(), map_addr,
+ lmmio_alias);
+} else {
+memory_region_set_alias_offset(lmmio_alias, map_addr);
+memory_region_set_size(lmmio_alias, map_size);
+memory_region_set_enabled(lmmio_alias, true);
+}
+}
+
 static MemTxResult astro_chip_read_with_attrs(void *opaque, hwaddr addr,
  uint64_t *data, unsigned size,
  MemTxAttrs attrs)
@@ -628,6 +675,11 @@ static MemTxResult astro_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 break;
 case 0x0300 ... 0x03d8 - 1: /* LMMIO_DIRECT0_BASE... */
 put_val_in_arrary(s->ioc_ranges, 0x300, addr, size, val);
+unsigned int index = (addr - 0x300) / 8;
+/* check if one of the 4 LMMIO_DIRECT regs, each using 3 entries. */
+if (index < LMMIO_DIRECT_RANGES * 3) {
+adjust_LMMIO_DIRECT_mapping(s, index);
+}
 break;
 case 0x10200:
 case 0x10220:
diff --git a/include/hw/pci-host/astro.h b/include/hw/pci-host/astro.h
index e2966917cd..832125a05a 100644
--- a/include/hw/pci-host/astro.h
+++ b/include/hw/pci-host/astro.h
@@ -24,6 +24,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(ElroyState, ELROY_PCI_HOST_BRIDGE)
 #define LMMIO_DIST_BASE_ADDR  0xf400ULL
 #define LMMIO_DIST_BASE_SIZE   0x400ULL
 
+#define LMMIO_DIRECT_RANGES 4
+
 #define IOS_DIST_BASE_ADDR  0xfffee0ULL
 #define IOS_DIST_BASE_SIZE   0x1ULL
 
@@ -83,9 +85,7 @@ struct AstroState {
 struct ElroyState *elroy[ELROY_NUM];
 
 MemoryRegion this_mem;
-
-MemoryRegion pci_mmio;
-MemoryRegion pci_io;
+MemoryRegion lmmio_direct[LMMIO_DIRECT_RANGES];
 
 IOMMUMemoryRegion iommu;
 AddressSpace iommu_as;
-- 
2.47.0

[PATCH v4 3/9] meson: Disallow 64-bit on 32-bit Xen emulation

2025-02-07 Thread Richard Henderson

Require a 64-bit host binary to spawn a 64-bit guest.

Reviewed-by: Thomas Huth 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 1af8aeb194..911955cfa8 100644
--- a/meson.build
+++ b/meson.build
@@ -304,9 +304,14 @@ else
 endif
 accelerator_targets = { 'CONFIG_KVM': kvm_targets }
 
-if cpu in ['x86', 'x86_64']
+if cpu == 'x86'
+  xen_targets = ['i386-softmmu']
+elif cpu == 'x86_64'
   xen_targets = ['i386-softmmu', 'x86_64-softmmu']
-elif cpu in ['arm', 'aarch64']
+elif cpu == 'arm'
+  # i386 emulator provides xenpv machine type for multiple architectures
+  xen_targets = ['i386-softmmu']
+elif cpu == 'aarch64'
   # i386 emulator provides xenpv machine type for multiple architectures
   xen_targets = ['i386-softmmu', 'x86_64-softmmu', 'aarch64-softmmu']
 else
-- 
2.43.0

[PATCH v4 4/9] meson: Disallow 64-bit on 32-bit HVF/NVMM/WHPX emulation

2025-02-07 Thread Richard Henderson

Require a 64-bit host binary to spawn a 64-bit guest.

For HVF this is trivially true because macOS 11 dropped
support for 32-bit applications entirely.

For NVMM, NetBSD only enables nvmm on x86_64:
  
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/nvmm/Makefile?rev=1.1.6.2;content-type=text%2Fplain

For WHPX, we have already dropped support for 32-bit Windows.

Signed-off-by: Richard Henderson 
---
 meson.build | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/meson.build b/meson.build
index 911955cfa8..85317cd63f 100644
--- a/meson.build
+++ b/meson.build
@@ -319,13 +319,11 @@ else
 endif
 accelerator_targets += { 'CONFIG_XEN': xen_targets }
 
-if cpu in ['aarch64']
+if cpu == 'aarch64'
   accelerator_targets += {
 'CONFIG_HVF': ['aarch64-softmmu']
   }
-endif
-
-if cpu in ['x86', 'x86_64']
+elif cpu == 'x86_64'
   accelerator_targets += {
 'CONFIG_HVF': ['x86_64-softmmu'],
 'CONFIG_NVMM': ['i386-softmmu', 'x86_64-softmmu'],
-- 
2.43.0

[PATCH v4 8/9] meson: Disallow 64-bit on 32-bit emulation

2025-02-07 Thread Richard Henderson

For system mode, we can rarely support the amount of RAM that
the guest requires. TCG emulation is restricted to round-robin
mode, which solves many of the atomicity issues, but not those
associated with virtio.  In any case, round-robin does nothing
to help the speed of emulation.

For user mode, most emulation does not succeed at all.  Most
of the time we cannot even load 64-bit non-PIE binaries due
to lack of a 64-bit address space.  Threads are run in
parallel, not round-robin, which means that atomicity
is not handled.

Reviewed-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---
 meson.build | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 85317cd63f..ec51827f40 100644
--- a/meson.build
+++ b/meson.build
@@ -3185,6 +3185,9 @@ if host_os == 'windows'
   endif
 endif
 
+# Detect host pointer size for the target configuration loop.
+host_long_bits = cc.sizeof('void *') * 8
+
 
 # Target configuration #
 
@@ -3277,8 +3280,14 @@ foreach target : target_dirs
 }
   endif
 
+  config_target += keyval.load('configs/targets' / target + '.mak')
+
   target_kconfig = []
   foreach sym: accelerators
+# Disallow 64-bit on 32-bit emulation and virtualization
+if host_long_bits < config_target['TARGET_LONG_BITS'].to_int()
+  continue
+endif
 if sym == 'CONFIG_TCG' or target in accelerator_targets.get(sym, [])
   config_target += { sym: 'y' }
   config_all_accel += { sym: 'y' }
@@ -3292,9 +3301,6 @@ foreach target : target_dirs
 error('No accelerator available for target @0@'.format(target))
   endif
 
-  config_target += keyval.load('configs/targets' / target + '.mak')
-  config_target += { 'TARGET_' + config_target['TARGET_ARCH'].to_upper(): 'y' }
-
   if 'TARGET_NEED_FDT' in config_target and not fdt.found()
 if default_targets
   warning('Disabling ' + target + ' due to missing libfdt')
@@ -3307,6 +3313,7 @@ foreach target : target_dirs
   actual_target_dirs += target
 
   # Add default keys
+  config_target += { 'TARGET_' + config_target['TARGET_ARCH'].to_upper(): 'y' }
   if 'TARGET_BASE_ARCH' not in config_target
 config_target += {'TARGET_BASE_ARCH': config_target['TARGET_ARCH']}
   endif
-- 
2.43.0

[PATCH v4 5/9] gitlab-ci: Replace aarch64 with arm in cross-i686-tci build

2025-02-07 Thread Richard Henderson

Configuration of 64-bit host on 32-bit guest will shortly
be denied.  Use a 32-bit guest instead.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 .gitlab-ci.d/crossbuilds.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 95dfc39224..7ae0f966f1 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -61,7 +61,7 @@ cross-i686-tci:
   variables:
 IMAGE: debian-i686-cross
 ACCEL: tcg-interpreter
-EXTRA_CONFIGURE_OPTS: 
--target-list=i386-softmmu,i386-linux-user,aarch64-softmmu,aarch64-linux-user,ppc-softmmu,ppc-linux-user
 --disable-plugins --disable-kvm
+EXTRA_CONFIGURE_OPTS: 
--target-list=i386-softmmu,i386-linux-user,arm-softmmu,arm-linux-user,ppc-softmmu,ppc-linux-user
 --disable-plugins --disable-kvm
 # Force tests to run with reduced parallelism, to see whether this
 # reduces the flakiness of this CI job. The CI
 # environment by default shows us 8 CPUs and so we
-- 
2.43.0

[PULL 0/6] Hppa system for v10 diva artist patches

2025-02-07 Thread deller

From: Helge Deller 

The following changes since commit 6fccaa2fba391815308a746d68f7fa197bc93586:

  Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2025-02-02 11:09:10 -0500)

are available in the Git repository at:

  https://github.com/hdeller/qemu-hppa.git 
tags/hppa-system-for-v10-diva-artist-pull-request

for you to fetch changes up to 3dc340f8a7cfbc30193ab269aad1a5a6026365a6:

  target/hppa: Update SeaBIOS-hppa (2025-02-04 23:07:05 +0100)


HPPA graphics and serial console enhancements

A small series of patches which enhance the graphics output on 64-bit hppa
machines. Allow disabling the artist graphic card and introduces drivers for
the Diva GSP (remote management) cards which are used in later 64-bit machines
and which we now use for serial console output.

The LMMIO regions of the Astro chip are now supported too, which is important
to support other graphic cards like an ATI PCI card with a 64-bit Linux kernel.



Helge Deller (6):
  hw/char: Add emulation of Diva GSP PCI management boards
  hw/hppa: Wire up Diva GSP card
  artist: Allow disabling artist on command line
  hw/hppa: Avoid creation of artist if disabled on command line
  hw/pci-host/astro: Add LMMIO range support
  target/hppa: Update SeaBIOS-hppa

 MAINTAINERS |   1 +
 hw/char/Kconfig |   3 +
 hw/char/diva-gsp.c  | 297 
 hw/char/meson.build |   1 +
 hw/display/artist.c |   9 +-
 hw/hppa/Kconfig |   1 +
 hw/hppa/machine.c   |  42 +++--
 hw/pci-host/astro.c |  52 +++
 include/hw/pci-host/astro.h |   6 +-
 roms/seabios-hppa   |   2 +-
 10 files changed, 383 insertions(+), 31 deletions(-)
 create mode 100644 hw/char/diva-gsp.c

-- 
2.47.0

[PATCH v4 0/9] meson: Deprecate 32-bit host support

2025-02-07 Thread Richard Henderson

v1: 20250128004254.33442-1-richard.hender...@linaro.org
v2: 20250203031821.741477-1-richard.hender...@linaro.org
v3: 20250204215359.1238808-1-richard.hender...@linaro.org

For v4, tidy NVMM/WHPX per Thomas' review.
Drop two more stubs patches which were intended to be dropped with v3.


r~


Richard Henderson (9):
  meson: Drop tcg as a module
  meson: Disallow 64-bit on 32-bit KVM emulation
  meson: Disallow 64-bit on 32-bit Xen emulation
  meson: Disallow 64-bit on 32-bit HVF/NVMM/WHPX emulation
  gitlab-ci: Replace aarch64 with arm in cross-i686-tci build
  configure: Define TARGET_LONG_BITS in configs/targets/*.mak
  target/*: Remove TARGET_LONG_BITS from cpu-param.h
  meson: Disallow 64-bit on 32-bit emulation
  meson: Deprecate 32-bit host support

 target/alpha/cpu-param.h|  2 -
 target/arm/cpu-param.h  |  2 -
 target/avr/cpu-param.h  |  1 -
 target/hexagon/cpu-param.h  |  1 -
 target/hppa/cpu-param.h |  2 -
 target/i386/cpu-param.h |  2 -
 target/loongarch/cpu-param.h|  1 -
 target/m68k/cpu-param.h |  1 -
 target/microblaze/cpu-param.h   |  2 -
 target/mips/cpu-param.h |  5 --
 target/openrisc/cpu-param.h |  1 -
 target/ppc/cpu-param.h  |  2 -
 target/riscv/cpu-param.h|  2 -
 target/rx/cpu-param.h   |  1 -
 target/s390x/cpu-param.h|  1 -
 target/sh4/cpu-param.h  |  1 -
 target/sparc/cpu-param.h|  2 -
 target/tricore/cpu-param.h  |  1 -
 target/xtensa/cpu-param.h   |  1 -
 .gitlab-ci.d/crossbuilds.yml|  2 +-
 accel/tcg/meson.build   | 11 ++--
 configs/targets/aarch64-bsd-user.mak|  1 +
 configs/targets/aarch64-linux-user.mak  |  1 +
 configs/targets/aarch64-softmmu.mak |  1 +
 configs/targets/aarch64_be-linux-user.mak   |  1 +
 configs/targets/alpha-linux-user.mak|  1 +
 configs/targets/alpha-softmmu.mak   |  1 +
 configs/targets/arm-bsd-user.mak|  1 +
 configs/targets/arm-linux-user.mak  |  1 +
 configs/targets/arm-softmmu.mak |  1 +
 configs/targets/armeb-linux-user.mak|  1 +
 configs/targets/avr-softmmu.mak |  1 +
 configs/targets/hexagon-linux-user.mak  |  1 +
 configs/targets/hppa-linux-user.mak |  2 +
 configs/targets/hppa-softmmu.mak|  1 +
 configs/targets/i386-bsd-user.mak   |  1 +
 configs/targets/i386-linux-user.mak |  1 +
 configs/targets/i386-softmmu.mak|  1 +
 configs/targets/loongarch64-linux-user.mak  |  1 +
 configs/targets/loongarch64-softmmu.mak |  1 +
 configs/targets/m68k-linux-user.mak |  1 +
 configs/targets/m68k-softmmu.mak|  1 +
 configs/targets/microblaze-linux-user.mak   |  1 +
 configs/targets/microblaze-softmmu.mak  |  3 +
 configs/targets/microblazeel-linux-user.mak |  1 +
 configs/targets/microblazeel-softmmu.mak|  3 +
 configs/targets/mips-linux-user.mak |  1 +
 configs/targets/mips-softmmu.mak|  1 +
 configs/targets/mips64-linux-user.mak   |  1 +
 configs/targets/mips64-softmmu.mak  |  1 +
 configs/targets/mips64el-linux-user.mak |  1 +
 configs/targets/mips64el-softmmu.mak|  1 +
 configs/targets/mipsel-linux-user.mak   |  1 +
 configs/targets/mipsel-softmmu.mak  |  1 +
 configs/targets/mipsn32-linux-user.mak  |  1 +
 configs/targets/mipsn32el-linux-user.mak|  1 +
 configs/targets/or1k-linux-user.mak |  1 +
 configs/targets/or1k-softmmu.mak|  1 +
 configs/targets/ppc-linux-user.mak  |  1 +
 configs/targets/ppc-softmmu.mak |  1 +
 configs/targets/ppc64-linux-user.mak|  1 +
 configs/targets/ppc64-softmmu.mak   |  1 +
 configs/targets/ppc64le-linux-user.mak  |  1 +
 configs/targets/riscv32-linux-user.mak  |  1 +
 configs/targets/riscv32-softmmu.mak |  1 +
 configs/targets/riscv64-bsd-user.mak|  1 +
 configs/targets/riscv64-linux-user.mak  |  1 +
 configs/targets/riscv64-softmmu.mak |  1 +
 configs/targets/rx-softmmu.mak  |  1 +
 configs/targets/s390x-linux-user.mak|  1 +
 configs/targets/s390x-softmmu.mak   |  1 +
 configs/targets/sh4-linux-user.mak  |  1 +
 configs/targets/sh4-softmmu.mak |  1 +
 configs/targets/sh4eb-linux-user.mak|  1 +
 configs/targets/sh4eb-softmmu.mak   |  1 +
 configs/targets/sparc-linux-user.mak|  1 +
 configs/targets/sparc-softmmu.mak   |  1 +
 configs/targets/sparc32plus-linux-user.mak  |  1 +
 configs/targets/sparc64-linux-user.mak  |  1 +
 configs/targets/sparc64-softmmu.mak |  1 +
 configs/targets/tricore-softmmu.mak |  1 +
 configs/targets/

[PATCH v4 9/9] meson: Deprecate 32-bit host support

2025-02-07 Thread Richard Henderson

We deprecated i686 system mode support for qemu 8.0.  However, to
make real cleanups to TCG we need to deprecate all 32-bit hosts.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 docs/about/deprecated.rst | 7 +++
 meson.build   | 8 +++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 4a3c302962..7c61d0ba16 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -204,6 +204,13 @@ is going to be so much slower it wouldn't make sense for 
any serious
 instrumentation. Due to implementation differences there will also be
 anomalies in things like memory instrumentation.
 
+32-bit host operating systems (since 10.0)
+''
+
+Keeping 32-bit host support alive is a substantial burden for the
+QEMU project.  Thus QEMU will in future drop the support for all
+32-bit host systems.
+
 System emulator CPUs
 
 
diff --git a/meson.build b/meson.build
index ec51827f40..387490d922 100644
--- a/meson.build
+++ b/meson.build
@@ -4841,14 +4841,12 @@ if host_arch == 'unknown'
 message('configure has succeeded and you can continue to build, but')
 message('QEMU will use a slow interpreter to emulate the target CPU.')
   endif
-elif host_arch == 'mips'
+elif host_long_bits < 64
   message()
   warning('DEPRECATED HOST CPU')
   message()
-  message('Support for CPU host architecture ' + cpu + ' is going to be')
-  message('dropped as soon as the QEMU project stops supporting Debian 12')
-  message('("Bookworm"). Going forward, the QEMU project will not guarantee')
-  message('that QEMU will compile or work on this host CPU.')
+  message('Support for 32-bit CPU host architecture ' + cpu + ' is going')
+  message('to be dropped in a future QEMU release.')
 endif
 
 if not supported_oses.contains(host_os)
-- 
2.43.0

[PULL 3/6] artist: Allow disabling artist on command line

2025-02-07 Thread deller

From: Helge Deller 

Allow users to disable the artist graphic card on the command line
with the option "-global artist.disable=true".
This change allows to use other graphic cards when using Linux, e.g.
by adding "-device ati-vga".

Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 8b719b11ed..f24c1d83dd 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -48,6 +48,7 @@ struct ARTISTState {
 
 struct vram_buffer vram_buffer[16];
 
+bool disable;
 uint16_t width;
 uint16_t height;
 uint16_t depth;
@@ -1211,8 +1212,8 @@ static uint64_t artist_reg_read(void *opaque, hwaddr 
addr, unsigned size)
 break;
 
 case 0x380004:
-/* 0x0200 Buserror */
-val = 0x6dc20006;
+/* magic number detected by SeaBIOS-hppa */
+val = s->disable ? 0 : 0x6dc20006;
 break;
 
 default:
@@ -1432,7 +1433,7 @@ static int vmstate_artist_post_load(void *opaque, int 
version_id)
 
 static const VMStateDescription vmstate_artist = {
 .name = "artist",
-.version_id = 2,
+.version_id = 3,
 .minimum_version_id = 2,
 .post_load = vmstate_artist_post_load,
 .fields = (const VMStateField[]) {
@@ -1470,6 +1471,7 @@ static const VMStateDescription vmstate_artist = {
 VMSTATE_UINT32(font_write1, ARTISTState),
 VMSTATE_UINT32(font_write2, ARTISTState),
 VMSTATE_UINT32(font_write_pos_y, ARTISTState),
+VMSTATE_BOOL(disable, ARTISTState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -1478,6 +1480,7 @@ static const Property artist_properties[] = {
 DEFINE_PROP_UINT16("width",ARTISTState, width, 1280),
 DEFINE_PROP_UINT16("height",   ARTISTState, height, 1024),
 DEFINE_PROP_UINT16("depth",ARTISTState, depth, 8),
+DEFINE_PROP_BOOL("disable",ARTISTState, disable, false),
 };
 
 static void artist_reset(DeviceState *qdev)
-- 
2.47.0

Re: [PATCH v4 4/4] qapi: expose all schema features to code

2025-02-07 Thread John Snow

On Fri, Feb 7, 2025, 6:57 AM Markus Armbruster  wrote:

> Daniel P. Berrangé  writes:
>
> > This replaces use of the constants from the QapiSpecialFeatures
> > enum, with constants from the auto-generate QapiFeatures enum
> > in qapi-features.h
> >
> > The 'deprecated' and 'unstable' features still have a little bit of
> > special handling, being force defined to be the 1st + 2nd features
> > in the enum, regardless of whether they're used in the schema. This
> > retains compatibility with common code that references the features
> > via the QapiSpecialFeatures constants.
> >
> > Signed-off-by: Daniel P. Berrangé 
>
> Daniel, feel free to ignore this at least for now.  I'm trying to learn
> some typing lore from John.
>
> v3 made mypy unhappy.  I asked John for advice, and also posted a
> solution involving ValuesView I hacked up myself.  Daniel took it for
> v4.
>
> John suggested to use List.
>
> I now wonder whether could use Iterable.
>
> I'll show the three solutions inline.
>
> John, thoughts?
>

ValuesView works just fine. It accurately describes what that function
returns. I only avoided it in my fixup because it's a more obscure type and
generally list is easier to work with as a first-class built in primitive
type to the language.

(read as: I didn't have to consult any docs to fix it up using List and I'm
lazy.)

Your solution describes precisely the type being returned (always good) and
avoids any re-copying of data.

Do be aware by caching the values view object in another object that you
are keeping a "live reference" to the list of dict values that I think can
change if the source dict changes. I doubt it matters, but you should know
about that.

The only design consideration you have now is what type you actually want
to return and why. I think it barely matters, and I'm always going to opt
for whatever is the least annoying for the patch author so I don't have to
bore/torture them with python minutiae.

As long as the tests pass (my first three patches in the dan-fixup branch I
posted based on v3) I'm more than content.


> [...]
>
> > diff --git a/scripts/qapi/features.py b/scripts/qapi/features.py
> > new file mode 100644
> > index 00..be3e5d03ff
> > --- /dev/null
> > +++ b/scripts/qapi/features.py
> > @@ -0,0 +1,51 @@
> > +"""
> > +QAPI features generator
> > +
> > +Copyright 2024 Red Hat
> > +
> > +This work is licensed under the terms of the GNU GPL, version 2.
> > +# See the COPYING file in the top-level directory.
> > +"""
> > +
> > +from typing import Dict, ValuesView
> > +
> > +from .common import c_enum_const, c_name
> > +from .gen import QAPISchemaMonolithicCVisitor
> > +from .schema import (
> > +QAPISchema,
> > +QAPISchemaFeature,
> > +)
> > +
> > +
> > +class QAPISchemaGenFeatureVisitor(QAPISchemaMonolithicCVisitor):
> > +
> > +def __init__(self, prefix: str):
> > +super().__init__(
> > +prefix, 'qapi-features',
> > +' * Schema-defined QAPI features',
> > +__doc__)
> > +
> > +self.features: ValuesView[QAPISchemaFeature]
>
> This is the ValuesView solution.
>
> The List solution:
>
>self.features: List[QAPISchemaFeature] = []
>
> The Iterable solution:
>
>self.features: Iterable[QAPISchemaFeature]
>
> [...]
>
>
> > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> > index e97c978d38..7f70969c09 100644
> > --- a/scripts/qapi/schema.py
> > +++ b/scripts/qapi/schema.py
>
> [...]
>
> > @@ -1147,6 +1161,9 @@ def __init__(self, fname: str):
> >  self._def_exprs(exprs)
> >  self.check()
> >
> > +def features(self) -> ValuesView[QAPISchemaFeature]:
> > +return self._feature_dict.values()
>
> This is the ValuesView solution.
>
> The List solution:
>
>def features(self) -> List[QAPISchemaFeature]:
>return list(self._feature_dict.values())
>
> The Iterable solution:
>
>def features(self) -> Iterable[QAPISchemaFeature]:
>return self._feature_dict.values()
>
>
> > +
> >  def _def_entity(self, ent: QAPISchemaEntity) -> None:
> >  self._entity_list.append(ent)
> >
>
> [...]
>
>

Re: [PATCH] target/arm/helper: Fix timer interrupt masking when HCR_EL2.E2H == 0

2025-02-07 Thread Alex Bennée

Richard Henderson  writes:

> On 2/7/25 07:45, Peter Maydell wrote:
>> This is where things go wrong -- icount_start_warp_timer()
>> notices that all CPU threads are currently idle, and
>> decides it needs to warp the timer forwards to the
>> next deadline, which is at the end of time -- INT64_MAX.
>> But once timer_mod_ns() returns, the generic timer code
>> is going to raise an interrupt (this goes through the GIC
>> code and comes back into the CPU which calls cpu_interrupt()),
>> so we don't want to warp the timer at all. The clock should
>> stay exactly at the value it has and the CPU is going to
>> have more work to do.
>> How is this supposed to work? Shouldn't we only be doing
>> the "start moving the icount forward to the next deadline"
>> once we've completed all the "run timers and AIO stuff" that
>> icount_handle_deadline() triggers, not randomly in the middle
>> of that when this timer callback or some other one might do
>> something to trigger an interrupt?
>
> I don't understand timer warping at all.  And you're right, it doesn't
> seem like this should happen outside of a specific point in the main
> loop.

This has come up before - and the conclusion was we don't know what
sleep=on/off is meant to mean. If the processor is asleep and there are
no timers to fire then nothing will happen.

It was off-list though:

  Subject: Re: qemu-system-aarch64 & icount behavior
  Date: Wed, 22 Jul 2020 at 11:21
  From: Kumar Gala 
  Subject: Fwd: qemu-system-aarch64 & icount behavior
  Message-ID: 

  Date: Fri, 24 Jul 2020 17:25:51 +0100
  From: Peter Maydell 

>> ... But I don't think there's any reason why
>> timer callbacks should be obliged to reprogram their timers
>> last, and in any case you can imagine scenarios where there
>> are multiple timer callbacks for different timers and it's
>> only the second timer that raises an interrupt...
>
> Agreed.
>
>
> r~

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v3 0/2] KVM: SEV: Add support for the ALLOWED_SEV_FEATURES feature

2025-02-07 Thread Kim Phillips

AMD EPYC 5th generation processors have introduced a feature that allows
the hypervisor to control the SEV_FEATURES that are set for, or by, a
guest [1].  ALLOWED_SEV_FEATURES can be used by the hypervisor to enforce
that SEV-ES and SEV-SNP guests cannot enable features that the
hypervisor does not want to be enabled.

Patch 1/2 adds support to detect the feature.

Patch 2/2 configures the ALLOWED_SEV_FEATURES field in the VMCB
according to the features the hypervisor supports.

[1] Section 15.36.20 "Allowed SEV Features", AMD64 Architecture
Programmer's Manual, Pub. 24593 Rev. 3.42 - March 2024:
https://bugzilla.kernel.org/attachment.cgi?id=306250

Based on 6.14-rc1.

v3:
 - Assign allowed_sev_features based on user-provided vmsa_features mask (Sean)
 - Users now have to explicitly opt-in with a qemu "allowed-sev-features=on" 
switch.
 - Rebased on top of 6.14-rc1 and reworked authorship chain (tglx)

v2: https://lore.kernel.org/lkml/2024081938.2192109-1-kim.phill...@amd.com/
 - Added some SEV_FEATURES require to be explicitly allowed by
   ALLOWED_SEV_FEATURES wording (Sean).
 - Added Nikunj's Reviewed-by.

v1: https://lore.kernel.org/lkml/20240802015732.3192877-3-kim.phill...@amd.com/

Kim Phillips (1):
  KVM: SEV: Configure "ALLOWED_SEV_FEATURES" VMCB Field

Kishon Vijay Abraham I (1):
  x86/cpufeatures: Add "Allowed SEV Features" Feature

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h |  5 -
 arch/x86/kvm/svm/sev.c | 17 +
 3 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.43.0

[PATCH v3 1/2] x86/cpufeatures: Add "Allowed SEV Features" Feature

2025-02-07 Thread Kim Phillips

From: Kishon Vijay Abraham I 

Add CPU feature detection for "Allowed SEV Features" to allow the
Hypervisor to enforce that SEV-ES and SEV-SNP guest VMs cannot
enable features (via SEV_FEATURES) that the Hypervisor does not
support or wish to be enabled.

Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Kim Phillips 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 508c0dad116b..a80a4164d110 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -453,6 +453,7 @@
 #define X86_FEATURE_DEBUG_SWAP (19*32+14) /* "debug_swap" SEV-ES full 
debug state swap support */
 #define X86_FEATURE_RMPREAD(19*32+21) /* RMPREAD instruction */
 #define X86_FEATURE_SEGMENTED_RMP  (19*32+23) /* Segmented RMP support */
+#define X86_FEATURE_ALLOWED_SEV_FEATURES (19*32+27) /* Allowed SEV Features */
 #define X86_FEATURE_SVSM   (19*32+28) /* "svsm" SVSM present */
 #define X86_FEATURE_HV_INUSE_WR_ALLOWED(19*32+30) /* Allow Write to 
in-use hypervisor-owned pages */
 
-- 
2.43.0

[PATCH v3 2/2] KVM: SEV: Configure "ALLOWED_SEV_FEATURES" VMCB Field

2025-02-07 Thread Kim Phillips

AMD EPYC 5th generation processors have introduced a feature that allows
the hypervisor to control the SEV_FEATURES that are set for, or by, a
guest [1].  ALLOWED_SEV_FEATURES can be used by the hypervisor to enforce
that SEV-ES and SEV-SNP guests cannot enable features that the
hypervisor does not want to be enabled.

When ALLOWED_SEV_FEATURES is enabled, a VMRUN will fail if any
non-reserved bits are 1 in SEV_FEATURES but are 0 in
ALLOWED_SEV_FEATURES.

Some SEV_FEATURES - currently PmcVirtualization and SecureAvic
(see Appendix B, Table B-4) - require an opt-in via ALLOWED_SEV_FEATURES,
i.e. are off-by-default, whereas all other features are effectively
on-by-default, but still honor ALLOWED_SEV_FEATURES.

[1] Section 15.36.20 "Allowed SEV Features", AMD64 Architecture
Programmer's Manual, Pub. 24593 Rev. 3.42 - March 2024:
https://bugzilla.kernel.org/attachment.cgi?id=306250

Co-developed-by: Kishon Vijay Abraham I 
Signed-off-by: Kishon Vijay Abraham I 
Signed-off-by: Kim Phillips 
---
 arch/x86/include/asm/svm.h |  5 -
 arch/x86/kvm/svm/sev.c | 17 +
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e2fac21471f5..6d94a727cc1a 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -158,7 +158,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
u64 avic_physical_id;   /* Offset 0xf8 */
u8 reserved_7[8];
u64 vmsa_pa;/* Used for an SEV-ES guest */
-   u8 reserved_8[720];
+   u8 reserved_8[40];
+   u64 allowed_sev_features;   /* Offset 0x138 */
+   u8 reserved_9[672];
/*
 * Offset 0x3e0, 32 bytes reserved
 * for use by hypervisor/software.
@@ -289,6 +291,7 @@ static_assert((X2AVIC_MAX_PHYSICAL_ID & 
AVIC_PHYSICAL_MAX_INDEX_MASK) == X2AVIC_
 #define SVM_SEV_FEAT_RESTRICTED_INJECTION  BIT(3)
 #define SVM_SEV_FEAT_ALTERNATE_INJECTION   BIT(4)
 #define SVM_SEV_FEAT_DEBUG_SWAPBIT(5)
+#define SVM_SEV_FEAT_ALLOWED_SEV_FEATURES  BIT_ULL(63)
 
 #define SVM_SEV_FEAT_INT_INJ_MODES \
(SVM_SEV_FEAT_RESTRICTED_INJECTION |\
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a2a794c32050..a9e16792cac0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -894,9 +894,19 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
return 0;
 }
 
+static u64 allowed_sev_features(struct kvm_sev_info *sev)
+{
+   if (cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES) &&
+   (sev->vmsa_features & SVM_SEV_FEAT_ALLOWED_SEV_FEATURES))
+   return sev->vmsa_features;
+
+   return 0;
+}
+
 static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu,
int *error)
 {
+   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
struct sev_data_launch_update_vmsa vmsa;
struct vcpu_svm *svm = to_svm(vcpu);
int ret;
@@ -906,6 +916,8 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct 
kvm_vcpu *vcpu,
return -EINVAL;
}
 
+   svm->vmcb->control.allowed_sev_features = allowed_sev_features(sev);
+
/* Perform some pre-encryption checks against the VMSA */
ret = sev_es_sync_vmsa(svm);
if (ret)
@@ -2447,6 +2459,8 @@ static int snp_launch_update_vmsa(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
struct vcpu_svm *svm = to_svm(vcpu);
u64 pfn = __pa(svm->sev_es.vmsa) >> PAGE_SHIFT;
 
+   svm->vmcb->control.allowed_sev_features = 
allowed_sev_features(sev);
+
ret = sev_es_sync_vmsa(svm);
if (ret)
return ret;
@@ -3069,6 +3083,9 @@ void __init sev_hardware_setup(void)
sev_supported_vmsa_features = 0;
if (sev_es_debug_swap_enabled)
sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+
+   if (sev_es_enabled && 
cpu_feature_enabled(X86_FEATURE_ALLOWED_SEV_FEATURES))
+   sev_supported_vmsa_features |= 
SVM_SEV_FEAT_ALLOWED_SEV_FEATURES;
 }
 
 void sev_hardware_unsetup(void)
-- 
2.43.0

[RFC] target/i386: sev: Add cmdline option to enable the Allowed SEV Features feature

2025-02-07 Thread Kim Phillips

The Allowed SEV Features feature allows the host kernel to control
which SEV features it does not want the guest to enable [1].

This has to be explicitly opted-in by the user because it has the
ability to break existing VMs if it were set automatically.

Currently, both the PmcVirtualization and SecureAvic features
require the Allowed SEV Features feature to be set.

Based on a similar patch written for Secure TSC [2].

[1] Section 15.36.20 "Allowed SEV Features", AMD64 Architecture
Programmer's Manual, Pub. 24593 Rev. 3.42 - March 2024:
https://bugzilla.kernel.org/attachment.cgi?id=306250

[2] https://github.com/qemu/qemu/commit/4b2288dc6025ba32519ee8d202ca72d565cbbab7

Signed-off-by: Kim Phillips 
---
 qapi/qom.json |  6 -
 target/i386/sev.c | 60 +++
 target/i386/sev.h |  2 ++
 3 files changed, 67 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 28ce24cd8d..113b44ad74 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -948,13 +948,17 @@
 # designated guest firmware page for measured boot with -kernel
 # (default: false) (since 6.2)
 #
+# @allowed-sev-features: true if secure allowed-sev-features feature
+# is to be enabled in an SEV-ES or SNP guest. (default: false)
+#
 # Since: 9.1
 ##
 { 'struct': 'SevCommonProperties',
   'data': { '*sev-device': 'str',
 '*cbitpos': 'uint32',
 'reduced-phys-bits': 'uint32',
-'*kernel-hashes': 'bool' } }
+'*kernel-hashes': 'bool',
+'*allowed-sev-features': 'bool' } }
 
 ##
 # @SevGuestProperties:
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 0e1dbb6959..85ad73f9a0 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -98,6 +98,7 @@ struct SevCommonState {
 uint32_t cbitpos;
 uint32_t reduced_phys_bits;
 bool kernel_hashes;
+uint64_t vmsa_features;
 
 /* runtime state */
 uint8_t api_major;
@@ -411,6 +412,33 @@ sev_get_reduced_phys_bits(void)
 return sev_common ? sev_common->reduced_phys_bits : 0;
 }
 
+static __u64
+sev_supported_vmsa_features(void)
+{
+uint64_t supported_vmsa_features = 0;
+struct kvm_device_attr attr = {
+.group = KVM_X86_GRP_SEV,
+.attr = KVM_X86_SEV_VMSA_FEATURES,
+.addr = (unsigned long) &supported_vmsa_features
+};
+
+bool sys_attr = kvm_check_extension(kvm_state, KVM_CAP_SYS_ATTRIBUTES);
+if (!sys_attr) {
+return 0;
+}
+
+int rc = kvm_ioctl(kvm_state, KVM_GET_DEVICE_ATTR, &attr);
+if (rc < 0) {
+if (rc != -ENXIO) {
+warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_SEV_VMSA_FEATURES) "
+"error: %d", rc);
+}
+return 0;
+}
+
+return supported_vmsa_features;
+}
+
 static SevInfo *sev_get_info(void)
 {
 SevInfo *info;
@@ -1524,6 +1552,20 @@ static int sev_common_kvm_init(ConfidentialGuestSupport 
*cgs, Error **errp)
 case KVM_X86_SNP_VM: {
 struct kvm_sev_init args = { 0 };
 
+if (sev_es_enabled()) {
+__u64 vmsa_features, supported_vmsa_features;
+
+supported_vmsa_features = sev_supported_vmsa_features();
+vmsa_features = sev_common->vmsa_features;
+if ((vmsa_features & supported_vmsa_features) != vmsa_features) {
+error_setg(errp, "%s: requested sev feature mask (0x%llx) "
+   "contains bits not supported by the host kernel "
+   " (0x%llx)", __func__, vmsa_features,
+   supported_vmsa_features);
+return -1;
+}
+args.vmsa_features = vmsa_features;
+}
 ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_INIT2, &args, &fw_error);
 break;
 }
@@ -2044,6 +2086,19 @@ static void sev_common_set_kernel_hashes(Object *obj, 
bool value, Error **errp)
 SEV_COMMON(obj)->kernel_hashes = value;
 }
 
+static bool
+sev_snp_guest_get_allowed_sev_features(Object *obj, Error **errp)
+{
+return SEV_COMMON(obj)->vmsa_features & SEV_VMSA_ALLOWED_SEV_FEATURES;
+}
+
+static void
+sev_snp_guest_set_allowed_sev_features(Object *obj, bool value, Error **errp)
+{
+if (value)
+SEV_COMMON(obj)->vmsa_features |= SEV_VMSA_ALLOWED_SEV_FEATURES;
+}
+
 static void
 sev_common_class_init(ObjectClass *oc, void *data)
 {
@@ -2061,6 +2116,11 @@ sev_common_class_init(ObjectClass *oc, void *data)
sev_common_set_kernel_hashes);
 object_class_property_set_description(oc, "kernel-hashes",
 "add kernel hashes to guest firmware for measured Linux boot");
+object_class_property_add_bool(oc, "allowed-sev-features",
+   sev_snp_guest_get_allowed_sev_features,
+   sev_snp_guest_set_allowed_sev_features);
+object_class_property_set_description(oc, "allowed-sev-features",
+"Enable the Allowed SEV Features feat

Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU

2025-02-07 Thread Zhangfei Gao

On Wed, Jan 22, 2025 at 12:43 AM Joao Martins  wrote:
>
> On 07/01/2025 06:55, Zhangfei Gao wrote:
> > Hi, Joao
> >
> > On Fri, Jun 23, 2023 at 5:51 AM Joao Martins  
> > wrote:
> >>
> >> Hey,
> >>
> >> This series introduces support for vIOMMU with VFIO device migration,
> >> particurlarly related to how we do the dirty page tracking.
> >>
> >> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
> >> provide dma translation services for guests to provide some form of
> >> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
> >> required for big VMs with VFs with more than 255 vcpus. We tackle both
> >> and remove the migration blocker when vIOMMU is present provided the
> >> conditions are met. I have both use-cases here in one series, but I am 
> >> happy
> >> to tackle them in separate series.
> >>
> >> As I found out we don't necessarily need to expose the whole vIOMMU
> >> functionality in order to just support interrupt remapping. x86 IOMMUs
> >> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
> >> Linux guests with commit c40c10 and since qemu commit 8646d9c773d8)
> >> can instantiate a IOMMU just for interrupt remapping without needing to
> >> be advertised/support DMA translation. AMD IOMMU in theory can provide
> >> the same, but Linux doesn't quite support the IR-only part there yet,
> >> only intel-iommu.
> >>
> >> The series is organized as following:
> >>
> >> Patches 1-5: Today we can't gather vIOMMU details before the guest
> >> establishes their first DMA mapping via the vIOMMU. So these first four
> >> patches add a way for vIOMMUs to be asked of their properties at start
> >> of day. I choose the least churn possible way for now (as opposed to a
> >> treewide conversion) and allow easy conversion a posteriori. As
> >> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
> >> allows us to fetch PCI backing vIOMMU attributes, without necessarily
> >> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
> >> was doing in v3.
> >>
> >> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
> >> DMA translation allowed. Today the 'dma-translation' attribute is
> >> x86-iommu only, but the way this series is structured nothing stops from
> >> other vIOMMUs supporting it too as long as they use
> >> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
> >> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
> >> the toggle/report DMA_TRANSLATION attribute. With the patches up to this 
> >> set,
> >> we've then tackled item (1) of the second paragraph.
> >
> > Not understanding how to handle the device page table.
> >
> > Does this mean after live-migration, the page table built by vIOMMU
> > will be re-build in the target guest via pci_setup_iommu_ops?
>
> AFAIU It is supposed to be done post loading the vIOMMU vmstate when enabling
> the vIOMMU related MRs. And when walking the different 'emulated' address 
> spaces
>  it will replay all mappings (and skip non-present parts of the address 
> space).
>
> The trick in making this work largelly depends on individual vIOMMU
> implementation (and this emulated vIOMMU stuff shouldn't be confused with 
> IOMMU
> nesting btw!). In intel case (and AMD will be similar) the root table pointer
> that's part of the vmstate has all the device pagetables, which is just guest
> memory that gets migrated over and enough to resolve VT-d/IVRS page walks.
>
> The somewhat hard to follow part is that when it replays it walks all the 
> whole
> DMAR memory region and only notifies IOMMU MR listeners if there's a present 
> PTE
> or skip it. So at the end of the enabling of MRs the IOTLB gets reconstructed.
> Though you would have to try to understand the flow with the vIOMMU you are 
> using.
>
> The replay in intel-iommu is triggered more or less this stack trace for a
> present PTE:
>
> vfio_iommu_map_notify
> memory_region_notify_iommu_one
> vtd_replay_hook
> vtd_page_walk_one
> vtd_page_walk_level
> vtd_page_walk_level
> vtd_page_walk_level
> vtd_page_walk
> vtd_iommu_replay
> memory_region_iommu_replay
> vfio_listener_region_add
> address_space_update_topology_pass
> address_space_set_flatview
> memory_region_transaction_commit
> vtd_switch_address_space
> vtd_switch_address_space_all
> vtd_post_load
> vmstate_load_state
> vmstate_load
> qemu_loadvm_section_start_full
> qemu_loadvm_state_main
> qemu_loadvm_state
> process_incoming_migration_co

Thanks Joao for the info

Sorry, some more questions,

When src boots up, the guest kernel will send commands to qemu.
qemu will consume these commands, and trigger

smmuv3_cmdq_consume
smmu_realloc_veventq
smmuv3_cmdq_consume
smmuv3_cmdq_consume SMMU_CMD_CFGI_STE
smmuv3_install_nested_ste
iommufd_backend_alloc_hwpt
host_iommu_device_iommufd_attach_hwpt

After live-migration, the dst does not get these commands, so it does
not call smmuv3_install_nested_ste etc.
so the dma page

Re: [PATCH 03/15] arm/cpu: Store aa64isar0 into the idregs arrays

2025-02-07 Thread Richard Henderson


On 2/7/25 03:02, Cornelia Huck wrote:

-t = cpu->isar.id_aa64zfr0;
+t = GET_IDREG(idregs, ID_AA64ZFR0);
  t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
  t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);   /* FEAT_SVE_PMULL128 */
  t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);   /* FEAT_SVE_BitPerm */
@@ -1252,7 +1262,7 @@ void aarch64_max_tcg_initfn(Object *obj)
  t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);  /* FEAT_I8MM */
  t = FIELD_DP64(t, ID_AA64ZFR0, F32MM, 1); /* FEAT_F32MM */
  t = FIELD_DP64(t, ID_AA64ZFR0, F64MM, 1); /* FEAT_F64MM */
-cpu->isar.id_aa64zfr0 = t;
+SET_IDREG(idregs, ID_AA64ZFR0, t);


This doesn't belong to this patch.


r~

Re: [PATCH 01/15] arm/cpu: Add sysreg definitions in cpu-sysregs.h

2025-02-07 Thread Richard Henderson


On 2/7/25 03:02, Cornelia Huck wrote:

diff --git a/target/arm/cpu-sysregs.h b/target/arm/cpu-sysregs.h
new file mode 100644
index ..de09ebae91a5
--- /dev/null
+++ b/target/arm/cpu-sysregs.h

...

+static const uint32_t id_register_sysreg[NUM_ID_IDX] = {


You can't place the data into a header like this.


diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2213c277348d..4bbce34e268d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -30,6 +30,7 @@
  #include "qapi/qapi-types-common.h"
  #include "target/arm/multiprocessing.h"
  #include "target/arm/gtimer.h"
+#include "target/arm/cpu-sysregs.h"


The data will be replicated into *every* user of cpu.h.


+static inline uint64_t _get_idreg(uint64_t *idregs, uint32_t index)
+{
+return idregs[index];
+}
+
+static inline void _set_idreg(uint64_t *idregs, uint32_t index, uint64_t value)
+{
+idregs[index] = value;
+}


No leading underscores -- this is not a freestanding environment like the 
kernel.
We must respect the system implementation namespace.


+/* REG is ID_XXX */
+#define FIELD_DP64_IDREG(ARRAY, REG, FIELD, VALUE)  \
+{ \
+uint64_t regval = _get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX);   \
+regval = FIELD_DP64(regval, REG, FIELD, VALUE); \
+_set_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX, regval); \
+}
+
+#define FIELD_DP32_IDREG(ARRAY, REG, FIELD, VALUE)  \
+{ \
+uint64_t regval = _get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX);   \
+regval = FIELD_DP32(regval, REG, FIELD, VALUE);   \
+_set_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX, regval); \
+}
+
+#define FIELD_EX64_IDREG(ARRAY, REG, FIELD) \
+FIELD_EX64(_get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX), REG, FIELD)  \
+
+#define FIELD_EX32_IDREG(ARRAY, REG, FIELD) \
+FIELD_EX32(_get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX), REG, FIELD)  \
+
+#define FIELD_SEX64_IDREG(ARRAY, REG, FIELD) \
+FIELD_SEX64(_get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX), REG, FIELD)  \
+
+#define SET_IDREG(ARRAY, REG, VALUE)\
+_set_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX, VALUE)
+
+#define GET_IDREG(ARRAY, REG)   \
+_get_idreg((uint64_t *)ARRAY, REG ## _EL1_IDX)


The casts look wrong, and seem very likely to hide bugs.
The macros should be written to be type-safe.

Perhaps like this:

#define FIELD_EX64_IDREG(ISAR, REG, FIELD) \
({ const ARMISARegisters *i_ = (ISAR); \
   FIELD_EX64(i_->idregs[REG ## _EL1_IDX], REG, FIELD); })


r~

Re: [PATCH 02/15] arm/kvm: add accessors for storing host features into idregs

2025-02-07 Thread Richard Henderson


On 2/7/25 03:02, Cornelia Huck wrote:

+/* read a 32b sysreg value and store it in the idregs */
+static int get_host_cpu_reg32(int fd, ARMHostCPUFeatures *ahcf, ARMSysRegs 
sysreg)
+{
+int index = get_sysreg_idx(sysreg);
+uint64_t *reg;
+int ret;
+
+if (index < 0) {
+return -ERANGE;
+}
+reg = &ahcf->isar.idregs[index];
+ret = read_sys_reg32(fd, (uint32_t *)reg, 
idregs_sysreg_to_kvm_reg(sysreg));
+return ret;
+}


I'm not keen on the casting.

If we want to retain read_sys_reg32 at all, then

uint32_t tmp;
ret = read_sys_reg32(fd, &tmp, idregs_sysreg_to_kvm_reg(sysreg));
if (ret == 0) {
ahcf->isar.idregs[index] = tmp;
}
return ret;

That said, read_sys_reg32 does exactly the opposite, using a uint64_t temporary. 
Therefore I would say that we should simply use read_sys_reg64.



r~

[PATCH v4 1/9] meson: Drop tcg as a module

2025-02-07 Thread Richard Henderson

This reverts commit dae0ec159f9 ("accel: build tcg modular").
The attempt was only enabled for x86, only modularized a small
portion of tcg, and in more than 3 years there have been no
follow-ups to improve the situation.

Reviewed-by: Thomas Huth 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/meson.build | 11 ---
 meson.build   | 18 +-
 2 files changed, 5 insertions(+), 24 deletions(-)

diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
index aef80de967..69f4808ac4 100644
--- a/accel/tcg/meson.build
+++ b/accel/tcg/meson.build
@@ -21,16 +21,13 @@ specific_ss.add_all(when: 'CONFIG_TCG', if_true: 
tcg_specific_ss)
 specific_ss.add(when: ['CONFIG_SYSTEM_ONLY', 'CONFIG_TCG'], if_true: files(
   'cputlb.c',
   'watchpoint.c',
+  'tcg-accel-ops.c',
+  'tcg-accel-ops-mttcg.c',
+  'tcg-accel-ops-icount.c',
+  'tcg-accel-ops-rr.c',
 ))
 
 system_ss.add(when: ['CONFIG_TCG'], if_true: files(
   'icount-common.c',
   'monitor.c',
 ))
-
-tcg_module_ss.add(when: ['CONFIG_SYSTEM_ONLY', 'CONFIG_TCG'], if_true: files(
-  'tcg-accel-ops.c',
-  'tcg-accel-ops-mttcg.c',
-  'tcg-accel-ops-icount.c',
-  'tcg-accel-ops-rr.c',
-))
diff --git a/meson.build b/meson.build
index 131b2225ab..e50a103f8a 100644
--- a/meson.build
+++ b/meson.build
@@ -322,12 +322,6 @@ if cpu in ['x86', 'x86_64']
   }
 endif
 
-modular_tcg = []
-# Darwin does not support references to thread-local variables in modules
-if host_os != 'darwin'
-  modular_tcg = ['i386-softmmu', 'x86_64-softmmu']
-endif
-
 ##
 # Compiler flags #
 ##
@@ -3279,11 +3273,6 @@ foreach target : target_dirs
 if sym == 'CONFIG_TCG' or target in accelerator_targets.get(sym, [])
   config_target += { sym: 'y' }
   config_all_accel += { sym: 'y' }
-  if target in modular_tcg
-config_target += { 'CONFIG_TCG_MODULAR': 'y' }
-  else
-config_target += { 'CONFIG_TCG_BUILTIN': 'y' }
-  endif
   target_kconfig += [ sym + '=y' ]
 endif
   endforeach
@@ -3642,7 +3631,6 @@ util_ss = ss.source_set()
 
 # accel modules
 qtest_module_ss = ss.source_set()
-tcg_module_ss = ss.source_set()
 
 modules = {}
 target_modules = {}
@@ -3803,11 +3791,7 @@ subdir('tests/qtest/libqos')
 subdir('tests/qtest/fuzz')
 
 # accel modules
-tcg_real_module_ss = ss.source_set()
-tcg_real_module_ss.add_all(when: 'CONFIG_TCG_MODULAR', if_true: tcg_module_ss)
-specific_ss.add_all(when: 'CONFIG_TCG_BUILTIN', if_true: tcg_module_ss)
-target_modules += { 'accel' : { 'qtest': qtest_module_ss,
-'tcg': tcg_real_module_ss }}
+target_modules += { 'accel' : { 'qtest': qtest_module_ss }}
 
 ##
 # Internal static_libraries and dependencies #
-- 
2.43.0

Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase

2025-02-07 Thread Peter Maydell

On Fri, 7 Feb 2025 at 17:48, Peter Xu  wrote:
>
> On Fri, Feb 07, 2025 at 04:58:39PM +, Peter Maydell wrote:
> > (I wonder if we ought to suggest quiescing outstanding
> > DMA in the enter phase? But it's probably easier to fix
> > the iommus like this series does than try to get every
> > dma-capable pci device to do something different.)
>
> I wonder if we should provide some generic helper to register vIOMMU reset
> callbacks, so that we'll be sure any vIOMMU model impl that will register
> at exit() phase only, and do nothing during the initial two phases.  Then
> we can put some rich comment on that helper on why.
>
> Looks like it means the qemu reset model in the future can be a combination
> of device tree (which resets depth-first) and the three phases model.  We
> will start to use different approach to solve different problems.

The tree of QOM devices (i.e. the one based on the qbus buses
and rooted at the sysbus) resets depth-first, but it does so in
three phases: first we traverse everything doing 'enter'; then
we traverse everything doing 'hold'; then we traverse everything
doing 'exit'. There *used* to be an awkward mix of some things
being three-phase and some not, but we have now got rid of all
of those so a system reset does a single three-phase reset run
which resets everything.

-- PMM

Re: [RFC PATCH v2 3/8] migration/multifd: Terminate the TLS connection

2025-02-07 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
>> The multifd recv side has been getting a TLS error of
>> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
>> side closes the sockets without ending the TLS session. This has been
>> masked by the code not checking the migration error after loadvm.
>> 
>> Start ending the TLS session at multifd_send_shutdown() so the recv
>> side always sees a clean termination (EOF) and we can start to
>> differentiate that from an actual premature termination that might
>> possibly happen in the middle of the migration.
>> 
>> There's nothing to be done if a previous migration error has already
>> broken the connection, so add a comment explaining it and ignore any
>> errors coming from gnutls_bye().
>> 
>> This doesn't break compat with older recv-side QEMUs because EOF has
>> always caused the recv thread to exit cleanly.
>> 
>> Signed-off-by: Fabiano Rosas 
>
> Reviewed-by: Peter Xu 
>
> One trivial comment..
>
>> ---
>>  migration/multifd.c | 34 +-
>>  migration/tls.c |  5 +
>>  migration/tls.h |  2 +-
>>  3 files changed, 39 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index ab73d6d984..b57cad3bb1 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>>  return;
>>  }
>>  
>> +for (i = 0; i < migrate_multifd_channels(); i++) {
>> +MultiFDSendParams *p = &multifd_send_state->params[i];
>> +
>> +/* thread_created implies the TLS handshake has succeeded */
>> +if (p->tls_thread_created && p->thread_created) {
>> +Error *local_err = NULL;
>> +/*
>> + * The destination expects the TLS session to always be
>> + * properly terminated. This helps to detect a premature
>> + * termination in the middle of the stream.  Note that
>> + * older QEMUs always break the connection on the source
>> + * and the destination always sees
>> + * GNUTLS_E_PREMATURE_TERMINATION.
>> + */
>> +migration_tls_channel_end(p->c, &local_err);
>> +
>> +if (local_err) {
>> +/*
>> + * The above can fail with broken pipe due to a
>> + * previous migration error, ignore the error.
>> + */
>> +assert(migration_has_failed(migrate_get_current()));
>
> Considering this is still src, do we want to be softer on this by
> error_report?
>
> Logically !migration_has_failed() means it succeeded, so we can throw src
> qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
> of comment..  Your call.
>

Maybe even a warning? If at this point migration succeeded, it's probably
best to let cleanup carry on.

>> +}
>> +}
>> +}
>> +
>>  multifd_send_terminate_threads();
>>  
>>  for (i = 0; i < migrate_multifd_channels(); i++) {
>> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>>  
>>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>> p->packet_len, &local_err);
>> -if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
>> +if (!ret) {
>> +/* EOF */
>> +assert(!local_err);
>> +break;
>> +}
>> +
>> +if (ret == -1) {
>>  break;
>>  }
>>  
>> diff --git a/migration/tls.c b/migration/tls.c
>> index fa03d9136c..5cbf952383 100644
>> --- a/migration/tls.c
>> +++ b/migration/tls.c
>> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>>NULL);
>>  }
>>  
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
>> +{
>> +qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
>> +}
>> +
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>>  {
>>  if (!migrate_tls()) {
>> diff --git a/migration/tls.h b/migration/tls.h
>> index 5797d153cb..58b25e1228 100644
>> --- a/migration/tls.h
>> +++ b/migration/tls.h
>> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
>> QIOChannel *ioc,
>> const char *hostname,
>> Error **errp);
>> -
>> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>>  /* Whether the QIO channel requires further TLS handshake? */
>>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>>  
>> -- 
>> 2.35.3
>>

Re: [RFC PATCH v2 8/8] migration/multifd: Add a compat property for TLS termination

2025-02-07 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Feb 07, 2025 at 11:27:58AM -0300, Fabiano Rosas wrote:
>> We're currently changing the way the source multifd migration handles
>> the shutdown of the multifd channels when TLS is in use to perform a
>> clean termination by calling gnutls_bye().
>> 
>> Older src QEMUs will always close the channel without terminating the
>> TLS session. New dst QEMUs treat an unclean termination as an
>> error. Due to synchronization conditions, src QEMUs 9.1 and 9.2 are an
>> exception and can put the destination in a condition where it ignores
>> the unclean termination. For src QEMUs older than 9.1, we'll need a
>> compat property on the destination to inform that the src does not
>> terminate the TLS session.
>> 
>> Add multifd_clean_tls_termination (default true) that can be switched
>> on the destination whenever a src QEMU <9.1 is in use.
>
> Patch looks good.  Though did you forget to add the compat entry?
>

Indeed.

> I suggest we add it for all pre-9.2, in case whoever backports the recent
> changes so it re-exposes again in any distro stables.
>
> IMHO it doesn't hurt us much to be always cautious on 9.1 and 9.2 too by
> loosing the termination a bit.
>

Ok, I'll put it in hw_compat_9_2.

>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  migration/migration.h | 33 +
>>  migration/multifd.c   |  8 +++-
>>  migration/multifd.h   |  2 ++
>>  migration/options.c   |  2 ++
>>  4 files changed, 44 insertions(+), 1 deletion(-)
>> 
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 4c1fafc2b5..77def0b437 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -443,6 +443,39 @@ struct MigrationState {
>>   * Default value is false. (since 8.1)
>>   */
>>  bool multifd_flush_after_each_section;
>> +
>> +/*
>> + * This variable only makes sense when set on the machine that is
>> + * the destination of a multifd migration with TLS enabled. It
>> + * affects the behavior of the last send->recv iteration with
>> + * regards to termination of the TLS session.
>> + *
>> + * When set:
>> + *
>> + * - the destination QEMU instance can expect to never get a
>> + *   GNUTLS_E_PREMATURE_TERMINATION error. Manifested as the error
>> + *   message: "The TLS connection was non-properly terminated".
>> + *
>> + * When clear:
>> + *
>> + * - the destination QEMU instance can expect to see a
>> + *   GNUTLS_E_PREMATURE_TERMINATION error in any multifd channel
>> + *   whenever the last recv() call of that channel happens after
>> + *   the source QEMU instance has already issued shutdown() on the
>> + *   channel.
>> + *
>> + *   Commit 637280aeb2 (since 9.1) introduced a side effect that
>> + *   causes the destination instance to not be affected by the
>> + *   premature termination, while commit 1d457daf86 (since 10.0)
>> + *   causes the premature termination condition to be once again
>> + *   reachable.
>> + *
>> + * NOTE: Regardless of the state of this option, a premature
>> + * termination of the TLS connection might happen due to error at
>> + * any moment prior to the last send->recv iteration.
>> + */
>> +bool multifd_clean_tls_termination;
>> +
>>  /*
>>   * This decides the size of guest memory chunk that will be used
>>   * to track dirty bitmap clearing.  The size of memory chunk will
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index b4f82b0893..4342399818 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -1147,6 +1147,7 @@ void multifd_recv_sync_main(void)
>>  
>>  static void *multifd_recv_thread(void *opaque)
>>  {
>> +MigrationState *s = migrate_get_current();
>>  MultiFDRecvParams *p = opaque;
>>  Error *local_err = NULL;
>>  bool use_packets = multifd_use_packets();
>> @@ -1155,6 +1156,10 @@ static void *multifd_recv_thread(void *opaque)
>>  trace_multifd_recv_thread_start(p->id);
>>  rcu_register_thread();
>>  
>> +if (!s->multifd_clean_tls_termination) {
>> +p->read_flags = QIO_CHANNEL_READ_FLAG_RELAXED_EOF;
>> +}
>> +
>>  while (true) {
>>  uint32_t flags = 0;
>>  bool has_data = false;
>> @@ -1166,7 +1171,8 @@ static void *multifd_recv_thread(void *opaque)
>>  }
>>  
>>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
>> -   p->packet_len, 0, &local_err);
>> +   p->packet_len, p->read_flags,
>> +   &local_err);
>>  if (!ret) {
>>  /* EOF */
>>  assert(!local_err);
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index bd785b9873..cf408ff721 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -244,6 +244,8 @@ typedef struct {
>>  uint32_t zero_n

Re: [PATCH 02/15] arm/kvm: add accessors for storing host features into idregs

2025-02-07 Thread Richard Henderson


On 2/7/25 03:02, Cornelia Huck wrote:

+/* read a 32b sysreg value and store it in the idregs */
+static int get_host_cpu_reg32(int fd, ARMHostCPUFeatures *ahcf, ARMSysRegs 
sysreg)
+{
+int index = get_sysreg_idx(sysreg);
+uint64_t *reg;
+int ret;
+
+if (index < 0) {
+return -ERANGE;
+}
+reg = &ahcf->isar.idregs[index];
+ret = read_sys_reg32(fd, (uint32_t *)reg, 
idregs_sysreg_to_kvm_reg(sysreg));
+return ret;
+}
+
+/* read a 64b sysreg value and store it in the idregs */
+static int get_host_cpu_reg64(int fd, ARMHostCPUFeatures *ahcf, ARMSysRegs 
sysreg)
+{
+int index = get_sysreg_idx(sysreg);


Why pass the ARMSysRegs value instead of the ARMIDRegisterIdx value?

You save yourself a linear search over the id_register_sysreg array, and you can't use 
this interface with a sysreg that doesn't have an index anyway -- ERANGE is a new failure 
mode.



r~

Re: [PATCH 15/15] arm/cpu: Add generated files

2025-02-07 Thread Richard Henderson


On 2/7/25 03:02, Cornelia Huck wrote:

And switch to using the generated definitions.

Generated against Linux 6.14-rc1.

Signed-off-by: Cornelia Huck
---
  target/arm/cpu-sysreg-properties.c | 716 -
  target/arm/cpu-sysregs.h   | 116 +
  target/arm/cpu-sysregs.h.inc   | 164 +++
  3 files changed, 860 insertions(+), 136 deletions(-)
  create mode 100644 target/arm/cpu-sysregs.h.inc


Why are we committing generated files and not generating them at build-time?


r~

Re: [RFC PATCH v2 1/8] crypto: Allow gracefully ending the TLS session

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 11:27:51AM -0300, Fabiano Rosas wrote:
> QEMU's TLS session code provides no way to call gnutls_bye() to
> terminate a TLS session. Callers of qcrypto_tls_session_read() can
> choose to ignore a GNUTLS_E_PREMATURE_TERMINATION error by setting the
> gracefulTermination argument.
> 
> The QIOChannelTLS ignores the premature termination error whenever
> shutdown() has already been issued. This is not enough anymore for the
> migration code due to changes [1] in the synchronization between
> migration source and destination.

This sentence seems to say commit [1] changed something on the tls
condition, but IMHO fundamentally the issue is multifd recv thread model
that relies on blocking readv() rather than request-based (like what src
multifd does).

Now src uses either shutdown() or close() to kick dest multifd recv threads
out from readv().  That has nothing to do with what we do during complete()
with those sync messages.. referencing it is ok, but we'll need to
reference also the other commit to be clear pre-9.0 can also be prone to
this.  To me, it's more important to mention the root cause on the multifd
recv thread model, which requires explicit tls terminations.

> 
> Add support for calling gnutls_bye() in the tlssession layer so users
> of QIOChannelTLS can clearly identify the end of a TLS session.
> 
> 1- 1d457daf86 ("migration/multifd: Further remove the SYNC on complete")
> 
> Signed-off-by: Fabiano Rosas 

-- 
Peter Xu

Re: [PATCH 5/5] hw/vfio/common: Add a trace point in vfio_reset_handler

2025-02-07 Thread Cédric Le Goater


On 2/6/25 15:21, Eric Auger wrote:

To ease the debug of reset sequence, let's add a trace point
in vfio_reset_handler()

Signed-off-by: Eric Auger 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/vfio/common.c | 1 +
  hw/vfio/trace-events | 1 +
  2 files changed, 2 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7499a9b74..173fb3a997 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1386,6 +1386,7 @@ void vfio_reset_handler(void *opaque)
  {
  VFIODevice *vbasedev;
  
+trace_vfio_reset_handler();

  QLIST_FOREACH(vbasedev, &vfio_device_list, global_next) {
  if (vbasedev->dev->realized) {
  vbasedev->ops->vfio_compute_needs_reset(vbasedev);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index cab1cf1de0..c5385e1a4f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -120,6 +120,7 @@ vfio_get_dev_region(const char *name, int index, uint32_t 
type, uint32_t subtype
  vfio_legacy_dma_unmap_overflow_workaround(void) ""
  vfio_get_dirty_bitmap(uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) 
"iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" 
dirty_pages=%"PRIu64
  vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 
0x%"PRIx64" - 0x%"PRIx64
+vfio_reset_handler(void) ""
  
  # platform.c

  vfio_platform_realize(char *name, char *compat) "vfio device %s, compat = %s"

Re: [PATCH] target/arm/helper: Fix timer interrupt masking when HCR_EL2.E2H == 0

2025-02-07 Thread Richard Henderson


On 2/7/25 07:45, Peter Maydell wrote:

This is where things go wrong -- icount_start_warp_timer()
notices that all CPU threads are currently idle, and
decides it needs to warp the timer forwards to the
next deadline, which is at the end of time -- INT64_MAX.

But once timer_mod_ns() returns, the generic timer code
is going to raise an interrupt (this goes through the GIC
code and comes back into the CPU which calls cpu_interrupt()),
so we don't want to warp the timer at all. The clock should
stay exactly at the value it has and the CPU is going to
have more work to do.

How is this supposed to work? Shouldn't we only be doing
the "start moving the icount forward to the next deadline"
once we've completed all the "run timers and AIO stuff" that
icount_handle_deadline() triggers, not randomly in the middle
of that when this timer callback or some other one might do
something to trigger an interrupt?


I don't understand timer warping at all.  And you're right, it doesn't seem like this 
should happen outside of a specific point in the main loop.



... But I don't think there's any reason why
timer callbacks should be obliged to reprogram their timers
last, and in any case you can imagine scenarios where there
are multiple timer callbacks for different timers and it's
only the second timer that raises an interrupt...


Agreed.


r~

Re: [PATCH 0/5] Fix vIOMMU reset order

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 05:06:20PM +, Peter Maydell wrote:
> On Fri, 7 Feb 2025 at 16:54, Peter Xu  wrote:
> >
> > On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> > > This is a follow-up of Peter's attempt to fix the fact that
> > > vIOMMUs are likely to be reset before the device they protect:
> > >
> > > [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> > > https://lore.kernel.org/all/20240117091559.144730-1-pet...@redhat.com/
> > >
> > > This is especially observed with virtio devices when a qmp system_reset
> > > command is sent but also with VFIO devices.
> > >
> > > This series puts the vIOMMU reset in the 3-phase exit callback.
> > >
> > > This scheme was tested successful with virtio-devices and some
> > > VFIO devices. Nevertheless not all the topologies have been
> > > tested yet.
> >
> > Eric,
> >
> > It's great to know that we seem to be able to fix everything in such small
> > changeset!
> >
> > I would like to double check two things with you here:
> >
> >   - For VFIO's reset hook, looks like we have landed more changes so that
> > vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
> > reset during "hold" phase only (via legacy_reset_hold()).  That part
> > will make sure vIOMMU (if switching to exit()-only reset) will order
> > properly with VFIO.  Is my understanding correct here?
> 
> Yes, we now do a reset of the whole system as a three-phase setup,
> and the old pre-three-phase reset APIs like qemu_register_reset() and
> device_class_set_legacy_reset() all happen during the "hold" phase.
> 
> >   - Is it possible if some PCIe devices that will provide its own
> > phase.exit(), would it matter on the order of PCIe device's
> > phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
> > exit()-only approach like this one)?
> 
> It's certainly possible for a PCIe device to implement
> a three-phase reset which does things in the exit phase. However
> I think I would say that such a device which didn't cancel all
> outstanding DMA operations during either 'enter' or 'hold'
> phases would be broken. If it did some other things during
> the 'exit' phase I don't think the ordering of those vs the
> iommu 'exit' handling should matter.

Yes, this sounds fair.

> 
> (To some extent the splitting into three phases is trying
> to set up a consistent model as outlined in docs/devel/reset.rst
> and to some extent it's just a convenient way to get a basic
> "this reset thing I need to do must happen after some other
> device has done its reset things" which you can achieve
> by ad-hoc putting them in different phases. Ideally we get
> mostly the former and a little pragmatic dose of the latter,
> but the consistent model is not very solidly nailed down
> so I have a feeling the proportions may not be quite as
> lopsided as we'd like :-) )

Yes, it's a good move that we can have other ways to fix all the problems
without major surgery, and it also looks solid and clean if we have plan to
fix any outlier PCIe devices.

If there will be a repost after all, not sure if Eric would like to add
some of above discussions into either some commit messages or cover letter.
Or some comment in the code might be even better.

Thanks!

-- 
Peter Xu

Re: [PATCH 0/5] Fix vIOMMU reset order

2025-02-07 Thread Cédric Le Goater


On 2/7/25 17:54, Peter Xu wrote:

On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:

This is a follow-up of Peter's attempt to fix the fact that
vIOMMUs are likely to be reset before the device they protect:

[PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
https://lore.kernel.org/all/20240117091559.144730-1-pet...@redhat.com/

This is especially observed with virtio devices when a qmp system_reset
command is sent but also with VFIO devices.

This series puts the vIOMMU reset in the 3-phase exit callback.

This scheme was tested successful with virtio-devices and some
VFIO devices. Nevertheless not all the topologies have been
tested yet.


Eric,

It's great to know that we seem to be able to fix everything in such small
changeset!

I would like to double check two things with you here:

   - For VFIO's reset hook, looks like we have landed more changes so that
 vfio's reset function is now a TYPE_LEGACY_RESET, and it always do the
 reset during "hold" phase only (via legacy_reset_hold()).  That part
 will make sure vIOMMU (if switching to exit()-only reset) will order
 properly with VFIO.  Is my understanding correct here?



Eric,

We were still seeing DMA errors from VFIO devices :

  VFIO_MAP_DMA failed: Bad address

with this series at shutdown (machine or OS) when using an intel_iommu
device. We could see that the VIOMMU was reset and the device DMAs
were still alive. Do you know why now ?

Thanks,

C.




   - Is it possible if some PCIe devices that will provide its own
 phase.exit(), would it matter on the order of PCIe device's
 phase.exit() and vIOMMU's phase.exit() (if vIOMMUs switch to use
 exit()-only approach like this one)?

PS: it would be great to attach such information in either cover letter or
commit message.  But definitely not a request to repost the patchset, if
Michael would have Message-ID when merge that'll be far enough to help
anyone find this discussion again.

Thanks!

Re: [PATCH v3 4/4] qapi: expose all schema features to code

2025-02-07 Thread John Snow

On Fri, Feb 7, 2025, 5:30 AM Markus Armbruster  wrote:

> John Snow  writes:
>
> > On Fri, Jan 31, 2025 at 8:18 AM Markus Armbruster 
> wrote:
> >
> >> Cc: John Snow for Python typing expertise.
> >>
> >> Daniel P. Berrangé  writes:
> >>
> >> > This replaces use of the constants from the QapiSpecialFeatures
> >> > enum, with constants from the auto-generate QapiFeatures enum
> >> > in qapi-features.h
> >> >
> >> > The 'deprecated' and 'unstable' features still have a little bit of
> >> > special handling, being force defined to be the 1st + 2nd features
> >> > in the enum, regardless of whether they're used in the schema. This
> >> > retains compatibility with common code that references the features
> >> > via the QapiSpecialFeatures constants.
> >> >
> >> > Signed-off-by: Daniel P. Berrangé 
>
> [...]
>
> >> > diff --git a/scripts/qapi/features.py b/scripts/qapi/features.py
> >> > new file mode 100644
> >> > index 00..f32f9fe5f4
> >> > --- /dev/null
> >> > +++ b/scripts/qapi/features.py
> >> > @@ -0,0 +1,51 @@
> >> > +"""
> >> > +QAPI features generator
> >> > +
> >> > +Copyright 2024 Red Hat
> >> > +
> >> > +This work is licensed under the terms of the GNU GPL, version 2.
> >> > +# See the COPYING file in the top-level directory.
> >> > +"""
> >> > +
> >> > +from typing import Dict
> >> > +
> >> > +from .common import c_enum_const, c_name
> >> > +from .gen import QAPISchemaMonolithicCVisitor
> >> > +from .schema import (
> >> > +QAPISchema,
> >> > +QAPISchemaFeature,
> >> > +)
> >> > +
> >> > +
> >> > +class QAPISchemaGenFeatureVisitor(QAPISchemaMonolithicCVisitor):
> >> > +
> >> > +def __init__(self, prefix: str):
> >> > +super().__init__(
> >> > +prefix, 'qapi-features',
> >> > +' * Schema-defined QAPI features',
> >> > +__doc__)
> >> > +
> >> > +self.features: Dict[str, QAPISchemaFeature] = {}
> >> > +
> >> > +def visit_begin(self, schema: QAPISchema) -> None:
> >> > +self.features = schema.features()
> >>
> >> Inconsistent type hints:
> >>
> >> $ mypy --config-file scripts/qapi/mypy.ini scripts/qapi-gen.py
> >> scripts/qapi/schema.py:1164: error: Incompatible return value type
> >> (got "dict_values[str, QAPISchemaFeature]", expected
> >> "List[QAPISchemaFeature]")  [return-value]
> >> scripts/qapi/features.py:31: error: Incompatible types in assignment
> >> (expression has type "List[QAPISchemaFeature]", variable has type
> >> "Dict[str, QAPISchemaFeature]")  [assignment]
> >>
> >> We've been working towards having the build run mypy, but we're not
> >> there, yet.  Sorry for the inconvenience!
> >>
> >> schema.features() returns .values(), i.e. a view object.
> >>
> >> I guess the type hint should be ValuesView[QAPISchemaFeature], both for
> >> type type of attribute .features above, and for the return type of
> >> method .features() below.  John?
> >>
> >
> > It's probably easiest to just use list(...) in the return and then use
> > List[T] anywhere it matters. The values view type is "kind of, but not
> > actually a list" because it isn't mutable. It is, however, an
> > Iterable/Sequence. You can either convert it to a list or make the typing
> > more abstract.
> >
> > (Rule of thumb: return types should be as specific as possible, input
> types
> > should be as abstract as possible.)
>
> Converting a view to a list makes a copy, right?
>
> I'm not asking because that would be terrible.  I just like to
> understand things.
>
> I'd like to move further discussion to Daniel's v4.


'Kay, but let me answer your direct questions here, sorry. I'll switch to
v4 afterwards.

Yeah, list(iterable) just builds a list from the iterable, so it uses the
iterable, immutable "view" into the dict keys to build a list. The dict
view is a "live" object attached to the dict, the list is a static mutable
list fixed at time of copy.

(you could type it more accurately, but that can be annoying, so you can
just convert it to something "normal" like a list or tuple.)


> > I apologize for this format of relaying patches as it is against the
> blood
> > oath I swore as a maintainer, but it's late in my day, forgive me:
> > https://gitlab.com/jsnow/qemu/-/commits/dan-fixup
> >
> > That branch has two things in it:
> >
> > (1) patches to make the python/ tests check the qapi module. This means
> the
> > "make check-minreqs" test you can run from python/ will be run by the
> > GitLab pipelines. You can also run "make check-tox" manually, or run the
> > optional python-tox test from the pipeline dashboard.
>
> These are:
>
> dd9e47f0a8 qapi: update pylintrc config
> dfc6344f32 python: add qapi static analysis tests
> 1f89bf53ed qapi: delete un-needed static analysis configs
>
> Will you post them for review & merging?
>

Yep! Just wanted to offer them as part of this fixup/review to make it easy
for the both of you to run tests consistently. I know it's been a real PITA
to lint qapi, and I found a really simple way t

Re: [PATCH 4/5] hw/arm/smmuv3: Move reset to exit phase

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 04:58:39PM +, Peter Maydell wrote:
> (I wonder if we ought to suggest quiescing outstanding
> DMA in the enter phase? But it's probably easier to fix
> the iommus like this series does than try to get every
> dma-capable pci device to do something different.)

I wonder if we should provide some generic helper to register vIOMMU reset
callbacks, so that we'll be sure any vIOMMU model impl that will register
at exit() phase only, and do nothing during the initial two phases.  Then
we can put some rich comment on that helper on why.

Looks like it means the qemu reset model in the future can be a combination
of device tree (which resets depth-first) and the three phases model.  We
will start to use different approach to solve different problems.

Maybe after we settle our mind, we should update the reset document,
e.g. for device emulation developers, we need to be clear on where to
quiesce the DMAs, and it must not happen at exit().  Both all devices and
all iommu impls need to follow the rules to make it work like the plan.

Thanks,

-- 
Peter Xu

Re: [RFC PATCH v2 1/8] crypto: Allow gracefully ending the TLS session

2025-02-07 Thread Fabiano Rosas

Peter Xu  writes:

> On Fri, Feb 07, 2025 at 11:27:51AM -0300, Fabiano Rosas wrote:
>> QEMU's TLS session code provides no way to call gnutls_bye() to
>> terminate a TLS session. Callers of qcrypto_tls_session_read() can
>> choose to ignore a GNUTLS_E_PREMATURE_TERMINATION error by setting the
>> gracefulTermination argument.
>> 
>> The QIOChannelTLS ignores the premature termination error whenever
>> shutdown() has already been issued. This is not enough anymore for the
>> migration code due to changes [1] in the synchronization between
>> migration source and destination.
>
> This sentence seems to say commit [1] changed something on the tls
> condition, but IMHO fundamentally the issue is multifd recv thread model
> that relies on blocking readv() rather than request-based (like what src
> multifd does).
>
> Now src uses either shutdown() or close() to kick dest multifd recv threads
> out from readv().  That has nothing to do with what we do during complete()
> with those sync messages.. referencing it is ok, but we'll need to
> reference also the other commit to be clear pre-9.0 can also be prone to
> this.  To me, it's more important to mention the root cause on the multifd
> recv thread model, which requires explicit tls terminations.
>

I didn't want to go into too much detail in a commit for crypto/. The
motivation for *this* patch is just: migration needs it. What about:

 The QIOChannelTLS ignores the premature termination error whenever
 shutdown() has already been issued. This was found to be not enough for
 the migration code because shutdown() might not have been issued before
 the connection is terminated.


>> 
>> Add support for calling gnutls_bye() in the tlssession layer so users
>> of QIOChannelTLS can clearly identify the end of a TLS session.
>> 
>> 1- 1d457daf86 ("migration/multifd: Further remove the SYNC on complete")
>> 
>> Signed-off-by: Fabiano Rosas

Re: [RFC PATCH v2 3/8] migration/multifd: Terminate the TLS connection

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 11:27:53AM -0300, Fabiano Rosas wrote:
> The multifd recv side has been getting a TLS error of
> GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send
> side closes the sockets without ending the TLS session. This has been
> masked by the code not checking the migration error after loadvm.
> 
> Start ending the TLS session at multifd_send_shutdown() so the recv
> side always sees a clean termination (EOF) and we can start to
> differentiate that from an actual premature termination that might
> possibly happen in the middle of the migration.
> 
> There's nothing to be done if a previous migration error has already
> broken the connection, so add a comment explaining it and ignore any
> errors coming from gnutls_bye().
> 
> This doesn't break compat with older recv-side QEMUs because EOF has
> always caused the recv thread to exit cleanly.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

One trivial comment..

> ---
>  migration/multifd.c | 34 +-
>  migration/tls.c |  5 +
>  migration/tls.h |  2 +-
>  3 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ab73d6d984..b57cad3bb1 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -490,6 +490,32 @@ void multifd_send_shutdown(void)
>  return;
>  }
>  
> +for (i = 0; i < migrate_multifd_channels(); i++) {
> +MultiFDSendParams *p = &multifd_send_state->params[i];
> +
> +/* thread_created implies the TLS handshake has succeeded */
> +if (p->tls_thread_created && p->thread_created) {
> +Error *local_err = NULL;
> +/*
> + * The destination expects the TLS session to always be
> + * properly terminated. This helps to detect a premature
> + * termination in the middle of the stream.  Note that
> + * older QEMUs always break the connection on the source
> + * and the destination always sees
> + * GNUTLS_E_PREMATURE_TERMINATION.
> + */
> +migration_tls_channel_end(p->c, &local_err);
> +
> +if (local_err) {
> +/*
> + * The above can fail with broken pipe due to a
> + * previous migration error, ignore the error.
> + */
> +assert(migration_has_failed(migrate_get_current()));

Considering this is still src, do we want to be softer on this by
error_report?

Logically !migration_has_failed() means it succeeded, so we can throw src
qemu way now, that shouldn't be a huge deal. More of thinking out loud kind
of comment..  Your call.

> +}
> +}
> +}
> +
>  multifd_send_terminate_threads();
>  
>  for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -1141,7 +1167,13 @@ static void *multifd_recv_thread(void *opaque)
>  
>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> p->packet_len, &local_err);
> -if (ret == 0 || ret == -1) {   /* 0: EOF  -1: Error */
> +if (!ret) {
> +/* EOF */
> +assert(!local_err);
> +break;
> +}
> +
> +if (ret == -1) {
>  break;
>  }
>  
> diff --git a/migration/tls.c b/migration/tls.c
> index fa03d9136c..5cbf952383 100644
> --- a/migration/tls.c
> +++ b/migration/tls.c
> @@ -156,6 +156,11 @@ void migration_tls_channel_connect(MigrationState *s,
>NULL);
>  }
>  
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp)
> +{
> +qio_channel_tls_bye(QIO_CHANNEL_TLS(ioc), errp);
> +}
> +
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc)
>  {
>  if (!migrate_tls()) {
> diff --git a/migration/tls.h b/migration/tls.h
> index 5797d153cb..58b25e1228 100644
> --- a/migration/tls.h
> +++ b/migration/tls.h
> @@ -36,7 +36,7 @@ void migration_tls_channel_connect(MigrationState *s,
> QIOChannel *ioc,
> const char *hostname,
> Error **errp);
> -
> +void migration_tls_channel_end(QIOChannel *ioc, Error **errp);
>  /* Whether the QIO channel requires further TLS handshake? */
>  bool migrate_channel_requires_tls_upgrade(QIOChannel *ioc);
>  
> -- 
> 2.35.3
> 

-- 
Peter Xu

Re: [RFC PATCH v2 4/8] migration: Check migration error after loadvm

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 11:27:54AM -0300, Fabiano Rosas wrote:
> We're currently only checking the QEMUFile error after
> qemu_loadvm_state(). Check the migration error as well to avoid
> missing errors that might be set by the multifd recv thread.
> 
> This doesn't break compat between 9.2 and 10.0 because 9.2 still has
> the multifd recv threads stuck at sync when the source channel shuts
> down. I.e. it doesn't have commit 1d457daf86 ("migration/multifd:
> Further remove the SYNC on complete"). QEMU versions with that commit
> will have compat broken with versions containing this commit. This is
> not an issue because both will be present in 10.0, but development
> trees might see a migration error.
> 
> Signed-off-by: Fabiano Rosas 
> ---
>  migration/savevm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index bc375db282..4046faf009 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2940,7 +2940,11 @@ int qemu_loadvm_state(QEMUFile *f)
>  
>  /* When reaching here, it must be precopy */
>  if (ret == 0) {
> -ret = qemu_file_get_error(f);
> +if (migrate_has_error(migrate_get_current())) {
> +ret = -EINVAL;
> +} else {
> +ret = qemu_file_get_error(f);
> +}
>  }

IIUC this one needs to be after the patch that allows pre-mature
terminations from old qemus?

-- 
Peter Xu

Re: [PATCH v7 3/6] accel/kvm: Report the loss of a large memory page

2025-02-07 Thread William Roche


On 2/5/25 18:07, Peter Xu wrote:

On Wed, Feb 05, 2025 at 05:27:13PM +0100, William Roche wrote:

[...]
The HMP command "info ramblock" is implemented with the ram_block_format()
function which returns a message buffer built with a string for each
ramblock (protected by the RCU_READ_LOCK_GUARD). Our new function copies a
struct with the necessary information.

Relaying on the buffer format to retrieve the information doesn't seem
reasonable, and more importantly, this buffer doesn't provide all the needed
data, like fd and fd_offset.

I would say that ram_block_format() and qemu_ram_block_info_from_addr()
serve 2 different goals.

(a reimplementation of ram_block_format() with an adapted version of
qemu_ram_block_info_from_addr() taking the extra information needed could be
doable for example, but may not be worth doing for now)


IIUC admin should be aware of fd_offset because the admin should be fully
aware of the start offset of FDs to specify in qemu cmdlines, or in
Libvirt. But yes, we can always add fd_offset into ram_block_format() if
it's helpful.

Besides, the existing issues on this patch:

   - From outcome of this patch, it introduces one ramblock API (which is ok
 to me, so far), to do some error_report()s.  It looks pretty much for
 debugging rather than something serious (e.g. report via QMP queries,
 QMP events etc.).  From debug POV, I still don't see why this is
 needed.. per discussed above.


The reason why I want to inform the user of a large memory failure more 
specifically than a standard sized page loss is because of the 
significant behavior difference: Our current implementation can 
transparently handle many situations without necessarily leading the VM 
to a crash. But when it comes to large pages, there is no mechanism to 
inform the VM of a large memory loss, and usually this situation leads 
the VM to crash, and can also generate some weird situations like qemu 
itself crashing or a loop of errors, for example.


So having a message informing of such a memory loss can help to 
understand a more radical VM or qemu behavior -- it increases the 
diagnosability of our code.


To verify that a SIGBUS appeared because of a large page loss, we 
currently need to verify the targeted memory block backend page_size.
We should usually get this information from the SIGBUS siginfo data 
(with a si_addr_lsb field giving an indication of the page size) but a 
KVM weakness with a hardcoded si_addr_lsb=PAGE_SHIFT value in the SIGBUS 
siginfo returned from the kernel prevents that: See 
kvm_send_hwpoison_signal() function.


So I first wrote a small API addition called 
qemu_ram_pagesize_from_addr() to retrieve only this page_size value from 
the impacted address; and later on, this function turned into the richer 
qemu_ram_block_info_from_addr() function to have the generated messages 
match the existing memory messages as rightly requested by David.


So the main reason is a KVM "weakness" with kvm_send_hwpoison_signal(), 
and the second reason is to have richer error messages.





   - From merge POV, this patch isn't a pure memory change, so I'll need to
 get ack from other maintainers, at least that should be how it works..


I agree :)



I feel like when hwpoison becomes a serious topic, we need some more
serious reporting facility than error reports.  So that we could have this
as separate topic to be revisited.  It might speed up your prior patches
from not being blocked on this.


I explained why I think that error messages are important, but I don't 
want to get blocked on fixing the hugepage memory recovery because of that.


If you think that not displaying a specific message for large page loss 
can help to get the recovery fixed, than I can change my proposal to do so.


Early next week, I'll send a simplified version of my first 3 patches 
without this specific messages and without the preallocation handling in 
all remap cases, so you can evaluate this possibility.


Thank again for your feedback
William.

Re: [RFC PATCH v2 8/8] migration/multifd: Add a compat property for TLS termination

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 11:27:58AM -0300, Fabiano Rosas wrote:
> We're currently changing the way the source multifd migration handles
> the shutdown of the multifd channels when TLS is in use to perform a
> clean termination by calling gnutls_bye().
> 
> Older src QEMUs will always close the channel without terminating the
> TLS session. New dst QEMUs treat an unclean termination as an
> error. Due to synchronization conditions, src QEMUs 9.1 and 9.2 are an
> exception and can put the destination in a condition where it ignores
> the unclean termination. For src QEMUs older than 9.1, we'll need a
> compat property on the destination to inform that the src does not
> terminate the TLS session.
> 
> Add multifd_clean_tls_termination (default true) that can be switched
> on the destination whenever a src QEMU <9.1 is in use.

Patch looks good.  Though did you forget to add the compat entry?

I suggest we add it for all pre-9.2, in case whoever backports the recent
changes so it re-exposes again in any distro stables.

IMHO it doesn't hurt us much to be always cautious on 9.1 and 9.2 too by
loosing the termination a bit.

> 
> Signed-off-by: Fabiano Rosas 
> ---
>  migration/migration.h | 33 +
>  migration/multifd.c   |  8 +++-
>  migration/multifd.h   |  2 ++
>  migration/options.c   |  2 ++
>  4 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.h b/migration/migration.h
> index 4c1fafc2b5..77def0b437 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -443,6 +443,39 @@ struct MigrationState {
>   * Default value is false. (since 8.1)
>   */
>  bool multifd_flush_after_each_section;
> +
> +/*
> + * This variable only makes sense when set on the machine that is
> + * the destination of a multifd migration with TLS enabled. It
> + * affects the behavior of the last send->recv iteration with
> + * regards to termination of the TLS session.
> + *
> + * When set:
> + *
> + * - the destination QEMU instance can expect to never get a
> + *   GNUTLS_E_PREMATURE_TERMINATION error. Manifested as the error
> + *   message: "The TLS connection was non-properly terminated".
> + *
> + * When clear:
> + *
> + * - the destination QEMU instance can expect to see a
> + *   GNUTLS_E_PREMATURE_TERMINATION error in any multifd channel
> + *   whenever the last recv() call of that channel happens after
> + *   the source QEMU instance has already issued shutdown() on the
> + *   channel.
> + *
> + *   Commit 637280aeb2 (since 9.1) introduced a side effect that
> + *   causes the destination instance to not be affected by the
> + *   premature termination, while commit 1d457daf86 (since 10.0)
> + *   causes the premature termination condition to be once again
> + *   reachable.
> + *
> + * NOTE: Regardless of the state of this option, a premature
> + * termination of the TLS connection might happen due to error at
> + * any moment prior to the last send->recv iteration.
> + */
> +bool multifd_clean_tls_termination;
> +
>  /*
>   * This decides the size of guest memory chunk that will be used
>   * to track dirty bitmap clearing.  The size of memory chunk will
> diff --git a/migration/multifd.c b/migration/multifd.c
> index b4f82b0893..4342399818 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1147,6 +1147,7 @@ void multifd_recv_sync_main(void)
>  
>  static void *multifd_recv_thread(void *opaque)
>  {
> +MigrationState *s = migrate_get_current();
>  MultiFDRecvParams *p = opaque;
>  Error *local_err = NULL;
>  bool use_packets = multifd_use_packets();
> @@ -1155,6 +1156,10 @@ static void *multifd_recv_thread(void *opaque)
>  trace_multifd_recv_thread_start(p->id);
>  rcu_register_thread();
>  
> +if (!s->multifd_clean_tls_termination) {
> +p->read_flags = QIO_CHANNEL_READ_FLAG_RELAXED_EOF;
> +}
> +
>  while (true) {
>  uint32_t flags = 0;
>  bool has_data = false;
> @@ -1166,7 +1171,8 @@ static void *multifd_recv_thread(void *opaque)
>  }
>  
>  ret = qio_channel_read_all_eof(p->c, (void *)p->packet,
> -   p->packet_len, 0, &local_err);
> +   p->packet_len, p->read_flags,
> +   &local_err);
>  if (!ret) {
>  /* EOF */
>  assert(!local_err);
> diff --git a/migration/multifd.h b/migration/multifd.h
> index bd785b9873..cf408ff721 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -244,6 +244,8 @@ typedef struct {
>  uint32_t zero_num;
>  /* used for de-compression methods */
>  void *compress_data;
> +/* Flags for the QIOChannel */
> +int read_flags;
>  } MultiFDRecvParams;
>  
>  typedef struct {
>

Re: [PATCH v2 01/17] tests/docker: replicate the check-rust-tools-nightly CI job

2025-02-07 Thread Richard Henderson


On 2/7/25 07:30, Alex Bennée wrote:

This allows people to run the test locally:

   make docker-test-rust@fedora-rust-nightly

Signed-off-by: Alex Bennée

---
v2
   - update MAINTAINERS
---
  MAINTAINERS   |  1 +
  tests/docker/Makefile.include |  3 +++
  tests/docker/test-rust| 21 +
  3 files changed, 25 insertions(+)
  create mode 100755 tests/docker/test-rust


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 4/4] vfio/igd: sync GPU generation with i915 kernel driver

2025-02-07 Thread Tomita Moeko



On 2/6/25 20:13, Corvin Köhne wrote:
> From: Corvin Köhne 
> 
> We're currently missing some GPU IDs already supported by the i915
> kernel driver. Additionally, we've treated IvyBridge as gen 6 in the
> past. According to i915 it's gen 7 [1]. It shouldn't cause any issues
> yet because we treat gen 6 and gen 7 the same way. Nevertheless, we
> should use the correct generation to avoid any confusion.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/i915_pci.c?h=v6.13#n330
> 
> Signed-off-by: Corvin Köhne 
> ---
>  hw/vfio/igd.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> index e5d7006ce2..7bbf018efc 100644
> --- a/hw/vfio/igd.c
> +++ b/hw/vfio/igd.c
> @@ -64,7 +64,7 @@ struct igd_device {
>  
>  static const struct igd_device igd_devices[] = {
>  INTEL_SNB_IDS(IGD_DEVICE, 6),
> -INTEL_IVB_IDS(IGD_DEVICE, 6),
> +INTEL_IVB_IDS(IGD_DEVICE, 7),
>  INTEL_HSW_IDS(IGD_DEVICE, 7),
>  INTEL_VLV_IDS(IGD_DEVICE, 7),
>  INTEL_BDW_IDS(IGD_DEVICE, 8),
> @@ -73,8 +73,10 @@ static const struct igd_device igd_devices[] = {
>  INTEL_BXT_IDS(IGD_DEVICE, 9),
>  INTEL_KBL_IDS(IGD_DEVICE, 9),
>  INTEL_CFL_IDS(IGD_DEVICE, 9),
> +INTEL_WHL_IDS(IGD_DEVICE, 9),
>  INTEL_CML_IDS(IGD_DEVICE, 9),
>  INTEL_GLK_IDS(IGD_DEVICE, 9),
> +INTEL_CNL_IDS(IGD_DEVICE, 9),
>  INTEL_ICL_IDS(IGD_DEVICE, 11),
>  INTEL_EHL_IDS(IGD_DEVICE, 11),
>  INTEL_JSL_IDS(IGD_DEVICE, 11),
> @@ -86,6 +88,8 @@ static const struct igd_device igd_devices[] = {
>  INTEL_RPLS_IDS(IGD_DEVICE, 12),
>  INTEL_RPLU_IDS(IGD_DEVICE, 12),
>  INTEL_RPLP_IDS(IGD_DEVICE, 12),
> +INTEL_ARL_IDS(IGD_DEVICE, 12),
> +INTEL_MTL_IDS(IGD_DEVICE, 12),

According to i915 driver [1], DSM becomes a part of BAR 2 in MTL/ARL.
All accesses to DSM from CPU should be via BAR I think. BARs are
mapped in guest address space to host address space by QEMU when
passthrough, as a common behavior, just like normal discrete GPUs.

Though IGD takes a memory region as DSM, it should be reserved by
firmware and not directly accessible by host also, like GTT memory,
since arch/x86/kernel/early-quirks.c no longer reserves DSM for MTL/
ARL.

Appling the BDSM quirk would bring issues to MTL/ARL. Probably there
is no special workarounds needed for MTL and later IGD devices. But
intel hasn't made the MTL/ARL/LNL datasheet publicly available yet,
I can not confirm it :( If Intel really decided to not using BDSM on
MTL+, we can just have a fixed id list for igd devices.

[1] 
https://github.com/torvalds/linux/blob/69b8923f5003664e3ffef102e7edfa2abdcf/drivers/gpu/drm/i915/gem/i915_gem_stolen.c#L918

>  };
>  
>  /*

Re: [RFC PATCH v2 1/8] crypto: Allow gracefully ending the TLS session

2025-02-07 Thread Peter Xu

On Fri, Feb 07, 2025 at 02:55:57PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Fri, Feb 07, 2025 at 11:27:51AM -0300, Fabiano Rosas wrote:
> >> QEMU's TLS session code provides no way to call gnutls_bye() to
> >> terminate a TLS session. Callers of qcrypto_tls_session_read() can
> >> choose to ignore a GNUTLS_E_PREMATURE_TERMINATION error by setting the
> >> gracefulTermination argument.
> >> 
> >> The QIOChannelTLS ignores the premature termination error whenever
> >> shutdown() has already been issued. This is not enough anymore for the
> >> migration code due to changes [1] in the synchronization between
> >> migration source and destination.
> >
> > This sentence seems to say commit [1] changed something on the tls
> > condition, but IMHO fundamentally the issue is multifd recv thread model
> > that relies on blocking readv() rather than request-based (like what src
> > multifd does).
> >
> > Now src uses either shutdown() or close() to kick dest multifd recv threads
> > out from readv().  That has nothing to do with what we do during complete()
> > with those sync messages.. referencing it is ok, but we'll need to
> > reference also the other commit to be clear pre-9.0 can also be prone to
> > this.  To me, it's more important to mention the root cause on the multifd
> > recv thread model, which requires explicit tls terminations.
> >
> 
> I didn't want to go into too much detail in a commit for crypto/. The

You already did so by referencing a multifd commit that changes how
complete() works!

> motivation for *this* patch is just: migration needs it. What about:
> 
>  The QIOChannelTLS ignores the premature termination error whenever
>  shutdown() has already been issued. This was found to be not enough for
>  the migration code because shutdown() might not have been issued before
>  the connection is terminated.

Looks good to me, thanks.

-- 
Peter Xu

Re: [PATCH v4 17/33] migration/multifd: Make MultiFDSendData a struct

2025-02-07 Thread Maciej S. Szmigiero


On 7.02.2025 15:36, Fabiano Rosas wrote:

"Maciej S. Szmigiero"  writes:


From: Peter Xu 

The newly introduced device state buffer can be used for either storing
VFIO's read() raw data, but already also possible to store generic device
states.  After noticing that device states may not easily provide a max
buffer size (also the fact that RAM MultiFDPages_t after all also want to
have flexibility on managing offset[] array), it may not be a good idea to
stick with union on MultiFDSendData.. as it won't play well with such
flexibility.

Switch MultiFDSendData to a struct.

It won't consume a lot more space in reality, after all the real buffers
were already dynamically allocated, so it's so far only about the two
structs (pages, device_state) that will be duplicated, but they're small.

With this, we can remove the pretty hard to understand alloc size logic.
Because now we can allocate offset[] together with the SendData, and
properly free it when the SendData is freed.

Signed-off-by: Peter Xu 
[MSS: Make sure to clear possible device state payload before freeing
MultiFDSendData, remove placeholders for other patches not included]
Signed-off-by: Maciej S. Szmigiero 
---
  migration/multifd-device-state.c |  5 -
  migration/multifd-nocomp.c   | 13 ++---
  migration/multifd.c  | 25 +++--
  migration/multifd.h  | 15 +--
  4 files changed, 22 insertions(+), 36 deletions(-)

diff --git a/migration/multifd-device-state.c b/migration/multifd-device-state.c
index 2207bea9bf8a..d1674b432ff2 100644
--- a/migration/multifd-device-state.c
+++ b/migration/multifd-device-state.c
@@ -16,11 +16,6 @@ static QemuMutex queue_job_mutex;
  
  static MultiFDSendData *device_state_send;
  
-size_t multifd_device_state_payload_size(void)

-{
-return sizeof(MultiFDDeviceState_t);
-}
-
  void multifd_device_state_send_setup(void)
  {
  qemu_mutex_init(&queue_job_mutex);
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index c00804652383..ffe75256c9fb 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -25,15 +25,14 @@
  
  static MultiFDSendData *multifd_ram_send;
  
-size_t multifd_ram_payload_size(void)

+void multifd_ram_payload_alloc(MultiFDPages_t *pages)
  {
-uint32_t n = multifd_ram_page_count();
+pages->offset = g_new0(ram_addr_t, multifd_ram_page_count());
+}
  
-/*

- * We keep an array of page offsets at the end of MultiFDPages_t,
- * add space for it in the allocation.
- */
-return sizeof(MultiFDPages_t) + n * sizeof(ram_addr_t);
+void multifd_ram_payload_free(MultiFDPages_t *pages)
+{
+g_clear_pointer(&pages->offset, g_free);
  }
  
  void multifd_ram_save_setup(void)

diff --git a/migration/multifd.c b/migration/multifd.c
index 61b061a33d35..0b61b8192231 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -105,26 +105,12 @@ struct {
  
  MultiFDSendData *multifd_send_data_alloc(void)

  {
-size_t max_payload_size, size_minus_payload;
+MultiFDSendData *new = g_new0(MultiFDSendData, 1);
  
-/*

- * MultiFDPages_t has a flexible array at the end, account for it
- * when allocating MultiFDSendData. Use max() in case other types
- * added to the union in the future are larger than
- * (MultiFDPages_t + flex array).
- */
-max_payload_size = MAX(multifd_ram_payload_size(),
-   multifd_device_state_payload_size());
-max_payload_size = MAX(max_payload_size, sizeof(MultiFDPayload));
-
-/*
- * Account for any holes the compiler might insert. We can't pack
- * the structure because that misaligns the members and triggers
- * Waddress-of-packed-member.
- */
-size_minus_payload = sizeof(MultiFDSendData) - sizeof(MultiFDPayload);
+multifd_ram_payload_alloc(&new->u.ram);
+/* Device state allocates its payload on-demand */
  
-return g_malloc0(size_minus_payload + max_payload_size);

+return new;
  }
  
  void multifd_send_data_clear(MultiFDSendData *data)

@@ -151,8 +137,11 @@ void multifd_send_data_free(MultiFDSendData *data)
  return;
  }
  
+/* This also free's device state payload */

  multifd_send_data_clear(data);
  
+multifd_ram_payload_free(&data->u.ram);

+


Shouldn't this be added to the switch statement at
multifd_send_data_clear() instead?


I think the intention is that RAM pages are allocated at MultiFDSendData
instance allocation time and stay allocated for its entire lifetime -
because we know RAM pages packet data size upfront and also that's what
multifd send threads will be mostly sending.

In contrast with RAM, device state allocates its payload on-demand since
its size is unknown and can vary between each multifd_queue_device_state()
invocation. This payload is free'd after it gets send by a multifd send
thread.

There's even a comment about this in multifd_send_data_alloc():

multifd_ram_payload_alloc(&new->u.ram);

Re: [RFC PATCH v2 0/8] crypto,io,migration: Add support to gnutls_bye()

2025-02-07 Thread Maciej S. Szmigiero


On 7.02.2025 15:27, Fabiano Rosas wrote:

v2:

Added the premature_ok logic;
Added compat property for QEMU <9.1;
Refactored the existing handshake code;

CI run:
https://gitlab.com/farosas/qemu/-/pipelines/1660800456

v1:
https://lore.kernel.org/r/20250206175824.22664-1-faro...@suse.de

Hi,

We've been discussing a way to stop multifd recv threads from getting
an error at the end of migration when the source threads close the
iochannel without ending the TLS session.

The original issue was introduced by commit 1d457daf86
("migration/multifd: Further remove the SYNC on complete") which
altered the synchronization of the source and destination in a manner
that causes the destination to already be waiting at recv() when the
source closes the connection.

One approach would be to issue gnutls_bye() at the source after all
the data has been sent. The destination would then gracefully exit
when it gets EOF.

Aside from stopping the recv thread from seeing an error, this also
creates a contract that all connections should be closed only after
the TLS session is ended. This helps to avoid masking a legitimate
issue where the connection is closed prematurely.



I've rebased my patch set on top of this version and can confirm
it works too (with respect to VFIO migration and QEMU tests).

The updated series is available at its usual place.

Thanks,
Maciej

[PATCH 2/3] hw/loongarch/virt: Rename function prefix name

2025-02-07 Thread Bibo Mao

Replace function prefix name loongarch_xxx with virt_xxx in file
virt-acpi-build.c

Signed-off-by: Bibo Mao 
---
 hw/loongarch/virt-acpi-build.c | 6 +++---
 hw/loongarch/virt.c| 2 +-
 include/hw/loongarch/virt.h| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/loongarch/virt-acpi-build.c b/hw/loongarch/virt-acpi-build.c
index fdd62acf7e..9ca88d63ae 100644
--- a/hw/loongarch/virt-acpi-build.c
+++ b/hw/loongarch/virt-acpi-build.c
@@ -656,7 +656,7 @@ static const VMStateDescription vmstate_acpi_build = {
 },
 };
 
-static bool loongarch_is_acpi_enabled(LoongArchVirtMachineState *lvms)
+static bool virt_is_acpi_enabled(LoongArchVirtMachineState *lvms)
 {
 if (lvms->acpi == ON_OFF_AUTO_OFF) {
 return false;
@@ -664,7 +664,7 @@ static bool 
loongarch_is_acpi_enabled(LoongArchVirtMachineState *lvms)
 return true;
 }
 
-void loongarch_acpi_setup(LoongArchVirtMachineState *lvms)
+void virt_acpi_setup(LoongArchVirtMachineState *lvms)
 {
 AcpiBuildTables tables;
 AcpiBuildState *build_state;
@@ -674,7 +674,7 @@ void loongarch_acpi_setup(LoongArchVirtMachineState *lvms)
 return;
 }
 
-if (!loongarch_is_acpi_enabled(lvms)) {
+if (!virt_is_acpi_enabled(lvms)) {
 ACPI_BUILD_DPRINTF("ACPI disabled. Bailing out.\n");
 return;
 }
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 63fa0f4e32..82d840d93f 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -686,7 +686,7 @@ static void virt_done(Notifier *notifier, void *data)
 LoongArchVirtMachineState *lvms = container_of(notifier,
   LoongArchVirtMachineState, machine_done);
 virt_build_smbios(lvms);
-loongarch_acpi_setup(lvms);
+virt_acpi_setup(lvms);
 virt_fdt_setup(lvms);
 }
 
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 9ba47793ef..062f63d874 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -64,5 +64,5 @@ struct LoongArchVirtMachineState {
 
 #define TYPE_LOONGARCH_VIRT_MACHINE  MACHINE_TYPE_NAME("virt")
 OBJECT_DECLARE_SIMPLE_TYPE(LoongArchVirtMachineState, LOONGARCH_VIRT_MACHINE)
-void loongarch_acpi_setup(LoongArchVirtMachineState *lvms);
+void virt_acpi_setup(LoongArchVirtMachineState *lvms);
 #endif
-- 
2.39.3

[PATCH 3/3] hw/loongarch/virt: Add separate file for fdt building

2025-02-07 Thread Bibo Mao

Similiar with virt-acpi-build.c, file virt-fdt-build.c is added here.
And move functions relative with fdt table building to the file.

It is only code movement and there is no function change.

Signed-off-by: Bibo Mao 
---
 hw/loongarch/meson.build  |   4 +-
 hw/loongarch/virt-fdt-build.c | 535 ++
 hw/loongarch/virt.c   | 524 -
 include/hw/loongarch/virt.h   |   1 +
 4 files changed, 539 insertions(+), 525 deletions(-)
 create mode 100644 hw/loongarch/virt-fdt-build.c

diff --git a/hw/loongarch/meson.build b/hw/loongarch/meson.build
index 3f020de7dc..d494d1e283 100644
--- a/hw/loongarch/meson.build
+++ b/hw/loongarch/meson.build
@@ -3,7 +3,9 @@ loongarch_ss.add(files(
 'boot.c',
 ))
 common_ss.add(when: 'CONFIG_LOONGARCH_VIRT', if_true: files('fw_cfg.c'))
-loongarch_ss.add(when: 'CONFIG_LOONGARCH_VIRT', if_true: files('virt.c'))
+loongarch_ss.add(when: 'CONFIG_LOONGARCH_VIRT', if_true: files(
+  'virt-fdt-build.c',
+  'virt.c'))
 loongarch_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
 
 hw_arch += {'loongarch': loongarch_ss}
diff --git a/hw/loongarch/virt-fdt-build.c b/hw/loongarch/virt-fdt-build.c
new file mode 100644
index 00..dbc269afba
--- /dev/null
+++ b/hw/loongarch/virt-fdt-build.c
@@ -0,0 +1,535 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2025 Loongson Technology Corporation Limited
+ */
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/guest-random.h"
+#include 
+#include "hw/acpi/generic_event_device.h"
+#include "hw/core/sysbus-fdt.h"
+#include "hw/intc/loongarch_extioi.h"
+#include "hw/loader.h"
+#include "hw/loongarch/virt.h"
+#include "hw/pci-host/gpex.h"
+#include "hw/pci-host/ls7a.h"
+#include "system/device_tree.h"
+#include "system/reset.h"
+#include "target/loongarch/cpu.h"
+
+static void create_fdt(LoongArchVirtMachineState *lvms)
+{
+MachineState *ms = MACHINE(lvms);
+uint8_t rng_seed[32];
+
+ms->fdt = create_device_tree(&lvms->fdt_size);
+if (!ms->fdt) {
+error_report("create_device_tree() failed");
+exit(1);
+}
+
+/* Header */
+qemu_fdt_setprop_string(ms->fdt, "/", "compatible",
+"linux,dummy-loongson3");
+qemu_fdt_setprop_cell(ms->fdt, "/", "#address-cells", 0x2);
+qemu_fdt_setprop_cell(ms->fdt, "/", "#size-cells", 0x2);
+qemu_fdt_add_subnode(ms->fdt, "/chosen");
+
+/* Pass seed to RNG */
+qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
+qemu_fdt_setprop(ms->fdt, "/chosen", "rng-seed", rng_seed, 
sizeof(rng_seed));
+}
+
+static void fdt_add_cpu_nodes(const LoongArchVirtMachineState *lvms)
+{
+int num;
+MachineState *ms = MACHINE(lvms);
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+const CPUArchIdList *possible_cpus;
+LoongArchCPU *cpu;
+CPUState *cs;
+char *nodename, *map_path;
+
+qemu_fdt_add_subnode(ms->fdt, "/cpus");
+qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#address-cells", 0x1);
+qemu_fdt_setprop_cell(ms->fdt, "/cpus", "#size-cells", 0x0);
+
+/* cpu nodes */
+possible_cpus = mc->possible_cpu_arch_ids(ms);
+for (num = 0; num < possible_cpus->len; num++) {
+cs = possible_cpus->cpus[num].cpu;
+if (cs == NULL) {
+continue;
+}
+
+nodename = g_strdup_printf("/cpus/cpu@%d", num);
+cpu = LOONGARCH_CPU(cs);
+
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "cpu");
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible",
+cpu->dtb_compatible);
+if (possible_cpus->cpus[num].props.has_node_id) {
+qemu_fdt_setprop_cell(ms->fdt, nodename, "numa-node-id",
+possible_cpus->cpus[num].props.node_id);
+}
+qemu_fdt_setprop_cell(ms->fdt, nodename, "reg", num);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "phandle",
+  qemu_fdt_alloc_phandle(ms->fdt));
+g_free(nodename);
+}
+
+/*cpu map */
+qemu_fdt_add_subnode(ms->fdt, "/cpus/cpu-map");
+for (num = 0; num < possible_cpus->len; num++) {
+cs = possible_cpus->cpus[num].cpu;
+if (cs == NULL) {
+continue;
+}
+
+nodename = g_strdup_printf("/cpus/cpu@%d", num);
+if (ms->smp.threads > 1) {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/core%d/thread%d",
+num / (ms->smp.cores * ms->smp.threads),
+(num / ms->smp.threads) % ms->smp.cores,
+num % ms->smp.threads);
+} else {
+map_path = g_strdup_printf(
+"/cpus/cpu-map/socket%d/core%d",
+num / ms->smp.cores,
+num % ms->smp.cores);
+}
+qemu_fdt_add_path(ms->fdt, map_path);
+qemu_fdt_setprop_phandle(ms->fdt, map_pa

[PATCH 0/3] hw/loongarch/virt: Code cleanup

2025-02-07 Thread Bibo Mao

This is code cleanup with loongArch virt machine type. One separate file
is added for fdt table building, also rename file acpi-build with
virt-acpi-build.

It is only cod movement and function rename. There is no any function
change.

Bibo Mao (3):
  hw/loongarch/virt: Rename filename acpi-build with virt-acpi-build
  hw/loongarch/virt: Rename function prefix name
  hw/loongarch/virt: Add separate file for fdt building

 hw/loongarch/meson.build  |   6 +-
 .../{acpi-build.c => virt-acpi-build.c}   |   6 +-
 hw/loongarch/virt-fdt-build.c | 535 ++
 hw/loongarch/virt.c   | 526 +
 include/hw/loongarch/virt.h   |   3 +-
 5 files changed, 545 insertions(+), 531 deletions(-)
 rename hw/loongarch/{acpi-build.c => virt-acpi-build.c} (99%)
 create mode 100644 hw/loongarch/virt-fdt-build.c


base-commit: 131c58469f6fb68c89b38fee6aba8bbb20c7f4bf
-- 
2.39.3

[PATCH 1/3] hw/loongarch/virt: Rename filename acpi-build with virt-acpi-build

2025-02-07 Thread Bibo Mao

File acpi-build.c is relative with virt machine type, rename it with
virt-acpi-build.c

Signed-off-by: Bibo Mao 
---
 hw/loongarch/meson.build | 2 +-
 hw/loongarch/{acpi-build.c => virt-acpi-build.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename hw/loongarch/{acpi-build.c => virt-acpi-build.c} (100%)

diff --git a/hw/loongarch/meson.build b/hw/loongarch/meson.build
index 005f017e21..3f020de7dc 100644
--- a/hw/loongarch/meson.build
+++ b/hw/loongarch/meson.build
@@ -4,6 +4,6 @@ loongarch_ss.add(files(
 ))
 common_ss.add(when: 'CONFIG_LOONGARCH_VIRT', if_true: files('fw_cfg.c'))
 loongarch_ss.add(when: 'CONFIG_LOONGARCH_VIRT', if_true: files('virt.c'))
-loongarch_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-build.c'))
+loongarch_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
 
 hw_arch += {'loongarch': loongarch_ss}
diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/virt-acpi-build.c
similarity index 100%
rename from hw/loongarch/acpi-build.c
rename to hw/loongarch/virt-acpi-build.c
-- 
2.39.3

[PATCH v3] hw/arm/virt: Support larger highmem MMIO regions

2025-02-07 Thread Matthew R. Ochs

The MMIO region size required to support virtualized environments with
large PCI BAR regions can exceed the hardcoded limit configured in QEMU.
For example, a VM with multiple NVIDIA Grace-Hopper GPUs passed through
requires more MMIO memory than the amount provided by VIRT_HIGH_PCIE_MMIO
(currently 512GB). Instead of updating VIRT_HIGH_PCIE_MMIO, introduce a
new parameter, highmem-mmio-size, that specifies the MMIO size required
to support the VM configuration.

Example usage with 1TB MMIO region size:
-machine virt,gic-version=3,highmem-mmio-size=1T

Signed-off-by: Matthew R. Ochs 
Reviewed-by: Gavin Shan 
---
v3: - Updated highmem-mmio-size description
v2: - Add unit suffix to example in commit message
- Use existing "high memory region" terminology
- Resolve minor braces nit

 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 38 ++
 2 files changed, 42 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index e67e7f0f7c50..20b14c22b659 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -138,6 +138,10 @@ highmem-mmio
   Set ``on``/``off`` to enable/disable the high memory region for PCI MMIO.
   The default is ``on``.
 
+highmem-mmio-size
+  Set the high memory region size for PCI MMIO. Must be a power-of-2 and
+  greater than or equal to the default size.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 49eb0355ef0c..d8d62df43f04 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2773,6 +2773,36 @@ static void virt_set_highmem_mmio(Object *obj, bool 
value, Error **errp)
 vms->highmem_mmio = value;
 }
 
+static void virt_get_highmem_mmio_size(Object *obj, Visitor *v, const char 
*name,
+  void *opaque, Error **errp)
+{
+uint64_t size = extended_memmap[VIRT_HIGH_PCIE_MMIO].size;
+
+visit_type_size(v, name, &size, errp);
+}
+
+static void virt_set_highmem_mmio_size(Object *obj, Visitor *v, const char 
*name,
+  void *opaque, Error **errp)
+{
+uint64_t size;
+
+if (!visit_type_size(v, name, &size, errp)) {
+return;
+}
+
+if (!is_power_of_2(size)) {
+error_setg(errp, "highmem_mmio_size is not a power-of-2");
+return;
+}
+
+if (size < extended_memmap[VIRT_HIGH_PCIE_MMIO].size) {
+error_setg(errp, "highmem_mmio_size is less than the default (%lu)",
+   extended_memmap[VIRT_HIGH_PCIE_MMIO].size);
+return;
+}
+
+extended_memmap[VIRT_HIGH_PCIE_MMIO].size = size;
+}
 
 static bool virt_get_its(Object *obj, Error **errp)
 {
@@ -3446,6 +3476,14 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable high "
   "memory region for PCI MMIO");
 
+object_class_property_add(oc, "highmem-mmio-size", "size",
+   virt_get_highmem_mmio_size,
+   virt_set_highmem_mmio_size,
+   NULL, NULL);
+object_class_property_set_description(oc, "highmem-mmio-size",
+  "Set the high memory region size "
+  "for PCI MMIO");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
-- 
2.46.0

[PATCH v4 1/2] s390x/pci: add support for guests that request direct mapping

2025-02-07 Thread Matthew Rosato

When receiving a guest mpcifc(4) or mpcifc(6) instruction without the T
bit set, treat this as a request to perform direct mapping instead of
address translation.  In order to facilitate this, pin the entirety of
guest memory into the host iommu.

Pinning for the direct mapping case is handled via vfio and its memory
listener.  Additionally, ram discard settings are inherited from vfio:
coordinated discards (e.g. virtio-mem) are allowed while uncoordinated
discards (e.g. virtio-balloon) are disabled.

Subsequent guest DMA operations are all expected to be of the format
guest_phys+sdma, allowing them to be used as lookup into the host
iommu table.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-bus.c | 38 +++--
 hw/s390x/s390-pci-inst.c| 13 +--
 hw/s390x/s390-pci-vfio.c| 23 
 hw/s390x/s390-virtio-ccw.c  |  5 +
 include/hw/s390x/s390-pci-bus.h |  4 
 5 files changed, 75 insertions(+), 8 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index eead269cc2..81e5843c81 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -18,6 +18,8 @@
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-kvm.h"
 #include "hw/s390x/s390-pci-vfio.h"
+#include "hw/s390x/s390-virtio-ccw.h"
+#include "hw/boards.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
@@ -720,16 +722,45 @@ void s390_pci_iommu_enable(S390PCIIOMMU *iommu)
  TYPE_S390_IOMMU_MEMORY_REGION, OBJECT(&iommu->mr),
  name, iommu->pal + 1);
 iommu->enabled = true;
+iommu->direct_map = false;
 memory_region_add_subregion(&iommu->mr, 0, 
MEMORY_REGION(&iommu->iommu_mr));
 g_free(name);
 }
 
+void s390_pci_iommu_direct_map_enable(S390PCIIOMMU *iommu)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+S390CcwMachineState *s390ms = S390_CCW_MACHINE(ms);
+
+/*
+ * For direct-mapping we must map the entire guest address space.  Rather
+ * than using an iommu, create a memory region alias that maps GPA X to
+ * IOVA X + SDMA.  VFIO will handle pinning via its memory listener.
+ */
+g_autofree char *name = g_strdup_printf("iommu-dm-s390-%04x",
+iommu->pbdev->uid);
+memory_region_init_alias(&iommu->dm_mr, OBJECT(&iommu->mr), name,
+ get_system_memory(), 0,
+ s390_get_memory_limit(s390ms));
+iommu->enabled = true;
+iommu->direct_map = true;
+memory_region_add_subregion(&iommu->mr, iommu->pbdev->zpci_fn.sdma,
+&iommu->dm_mr);
+}
+
 void s390_pci_iommu_disable(S390PCIIOMMU *iommu)
 {
 iommu->enabled = false;
 g_hash_table_remove_all(iommu->iotlb);
-memory_region_del_subregion(&iommu->mr, MEMORY_REGION(&iommu->iommu_mr));
-object_unparent(OBJECT(&iommu->iommu_mr));
+if (iommu->direct_map) {
+memory_region_del_subregion(&iommu->mr, &iommu->dm_mr);
+iommu->direct_map = false;
+object_unparent(OBJECT(&iommu->dm_mr));
+} else {
+memory_region_del_subregion(&iommu->mr,
+MEMORY_REGION(&iommu->iommu_mr));
+object_unparent(OBJECT(&iommu->iommu_mr));
+}
 }
 
 static void s390_pci_iommu_free(S390pciState *s, PCIBus *bus, int32_t devfn)
@@ -1130,6 +1161,7 @@ static void s390_pcihost_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 /* Always intercept emulated devices */
 pbdev->interp = false;
 pbdev->forwarding_assist = false;
+pbdev->rtr_avail = false;
 }
 
 if (s390_pci_msix_init(pbdev) && !pbdev->interp) {
@@ -1488,6 +1520,8 @@ static const Property s390_pci_device_properties[] = {
 DEFINE_PROP_BOOL("interpret", S390PCIBusDevice, interp, true),
 DEFINE_PROP_BOOL("forwarding-assist", S390PCIBusDevice, forwarding_assist,
  true),
+DEFINE_PROP_BOOL("relaxed-translation", S390PCIBusDevice, rtr_avail,
+ true),
 };
 
 static const VMStateDescription s390_pci_device_vmstate = {
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index e386d75d58..8cdeb6cb7f 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -16,6 +16,7 @@
 #include "exec/memory.h"
 #include "qemu/error-report.h"
 #include "system/hw_accel.h"
+#include "hw/boards.h"
 #include "hw/pci/pci_device.h"
 #include "hw/s390x/s390-pci-inst.h"
 #include "hw/s390x/s390-pci-bus.h"
@@ -1008,17 +1009,25 @@ static int reg_ioat(CPUS390XState *env, 
S390PCIBusDevice *pbdev, ZpciFib fib,
 }
 
 /* currently we only support designation type 1 with translation */
-if (!(dt == ZPCI_IOTA_RTTO && t)) {
+if (t && dt != ZPCI_IOTA_RTTO) {
 error_report("unsupported ioat dt %d t %d", dt, t);
 s390_program_interrupt(env, PGM_O

[PATCH v4 0/2] s390x/pci: relax I/O address translation requirement

2025-02-07 Thread Matthew Rosato

This series introduces the concept of the relaxed translation requirement
for s390x guests in order to allow bypass of the guest IOMMU for more
efficient PCI passthrough.

With this series, QEMU can indicate to the guest that an IOMMU is not
strictly required for a zPCI device.  This would subsequently allow a
guest linux to use iommu.passthrough=1 and bypass their guest IOMMU for
PCI devices.

When this occurs, QEMU will note the behavior via an intercepted MPCIFC
instruction and will fill the host iommu with mappings of the entire
guest address space in response.

There is a kernel series that adds the relevant behavior needed to
exploit this new feature from within a s390x linux guest.  Most
recent version of that is at [1].

[1]: 
https://lore.kernel.org/linux-s390/20250207205335.473946-1-mjros...@linux.ibm.com/

Changes for v4:
- use get_system_memory() instead of ms->ram
- rename rtr_allowed to rtr_avail
- turn off rtr_avail for emulated devices so MPCFIC fence properly
  rejects an attempt at direct mapping (we only advertise via CLP
  for passthrough devices)
- turn off rtr_avail for passthrough ISM devices
- various minor changes

Changes for v3:
- use s390_get_memory_limit
- advertise full aperture for relaxed-translation-capable devices

Changes for v2:
- Add relax-translation property, fence for older machines
- Add a new MPCIFC failure case when direct-mapping requested but
  the relax-translation property is set to off.
- For direct mapping, use a memory alias to handle the SMDA offset and
  then just let vfio handle the pinning of memory.

Matthew Rosato (2):
  s390x/pci: add support for guests that request direct mapping
  s390x/pci: indicate QEMU supports relaxed translation for passthrough

 hw/s390x/s390-pci-bus.c | 38 +++--
 hw/s390x/s390-pci-inst.c| 13 +--
 hw/s390x/s390-pci-vfio.c| 28 +++-
 hw/s390x/s390-virtio-ccw.c  |  5 +
 include/hw/s390x/s390-pci-bus.h |  4 
 include/hw/s390x/s390-pci-clp.h |  1 +
 6 files changed, 80 insertions(+), 9 deletions(-)

-- 
2.48.1

[PATCH v4 2/2] s390x/pci: indicate QEMU supports relaxed translation for passthrough

2025-02-07 Thread Matthew Rosato

Specifying this bit in the guest CLP response indicates that the guest
can optionally choose to skip translation and instead use
identity-mapped operations.

Signed-off-by: Matthew Rosato 
---
 hw/s390x/s390-pci-vfio.c| 5 -
 include/hw/s390x/s390-pci-clp.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-pci-vfio.c b/hw/s390x/s390-pci-vfio.c
index 443e222912..6236ac7f1e 100644
--- a/hw/s390x/s390-pci-vfio.c
+++ b/hw/s390x/s390-pci-vfio.c
@@ -238,8 +238,11 @@ static void s390_pci_read_group(S390PCIBusDevice *pbdev,
 pbdev->pci_group = s390_group_create(pbdev->zpci_fn.pfgid, start_gid);
 
 resgrp = &pbdev->pci_group->zpci_group;
+if (pbdev->rtr_avail) {
+resgrp->fr |= CLP_RSP_QPCIG_MASK_RTR;
+}
 if (cap->flags & VFIO_DEVICE_INFO_ZPCI_FLAG_REFRESH) {
-resgrp->fr = 1;
+resgrp->fr |= CLP_RSP_QPCIG_MASK_REFRESH;
 }
 resgrp->dasm = cap->dasm;
 resgrp->msia = cap->msi_addr;
diff --git a/include/hw/s390x/s390-pci-clp.h b/include/hw/s390x/s390-pci-clp.h
index 03b7f9ba5f..6a635d693b 100644
--- a/include/hw/s390x/s390-pci-clp.h
+++ b/include/hw/s390x/s390-pci-clp.h
@@ -158,6 +158,7 @@ typedef struct ClpRspQueryPciGrp {
 #define CLP_RSP_QPCIG_MASK_NOI 0xfff
 uint16_t i;
 uint8_t version;
+#define CLP_RSP_QPCIG_MASK_RTR 0x20
 #define CLP_RSP_QPCIG_MASK_FRAME   0x2
 #define CLP_RSP_QPCIG_MASK_REFRESH 0x1
 uint8_t fr;
-- 
2.48.1

[PATCH v5 0/4] virtio: Convert feature properties to OnOffAuto

2025-02-07 Thread Akihiko Odaki

This series was spun off from:
"[PATCH 0/3] virtio-net: Convert feature properties to OnOffAuto"
(https://patchew.org/QEMU/20240714-auto-v3-0-e27401aab...@daynix.com/)

Some features are not always available with vhost. Legacy features are
not available with vp_vdpa in particular. virtio devices used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

QEMU already has OnOffAuto type, which includes the "auto" value to let
it automatically decide the effective value. Convert feature properties
to OnOffAuto and set them "auto" by default to utilize it. This allows
QEMU to report an error if they are set "on" and the corresponding
features are not available.

Signed-off-by: Akihiko Odaki 
---
Changes in v5:
- Covered QAPI more than just qdev.
- Expanded the description of patch
  "qapi: Accept bool for OnOffAuto and OnOffSplit".
- Rebased.
- Link to v4: 
https://lore.kernel.org/r/20250108-virtio-v4-0-cbf0aa04c...@daynix.com

Changes in v4:
- Added patch "qapi: Do not consume a value if failed".
- Link to v3: 
https://lore.kernel.org/r/20250104-virtio-v3-0-63ef70e9d...@daynix.com

Changes in v3:
- Rebased.
- Link to v2: 
https://lore.kernel.org/r/20241022-virtio-v2-0-b2394236e...@daynix.com

Changes in v2:
- Expanded the message of patch "qdev-properties: Accept bool for
  OnOffAuto".
- Link to v1: 
https://lore.kernel.org/r/20241014-virtio-v1-0-e9ddf7a81...@daynix.com

---
Akihiko Odaki (4):
  qapi: Do not consume a value if failed
  qapi: Accept bool for OnOffAuto and OnOffSplit
  qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()
  virtio: Convert feature properties to OnOffAuto

 include/hw/qdev-properties.h |  18 
 include/hw/virtio/virtio.h   |  38 +---
 hw/core/machine.c|   1 +
 hw/core/qdev-properties.c|  83 +-
 hw/virtio/virtio-bus.c   |  14 +-
 hw/virtio/virtio.c   |   4 +-
 qapi/qobject-input-visitor.c | 103 +--
 scripts/qapi/visit.py|  24 ++
 8 files changed, 229 insertions(+), 56 deletions(-)
---
base-commit: 7433709a147706ad7d1956b15669279933d0f82b
change-id: 20241013-virtio-164ea3f295c3

Best regards,
-- 
Akihiko Odaki

[PATCH v5 1/4] qapi: Do not consume a value if failed

2025-02-07 Thread Akihiko Odaki

Do not consume a value if interpreting one failed so that we can
reinterpret the value with a different type.

Signed-off-by: Akihiko Odaki 
---
 qapi/qobject-input-visitor.c | 103 +--
 1 file changed, 69 insertions(+), 34 deletions(-)

diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index 
f110a804b2ae0f3f75122775ddbc5ec7cc5de230..799c1c9bd6bde0676d6b028b485de13cb4884395
 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -116,9 +116,8 @@ static const char *full_name(QObjectInputVisitor *qiv, 
const char *name)
 return full_name_nth(qiv, name, 0);
 }
 
-static QObject *qobject_input_try_get_object(QObjectInputVisitor *qiv,
- const char *name,
- bool consume)
+static QObject *qobject_input_try_get_object(const QObjectInputVisitor *qiv,
+ const char *name)
 {
 StackObject *tos;
 QObject *qobj;
@@ -138,34 +137,19 @@ static QObject 
*qobject_input_try_get_object(QObjectInputVisitor *qiv,
 if (qobject_type(qobj) == QTYPE_QDICT) {
 assert(name);
 ret = qdict_get(qobject_to(QDict, qobj), name);
-if (tos->h && consume && ret) {
-bool removed = g_hash_table_remove(tos->h, name);
-assert(removed);
-}
 } else {
 assert(qobject_type(qobj) == QTYPE_QLIST);
 assert(!name);
-if (tos->entry) {
-ret = qlist_entry_obj(tos->entry);
-if (consume) {
-tos->entry = qlist_next(tos->entry);
-}
-} else {
-ret = NULL;
-}
-if (consume) {
-tos->index++;
-}
+ret = tos->entry ? qlist_entry_obj(tos->entry) : NULL;
 }
 
 return ret;
 }
 
 static QObject *qobject_input_get_object(QObjectInputVisitor *qiv,
- const char *name,
- bool consume, Error **errp)
+ const char *name, Error **errp)
 {
-QObject *obj = qobject_input_try_get_object(qiv, name, consume);
+QObject *obj = qobject_input_try_get_object(qiv, name);
 
 if (!obj) {
 error_setg(errp, QERR_MISSING_PARAMETER, full_name(qiv, name));
@@ -173,6 +157,38 @@ static QObject 
*qobject_input_get_object(QObjectInputVisitor *qiv,
 return obj;
 }
 
+static void qobject_input_consume_object(QObjectInputVisitor *qiv,
+ const char *name)
+{
+StackObject *tos;
+QObject *qobj;
+
+if (QSLIST_EMPTY(&qiv->stack)) {
+/* Starting at root, name is ignored. */
+return;
+}
+
+/* We are in a container; find the next element. */
+tos = QSLIST_FIRST(&qiv->stack);
+qobj = tos->obj;
+assert(qobj);
+
+if (qobject_type(qobj) == QTYPE_QDICT) {
+assert(name);
+if (tos->h) {
+bool removed = g_hash_table_remove(tos->h, name);
+assert(removed);
+}
+} else {
+assert(qobject_type(qobj) == QTYPE_QLIST);
+assert(!name);
+if (tos->entry) {
+tos->entry = qlist_next(tos->entry);
+}
+tos->index++;
+}
+}
+
 static const char *qobject_input_get_keyval(QObjectInputVisitor *qiv,
 const char *name,
 Error **errp)
@@ -180,7 +196,7 @@ static const char 
*qobject_input_get_keyval(QObjectInputVisitor *qiv,
 QObject *qobj;
 QString *qstr;
 
-qobj = qobject_input_get_object(qiv, name, true, errp);
+qobj = qobject_input_get_object(qiv, name, errp);
 if (!qobj) {
 return NULL;
 }
@@ -233,6 +249,7 @@ static const QListEntry 
*qobject_input_push(QObjectInputVisitor *qiv,
 tos->index = -1;
 }
 
+qobject_input_consume_object(qiv, name);
 QSLIST_INSERT_HEAD(&qiv->stack, tos, node);
 return tos->entry;
 }
@@ -279,7 +296,7 @@ static bool qobject_input_start_struct(Visitor *v, const 
char *name, void **obj,
size_t size, Error **errp)
 {
 QObjectInputVisitor *qiv = to_qiv(v);
-QObject *qobj = qobject_input_get_object(qiv, name, true, errp);
+QObject *qobj = qobject_input_get_object(qiv, name, errp);
 
 if (obj) {
 *obj = NULL;
@@ -316,7 +333,7 @@ static bool qobject_input_start_list(Visitor *v, const char 
*name,
  Error **errp)
 {
 QObjectInputVisitor *qiv = to_qiv(v);
-QObject *qobj = qobject_input_get_object(qiv, name, true, errp);
+QObject *qobj = qobject_input_get_object(qiv, name, errp);
 const QListEntry *entry;
 
 if (list) {
@@ -382,7 +399,7 @@ static bool qobject_input_start_alternate(Visitor *v, const 
char *name,
   Error **errp)
 {
 QObjectInputVisitor

[PATCH v5 3/4] qdev-properties: Add DEFINE_PROP_ON_OFF_AUTO_BIT64()

2025-02-07 Thread Akihiko Odaki

DEFINE_PROP_ON_OFF_AUTO_BIT64() corresponds to DEFINE_PROP_ON_OFF_AUTO()
as DEFINE_PROP_BIT64() corresponds to DEFINE_PROP_BOOL(). The difference
is that DEFINE_PROP_ON_OFF_AUTO_BIT64() exposes OnOffAuto instead of
bool.

Signed-off-by: Akihiko Odaki 
---
 include/hw/qdev-properties.h | 18 
 hw/core/qdev-properties.c| 66 +++-
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 
bf27375a3ccdb238ef3327dd85d3d0a1431cbfbf..0d161325e8dc92d0e0e5aa9a1e2dd734f7a55cae
 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -43,11 +43,22 @@ struct PropertyInfo {
 ObjectPropertyRelease *release;
 };
 
+/**
+ * struct OnOffAutoBit64 - OnOffAuto storage with 64 elements.
+ * @on_bits: Bitmap of elements with "on".
+ * @auto_bits: Bitmap of elements with "auto".
+ */
+typedef struct OnOffAutoBit64 {
+uint64_t on_bits;
+uint64_t auto_bits;
+} OnOffAutoBit64;
+
 
 /*** qdev-properties.c ***/
 
 extern const PropertyInfo qdev_prop_bit;
 extern const PropertyInfo qdev_prop_bit64;
+extern const PropertyInfo qdev_prop_on_off_auto_bit64;
 extern const PropertyInfo qdev_prop_bool;
 extern const PropertyInfo qdev_prop_enum;
 extern const PropertyInfo qdev_prop_uint8;
@@ -100,6 +111,13 @@ extern const PropertyInfo qdev_prop_link;
 .set_default = true,  \
 .defval.u  = (bool)_defval)
 
+#define DEFINE_PROP_ON_OFF_AUTO_BIT64(_name, _state, _field, _bit, _defval) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_on_off_auto_bit64, \
+OnOffAutoBit64, \
+.bitnr= (_bit), \
+.set_default = true,\
+.defval.i = (OnOffAuto)_defval)
+
 #define DEFINE_PROP_BOOL(_name, _state, _field, _defval) \
 DEFINE_PROP(_name, _state, _field, qdev_prop_bool, bool, \
 .set_default = true, \
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 
073902431213c5be47197cb0d993d60cc2562501..cfab7b97091ad704b7f43d6ba6fcd8937ca5dfe3
 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -188,7 +188,8 @@ const PropertyInfo qdev_prop_bit = {
 
 static uint64_t qdev_get_prop_mask64(const Property *prop)
 {
-assert(prop->info == &qdev_prop_bit64);
+assert(prop->info == &qdev_prop_bit64 ||
+   prop->info == &qdev_prop_on_off_auto_bit64);
 return 0x1ull << prop->bitnr;
 }
 
@@ -233,6 +234,69 @@ const PropertyInfo qdev_prop_bit64 = {
 .set_default_value = set_default_value_bool,
 };
 
+static void prop_get_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+OnOffAuto value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (p->auto_bits & mask) {
+value = ON_OFF_AUTO_AUTO;
+} else if (p->on_bits & mask) {
+value = ON_OFF_AUTO_ON;
+} else {
+value = ON_OFF_AUTO_OFF;
+}
+
+visit_type_OnOffAuto(v, name, &value, errp);
+}
+
+static void prop_set_on_off_auto_bit64(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+Property *prop = opaque;
+OnOffAutoBit64 *p = object_field_prop_ptr(obj, prop);
+OnOffAuto value;
+uint64_t mask = qdev_get_prop_mask64(prop);
+
+if (!visit_type_OnOffAuto(v, name, &value, errp)) {
+return;
+}
+
+switch (value) {
+case ON_OFF_AUTO_AUTO:
+p->on_bits &= ~mask;
+p->auto_bits |= mask;
+break;
+
+case ON_OFF_AUTO_ON:
+p->on_bits |= mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO_OFF:
+p->on_bits &= ~mask;
+p->auto_bits &= ~mask;
+break;
+
+case ON_OFF_AUTO__MAX:
+g_assert_not_reached();
+}
+}
+
+const PropertyInfo qdev_prop_on_off_auto_bit64 = {
+.name  = "OnOffAuto",
+.description = "on/off/auto",
+.enum_table = &OnOffAuto_lookup,
+.get = prop_get_on_off_auto_bit64,
+.set = prop_set_on_off_auto_bit64,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- bool --- */
 
 static void get_bool(Object *obj, Visitor *v, const char *name, void *opaque,

-- 
2.48.1

[PATCH v5 4/4] virtio: Convert feature properties to OnOffAuto

2025-02-07 Thread Akihiko Odaki

Some features are not always available with vhost. Legacy features are
not available with vp_vdpa in particular. virtio devices used to disable
them when not available even if the corresponding properties were
explicitly set to "on".

QEMU already has OnOffAuto type, which includes the "auto" value to let
it automatically decide the effective value. Convert feature properties
to OnOffAuto and set them "auto" by default to utilize it. This allows
QEMU to report an error if they are set "on" and the corresponding
features are not available.

Signed-off-by: Akihiko Odaki 
---
 include/hw/virtio/virtio.h | 38 +-
 hw/core/machine.c  |  1 +
 hw/virtio/virtio-bus.c | 14 --
 hw/virtio/virtio.c |  4 +++-
 4 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 
638691028050d2599592d8c7e95c75ac3913fbdd..b854c2cb1d04da0a35165289c28f87e8cb869df6
 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -113,7 +113,8 @@ struct VirtIODevice
 uint16_t queue_sel;
 /**
  * These fields represent a set of VirtIO features at various
- * levels of the stack. @host_features indicates the complete
+ * levels of the stack. @requested_features indicates the feature
+ * set the user requested. @host_features indicates the complete
  * feature set the VirtIO device can offer to the driver.
  * @guest_features indicates which features the VirtIO driver has
  * selected by writing to the feature register. Finally
@@ -121,6 +122,7 @@ struct VirtIODevice
  * backend (e.g. vhost) and could potentially be a subset of the
  * total feature set offered by QEMU.
  */
+OnOffAutoBit64 requested_features;
 uint64_t host_features;
 uint64_t guest_features;
 uint64_t backend_features;
@@ -149,6 +151,7 @@ struct VirtIODevice
 bool started;
 bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
 bool disable_legacy_check;
+bool force_features_auto;
 bool vhost_started;
 VMChangeStateEntry *vmstate;
 char *bus_name;
@@ -376,22 +379,23 @@ typedef struct VirtIOSCSIConf VirtIOSCSIConf;
 typedef struct VirtIORNGConf VirtIORNGConf;
 
 #define DEFINE_VIRTIO_COMMON_FEATURES(_state, _field) \
-DEFINE_PROP_BIT64("indirect_desc", _state, _field,\
-  VIRTIO_RING_F_INDIRECT_DESC, true), \
-DEFINE_PROP_BIT64("event_idx", _state, _field,\
-  VIRTIO_RING_F_EVENT_IDX, true), \
-DEFINE_PROP_BIT64("notify_on_empty", _state, _field,  \
-  VIRTIO_F_NOTIFY_ON_EMPTY, true), \
-DEFINE_PROP_BIT64("any_layout", _state, _field, \
-  VIRTIO_F_ANY_LAYOUT, true), \
-DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
-  VIRTIO_F_IOMMU_PLATFORM, false), \
-DEFINE_PROP_BIT64("packed", _state, _field, \
-  VIRTIO_F_RING_PACKED, false), \
-DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true), \
-DEFINE_PROP_BIT64("in_order", _state, _field, \
-  VIRTIO_F_IN_ORDER, false)
+DEFINE_PROP_ON_OFF_AUTO_BIT64("indirect_desc", _state, _field, \
+  VIRTIO_RING_F_INDIRECT_DESC, \
+  ON_OFF_AUTO_AUTO), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("event_idx", _state, _field, \
+  VIRTIO_RING_F_EVENT_IDX, ON_OFF_AUTO_AUTO), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("notify_on_empty", _state, _field, \
+  VIRTIO_F_NOTIFY_ON_EMPTY, ON_OFF_AUTO_AUTO), 
\
+DEFINE_PROP_ON_OFF_AUTO_BIT64("any_layout", _state, _field, \
+  VIRTIO_F_ANY_LAYOUT, ON_OFF_AUTO_AUTO), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("iommu_platform", _state, _field, \
+  VIRTIO_F_IOMMU_PLATFORM, ON_OFF_AUTO_OFF), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("packed", _state, _field, \
+  VIRTIO_F_RING_PACKED, ON_OFF_AUTO_OFF), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("queue_reset", _state, _field, \
+  VIRTIO_F_RING_RESET, ON_OFF_AUTO_AUTO), \
+DEFINE_PROP_ON_OFF_AUTO_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, ON_OFF_AUTO_OFF)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 
c23b39949649054ac59d2a9b497f34e1b7bd8d6c..0de04baa61735ff02f797f778c626ef690625ce3
 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -38,6 +38,7 @@
 
 GlobalProperty hw_compat_9_2[] = {
 {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
+{ TYPE_VIRTIO_DEVICE, "x-force-features-auto", "on" },
 };
 const size_t hw_compat_9_

[PATCH v5 2/4] qapi: Accept bool for OnOffAuto and OnOffSplit

2025-02-07 Thread Akihiko Odaki

bool has representations of "on" and "off" different from
OnOffAuto/OnOffSplit:
- The command line syntax accepts on/yes/true/y and off/no/false/n for
  bool but only on and off for OnOffAuto.
- JSON uses true/false for bool but "on" and "off" for
  OnOffAuto/OnOffSplit.

This inconsistency causes some problems:
- Users need to take the underlying type into consideration to determine
  what literal to specify, increasing cognitive loads for human users
  and complexity for programs invoking QEMU.
- Converting an existing bool property to OnOffAuto/OnOffSplit will
  break compatibility.

Fix these problems by accepting bool literals for OnOffAuto/OnOffSplit.
This change is specific to OnOffAuto/OnOffSplit; types added in the
future may be defined as an alternate of bool and enum to avoid the
mentioned problems in the first place.

Signed-off-by: Akihiko Odaki 
---
 hw/core/qdev-properties.c | 17 -
 scripts/qapi/visit.py | 24 
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 
434a76f5036edd2091a9c79525b8e102582637be..073902431213c5be47197cb0d993d60cc2562501
 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -2,6 +2,7 @@
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
 #include "qapi/qapi-types-misc.h"
+#include "qapi/qapi-visit-common.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/ctype.h"
 #include "qemu/error-report.h"
@@ -493,12 +494,26 @@ const PropertyInfo qdev_prop_string = {
 
 /* --- on/off/auto --- */
 
+static void set_on_off_auto(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+Property *prop = opaque;
+int *ptr = object_field_prop_ptr(obj, prop);
+OnOffAuto value;
+
+if (!visit_type_OnOffAuto(v, name, &value, errp)) {
+return;
+}
+
+*ptr = value;
+}
+
 const PropertyInfo qdev_prop_on_off_auto = {
 .name = "OnOffAuto",
 .description = "on/off/auto",
 .enum_table = &OnOffAuto_lookup,
 .get = qdev_propinfo_get_enum,
-.set = qdev_propinfo_set_enum,
+.set = set_on_off_auto,
 .set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
diff --git a/scripts/qapi/visit.py b/scripts/qapi/visit.py
index 
12f92e429f6bafc091f74af88c1b837d08c7f733..221373b165aa95bceb4eb50a557edf0e5b4c01f7
 100644
--- a/scripts/qapi/visit.py
+++ b/scripts/qapi/visit.py
@@ -209,6 +209,29 @@ def gen_visit_list(name: str, element_type: 
QAPISchemaType) -> str:
 
 
 def gen_visit_enum(name: str) -> str:
+if name in ('OnOffAuto', 'OnOffSplit'):
+return mcgen('''
+
+bool visit_type_%(c_name)s(Visitor *v, const char *name,
+ %(c_name)s *obj, Error **errp)
+{
+bool b;
+int i;
+
+if (v->type == VISITOR_INPUT && visit_type_bool(v, name, &b, NULL)) {
+*obj = b ? %(on)s : %(off)s;
+return true;
+}
+
+b = visit_type_enum(v, name, &i, &%(c_name)s_lookup, errp);
+*obj = i;
+
+return b;
+}
+''',
+ c_name=c_name(name),
+ on=c_enum_const(name, 'on'), off=c_enum_const(name, 
'off'))
+
 return mcgen('''
 
 bool visit_type_%(c_name)s(Visitor *v, const char *name,
@@ -359,6 +382,7 @@ def _begin_user_module(self, name: str) -> None:
 self._genc.preamble_add(mcgen('''
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/visitor-impl.h"
 #include "%(visit)s.h"
 ''',
   visit=visit))

-- 
2.48.1

Re: [PATCH 04/10] rust: add bindings for gpio_{in|out} initialization

2025-02-07 Thread Zhao Liu

On Wed, Jan 29, 2025 at 11:59:04AM +0100, Paolo Bonzini wrote:
> Date: Wed, 29 Jan 2025 11:59:04 +0100
> From: Paolo Bonzini 
> Subject: Re: [PATCH 04/10] rust: add bindings for gpio_{in|out}
>  initialization
> 
> 
> 
> On Sat, Jan 25, 2025 at 1:32 PM Zhao Liu  wrote:
> > +fn init_gpio_in FnCall<(&'a Self::Target, u32, 
> > u32)>>(&self, num_lines: u32, _f: F) {
> > +unsafe extern "C" fn rust_irq_handler FnCall<(&'a T, 
> > u32, u32)>>(
> > +opaque: *mut c_void,
> > +line: c_int,
> > +level: c_int,
> > +) {
> > +// SAFETY: the opaque was passed as a reference to `T`
> > +F::call((unsafe { &*(opaque.cast::()) }, line as u32, level 
> > as u32))
> > +}
> > +
> > +let gpio_in_cb: unsafe extern "C" fn(*mut c_void, c_int, c_int) =
> > +rust_irq_handler::;
> 
> Please add "let _: () = F::ASSERT_IS_SOME;", which is added by the
> qdev_init_clock_in() patch.
> 

Okay.

I would add `assert!(F::is_some());` at the beginning of init_gpio_in().

There's a difference with origianl C version:

In C side, qdev_get_gpio_in() family could accept a NULL handler, but
there's no such case in current QEMU:

* qdev_get_gpio_in
* qdev_init_gpio_in_named
* qdev_init_gpio_in_named_with_opaque

And from code logic view, creating an input GPIO line but doing nothing
on input, sounds also unusual.

So, for simplicity, in the Rust version I make the handler non-optional.

Re: [PATCH 01/12] target/riscv: Source vector registers cannot overlap mask register

2025-02-07 Thread Max Chou


Reviewed-by: Max Chou 


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

Add the relevant ISA paragraphs explaining why source (and destination)
registers cannot overlap the mask register.

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 29 ++---
  1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index b9883a5d32..20b1cb127b 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -100,10 +100,33 @@ static bool require_scale_rvfmin(DisasContext *s)
  }
  }
  
-/* Destination vector register group cannot overlap source mask register. */

-static bool require_vm(int vm, int vd)
+/*
+ * Source and destination vector register groups cannot overlap source mask
+ * register:
+ *
+ * A vector register cannot be used to provide source operands with more than
+ * one EEW for a single instruction. A mask register source is considered to
+ * have EEW=1 for this constraint. An encoding that would result in the same
+ * vector register being read with two or more different EEWs, including when
+ * the vector register appears at different positions within two or more vector
+ * register groups, is reserved.
+ * (Section 5.2)
+ *
+ * A destination vector register group can overlap a source vector
+ * register group only if one of the following holds:
+ *  1. The destination EEW equals the source EEW.
+ *  2. The destination EEW is smaller than the source EEW and the overlap
+ * is in the lowest-numbered part of the source register group.
+ *  3. The destination EEW is greater than the source EEW, the source EMUL
+ * is at least 1, and the overlap is in the highest-numbered part of
+ * the destination register group.
+ * For the purpose of determining register group overlap constraints, mask
+ * elements have EEW=1.
+ * (Section 5.2)
+ */
+static bool require_vm(int vm, int v)
  {
-return (vm != 0 || vd != 0);
+return (vm != 0 || v != 0);
  }
  
  static bool require_nf(int vd, int nf, int lmul)

Re: [PATCH 3/4] vfio/igd: use PCI ID defines to detect IGD gen

2025-02-07 Thread Corvin Köhne

On Fri, 2025-02-07 at 08:47 +0100, Corvin Köhne wrote:
> On Thu, 2025-02-06 at 14:26 -0700, Alex Williamson wrote:
> > On Thu,  6 Feb 2025 13:13:39 +0100
> > Corvin Köhne  wrote:
> > 
> > > From: Corvin Köhne 
> > > 
> > > We've recently imported the PCI ID list of knwon Intel GPU devices from
> > > Linux. It allows us to properly match GPUs to their generation without
> > > maintaining an own list of PCI IDs.
> > > 
> > > Signed-off-by: Corvin Köhne 
> > > ---
> > >  hw/vfio/igd.c | 77 ---
> > >  1 file changed, 42 insertions(+), 35 deletions(-)
> > > 
> > > diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> > > index 0740a5dd8c..e5d7006ce2 100644
> > > --- a/hw/vfio/igd.c
> > > +++ b/hw/vfio/igd.c
> > > @@ -18,6 +18,7 @@
> > >  #include "hw/hw.h"
> > >  #include "hw/nvram/fw_cfg.h"
> > >  #include "pci.h"
> > > +#include "standard-headers/drm/intel/pciids.h"
> > >  #include "trace.h"
> > >  
> > >  /*
> > > @@ -51,6 +52,42 @@
> > >   * headless setup is desired, the OpRegion gets in the way of that.
> > >   */
> > >  
> > > +struct igd_device {
> > > +    const uint32_t device_id;
> > > +    const int gen;
> > > +};
> > > +
> > > +#define IGD_DEVICE(_id, _gen) { \
> > > +    .device_id = (_id), \
> > > +    .gen = (_gen), \
> > > +}
> > > +
> > > +static const struct igd_device igd_devices[] = {
> > > +    INTEL_SNB_IDS(IGD_DEVICE, 6),
> > > +    INTEL_IVB_IDS(IGD_DEVICE, 6),
> > > +    INTEL_HSW_IDS(IGD_DEVICE, 7),
> > > +    INTEL_VLV_IDS(IGD_DEVICE, 7),
> > > +    INTEL_BDW_IDS(IGD_DEVICE, 8),
> > > +    INTEL_CHV_IDS(IGD_DEVICE, 8),
> > > +    INTEL_SKL_IDS(IGD_DEVICE, 9),
> > > +    INTEL_BXT_IDS(IGD_DEVICE, 9),
> > > +    INTEL_KBL_IDS(IGD_DEVICE, 9),
> > > +    INTEL_CFL_IDS(IGD_DEVICE, 9),
> > > +    INTEL_CML_IDS(IGD_DEVICE, 9),
> > > +    INTEL_GLK_IDS(IGD_DEVICE, 9),
> > > +    INTEL_ICL_IDS(IGD_DEVICE, 11),
> > > +    INTEL_EHL_IDS(IGD_DEVICE, 11),
> > > +    INTEL_JSL_IDS(IGD_DEVICE, 11),
> > > +    INTEL_TGL_IDS(IGD_DEVICE, 12),
> > > +    INTEL_RKL_IDS(IGD_DEVICE, 12),
> > > +    INTEL_ADLS_IDS(IGD_DEVICE, 12),
> > > +    INTEL_ADLP_IDS(IGD_DEVICE, 12),
> > > +    INTEL_ADLN_IDS(IGD_DEVICE, 12),
> > > +    INTEL_RPLS_IDS(IGD_DEVICE, 12),
> > > +    INTEL_RPLU_IDS(IGD_DEVICE, 12),
> > > +    INTEL_RPLP_IDS(IGD_DEVICE, 12),
> > > +};
> > 
> > I agree with Connie's comment on the ordering and content of the first
> > two patches.
> > 
> > For these last two, I wish these actually made it substantially easier
> > to synchronize with upstream.  Based on the next patch, I think it
> > still requires manually tracking/parsing internal code in the i915
> > driver to extract generation information.
> > 
> > Is it possible that we could split the above into a separate file
> > that's auto-generated from a script?  For example maybe some scripting
> > and C code that can instantiate the pciidlist array from i915_pci.c and
> > regurgitate it into a device-id/generation table?  Thanks,
> > 
> > Alex
> > 
> 
> Hi Alex,
> 
> I took a closer look into i915 and it seems hard to parse. Upstream maintains
> a
> description for each generation, e.g. on AlderLake P [1] the generation is
> defined in the .info field of a struct, the .info field itself is defined
> somewhere else [2] and sets the .__runtime_defaults.ip.ver by another C macro
> [3]. Other platforms like GeminiLake set the .ip.ver directly in their
> description struct [4].
> 
> Nevertheless, we may not need this PCI ID mapping at all in the future. It
> looks
> like Intel added a new register to their GPU starting with MeteorLake [5]. We
> can read it to obtain the GPU generation [6]. I don't have a MeteorLake system
> available yet, so I can't test it. On my TigerLake system, the register
> returns
> zero. When it works as expected, we could refactor the igd_gen function to
> something like:
> 
> static int igd_gen(VFIOPCIDevice vdev) {
>   uint32_t gmd_id = vfio_region_read(&vdev->bars[0].region, GMD_ID_DISPLAY,
> 4);
>   if (gmd_id != 0) {
>     return (gmd_id & GMD_ID_ARCH_MASK) >> GMD_ID_ARCH_SHIFT;
>   }
> 
>   // Fallback to PCI ID mapping.
>   ... 
> }
> 
> [1]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L1171
> [2]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L1128
> [3]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L1120
> [4]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L829
> [5]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L1326-L1330
> [6]
> https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gpu/drm/i915/display/intel_display_device.c#L1432
> 
> 

I've missed that upstream maintains a second list [1]. Nevertheless, it looks
still hard to parse.

[1]
https://elixir.bootlin.com/linux/v6.13.1/source/drivers/gp

Re: [PATCH v2 13/18] hw/arm/fsl-imx8mp: Implement gneral purpose timers

2025-02-07 Thread Bernhard Beschow




Am 6. Februar 2025 17:29:16 UTC schrieb Peter Maydell 
:
>On Tue, 4 Feb 2025 at 09:21, Bernhard Beschow  wrote:
>>
>> Signed-off-by: Bernhard Beschow 
>> ---
>>  docs/system/arm/imx8mp-evk.rst |  1 +
>>  include/hw/arm/fsl-imx8mp.h| 11 +++
>>  include/hw/timer/imx_gpt.h |  1 +
>>  hw/arm/fsl-imx8mp.c| 53 ++
>>  hw/timer/imx_gpt.c | 25 
>>  hw/arm/Kconfig |  1 +
>>  6 files changed, 92 insertions(+)
>
>Typo in the subject: "general". Otherwise

Will be fixed in v3.
>
>Reviewed-by: Peter Maydell 

Thanks,
Bernhard

>
>thanks
>-- PMM

Re: [PATCH 02/12] target/riscv: handle vrgather mask and source overlap

2025-02-07 Thread Max Chou


Hi Anton,

You might need to extend this patch or provide a new patch to handle
the different EEWs source operands checking for the vrgatherei16.vv
instruction (when SEW is not 16).

Thanks,
Max


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 20b1cb127b..c66cd95bdb 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -3453,7 +3453,9 @@ static bool vrgather_vv_check(DisasContext *s, arg_rmrr 
*a)
 require_align(a->rs1, s->lmul) &&
 require_align(a->rs2, s->lmul) &&
 (a->rd != a->rs2 && a->rd != a->rs1) &&
-   require_vm(a->vm, a->rd);
+   require_vm(a->vm, a->rd) &&
+   require_vm(a->vm, a->rs1) &&
+   require_vm(a->vm, a->rs2);
  }
  
  static bool vrgatherei16_vv_check(DisasContext *s, arg_rmrr *a)

@@ -3470,7 +3472,9 @@ static bool vrgatherei16_vv_check(DisasContext *s, 
arg_rmrr *a)
a->rs1, 1 << MAX(emul, 0)) &&
 !is_overlapped(a->rd, 1 << MAX(s->lmul, 0),
a->rs2, 1 << MAX(s->lmul, 0)) &&
-   require_vm(a->vm, a->rd);
+   require_vm(a->vm, a->rd) &&
+   require_vm(a->vm, a->rs1) &&
+   require_vm(a->vm, a->rs2);
  }
  
  GEN_OPIVV_TRANS(vrgather_vv, vrgather_vv_check)

@@ -3483,7 +3487,8 @@ static bool vrgather_vx_check(DisasContext *s, arg_rmrr 
*a)
 require_align(a->rd, s->lmul) &&
 require_align(a->rs2, s->lmul) &&
 (a->rd != a->rs2) &&
-   require_vm(a->vm, a->rd);
+   require_vm(a->vm, a->rd) &&
+   require_vm(a->vm, a->rs2);
  }
  
  /* vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[rs1] */

Re: [PATCH 03/12] target/riscv: handle vadd.vx form mask and source overlap

2025-02-07 Thread Max Chou


Hi Anton,

I think that the commit message could be improved for better clarity.
The vext_check_ss function affects more RVV instructions than the 
vadd.vx instruction alone.
(PS:perhaps using the category (OPIVX/OPFVF/etc.) to describe the 
affected RVV instructions would be more helpful.)

Additionally, the patch 04/07/08/09/10 also have the same issue.

Thanks,
Max


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index c66cd95bdb..bc2780497e 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -382,6 +382,7 @@ static bool vext_check_ld_index(DisasContext *s, int vd, 
int vs2,
  static bool vext_check_ss(DisasContext *s, int vd, int vs, int vm)
  {
  return require_vm(vm, vd) &&
+   require_vm(vm, vs) &&
 require_align(vd, s->lmul) &&
 require_align(vs, s->lmul);
  }

Re: [PATCH 05/12] target/riscv: handle vslide1down.vx form mask and source overlap

2025-02-07 Thread Max Chou


Hi Anton,

The vext_check_slide function affects the 
vslide[up|down].v[x|i]/vfslide1[up|down].vf/vslide1[up|down].vx

instructions than the vslide1down.vx instruction alone.
Therefore, it would be more appropriate to update the commit message to 
provide a clearer information.
(PS:perhaps, using the “vector slide instructions” to replace the 
specified vslide1down.vx instruction would be better.)

The patch 06 also has the same issue.

Thanks,
Max


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index f5ba1c4280..a873536eea 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -609,6 +609,7 @@ static bool vext_check_slide(DisasContext *s, int vd, int 
vs2,
  {
  bool ret = require_align(vs2, s->lmul) &&
 require_align(vd, s->lmul) &&
+   require_vm(vm, vs2) &&
 require_vm(vm, vd);
  if (is_over) {
  ret &= (vd != vs2);

Re: [PATCH] target/tricore: Inline TARGET_LONG_BITS in decode_rr_logical_shift()

2025-02-07 Thread Bastian Koppelmann

On Thu, Feb 06, 2025 at 06:32:58PM +0100, Philippe Mathieu-Daudé wrote:
> We only support 32-bit TriCore architecture.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/tricore/translate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Bastian Koppelmann 

Cheers,
Bastian

Re: [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities

2025-02-07 Thread Prasad Pandit

On Fri, 7 Feb 2025 at 04:14, Peter Xu  wrote:
> Would you mind reorder the two test patches, to avoid removing the lines
> added by previous patch?

* Both ways they are the same in the end, no? Anyway, will do.

Thank you.
---
  - Prasad

Re: [PATCH 10/12] target/riscv: handle vwadd.wv form vs1 and vs2 overlap

2025-02-07 Thread Max Chou


Hi Anton,

This patch violates some coding style rules of QEMU.
You can verify the coding style by running the checkpatch.pl script in 
the QEMU repository.
(ref: 
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style)

The patch 12 also has the same issue.

Thanks,
Max


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

for 2*SEW = 2*SEW op SEW instructions vs2 and vs1 cannot overlap
because it would mean a register is read with two different SEW
settings.

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 2309d9abd0..312d8b1b81 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -549,7 +549,8 @@ static bool vext_check_dds(DisasContext *s, int vd, int 
vs1, int vs2, int vm)
  {
  return vext_check_ds(s, vd, vs1, vm) &&
 require_vm(vm, vs2) &&
-   require_align(vs2, s->lmul + 1);
+   require_align(vs2, s->lmul + 1) &&
+   !is_overlapped(vs2, 1 << MAX(s->lmul+1, 0), vs1, 1 << MAX(s->lmul, 
0));
  }
  
  static bool vext_check_sd(DisasContext *s, int vd, int vs, int vm)

Re: [PATCH 11/12] target/riscv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS

2025-02-07 Thread Max Chou


Reviewed-by: Max Chou 


On 2025/1/26 3:20 PM, Anton Blanchard wrote:

Signed-off-by: Anton Blanchard 
---
  target/riscv/insn_trans/trans_rvv.c.inc | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 312d8b1b81..2741f8bd8e 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -2410,10 +2410,10 @@ static bool opfvf_widen_check(DisasContext *s, arg_rmrr 
*a)
  }
  
  /* OPFVF with WIDEN */

-#define GEN_OPFVF_WIDEN_TRANS(NAME)  \
+#define GEN_OPFVF_WIDEN_TRANS(NAME, CHECK)   \
  static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   \
  {\
-if (opfvf_widen_check(s, a)) {   \
+if (CHECK(s, a)) {   \
  uint32_t data = 0;   \
  static gen_helper_opfvf *const fns[2] = {\
  gen_helper_##NAME##_h, gen_helper_##NAME##_w,\
@@ -2429,8 +2429,8 @@ static bool trans_##NAME(DisasContext *s, arg_rmrr *a)
   \
  return false;\
  }
  
-GEN_OPFVF_WIDEN_TRANS(vfwadd_vf)

-GEN_OPFVF_WIDEN_TRANS(vfwsub_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwadd_vf, opfvf_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwsub_vf, opfvf_widen_check)
  
  static bool opfwv_widen_check(DisasContext *s, arg_rmrr *a)

  {
@@ -2512,7 +2512,7 @@ GEN_OPFVF_TRANS(vfrdiv_vf,  opfvf_check)
  
  /* Vector Widening Floating-Point Multiply */

  GEN_OPFVV_WIDEN_TRANS(vfwmul_vv, opfvv_widen_check)
-GEN_OPFVF_WIDEN_TRANS(vfwmul_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwmul_vf, opfvf_widen_check)
  
  /* Vector Single-Width Floating-Point Fused Multiply-Add Instructions */

  GEN_OPFVV_TRANS(vfmacc_vv, opfvv_check)
@@ -2537,10 +2537,10 @@ GEN_OPFVV_WIDEN_TRANS(vfwmacc_vv, opfvv_widen_check)
  GEN_OPFVV_WIDEN_TRANS(vfwnmacc_vv, opfvv_widen_check)
  GEN_OPFVV_WIDEN_TRANS(vfwmsac_vv, opfvv_widen_check)
  GEN_OPFVV_WIDEN_TRANS(vfwnmsac_vv, opfvv_widen_check)
-GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf)
-GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf)
-GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf)
-GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf)
+GEN_OPFVF_WIDEN_TRANS(vfwmacc_vf, opfvf_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwnmacc_vf, opfvf_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwmsac_vf, opfvf_widen_check)
+GEN_OPFVF_WIDEN_TRANS(vfwnmsac_vf, opfvf_widen_check)
  
  /* Vector Floating-Point Square-Root Instruction */

Re: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set KVM_PMU_CAP_DISABLE

2025-02-07 Thread Mi, Dapeng



On 11/21/2024 6:06 PM, Mi, Dapeng wrote:
> On 11/8/2024 7:44 AM, dongli.zh...@oracle.com wrote:
>> Hi Zhao,
>>
>>
>> On 11/6/24 11:52 PM, Zhao Liu wrote:
>>> (+Dapang & Zide)
>>>
>>> Hi Dongli,
>>>
>>> On Mon, Nov 04, 2024 at 01:40:17AM -0800, Dongli Zhang wrote:
 Date: Mon,  4 Nov 2024 01:40:17 -0800
 From: Dongli Zhang 
 Subject: [PATCH 2/7] target/i386/kvm: introduce 'pmu-cap-disabled' to set
  KVM_PMU_CAP_DISABLE
 X-Mailer: git-send-email 2.43.5

 The AMD PMU virtualization is not disabled when configuring
 "-cpu host,-pmu" in the QEMU command line on an AMD server. Neither
 "-cpu host,-pmu" nor "-cpu EPYC" effectively disables AMD PMU
 virtualization in such an environment.

 As a result, VM logs typically show:

 [0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

 whereas the expected logs should be:

 [0.596381] Performance Events: PMU not available due to 
 virtualization, using software events only.
 [0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

 This discrepancy occurs because AMD PMU does not use CPUID to determine
 whether PMU virtualization is supported.
>>> Intel platform doesn't have this issue since Linux kernel fails to check
>>> the CPU family & model when "-cpu *,-pmu" option clears PMU version.
>>>
>>> The difference between Intel and AMD platforms, however, is that it seems
>>> Intel hardly ever reaches the “...due virtualization” message, but
>>> instead reports an error because it recognizes a mismatched family/model.
>>>
>>> This may be a drawback of the PMU driver's print message, but the result
>>> is the same, it prevents the PMU driver from enabling.
>>>
>>> So, please mention that KVM_PMU_CAP_DISABLE doesn't change the PMU
>>> behavior on Intel platform because current "pmu" property works as
>>> expected.
>> Sure. I will mention this in v2.
>>
 To address this, we introduce a new property, 'pmu-cap-disabled', for KVM
 acceleration. This property sets KVM_PMU_CAP_DISABLE if
 KVM_CAP_PMU_CAPABILITY is supported. Note that this feature currently
 supports only x86 hosts, as KVM_CAP_PMU_CAPABILITY is used exclusively for
 x86 systems.

 Signed-off-by: Dongli Zhang 
 ---
 Another previous solution to re-use '-cpu host,-pmu':
 https://urldefense.com/v3/__https://lore.kernel.org/all/20221119122901.2469-1-dongli.zh...@oracle.com/__;!!ACWV5N9M2RV99hQ!Nm8Db-mwBoMIwKkRqzC9kgNi5uZ7SCIf43zUBn92Ar_NEbLXq-ZkrDDvpvDQ4cnS2i4VyKAp6CRVE12bRkMF$
  
>>> IMO, I prefer the previous version. This VM-level KVM property is
>>> difficult to integrate with the existing CPU properties. Pls refer later
>>> comments for reasons.
>>>
  accel/kvm/kvm-all.c|  1 +
  include/sysemu/kvm_int.h   |  1 +
  qemu-options.hx|  9 ++-
  target/i386/cpu.c  |  2 +-
  target/i386/kvm/kvm.c  | 52 ++
  target/i386/kvm/kvm_i386.h |  2 ++
  6 files changed, 65 insertions(+), 2 deletions(-)

 diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
 index 801cff16a5..8b5ba45cf7 100644
 --- a/accel/kvm/kvm-all.c
 +++ b/accel/kvm/kvm-all.c
 @@ -3933,6 +3933,7 @@ static void kvm_accel_instance_init(Object *obj)
  s->xen_evtchn_max_pirq = 256;
  s->device = NULL;
  s->msr_energy.enable = false;
 +s->pmu_cap_disabled = false;
  }
>>> The CPU property "pmu" also defaults to "false"...but:
>>>
>>>  * max CPU would override this and try to enable PMU by default in
>>>max_x86_cpu_initfn().
>>>
>>>  * Other named CPU models keep the default setting to avoid affecting
>>>the migration.
>>>
>>> The pmu_cap_disabled and “pmu” property look unbound and unassociated,
>>> so this can cause the conflict when they are not synchronized. For
>>> example,
>>>
>>> -cpu host -accel kvm,pmu-cap-disabled=on
>>>
>>> The above options will fail to launch a VM (on Intel platform).
>>>
>>> Ideally, the “pmu” property and pmu-cap-disabled should be bound to each
>>> other and be consistent. But it's not easy because:
>>>  - There is no proper way to have pmu_cap_disabled set different default
>>>values (e.g., "false" for max CPU and "true" for named CPU models)
>>>based on different CPU models.
>>>  - And, no proper place to check the consistency of pmu_cap_disabled and
>>>enable_pmu.
>>>
>>> Therefore, I prefer your previous approach, to reuse current CPU "pmu"
>>> property.
>> Thank you very much for the suggestion and reasons.
>>
>> I am going to follow your suggestion to switch back to the previous solution 
>> in v2.
> +1.
>
>  I also prefer to leverage current exist "+/-pmu" option instead of adding
> a new option. More options, more complexity. When they are not
> inconsistent, which has higher priority? all these are issues.
>
> Although KVM_CAP_PMU_CAPABILITY is a VM-level PMU capability, b

Re: [PATCH 04/10] rust: add bindings for gpio_{in|out} initialization

2025-02-07 Thread Paolo Bonzini

Il ven 7 feb 2025, 09:24 Zhao Liu  ha scritto:

> > Please add "let _: () = F::ASSERT_IS_SOME;", which is added by the
> > qdev_init_clock_in() patch.
> >
>
> Okay.
>
> I would add `assert!(F::is_some());` at the beginning of init_gpio_in().
>

Use the "let" so that it's caught at compile time.

There's a difference with origianl C version:
>
> In C side, qdev_get_gpio_in() family could accept a NULL handler, but
> there's no such case in current QEMU:
>
> * qdev_get_gpio_in
> * qdev_init_gpio_in_named
> * qdev_init_gpio_in_named_with_opaque
>
> And from code logic view, creating an input GPIO line but doing nothing
> on input, sounds also unusual.
>

Wouldn't it then crash in qemu_set_irq?

Paolo

So, for simplicity, in the Rust version I make the handler non-optional.
>
>
>

[PATCH 01/15] arm/cpu: Add sysreg definitions in cpu-sysregs.h

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

This new header contains macros that define aarch64 registers.
In a subsequent patch, this will be replaced by a more exhaustive
version that will be generated from linux arch/arm64/tools/sysreg
file. Those macros are sufficient to migrate the storage of those
ID regs from named fields in isar struct to an array cell.

[CH: reworked to use different structures]
[CH: moved accessors from the patches first using them to here,
 dropped interaction with writable registers, which will happen
 later]
Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-sysregs.h | 131 +++
 target/arm/cpu.h |  42 +
 2 files changed, 173 insertions(+)
 create mode 100644 target/arm/cpu-sysregs.h

diff --git a/target/arm/cpu-sysregs.h b/target/arm/cpu-sysregs.h
new file mode 100644
index ..de09ebae91a5
--- /dev/null
+++ b/target/arm/cpu-sysregs.h
@@ -0,0 +1,131 @@
+#ifndef ARM_CPU_SYSREGS_H
+#define ARM_CPU_SYSREGS_H
+
+/*
+ * Following is similar to the coprocessor regs encodings, but with an argument
+ * ordering that matches the ARM ARM. We also reuse the various CP_REG_ defines
+ * that actually are the same as the equivalent KVM_REG_ values.
+ */
+#define ENCODE_ID_REG(op0, op1, crn, crm, op2)  \
+(((op0) << CP_REG_ARM64_SYSREG_OP0_SHIFT) | \
+ ((op1) << CP_REG_ARM64_SYSREG_OP1_SHIFT) | \
+ ((crn) << CP_REG_ARM64_SYSREG_CRN_SHIFT) | \
+ ((crm) << CP_REG_ARM64_SYSREG_CRM_SHIFT) | \
+ ((op2) << CP_REG_ARM64_SYSREG_OP2_SHIFT))
+
+typedef enum ARMIDRegisterIdx {
+ID_AA64PFR0_EL1_IDX,
+ID_AA64PFR1_EL1_IDX,
+ID_AA64SMFR0_EL1_IDX,
+ID_AA64DFR0_EL1_IDX,
+ID_AA64DFR1_EL1_IDX,
+ID_AA64ISAR0_EL1_IDX,
+ID_AA64ISAR1_EL1_IDX,
+ID_AA64ISAR2_EL1_IDX,
+ID_AA64MMFR0_EL1_IDX,
+ID_AA64MMFR1_EL1_IDX,
+ID_AA64MMFR2_EL1_IDX,
+ID_AA64MMFR3_EL1_IDX,
+ID_PFR0_EL1_IDX,
+ID_PFR1_EL1_IDX,
+ID_DFR0_EL1_IDX,
+ID_MMFR0_EL1_IDX,
+ID_MMFR1_EL1_IDX,
+ID_MMFR2_EL1_IDX,
+ID_MMFR3_EL1_IDX,
+ID_ISAR0_EL1_IDX,
+ID_ISAR1_EL1_IDX,
+ID_ISAR2_EL1_IDX,
+ID_ISAR3_EL1_IDX,
+ID_ISAR4_EL1_IDX,
+ID_ISAR5_EL1_IDX,
+ID_MMFR4_EL1_IDX,
+ID_ISAR6_EL1_IDX,
+MVFR0_EL1_IDX,
+MVFR1_EL1_IDX,
+MVFR2_EL1_IDX,
+ID_PFR2_EL1_IDX,
+ID_DFR1_EL1_IDX,
+ID_MMFR5_EL1_IDX,
+ID_AA64ZFR0_EL1_IDX,
+CTR_EL0_IDX,
+NUM_ID_IDX,
+} ARMIDRegisterIdx;
+
+typedef enum ARMSysRegs {
+SYS_ID_AA64PFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 4, 0),
+SYS_ID_AA64PFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 4, 1),
+SYS_ID_AA64SMFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 4, 5),
+SYS_ID_AA64DFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 5, 0),
+SYS_ID_AA64DFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 5, 1),
+SYS_ID_AA64ISAR0_EL1 = ENCODE_ID_REG(3, 0, 0, 6, 0),
+SYS_ID_AA64ISAR1_EL1 = ENCODE_ID_REG(3, 0, 0, 6, 1),
+SYS_ID_AA64ISAR2_EL1 = ENCODE_ID_REG(3, 0, 0, 6, 2),
+SYS_ID_AA64MMFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 7, 0),
+SYS_ID_AA64MMFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 7, 1),
+SYS_ID_AA64MMFR2_EL1 = ENCODE_ID_REG(3, 0, 0, 7, 2),
+SYS_ID_AA64MMFR3_EL1 = ENCODE_ID_REG(3, 0, 0, 7, 3),
+SYS_ID_PFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 0),
+SYS_ID_PFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 1),
+SYS_ID_DFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 2),
+SYS_ID_MMFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 4),
+SYS_ID_MMFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 5),
+SYS_ID_MMFR2_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 6),
+SYS_ID_MMFR3_EL1 = ENCODE_ID_REG(3, 0, 0, 1, 7),
+SYS_ID_ISAR0_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 0),
+SYS_ID_ISAR1_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 1),
+SYS_ID_ISAR2_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 2),
+SYS_ID_ISAR3_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 3),
+SYS_ID_ISAR4_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 4),
+SYS_ID_ISAR5_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 5),
+SYS_ID_MMFR4_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 6),
+SYS_ID_ISAR6_EL1 = ENCODE_ID_REG(3, 0, 0, 2, 7),
+SYS_MVFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 0),
+SYS_MVFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 1),
+SYS_MVFR2_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 2),
+SYS_ID_PFR2_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 4),
+SYS_ID_DFR1_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 5),
+SYS_ID_MMFR5_EL1 = ENCODE_ID_REG(3, 0, 0, 3, 6),
+SYS_ID_AA64ZFR0_EL1 = ENCODE_ID_REG(3, 0, 0, 4, 4),
+SYS_CTR_EL0 = ENCODE_ID_REG(3, 3, 0, 0, 1),
+} ARMSysRegs;
+
+static const uint32_t id_register_sysreg[NUM_ID_IDX] = {
+[ID_AA64PFR0_EL1_IDX] = SYS_ID_AA64PFR0_EL1,
+[ID_AA64PFR1_EL1_IDX] = SYS_ID_AA64PFR1_EL1,
+[ID_AA64SMFR0_EL1_IDX] = SYS_ID_AA64SMFR0_EL1,
+[ID_AA64DFR0_EL1_IDX] = SYS_ID_AA64DFR0_EL1,
+[ID_AA64DFR1_EL1_IDX] = SYS_ID_AA64DFR1_EL1,
+[ID_AA64ISAR0_EL1_IDX] = SYS_ID_AA64ISAR0_EL1,
+[ID_AA64ISAR1_EL1_IDX] = SYS_ID_AA64ISAR1_EL1,
+[ID_AA64ISAR2_EL1_IDX] = SYS_ID_AA64ISAR2_EL1,
+[ID_AA64MMFR0_EL1_IDX] = SYS_ID_AA64MMFR0_EL1,
+

[PATCH 09/15] arm/cpu: Store id_isar0-7 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 hw/intc/armv7m_nvic.c |  12 ++--
 target/arm/cpu-features.h |  36 +-
 target/arm/cpu.c  |  24 +++
 target/arm/cpu.h  |   7 --
 target/arm/cpu64.c|  28 
 target/arm/helper.c   |  14 ++--
 target/arm/kvm.c  |  22 +++---
 target/arm/tcg/cpu-v7m.c  |  90 +---
 target/arm/tcg/cpu32.c| 143 --
 target/arm/tcg/cpu64.c| 108 ++--
 10 files changed, 243 insertions(+), 241 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 5fd076098243..0e3174dc30db 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -1303,32 +1303,32 @@ static uint32_t nvic_readl(NVICState *s, uint32_t 
offset, MemTxAttrs attrs)
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar0;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR0);
 case 0xd64: /* ISAR1.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar1;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR1);
 case 0xd68: /* ISAR2.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar2;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR2);
 case 0xd6c: /* ISAR3.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar3;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR3);
 case 0xd70: /* ISAR4.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar4;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR4);
 case 0xd74: /* ISAR5.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_isar5;
+return GET_IDREG(&cpu->isar.idregs, ID_ISAR5);
 case 0xd78: /* CLIDR */
 return cpu->clidr;
 case 0xd7c: /* CTR */
diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 6224c7ec6356..b0d181996865 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -45,93 +45,93 @@
  */
 static inline bool isar_feature_aa32_thumb_div(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR0, DIVIDE) != 0;
 }
 
 static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR0, DIVIDE) > 1;
 }
 
 static inline bool isar_feature_aa32_lob(const ARMISARegisters *id)
 {
 /* (M-profile) low-overhead loops and branch future */
-return FIELD_EX32(id->id_isar0, ID_ISAR0, CMPBRANCH) >= 3;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR0, CMPBRANCH) >= 3;
 }
 
 static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR1, JAZELLE) != 0;
 }
 
 static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, AES) != 0;
 }
 
 static inline bool isar_feature_aa32_pmull(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) > 1;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, AES) > 1;
 }
 
 static inline bool isar_feature_aa32_sha1(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA1) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, SHA1) != 0;
 }
 
 static inline bool isar_feature_aa32_sha2(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA2) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, SHA2) != 0;
 }
 
 static inline bool isar_feature_aa32_crc32(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, CRC32) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, CRC32) != 0;
 }
 
 static inline bool isar_feature_aa32_rdm(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, RDM) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, RDM) != 0;
 }
 
 static inline bool isar_feature_aa32_vcma(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR5, VCMA) != 0;
 }
 
 static inline bool isar_feature_aa32_jscvt(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_isar6, ID_ISAR6, JSCVT) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_ISAR6, JSCVT) != 0;
 }
 
 static inline bool isar_feature_aa32_dp(c

[PATCH 07/15] arm/cpu: Store aa64dfr0/1 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-features.h | 16 
 target/arm/cpu.c  | 15 +--
 target/arm/cpu.h  |  2 --
 target/arm/cpu64.c|  4 ++--
 target/arm/helper.c   |  4 ++--
 target/arm/internals.h|  6 +++---
 target/arm/kvm.c  |  6 ++
 target/arm/tcg/cpu64.c| 33 +
 8 files changed, 39 insertions(+), 47 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index a26b05cb9804..05de9e0d9932 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -890,30 +890,30 @@ static inline bool isar_feature_aa64_nv2(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_pmuv3p1(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 4 &&
-FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) >= 4 &&
+FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) != 0xf;
 }
 
 static inline bool isar_feature_aa64_pmuv3p4(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 5 &&
-FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) >= 5 &&
+FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) != 0xf;
 }
 
 static inline bool isar_feature_aa64_pmuv3p5(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 6 &&
-FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) >= 6 &&
+FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, PMUVER) != 0xf;
 }
 
 static inline bool isar_feature_aa64_debugv8p2(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, DEBUGVER) >= 8;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64DFR0, DEBUGVER) >= 8;
 }
 
 static inline bool isar_feature_aa64_doublelock(const ARMISARegisters *id)
 {
-return FIELD_SEX64(id->id_aa64dfr0, ID_AA64DFR0, DOUBLELOCK) >= 0;
+return FIELD_SEX64_IDREG(&id->idregs, ID_AA64DFR0, DOUBLELOCK) >= 0;
 }
 
 static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 3e7f2e495e68..8f2d58cffbfd 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2370,8 +2370,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cpu);
 #endif
 } else {
-cpu->isar.id_aa64dfr0 =
-FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
+FIELD_DP64_IDREG(idregs, ID_AA64DFR0, PMUVER, 0);
 cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
 cpu->pmceid0 = 0;
 cpu->pmceid1 = 0;
@@ -2431,19 +2430,15 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
  * try to access the non-existent system registers for them.
  */
 /* FEAT_SPE (Statistical Profiling Extension) */
-cpu->isar.id_aa64dfr0 =
-FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMSVER, 0);
+FIELD_DP64_IDREG(idregs, ID_AA64DFR0, PMSVER, 0);
 /* FEAT_TRBE (Trace Buffer Extension) */
-cpu->isar.id_aa64dfr0 =
-FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, TRACEBUFFER, 0);
+FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEBUFFER, 0);
 /* FEAT_TRF (Self-hosted Trace Extension) */
-cpu->isar.id_aa64dfr0 =
-FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, TRACEFILT, 0);
+FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEFILT, 0);
 cpu->isar.id_dfr0 =
 FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, TRACEFILT, 0);
 /* Trace Macrocell system register access */
-cpu->isar.id_aa64dfr0 =
-FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, TRACEVER, 0);
+FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEVER, 0);
 cpu->isar.id_dfr0 =
 FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPTRC, 0);
 /* Memory mapped trace */
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index fbbec43dbdac..99b0c2a4b39d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1067,8 +1067,6 @@ struct ArchCPU {
 uint32_t dbgdidr;
 uint32_t dbgdevid;
 uint32_t dbgdevid1;
-uint64_t id_aa64dfr0;
-uint64_t id_aa64dfr1;
 uint64_t id_aa64smfr0;
 uint64_t reset_pmcr_el0;
 uint64_t idregs[NUM_ID_IDX];
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index ba39b8cc1ee0..22286a1844a4 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -649,7 +649,7 @@ static void aarch64_a57_initfn(Object *obj)
 cpu->isar.id_isar5 = 0x00011121;
 cpu->isar.id_isar6 = 0;
 SET_IDREG(idregs, ID_AA64PFR0, 0x);
-cpu->isar.id_aa64dfr0 = 0x10305106;
+SET_IDREG(idregs, ID_

[PATCH 05/15] arm/cpu: Store aa64pfr0/1 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-features.h | 40 -
 target/arm/cpu.c  | 29 
 target/arm/cpu.h  |  2 --
 target/arm/cpu64.c| 14 
 target/arm/helper.c   |  6 ++---
 target/arm/kvm.c  | 24 +---
 target/arm/tcg/cpu64.c| 47 ++-
 7 files changed, 68 insertions(+), 94 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 2837c3e8c1c7..fa5a524b5513 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -601,68 +601,68 @@ static inline bool isar_feature_aa64_mops(const 
ARMISARegisters *id)
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically.  */
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) != 0xf;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, FP) != 0xf;
 }
 
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, FP) == 1;
 }
 
 static inline bool isar_feature_aa64_aa32(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, EL0) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, EL0) >= 2;
 }
 
 static inline bool isar_feature_aa64_aa32_el1(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, EL1) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, EL1) >= 2;
 }
 
 static inline bool isar_feature_aa64_aa32_el2(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, EL2) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, EL2) >= 2;
 }
 
 static inline bool isar_feature_aa64_ras(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, RAS) != 0;
 }
 
 static inline bool isar_feature_aa64_doublefault(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RAS) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, RAS) >= 2;
 }
 
 static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, SVE) != 0;
 }
 
 static inline bool isar_feature_aa64_sel2(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SEL2) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, SEL2) != 0;
 }
 
 static inline bool isar_feature_aa64_rme(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RME) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, RME) != 0;
 }
 
 static inline bool isar_feature_aa64_dit(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, DIT) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, DIT) != 0;
 }
 
 static inline bool isar_feature_aa64_scxtnum(const ARMISARegisters *id)
 {
-int key = FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, CSV2);
+int key = FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR0, CSV2);
 if (key >= 2) {
 return true;  /* FEAT_CSV2_2 */
 }
 if (key == 1) {
-key = FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, CSV2_FRAC);
+key = FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, CSV2_FRAC);
 return key >= 2;  /* FEAT_CSV2_1p2 */
 }
 return false;
@@ -670,37 +670,37 @@ static inline bool isar_feature_aa64_scxtnum(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_ssbs(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SSBS) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, SSBS) != 0;
 }
 
 static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, BT) != 0;
 }
 
 static inline bool isar_feature_aa64_mte_insn_reg(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, MTE) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, MTE) != 0;
 }
 
 static inline bool isar_feature_aa64_mte(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, MTE) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, MTE) >= 2;
 }
 
 static inline bool isar_feature_aa64_mte3(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, MTE) >= 3;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64PFR1, MTE) >= 3;
 }
 
 static inline bool isar_feature_aa64_sme(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SME) != 0;
+r

[PATCH 02/15] arm/kvm: add accessors for storing host features into idregs

2025-02-07 Thread Cornelia Huck

Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-sysregs.h |  3 +++
 target/arm/cpu64.c   | 25 +
 target/arm/kvm.c | 30 ++
 3 files changed, 58 insertions(+)

diff --git a/target/arm/cpu-sysregs.h b/target/arm/cpu-sysregs.h
index de09ebae91a5..54a4fadbf0c1 100644
--- a/target/arm/cpu-sysregs.h
+++ b/target/arm/cpu-sysregs.h
@@ -128,4 +128,7 @@ static const uint32_t id_register_sysreg[NUM_ID_IDX] = {
 [CTR_EL0_IDX] = SYS_CTR_EL0,
 };
 
+int get_sysreg_idx(ARMSysRegs sysreg);
+uint64_t idregs_sysreg_to_kvm_reg(ARMSysRegs sysreg);
+
 #endif /* ARM_CPU_SYSREGS_H */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 8188ede5cc8a..9ae78253cb34 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -736,6 +736,31 @@ static void aarch64_a53_initfn(Object *obj)
 define_cortex_a72_a57_a53_cp_reginfo(cpu);
 }
 
+#ifdef CONFIG_KVM
+
+int get_sysreg_idx(ARMSysRegs sysreg)
+{
+int i;
+
+for (i = 0; i < NUM_ID_IDX; i++) {
+if (id_register_sysreg[i] == sysreg) {
+return i;
+}
+}
+return -1;
+}
+
+uint64_t idregs_sysreg_to_kvm_reg(ARMSysRegs sysreg)
+{
+return ARM64_SYS_REG((sysreg & CP_REG_ARM64_SYSREG_OP0_MASK) >> 
CP_REG_ARM64_SYSREG_OP0_SHIFT,
+ (sysreg & CP_REG_ARM64_SYSREG_OP1_MASK) >> 
CP_REG_ARM64_SYSREG_OP1_SHIFT,
+ (sysreg & CP_REG_ARM64_SYSREG_CRN_MASK) >> 
CP_REG_ARM64_SYSREG_CRN_SHIFT,
+ (sysreg & CP_REG_ARM64_SYSREG_CRM_MASK) >> 
CP_REG_ARM64_SYSREG_CRM_SHIFT,
+ (sysreg & CP_REG_ARM64_SYSREG_OP2_MASK) >> 
CP_REG_ARM64_SYSREG_OP2_SHIFT);
+}
+
+#endif
+
 static void aarch64_host_initfn(Object *obj)
 {
 #if defined(CONFIG_KVM)
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb2349..3b8bb5661f2b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -246,6 +246,36 @@ static bool kvm_arm_pauth_supported(void)
 kvm_check_extension(kvm_state, KVM_CAP_ARM_PTRAUTH_GENERIC));
 }
 
+/* read a 32b sysreg value and store it in the idregs */
+static int get_host_cpu_reg32(int fd, ARMHostCPUFeatures *ahcf, ARMSysRegs 
sysreg)
+{
+int index = get_sysreg_idx(sysreg);
+uint64_t *reg;
+int ret;
+
+if (index < 0) {
+return -ERANGE;
+}
+reg = &ahcf->isar.idregs[index];
+ret = read_sys_reg32(fd, (uint32_t *)reg, 
idregs_sysreg_to_kvm_reg(sysreg));
+return ret;
+}
+
+/* read a 64b sysreg value and store it in the idregs */
+static int get_host_cpu_reg64(int fd, ARMHostCPUFeatures *ahcf, ARMSysRegs 
sysreg)
+{
+int index = get_sysreg_idx(sysreg);
+uint64_t *reg;
+int ret;
+
+if (index < 0) {
+return -ERANGE;
+}
+reg = &ahcf->isar.idregs[index];
+ret = read_sys_reg64(fd, reg, idregs_sysreg_to_kvm_reg(sysreg));
+return ret;
+}
+
 static bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 {
 /* Identify the feature bits corresponding to the host CPU, and
-- 
2.48.1

[PATCH 06/15] arm/cpu: Store aa64mmfr0-3 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-features.h | 72 +++
 target/arm/cpu.h  |  4 ---
 target/arm/cpu64.c|  8 ++---
 target/arm/helper.c   |  8 ++---
 target/arm/kvm.c  | 12 +++
 target/arm/ptw.c  |  6 ++--
 target/arm/tcg/cpu64.c| 64 +-
 7 files changed, 82 insertions(+), 92 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index fa5a524b5513..a26b05cb9804 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -705,187 +705,187 @@ static inline bool isar_feature_aa64_nmi(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
 {
-return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
+return FIELD_SEX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN4) >= 1;
 }
 
 static inline bool isar_feature_aa64_tgran4_2_lpa2(const ARMISARegisters *id)
 {
-unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4_2);
+unsigned t = FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN4_2);
 return t >= 3 || (t == 0 && isar_feature_aa64_tgran4_lpa2(id));
 }
 
 static inline bool isar_feature_aa64_tgran16_lpa2(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN16) >= 2;
 }
 
 static inline bool isar_feature_aa64_tgran16_2_lpa2(const ARMISARegisters *id)
 {
-unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16_2);
+unsigned t = FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN16_2);
 return t >= 3 || (t == 0 && isar_feature_aa64_tgran16_lpa2(id));
 }
 
 static inline bool isar_feature_aa64_tgran4(const ARMISARegisters *id)
 {
-return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 0;
+return FIELD_SEX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN4) >= 0;
 }
 
 static inline bool isar_feature_aa64_tgran16(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16) >= 1;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN16) >= 1;
 }
 
 static inline bool isar_feature_aa64_tgran64(const ARMISARegisters *id)
 {
-return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN64) >= 0;
+return FIELD_SEX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN64) >= 0;
 }
 
 static inline bool isar_feature_aa64_tgran4_2(const ARMISARegisters *id)
 {
-unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4_2);
+unsigned t = FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN4_2);
 return t >= 2 || (t == 0 && isar_feature_aa64_tgran4(id));
 }
 
 static inline bool isar_feature_aa64_tgran16_2(const ARMISARegisters *id)
 {
-unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16_2);
+unsigned t = FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN16_2);
 return t >= 2 || (t == 0 && isar_feature_aa64_tgran16(id));
 }
 
 static inline bool isar_feature_aa64_tgran64_2(const ARMISARegisters *id)
 {
-unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN64_2);
+unsigned t = FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, TGRAN64_2);
 return t >= 2 || (t == 0 && isar_feature_aa64_tgran64(id));
 }
 
 static inline bool isar_feature_aa64_fgt(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, FGT) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, FGT) != 0;
 }
 
 static inline bool isar_feature_aa64_ecv_traps(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, ECV) > 0;
 }
 
 static inline bool isar_feature_aa64_ecv(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 1;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR0, ECV) > 1;
 }
 
 static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR1, VH) != 0;
 }
 
 static inline bool isar_feature_aa64_lor(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, LO) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR1, LO) != 0;
 }
 
 static inline bool isar_feature_aa64_pan(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR1, PAN) != 0;
 }
 
 static inline bool isar_feature_aa64_ats1e1(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR1, PAN) >= 2;
 }
 
 static inline bool isar_feature_aa64_pan3(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) >= 3;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64MMFR1,

[PATCH 03/15] arm/cpu: Store aa64isar0 into the idregs arrays

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-features.h | 57 ---
 target/arm/cpu.c  | 14 --
 target/arm/cpu.h  |  2 --
 target/arm/cpu64.c|  8 +++---
 target/arm/helper.c   |  6 +++--
 target/arm/kvm.c  |  8 +++---
 target/arm/tcg/cpu64.c| 44 ++
 7 files changed, 74 insertions(+), 65 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 30302d6c5b41..9638c9428db3 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -22,6 +22,7 @@
 
 #include "hw/registerfields.h"
 #include "qemu/host-utils.h"
+#include "cpu-sysregs.h"
 
 /*
  * Naming convention for isar_feature functions:
@@ -376,92 +377,92 @@ static inline bool isar_feature_aa32_doublelock(const 
ARMISARegisters *id)
  */
 static inline bool isar_feature_aa64_aes(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, AES) != 0;
 }
 
 static inline bool isar_feature_aa64_pmull(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) > 1;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, AES) > 1;
 }
 
 static inline bool isar_feature_aa64_sha1(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA1) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SHA1) != 0;
 }
 
 static inline bool isar_feature_aa64_sha256(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SHA2) != 0;
 }
 
 static inline bool isar_feature_aa64_sha512(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) > 1;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SHA2) > 1;
 }
 
 static inline bool isar_feature_aa64_crc32(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, CRC32) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, CRC32) != 0;
 }
 
 static inline bool isar_feature_aa64_atomics(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, ATOMIC) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, ATOMIC) != 0;
 }
 
 static inline bool isar_feature_aa64_rdm(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, RDM) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, RDM) != 0;
 }
 
 static inline bool isar_feature_aa64_sha3(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA3) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SHA3) != 0;
 }
 
 static inline bool isar_feature_aa64_sm3(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM3) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SM3) != 0;
 }
 
 static inline bool isar_feature_aa64_sm4(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM4) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, SM4) != 0;
 }
 
 static inline bool isar_feature_aa64_dp(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, DP) != 0;
 }
 
 static inline bool isar_feature_aa64_fhm(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, FHM) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, FHM) != 0;
 }
 
 static inline bool isar_feature_aa64_condm_4(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TS) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, TS) != 0;
 }
 
 static inline bool isar_feature_aa64_condm_5(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TS) >= 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, TS) >= 2;
 }
 
 static inline bool isar_feature_aa64_rndr(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, RNDR) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, RNDR) != 0;
 }
 
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, TLB) == 2;
 }
 
 static inline bool isar_feature_aa64_tlbios(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) != 0;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64ISAR0, TLB) != 0;
 }
 
 static inline bool isar_feature_aa64_jscvt(const ARMISARegisters *id)
@@ -917,52 +918,52 @@ static inline bool isar_feature_aa64_doublelock(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_sve2(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa

[PATCH 11/15] arm/cpu: Store id_dfr0/1 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 hw/intc/armv7m_nvic.c |  2 +-
 target/arm/cpu-features.h | 16 
 target/arm/cpu.c  | 13 +
 target/arm/cpu.h  |  2 --
 target/arm/cpu64.c|  4 ++--
 target/arm/helper.c   |  4 ++--
 target/arm/kvm.c  |  6 ++
 target/arm/tcg/cpu-v7m.c  | 12 ++--
 target/arm/tcg/cpu32.c| 30 ++
 target/arm/tcg/cpu64.c| 16 
 10 files changed, 48 insertions(+), 57 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 08529b89a6e0..456a1db62bdd 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -1274,7 +1274,7 @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, 
MemTxAttrs attrs)
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_dfr0;
+return GET_IDREG(idregs, ID_DFR0);
 case 0xd4c: /* AFR0.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index a6eda2a1c554..97c7fee70a7b 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -299,22 +299,22 @@ static inline bool isar_feature_aa32_ats1e1(const 
ARMISARegisters *id)
 static inline bool isar_feature_aa32_pmuv3p1(const ARMISARegisters *id)
 {
 /* 0xf means "non-standard IMPDEF PMU" */
-return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
-FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
+return FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) >= 4 &&
+FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) != 0xf;
 }
 
 static inline bool isar_feature_aa32_pmuv3p4(const ARMISARegisters *id)
 {
 /* 0xf means "non-standard IMPDEF PMU" */
-return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 5 &&
-FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
+return FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) >= 5 &&
+FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) != 0xf;
 }
 
 static inline bool isar_feature_aa32_pmuv3p5(const ARMISARegisters *id)
 {
 /* 0xf means "non-standard IMPDEF PMU" */
-return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 6 &&
-FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
+return FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) >= 6 &&
+FIELD_EX32_IDREG(&id->idregs, ID_DFR0, PERFMON) != 0xf;
 }
 
 static inline bool isar_feature_aa32_hpd(const ARMISARegisters *id)
@@ -359,12 +359,12 @@ static inline bool isar_feature_aa32_ssbs(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa32_debugv7p1(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_dfr0, ID_DFR0, COPDBG) >= 5;
+return FIELD_EX32_IDREG(&id->idregs, ID_DFR0, COPDBG) >= 5;
 }
 
 static inline bool isar_feature_aa32_debugv8p2(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_dfr0, ID_DFR0, COPDBG) >= 8;
+return FIELD_EX32_IDREG(&id->idregs, ID_DFR0, COPDBG) >= 8;
 }
 
 static inline bool isar_feature_aa32_doublelock(const ARMISARegisters *id)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index da11b59ba843..bfca468fb342 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2341,7 +2341,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
  * feature registers as well.
  */
 FIELD_DP32_IDREG(idregs, ID_PFR1, SECURITY, 0);
-cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
+FIELD_DP32_IDREG(idregs, ID_DFR0, COPSDBG, 0);
 FIELD_DP64_IDREG(idregs, ID_AA64PFR0, EL3, 0);
 
 /* Disable the realm management extension, which requires EL3. */
@@ -2369,7 +2369,7 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 #endif
 } else {
 FIELD_DP64_IDREG(idregs, ID_AA64DFR0, PMUVER, 0);
-cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
+FIELD_DP32_IDREG(idregs, ID_DFR0, PERFMON, 0);
 cpu->pmceid0 = 0;
 cpu->pmceid1 = 0;
 }
@@ -2432,15 +2432,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEBUFFER, 0);
 /* FEAT_TRF (Self-hosted Trace Extension) */
 FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEFILT, 0);
-cpu->isar.id_dfr0 =
-FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, TRACEFILT, 0);
+FIELD_DP32_IDREG(idregs, ID_DFR0, TRACEFILT, 0);
 /* Trace Macrocell system register access */
 FIELD_DP64_IDREG(idregs, ID_AA64DFR0, TRACEVER, 0);
-cpu->isar.id_dfr0 =
-FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPTRC, 0);
+FIELD_DP32_IDREG(idregs, ID_DFR0, COPTRC, 0);
 /* Memory mapped trace */
-cpu->isar.id_dfr0 =
-FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, MMAPTRC, 0);
+FIELD_DP32_IDREG(idregs, ID

[PATCH 08/15] arm/cpu: Store aa64smfr0 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-features.h | 6 +++---
 target/arm/cpu.h  | 1 -
 target/arm/cpu64.c| 7 ++-
 target/arm/helper.c   | 2 +-
 target/arm/kvm.c  | 3 +--
 target/arm/tcg/cpu64.c| 4 ++--
 6 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 05de9e0d9932..6224c7ec6356 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -968,17 +968,17 @@ static inline bool isar_feature_aa64_sve_f64mm(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa64_sme_f64f64(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, F64F64);
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64SMFR0, F64F64);
 }
 
 static inline bool isar_feature_aa64_sme_i16i64(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, I16I64) == 0xf;
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64SMFR0, I16I64) == 0xf;
 }
 
 static inline bool isar_feature_aa64_sme_fa64(const ARMISARegisters *id)
 {
-return FIELD_EX64(id->id_aa64smfr0, ID_AA64SMFR0, FA64);
+return FIELD_EX64_IDREG(&id->idregs, ID_AA64SMFR0, FA64);
 }
 
 /*
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 99b0c2a4b39d..82db0d429c91 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1067,7 +1067,6 @@ struct ArchCPU {
 uint32_t dbgdidr;
 uint32_t dbgdevid;
 uint32_t dbgdevid1;
-uint64_t id_aa64smfr0;
 uint64_t reset_pmcr_el0;
 uint64_t idregs[NUM_ID_IDX];
 } isar;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 22286a1844a4..5c3ca3ba7af1 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -306,7 +306,7 @@ void arm_cpu_sme_finalize(ARMCPU *cpu, Error **errp)
 
 if (vq_map == 0) {
 if (!cpu_isar_feature(aa64_sme, cpu)) {
-cpu->isar.id_aa64smfr0 = 0;
+SET_IDREG(&cpu->isar.idregs, ID_AA64SMFR0, 0);
 return;
 }
 
@@ -359,11 +359,8 @@ static bool cpu_arm_get_sme_fa64(Object *obj, Error **errp)
 static void cpu_arm_set_sme_fa64(Object *obj, bool value, Error **errp)
 {
 ARMCPU *cpu = ARM_CPU(obj);
-uint64_t t;
 
-t = cpu->isar.id_aa64smfr0;
-t = FIELD_DP64(t, ID_AA64SMFR0, FA64, value);
-cpu->isar.id_aa64smfr0 = t;
+FIELD_DP64_IDREG(&cpu->isar.idregs, ID_AA64SMFR0, FA64, value);
 }
 
 #ifdef CONFIG_USER_ONLY
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 437ba8a53934..7c2953a971b6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7725,7 +7725,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 5,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = cpu->isar.id_aa64smfr0 },
+  .resetvalue = GET_IDREG(idregs, ID_AA64SMFR0)},
 { .name = "ID_AA64PFR6_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 6,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 7597c84ff2ce..b3092335a118 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -361,8 +361,7 @@ static bool 
kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 err = 0;
 } else {
 err |= get_host_cpu_reg64(fd, ahcf, SYS_ID_AA64PFR1_EL1);
-err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64smfr0,
-  ARM64_SYS_REG(3, 0, 0, 4, 5));
+err |= get_host_cpu_reg64(fd, ahcf, SYS_ID_AA64SMFR0_EL1);
 err |= get_host_cpu_reg64(fd, ahcf, SYS_ID_AA64DFR0_EL1);
 err |= get_host_cpu_reg64(fd, ahcf, SYS_ID_AA64DFR1_EL1);
 err |= get_host_cpu_reg64(fd, ahcf, SYS_ID_AA64ISAR0_EL1);
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index ce4cb449a381..38d189361e3e 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1266,7 +1266,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64DFR0, HPMN0, 1); /* FEAT_HPMN0 */
 SET_IDREG(idregs, ID_AA64DFR0, t);
 
-t = cpu->isar.id_aa64smfr0;
+t = GET_IDREG(idregs, ID_AA64SMFR0);
 t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1);   /* FEAT_SME */
 t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1);   /* FEAT_SME */
 t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1);   /* FEAT_SME */
@@ -1274,7 +1274,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1);   /* FEAT_SME_F64F64 */
 t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */
 t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */
-cpu->isar.id_aa64smfr0 = t;
+SET_IDREG(idregs, ID_AA64SMFR0, t);
 
 /* Replicate the same data to the 32-bit id registers.  */
 aa32_max_features(cpu);
-- 
2.48.1

[PATCH 13/15] arm/cpu: Add infra to handle generated ID register definitions

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

The known ID regs are described in a new initialization function
dubbed initialize_cpu_sysreg_properties(). That code will be
automatically generated from linux arch/arm64/tools/sysreg. For the
time being let's just describe a single id reg, CTR_EL0. In this
description we only care about non RES/RAZ fields, ie. named fields.

The registers are populated in an array indexed by ARMIDRegisterIdx
and their fields are added in a sorted list.

[CH: adapted to reworked register storage]
Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 target/arm/cpu-custom.h| 55 ++
 target/arm/cpu-sysreg-properties.c | 41 ++
 target/arm/cpu64.c |  2 ++
 target/arm/meson.build |  1 +
 4 files changed, 99 insertions(+)
 create mode 100644 target/arm/cpu-custom.h
 create mode 100644 target/arm/cpu-sysreg-properties.c

diff --git a/target/arm/cpu-custom.h b/target/arm/cpu-custom.h
new file mode 100644
index ..17533765dacd
--- /dev/null
+++ b/target/arm/cpu-custom.h
@@ -0,0 +1,55 @@
+#ifndef ARM_CPU_CUSTOM_H
+#define ARM_CPU_CUSTOM_H
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "cpu.h"
+#include "cpu-sysregs.h"
+
+typedef struct ARM64SysRegField {
+const char *name; /* name of the field, for instance CTR_EL0_IDC */
+int index;
+int lower;
+int upper;
+} ARM64SysRegField;
+
+typedef struct ARM64SysReg {
+const char *name;   /* name of the sysreg, for instance CTR_EL0 */
+ARMSysRegs sysreg;
+int index;
+GList *fields; /* list of named fields, excluding RES* */
+} ARM64SysReg;
+
+void initialize_cpu_sysreg_properties(void);
+
+/*
+ * List of exposed ID regs (automatically populated from linux
+ * arch/arm64/tools/sysreg)
+ */
+extern ARM64SysReg arm64_id_regs[NUM_ID_IDX];
+
+/* Allocate a new field and insert it at the head of the @reg list */
+static inline GList *arm64_sysreg_add_field(ARM64SysReg *reg, const char *name,
+ uint8_t min, uint8_t max) {
+
+ ARM64SysRegField *field = g_new0(ARM64SysRegField, 1);
+
+ field->name = name;
+ field->lower = min;
+ field->upper = max;
+ field->index = reg->index;
+
+ reg->fields = g_list_append(reg->fields, field);
+ return reg->fields;
+}
+
+static inline ARM64SysReg *arm64_sysreg_get(ARMIDRegisterIdx index)
+{
+ARM64SysReg *reg = &arm64_id_regs[index];
+
+reg->index = index;
+reg->sysreg = id_register_sysreg[index];
+return reg;
+}
+
+#endif
diff --git a/target/arm/cpu-sysreg-properties.c 
b/target/arm/cpu-sysreg-properties.c
new file mode 100644
index ..8b7ef5badfb9
--- /dev/null
+++ b/target/arm/cpu-sysreg-properties.c
@@ -0,0 +1,41 @@
+/*
+ * QEMU ARM CPU SYSREG PROPERTIES
+ * to be generated from linux sysreg
+ *
+ * Copyright (c) Red Hat, Inc. 2024
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ * 
+ */
+
+#include "cpu-custom.h"
+
+ARM64SysReg arm64_id_regs[NUM_ID_IDX];
+
+void initialize_cpu_sysreg_properties(void)
+{
+memset(arm64_id_regs, 0, sizeof(ARM64SysReg) * NUM_ID_IDX);
+/* CTR_EL0 */
+ARM64SysReg *CTR_EL0 = arm64_sysreg_get(CTR_EL0_IDX);
+CTR_EL0->name = "CTR_EL0";
+arm64_sysreg_add_field(CTR_EL0, "TminLine", 32, 37);
+arm64_sysreg_add_field(CTR_EL0, "DIC", 29, 29);
+arm64_sysreg_add_field(CTR_EL0, "IDC", 28, 28);
+arm64_sysreg_add_field(CTR_EL0, "CWG", 24, 27);
+arm64_sysreg_add_field(CTR_EL0, "ERG", 20, 23);
+arm64_sysreg_add_field(CTR_EL0, "DminLine", 16, 19);
+arm64_sysreg_add_field(CTR_EL0, "L1Ip", 14, 15);
+arm64_sysreg_add_field(CTR_EL0, "IminLine", 0, 3);
+}
+
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index beba1733c99f..8371aabce5f4 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -35,6 +35,7 @@
 #include "internals.h"
 #include "cpu-features.h"
 #include "cpregs.h"
+#include "cpu-custom.h"
 
 void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 {
@@ -894,6 +895,7 @@ static void aarch64_cpu_register_types(void)
 {
 size_t i;
 
+initialize_cpu_sysreg_properties();
 type_register_static(&aarch64_cpu_type_info);
 
 for (i = 0; i < ARRAY_SIZE(aarch64_cpus); ++i) {
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 2e10464dbb6b..9c7a04ee1b26 100644
--- a/target/arm/mes

[PATCH 12/15] arm/cpu: Store id_mmfr0-5 into the idregs array

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 hw/intc/armv7m_nvic.c |  8 ++--
 target/arm/cpu-features.h | 18 
 target/arm/cpu.h  |  6 ---
 target/arm/cpu64.c| 16 +++
 target/arm/helper.c   | 12 ++---
 target/arm/kvm.c  | 18 +++-
 target/arm/tcg/cpu-v7m.c  | 48 ++--
 target/arm/tcg/cpu32.c| 94 +++
 target/arm/tcg/cpu64.c| 76 +++
 9 files changed, 140 insertions(+), 156 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index 456a1db62bdd..86e18cac116c 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -1284,22 +1284,22 @@ static uint32_t nvic_readl(NVICState *s, uint32_t 
offset, MemTxAttrs attrs)
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr0;
+return GET_IDREG(idregs, ID_MMFR0);
 case 0xd54: /* MMFR1.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr1;
+return GET_IDREG(idregs, ID_MMFR1);
 case 0xd58: /* MMFR2.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr2;
+return GET_IDREG(idregs, ID_MMFR2);
 case 0xd5c: /* MMFR3.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
 }
-return cpu->isar.id_mmfr3;
+return GET_IDREG(idregs, ID_MMFR3);
 case 0xd60: /* ISAR0.  */
 if (!arm_feature(&cpu->env, ARM_FEATURE_M_MAIN)) {
 goto bad_offset;
diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 97c7fee70a7b..90ada4c2d227 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -283,17 +283,17 @@ static inline bool isar_feature_aa32_vminmaxnm(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa32_pxn(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr0, ID_MMFR0, VMSA) >= 4;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR0, VMSA) >= 4;
 }
 
 static inline bool isar_feature_aa32_pan(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR3, PAN) != 0;
 }
 
 static inline bool isar_feature_aa32_ats1e1(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) >= 2;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR3, PAN) >= 2;
 }
 
 static inline bool isar_feature_aa32_pmuv3p1(const ARMISARegisters *id)
@@ -319,32 +319,32 @@ static inline bool isar_feature_aa32_pmuv3p5(const 
ARMISARegisters *id)
 
 static inline bool isar_feature_aa32_hpd(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, HPDS) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, HPDS) != 0;
 }
 
 static inline bool isar_feature_aa32_ac2(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, AC2) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, AC2) != 0;
 }
 
 static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, CCIDX) != 0;
 }
 
 static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, XNX) != 0;
 }
 
 static inline bool isar_feature_aa32_half_evt(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, EVT) >= 1;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, EVT) >= 1;
 }
 
 static inline bool isar_feature_aa32_evt(const ARMISARegisters *id)
 {
-return FIELD_EX32(id->id_mmfr4, ID_MMFR4, EVT) >= 2;
+return FIELD_EX32_IDREG(&id->idregs, ID_MMFR4, EVT) >= 2;
 }
 
 static inline bool isar_feature_aa32_dit(const ARMISARegisters *id)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 215ebf165e6b..f6e1836d0fdd 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1043,12 +1043,6 @@ struct ArchCPU {
  * field by reading the value from the KVM vCPU.
  */
 struct ARMISARegisters {
-uint32_t id_mmfr0;
-uint32_t id_mmfr1;
-uint32_t id_mmfr2;
-uint32_t id_mmfr3;
-uint32_t id_mmfr4;
-uint32_t id_mmfr5;
 uint32_t mvfr0;
 uint32_t mvfr1;
 uint32_t mvfr2;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 9f83984fa900..beba1733c99f 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -634,10 +634,10 @@ static void aarch64_a57_initfn(Object *obj)
 SET_IDREG(idregs, ID_PFR1, 0x00011011);
 SET_IDREG(idregs, ID_DFR0, 0x03010066);
 cpu->id_afr0 = 0x;
-cpu->isar.id_mmfr0 = 0x10101105;
-cpu->isar.id_mmfr1 = 0x4000;
-cpu->isa

[PATCH 14/15] arm/cpu: Add sysreg generation scripts

2025-02-07 Thread Cornelia Huck

From: Eric Auger 

Introduce scripts that automate the generation of system register
definitions from a given linux source tree arch/arm64/tools/sysreg.

Invocation of
./update-aarch64-sysreg-code.sh $PATH_TO_LINUX_SOURCE_TREE
in scripts directory do generate 2 qemu files:
- target/arm/cpu-sysreg-properties.c
- target/arm/cpu-sysregs.h.inc

cpu-sysregs.h.inc creates defines for all system registers.
However cpu-sysreg-properties.c only cares about feature ID registers.

update-aarch64-sysreg-code.sh calls two awk scripts.
gen-cpu-sysreg-properties.awk is inherited from kernel
arch/arm64/tools/gen-sysreg.awk. All credits to Mark Rutland
the original author of this script.

[CH: update to handle current kernel sysregs structure, and to emit
 the re-worked register structures]
Signed-off-by: Eric Auger 
Signed-off-by: Cornelia Huck 
---
 scripts/gen-cpu-sysreg-properties.awk | 325 ++
 scripts/gen-cpu-sysregs-header.awk|  70 ++
 scripts/update-aarch64-sysreg-code.sh |  30 +++
 3 files changed, 425 insertions(+)
 create mode 100755 scripts/gen-cpu-sysreg-properties.awk
 create mode 100755 scripts/gen-cpu-sysregs-header.awk
 create mode 100755 scripts/update-aarch64-sysreg-code.sh

diff --git a/scripts/gen-cpu-sysreg-properties.awk 
b/scripts/gen-cpu-sysreg-properties.awk
new file mode 100755
index ..76c37938b168
--- /dev/null
+++ b/scripts/gen-cpu-sysreg-properties.awk
@@ -0,0 +1,325 @@
+#!/bin/awk -f
+# SPDX-License-Identifier: GPL-2.0
+# gen-cpu-sysreg-properties.awk: arm64 sysreg header generator
+#
+# Usage: awk -f gen-cpu-sysreg-properties.awk 
$LINUX_PATH/arch/arm64/tools/sysreg
+
+function block_current() {
+   return __current_block[__current_block_depth];
+}
+
+# Log an error and terminate
+function fatal(msg) {
+   print "Error at " NR ": " msg > "/dev/stderr"
+
+   printf "Current block nesting:"
+
+   for (i = 0; i <= __current_block_depth; i++) {
+   printf " " __current_block[i]
+   }
+   printf "\n"
+
+   exit 1
+}
+
+# Enter a new block, setting the active block to @block
+function block_push(block) {
+   __current_block[++__current_block_depth] = block
+}
+
+# Exit a block, setting the active block to the parent block
+function block_pop() {
+   if (__current_block_depth == 0)
+   fatal("error: block_pop() in root block")
+
+   __current_block_depth--;
+}
+
+# Sanity check the number of records for a field makes sense. If not, produce
+# an error and terminate.
+function expect_fields(nf) {
+   if (NF != nf)
+   fatal(NF " fields found where " nf " expected")
+}
+
+# Print a CPP macro definition, padded with spaces so that the macro bodies
+# line up in a column
+function define(name, val) {
+   printf "%-56s%s\n", "#define " name, val
+}
+
+# Print standard BITMASK/SHIFT/WIDTH CPP definitions for a field
+function define_field(reg, field, msb, lsb, idreg) {
+   if (idreg)
+print "arm64_sysreg_add_field("reg", \""field"\", "lsb", 
"msb");"
+}
+
+# Print a field _SIGNED definition for a field
+function define_field_sign(reg, field, sign, idreg) {
+   if (idreg)
+print "arm64_sysreg_add_field("reg", \""field"\", "lsb", 
"msb");"
+}
+
+# Parse a "[:]" string into the global variables @msb and @lsb
+function parse_bitdef(reg, field, bitdef, _bits)
+{
+   if (bitdef ~ /^[0-9]+$/) {
+   msb = bitdef
+   lsb = bitdef
+   } else if (split(bitdef, _bits, ":") == 2) {
+   msb = _bits[1]
+   lsb = _bits[2]
+   } else {
+   fatal("invalid bit-range definition '" bitdef "'")
+   }
+
+
+   if (msb != next_bit)
+   fatal(reg "." field " starts at " msb " not " next_bit)
+   if (63 < msb || msb < 0)
+   fatal(reg "." field " invalid high bit in '" bitdef "'")
+   if (63 < lsb || lsb < 0)
+   fatal(reg "." field " invalid low bit in '" bitdef "'")
+   if (msb < lsb)
+   fatal(reg "." field " invalid bit-range '" bitdef "'")
+   if (low > high)
+   fatal(reg "." field " has invalid range " high "-" low)
+
+   next_bit = lsb - 1
+}
+
+BEGIN {
+   print "#include \"cpu-custom.h\""
+   print ""
+   print "ARM64SysReg arm64_id_regs[NUM_ID_IDX];"
+   print ""
+   print "void initialize_cpu_sysreg_properties(void)"
+   print "{"
+print "memset(arm64_id_regs, 0, sizeof(ARM64SysReg) * NUM_ID_IDX);"
+print ""
+
+   __current_block_depth = 0
+   __current_block[__current_block_depth] = "Root"
+}
+
+END {
+   if (__current_block_depth != 0)
+   fatal("Missing terminator for " block_current() " block")
+
+   print "}"
+}
+
+# skip blank lines and comment lines
+/^$/ { next }
+/^[\t ]*#/ { next }
+
+/^SysregFields/ && block_current() == "Root" {
+   block_push("SysregFields")
+
+   expect_fields(2)
+
+   reg = $2
+
+

Re: [PATCH 0/5] Fix vIOMMU reset order

2025-02-07 Thread Michael S. Tsirkin

On Thu, Feb 06, 2025 at 03:21:51PM +0100, Eric Auger wrote:
> This is a follow-up of Peter's attempt to fix the fact that
> vIOMMUs are likely to be reset before the device they protect:
> 
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/20240117091559.144730-1-pet...@redhat.com/
> 
> This is especially observed with virtio devices when a qmp system_reset
> command is sent but also with VFIO devices.
> 
> This series puts the vIOMMU reset in the 3-phase exit callback.
> 
> This scheme was tested successful with virtio-devices and some
> VFIO devices. Nevertheless not all the topologies have been
> tested yet.
> 
> Best Regards
> 
> Eric



Looks good.


Acked-by: Michael S. Tsirkin 

How should this be merged?
I supposed I can merge the 1st three and the other
two by the respective maintainers?
I don't think there's a dependency here, right?

> This series can be found at:
> https://github.com/eauger/qemu/tree/viommu-3phase-reset-v1
> 
> Eric Auger (4):
>   hw/virtio/virtio-iommu: Migrate to 3-phase reset
>   hw/i386/intel-iommu: Migrate to 3-phase reset
>   hw/arm/smmuv3: Move reset to exit phase
>   hw/vfio/common: Add a trace point in vfio_reset_handler
> 
> Peter Xu (1):
>   hw/i386/intel_iommu: Tear down address spaces before IOMMU reset
> 
>  hw/arm/smmuv3.c  |  9 +
>  hw/i386/intel_iommu.c| 10 ++
>  hw/vfio/common.c |  1 +
>  hw/virtio/virtio-iommu.c |  9 +
>  hw/arm/trace-events  |  1 +
>  hw/i386/trace-events |  1 +
>  hw/vfio/trace-events |  1 +
>  hw/virtio/trace-events   |  2 +-
>  8 files changed, 21 insertions(+), 13 deletions(-)
> 
> -- 
> 2.47.1

1 2 3 >

1 - 100 of 261 matches

Mail list logo