date:20240429


On 25.04.24 23:30, Fabiano Rosas wrote:

@@ -797,13 +801,18 @@ fail:
MIGRATION_STATUS_FAILED);
  migration_incoming_state_destroy();
  
-if (migrate_has_error(s)) {

-WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-error_report_err(s->error);
+if (mis->exit_on_error) {
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+error_report_err(s->error);

error_report_err(error_copy(s->error))

...because later on you're reading from s->error at
fill_destination_migration_info.


No, we immediately do exit() instead. That's just a preexisting behavior, moved into 
"if (mis->exit_on_error)"

--
Best regards,
Vladimir

[PATCH v4 1/4] migration: move trace-point from migrate_fd_error to migrate_set_error

Cover more cases by trace-point.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c  | 4 +++-
 migration/trace-events | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index b5af6b5105..2dc6a063e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1421,6 +1421,9 @@ static void migrate_fd_cleanup_bh(void *opaque)
 void migrate_set_error(MigrationState *s, const Error *error)
 {
 QEMU_LOCK_GUARD(&s->error_mutex);
+
+trace_migrate_error(error_get_pretty(error));
+
 if (!s->error) {
 s->error = error_copy(error);
 }
@@ -1444,7 +1447,6 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
-trace_migrate_fd_error(error_get_pretty(error));
 assert(s->to_dst_file == NULL);
 migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
   MIGRATION_STATUS_FAILED);
diff --git a/migration/trace-events b/migration/trace-events
index f0e1cb80c7..d0c44c3853 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -152,7 +152,7 @@ multifd_set_outgoing_channel(void *ioc, const char 
*ioctype, const char *hostnam
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
-migrate_fd_error(const char *error_desc) "error=%s"
+migrate_error(const char *error_desc) "error=%s"
 migrate_fd_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in 
%s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact 
pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.34.1

[PATCH v4 2/4] migration: process_incoming_migration_co(): complete cleanup on failure

Make call to migration_incoming_state_destroy(), instead of doing only
partial of it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2dc6a063e9..0d26db47f7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -799,10 +799,7 @@ process_incoming_migration_co(void *opaque)
 fail:
 migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_FAILED);
-qemu_fclose(mis->from_src_file);
-
-multifd_recv_cleanup();
-compress_threads_load_cleanup();
+migration_incoming_state_destroy();
 
 exit(EXIT_FAILURE);
 }
-- 
2.34.1

[PATCH v4 0/4] migration: do not exit on incoming failure

Hi all!

The series brings an option to not immediately exit on incoming
migration failure, giving a possibility to orchestrator to get the error
through QAPI and shutdown QEMU by "quit".

v4:
- add r-b and a-b by Fabiano and Markus
- improve wording in 04 as Markus suggested

v3:
- don't refactor the whole code around setting migration error, it seems
  too much and necessary for the new feature itself
- add constant
- change behavior for HMP command
- split some things to separate patches
- and more, by Peter's suggestions


New behavior can be demonstrated like this:

bash:

(
cat <

[PATCH v4 3/4] migration: process_incoming_migration_co(): rework error reporting

Unify error reporting in the function. This simplifies the following
commit, which will not-exit-on-error behavior variant to the function.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0d26db47f7..0b15f7ccf4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -735,14 +735,16 @@ static void process_incoming_migration_bh(void *opaque)
 static void coroutine_fn
 process_incoming_migration_co(void *opaque)
 {
+MigrationState *s = migrate_get_current();
 MigrationIncomingState *mis = migration_incoming_get_current();
 PostcopyState ps;
 int ret;
+Error *local_err = NULL;
 
 assert(mis->from_src_file);
 
 if (compress_threads_load_setup(mis->from_src_file)) {
-error_report("Failed to setup decompress threads");
+error_setg(&local_err, "Failed to setup decompress threads");
 goto fail;
 }
 
@@ -779,18 +781,12 @@ process_incoming_migration_co(void *opaque)
 }
 
 if (ret < 0) {
-MigrationState *s = migrate_get_current();
-
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-error_report_err(s->error);
-}
-}
-error_report("load of migration failed: %s", strerror(-ret));
+error_setg(&local_err, "load of migration failed: %s", strerror(-ret));
 goto fail;
 }
 
 if (colo_incoming_co() < 0) {
+error_setg(&local_err, "colo incoming failed");
 goto fail;
 }
 
@@ -801,6 +797,12 @@ fail:
   MIGRATION_STATUS_FAILED);
 migration_incoming_state_destroy();
 
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+error_report_err(s->error);
+}
+}
+error_report_err(local_err);
 exit(EXIT_FAILURE);
 }
 
-- 
2.34.1

[PATCH v4 4/4] qapi: introduce exit-on-error parameter for migrate-incoming

Now we do set MIGRATION_FAILED state, but don't give a chance to
orchestrator to query migration state and get the error.

Let's provide a possibility for QMP-based orchestrators to get an error
like with outgoing migration.

For hmp_migrate_incoming(), let's enable the new behavior: HMP is not
and ABI, it's mostly intended to use by developer and it makes sense
not to stop the process.

For x-exit-preconfig, let's keep the old behavior:
 - it's called from init(), so here we want to keep current behavior by
   default
 - it does exit on error by itself as well
So, if we want to change the behavior of x-exit-preconfig, it should be
another patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Markus Armbruster 
---
 migration/migration-hmp-cmds.c |  2 +-
 migration/migration.c  | 36 --
 migration/migration.h  |  3 +++
 qapi/migration.json|  7 ++-
 system/vl.c|  3 ++-
 5 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..23181bbee1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -466,7 +466,7 @@ void hmp_migrate_incoming(Monitor *mon, const QDict *qdict)
 }
 QAPI_LIST_PREPEND(caps, g_steal_pointer(&channel));
 
-qmp_migrate_incoming(NULL, true, caps, &err);
+qmp_migrate_incoming(NULL, true, caps, true, false, &err);
 qapi_free_MigrationChannelList(caps);
 
 end:
diff --git a/migration/migration.c b/migration/migration.c
index 0b15f7ccf4..5cfe420a76 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -72,6 +72,8 @@
 #define NOTIFIER_ELEM_INIT(array, elem)\
 [elem] = NOTIFIER_WITH_RETURN_LIST_INITIALIZER((array)[elem])
 
+#define INMIGRATE_DEFAULT_EXIT_ON_ERROR true
+
 static NotifierWithReturnList migration_state_notifiers[] = {
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
@@ -234,6 +236,8 @@ void migration_object_init(void)
 qemu_cond_init(¤t_incoming->page_request_cond);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
+current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 migration_object_check(current_migration, &error_fatal);
 
 blk_mig_init();
@@ -797,13 +801,18 @@ fail:
   MIGRATION_STATUS_FAILED);
 migration_incoming_state_destroy();
 
-if (migrate_has_error(s)) {
-WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
-error_report_err(s->error);
+if (mis->exit_on_error) {
+if (migrate_has_error(s)) {
+WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+error_report_err(s->error);
+}
 }
+error_report_err(local_err);
+exit(EXIT_FAILURE);
+} else {
+migrate_set_error(s, local_err);
+error_free(local_err);
 }
-error_report_err(local_err);
-exit(EXIT_FAILURE);
 }
 
 /**
@@ -1312,6 +1321,15 @@ static void 
fill_destination_migration_info(MigrationInfo *info)
 break;
 }
 info->status = mis->state;
+
+if (!info->error_desc) {
+MigrationState *s = migrate_get_current();
+QEMU_LOCK_GUARD(&s->error_mutex);
+
+if (s->error) {
+info->error_desc = g_strdup(error_get_pretty(s->error));
+}
+}
 }
 
 MigrationInfo *qmp_query_migrate(Error **errp)
@@ -1795,10 +1813,13 @@ void migrate_del_blocker(Error **reasonp)
 }
 
 void qmp_migrate_incoming(const char *uri, bool has_channels,
-  MigrationChannelList *channels, Error **errp)
+  MigrationChannelList *channels,
+  bool has_exit_on_error, bool exit_on_error,
+  Error **errp)
 {
 Error *local_err = NULL;
 static bool once = true;
+MigrationIncomingState *mis = migration_incoming_get_current();
 
 if (!once) {
 error_setg(errp, "The incoming migration has already been started");
@@ -1813,6 +1834,9 @@ void qmp_migrate_incoming(const char *uri, bool 
has_channels,
 return;
 }
 
+mis->exit_on_error =
+has_exit_on_error ? exit_on_error : INMIGRATE_DEFAULT_EXIT_ON_ERROR;
+
 qemu_start_incoming_migration(uri, has_channels, channels, &local_err);
 
 if (local_err) {
diff --git a/migration/migration.h b/migration/migration.h
index 8045e39c26..95995a818e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -227,6 +227,9 @@ struct MigrationIncomingState {
  * is needed as this field is updated serially.
  */
 unsigned int switchover_ack_pending_num;
+
+/* Do exit on incoming migration failure */
+bool exit_on_error;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8c65b90328..9feed413b5 100644
--- a/qapi/migrati

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

fan  writes:

> On Fri, Apr 26, 2024 at 11:12:50AM +0200, Markus Armbruster wrote:
>> nifan@gmail.com writes:

[...]

>> > diff --git a/qapi/cxl.json b/qapi/cxl.json
>> > index 4281726dec..2dcf03d973 100644
>> > --- a/qapi/cxl.json
>> > +++ b/qapi/cxl.json
>> > @@ -361,3 +361,72 @@
>> >  ##
>> >  {'command': 'cxl-inject-correctable-error',
>> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
>> > +
>> > +##
>> > +# @CXLDCExtentRecord:
>> 
>> Such traffic jams of capital letters are hard to read.  What about
>> CxlDynamicCapacityExtent?
>> 
>> > +#
>> > +# Record of a single extent to add/release
>> 
>> Suggest "A dynamic capacity extent."
>> 
>> > +#
>> > +# @offset: offset to the start of the region where the extent to be 
>> > operated
>> 
>> Blank line here, please.
>> 
>> 
>> 
>> > +# @len: length of the extent
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'struct': 'CXLDCExtentRecord',
>> > +  'data': {
>> > +  'offset':'uint64',
>> > +  'len': 'uint64'
>> > +  }
>> > +}
>> > +
>> > +##
>> > +# @cxl-add-dynamic-capacity:
>> > +#
>> > +# Command to start add dynamic capacity extents flow. The device will
>> > +# have to acknowledged the acceptance of the extents before they are 
>> > usable.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Yes. I will work on it for the next version once all the feedbacks
> are collected and comments are resolved.
>
> See below.
>
>> 
>> docs/devel/qapi-code-gen.rst:
>> 
>> For legibility, wrap text paragraphs so every line is at most 70
>> characters long.
>> 
>> Separate sentences with two spaces.
>> 
>> More elsewhere.
>> 
>> > +#
>> > +# @path: CXL DCD canonical QOM path
>> 
>> I'd prefer @qom-path, unless you can make a consistency argument for
>> @path.
>> 
>> Sure the QOM path needs to be canonical?
>> 
>> If not, what about "path to the CXL dynamic capacity device in the QOM
>> tree".  Intentionally close to existing descriptions of @qom-path
>> elsewhere.
>
> From the same file, I saw "path" was used for other commands, like
> "cxl-inject-memory-module-event", so I followed it.
> DCD is nothing different from "type 3 device" expect it can dynamically
> change capacity. 
> Renaming it to "qom-path" is no problem for me, just want to make sure it
> will not break the naming consistency.

Both @path and @qom-path are used (sadly).  @path is used for all kinds
of paths, whereas @qom-path is only used for QOM paths.  That's why I
prefer it.

However, you're making a compelling local consistency argument: cxl.json
uses only @path.  Sticking to that makes sense.

>> > +# @hid: host id
>> 
>> @host-id, unless "HID" is established terminology in CXL DCD land.
>
> host-id works.
>> 
>> What is a host ID?
>
> It is an id identifying the host to which the capacity is being added.

How are these IDs assigned?

>> > +# @selection-policy: policy to use for selecting extents for adding 
>> > capacity
>> 
>> Where are selection policies defined?
>
> It is defined in CXL specification: Specifies the policy to use for selecting
> which extents comprise the added capacity

Include a reference to the spec here?

>> > +# @region-id: id of the region where the extent to add
>> 
>> Is "region ID" the established terminology in CXL DCD land?  Or is
>> "region number" also used?  I'm asking because "ID" in this QEMU device
>> context suggests a connection to a qdev ID.
>> 
>> If region number is fine, I'd rename to just @region, and rephrase the
>> description to avoid "ID".  Perhaps "number of the region the extent is
>> to be added to".  Not entirely happy with the phrasing, doesn't exactly
>> roll off the tongue, but "where the extent to add" sounds worse to my
>> ears.  Mind, I'm not a native speaker.
>
> Yes. region number is fine. Will rename it as "region"
>
>> 
>> > +# @tag: Context field
>> 
>> What is this about?
>
> Based on the specification, it is "Context field utilized by implementations
> that make use of the Dynamic Capacity feature.". Basically, it is a
> string (label) attached to an dynamic capacity extent so we can achieve
> specific purpose, like identifying or grouping extents.

Include a reference to the spec here?

>> > +# @extents: Extents to add
>> 
>> Blank lines between argument descriptions, please.
>> 
>> > +#
>> > +# Since : 9.1
>> > +##
>> > +{ 'command': 'cxl-add-dynamic-capacity',
>> > +  'data': { 'path': 'str',
>> > +'hid': 'uint16',
>> > +'selection-policy': 'uint8',
>> > +'region-id': 'uint8',
>> > +'tag': 'str',
>> > +'extents': [ 'CXLDCExtentRecord' ]
>> > +   }
>> > +}
>> > +
>> > +##
>> > +# @cxl-release-dynamic-capacity:
>> > +#
>> > +# Command to start release dynamic capacity extents flow. The host will
>> > +# need to respond to indicate that it has released the capacity before it
>> > +# is made unavailable for read and write and can be re-added.
>> 
>> This text needs work.  More on that at the end of my review.
>
> Will

Re: [PATCH v4] target/riscv: Implement dynamic establishment of custom decoder

2024-04-29 Thread Huang Tao




On 2024/4/29 11:51, Alistair Francis wrote:

On Thu, Mar 14, 2024 at 7:23 PM Huang Tao  wrote:

In this patch, we modify the decoder to be a freely composable data
structure instead of a hardcoded one. It can be dynamically builded up
according to the extensions.
This approach has several benefits:
1. Provides support for heterogeneous cpu architectures. As we add decoder in
RISCVCPU, each cpu can have their own decoder, and the decoders can be
different due to cpu's features.
2. Improve the decoding efficiency. We run the guard_func to see if the decoder
can be added to the dynamic_decoder when building up the decoder. Therefore,
there is no need to run the guard_func when decoding each instruction. It 
can
improve the decoding efficiency
3. For vendor or dynamic cpus, it allows them to customize their own decoder
functions to improve decoding efficiency, especially when vendor-defined
instruction sets increase. Because of dynamic building up, it can skip the 
other
decoder guard functions when decoding.
4. Pre patch for allowing adding a vendor decoder before decode_insn32() with 
minimal
overhead for users that don't need this particular vendor decoder.

Signed-off-by: Huang Tao 
Suggested-by: Christoph Muellner 
Co-authored-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 

Do you mind rebasing this on
https://github.com/alistair23/qemu/tree/riscv-to-apply.next?

Alistair


I will rebase this patch on the latest riscv-to-apply.next.

Thanks


---
Changes in v4:
- fix typo
- rename function
- add 'if tcg_enable()'
- move function to tcg-cpu.c and declarations to tcg-cpu.h

Changes in v3:
- use GPtrArray to save decode function poionter list.
---
  target/riscv/cpu.c |  1 +
  target/riscv/cpu.h |  1 +
  target/riscv/tcg/tcg-cpu.c | 15 +++
  target/riscv/tcg/tcg-cpu.h | 15 +++
  target/riscv/translate.c   | 31 +++
  5 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c160b9216b..17070b82a7 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1132,6 +1132,7 @@ void riscv_cpu_finalize_features(RISCVCPU *cpu, Error 
**errp)
  error_propagate(errp, local_err);
  return;
  }
+riscv_tcg_cpu_finalize_dynamic_decoder(cpu);
  } else if (kvm_enabled()) {
  riscv_kvm_cpu_finalize_features(cpu, &local_err);
  if (local_err != NULL) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3b1a02b944..48e67410e1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -457,6 +457,7 @@ struct ArchCPU {
  uint32_t pmu_avail_ctrs;
  /* Mapping of events to counters */
  GHashTable *pmu_event_ctr_map;
+const GPtrArray *decoders;
  };

  /**
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index ab6db817db..c9ab92ea2f 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -853,6 +853,21 @@ void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error 
**errp)
  }
  }

+void riscv_tcg_cpu_finalize_dynamic_decoder(RISCVCPU *cpu)
+{
+GPtrArray *dynamic_decoders;
+dynamic_decoders = g_ptr_array_sized_new(decoder_table_size);
+for (size_t i = 0; i < decoder_table_size; ++i) {
+if (decoder_table[i].guard_func &&
+decoder_table[i].guard_func(&cpu->cfg)) {
+g_ptr_array_add(dynamic_decoders,
+(gpointer)decoder_table[i].riscv_cpu_decode_fn);
+}
+}
+
+cpu->decoders = dynamic_decoders;
+}
+
  bool riscv_cpu_tcg_compatible(RISCVCPU *cpu)
  {
  return object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST) == NULL;
diff --git a/target/riscv/tcg/tcg-cpu.h b/target/riscv/tcg/tcg-cpu.h
index f7b32417f8..ce94253fe4 100644
--- a/target/riscv/tcg/tcg-cpu.h
+++ b/target/riscv/tcg/tcg-cpu.h
@@ -26,4 +26,19 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error 
**errp);
  void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
  bool riscv_cpu_tcg_compatible(RISCVCPU *cpu);

+struct DisasContext;
+struct RISCVCPUConfig;
+typedef struct RISCVDecoder {
+bool (*guard_func)(const struct RISCVCPUConfig *);
+bool (*riscv_cpu_decode_fn)(struct DisasContext *, uint32_t);
+} RISCVDecoder;
+
+typedef bool (*riscv_cpu_decode_fn)(struct DisasContext *, uint32_t);
+
+extern const size_t decoder_table_size;
+
+extern const RISCVDecoder decoder_table[];
+
+void riscv_tcg_cpu_finalize_dynamic_decoder(RISCVCPU *cpu);
+
  #endif
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index ea5d52b2ef..bce16d5054 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -37,6 +37,8 @@
  #include "exec/helper-info.c.inc"
  #undef  HELPER_H

+#include "tcg/tcg-cpu.h"
+
  /* global register indices */
  static TCGv cpu_gpr[32], cpu_gprh[32], cpu_pc, cpu_vl, cpu_vstart;
  static TCGv_i64 cpu_fpr[32]; /* assume F and D extensions */
@@ -117,

[PATCH] hw/char/stm32l4x5_usart: Fix memory corruption by adding correct class_size

2024-04-29 Thread Thomas Huth

"make check-qtest-aarch64" recently started failing on FreeBSD builds,
and valgrind on Linux also detected that there is something fishy with
the new stm32l4x5-usart: The code forgot to set the correct class_size
here, so the various class_init functions in this file wrote beyond
the allocated buffer when setting the subc->type field.

Fixes: 4fb37aea7e ("hw/char: Implement STM32L4x5 USART skeleton")
Signed-off-by: Thomas Huth 
---
 hw/char/stm32l4x5_usart.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
index 2627aab832..02f666308c 100644
--- a/hw/char/stm32l4x5_usart.c
+++ b/hw/char/stm32l4x5_usart.c
@@ -617,6 +617,7 @@ static const TypeInfo stm32l4x5_usart_types[] = {
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size  = sizeof(Stm32l4x5UsartBaseState),
 .instance_init  = stm32l4x5_usart_base_init,
+.class_size = sizeof(Stm32l4x5UsartBaseClass),
 .class_init = stm32l4x5_usart_base_class_init,
 .abstract   = true,
 }, {
-- 
2.44.0

Re: [RFC 1/2] iova_tree: add an id member to DMAMap

2024-04-29 Thread Eugenio Perez Martin

On Thu, Apr 25, 2024 at 7:44 PM Si-Wei Liu  wrote:
>
>
>
> On 4/24/2024 12:33 AM, Eugenio Perez Martin wrote:
> > On Wed, Apr 24, 2024 at 12:21 AM Si-Wei Liu  wrote:
> >>
> >>
> >> On 4/22/2024 1:49 AM, Eugenio Perez Martin wrote:
> >>> On Sat, Apr 20, 2024 at 1:50 AM Si-Wei Liu  wrote:
> 
>  On 4/19/2024 1:29 AM, Eugenio Perez Martin wrote:
> > On Thu, Apr 18, 2024 at 10:46 PM Si-Wei Liu  
> > wrote:
> >> On 4/10/2024 3:03 AM, Eugenio Pérez wrote:
> >>> IOVA tree is also used to track the mappings of virtio-net shadow
> >>> virtqueue.  This mappings may not match with the GPA->HVA ones.
> >>>
> >>> This causes a problem when overlapped regions (different GPA but same
> >>> translated HVA) exists in the tree, as looking them by HVA will return
> >>> them twice.  To solve this, create an id member so we can assign 
> >>> unique
> >>> identifiers (GPA) to the maps.
> >>>
> >>> Signed-off-by: Eugenio Pérez 
> >>> ---
> >>>  include/qemu/iova-tree.h | 5 +++--
> >>>  util/iova-tree.c | 3 ++-
> >>>  2 files changed, 5 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> >>> index 2a10a7052e..34ee230e7d 100644
> >>> --- a/include/qemu/iova-tree.h
> >>> +++ b/include/qemu/iova-tree.h
> >>> @@ -36,6 +36,7 @@ typedef struct DMAMap {
> >>>  hwaddr iova;
> >>>  hwaddr translated_addr;
> >>>  hwaddr size;/* Inclusive */
> >>> +uint64_t id;
> >>>  IOMMUAccessFlags perm;
> >>>  } QEMU_PACKED DMAMap;
> >>>  typedef gboolean (*iova_tree_iterator)(DMAMap *map);
> >>> @@ -100,8 +101,8 @@ const DMAMap *iova_tree_find(const IOVATree 
> >>> *tree, const DMAMap *map);
> >>>   * @map: the mapping to search
> >>>   *
> >>>   * Search for a mapping in the iova tree that translated_addr 
> >>> overlaps with the
> >>> - * mapping range specified.  Only the first found mapping will be
> >>> - * returned.
> >>> + * mapping range specified and map->id is equal.  Only the first 
> >>> found
> >>> + * mapping will be returned.
> >>>   *
> >>>   * Return: DMAMap pointer if found, or NULL if not found.  Note 
> >>> that
> >>>   * the returned DMAMap pointer is maintained internally.  User 
> >>> should
> >>> diff --git a/util/iova-tree.c b/util/iova-tree.c
> >>> index 536789797e..0863e0a3b8 100644
> >>> --- a/util/iova-tree.c
> >>> +++ b/util/iova-tree.c
> >>> @@ -97,7 +97,8 @@ static gboolean 
> >>> iova_tree_find_address_iterator(gpointer key, gpointer value,
> >>>
> >>>  needle = args->needle;
> >>>  if (map->translated_addr + map->size < 
> >>> needle->translated_addr ||
> >>> -needle->translated_addr + needle->size < 
> >>> map->translated_addr) {
> >>> +needle->translated_addr + needle->size < 
> >>> map->translated_addr ||
> >>> +needle->id != map->id) {
> >> It looks this iterator can also be invoked by SVQ from
> >> vhost_svq_translate_addr() -> iova_tree_find_iova(), where guest GPA
> >> space will be searched on without passing in the ID (GPA), and exact
> >> match for the same GPA range is not actually needed unlike the mapping
> >> removal case. Could we create an API variant, for the SVQ lookup case
> >> specifically? Or alternatively, add a special flag, say skip_id_match 
> >> to
> >> DMAMap, and the id match check may look like below:
> >>
> >> (!needle->skip_id_match && needle->id != map->id)
> >>
> >> I think vhost_svq_translate_addr() could just call the API variant or
> >> pass DMAmap with skip_id_match set to true to 
> >> svq_iova_tree_find_iova().
> >>
> > I think you're totally right. But I'd really like to not complicate
> > the API of the iova_tree more.
> >
> > I think we can look for the hwaddr using memory_region_from_host and
> > then get the hwaddr. It is another lookup though...
>  Yeah, that will be another means of doing translation without having to
>  complicate the API around iova_tree. I wonder how the lookup through
>  memory_region_from_host() may perform compared to the iova tree one, the
>  former looks to be an O(N) linear search on a linked list while the
>  latter would be roughly O(log N) on an AVL tree?
> >>> Even worse, as the reverse lookup (from QEMU vaddr to SVQ IOVA) is
> >>> linear too. It is not even ordered.
> >> Oh Sorry, I misread the code and I should look for g_tree_foreach ()
> >> instead of g_tree_search_node(). So the former is indeed linear
> >> iteration, but it looks to be ordered?
> >>
> >> https://github.com/GNOME/glib/blob/main/glib/gtree.c#L1115
> > The GPA / IOVA are ordered but we're looking by QEMU's vaddr.
> >
> > If we have t

Re: [PATCH v3 4/5] qapi: introduce device-sync-config


On 24.04.24 14:48, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/block/vhost-user-blk.c | 27 --
  hw/virtio/virtio-pci.c|  9 
  include/hw/qdev-core.h|  3 +++
  include/sysemu/runstate.h |  1 +
  qapi/qdev.json| 21 +
  system/qdev-monitor.c | 47 +++
  system/runstate.c |  5 +
  7 files changed, 106 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
  s->blkcfg.wce = blkcfg->wce;
  }
  
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)

+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, &s->blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
  static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
  {
  int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
  Error *local_err = NULL;
  
  if (!dev->started) {

  return 0;
  }
  
-ret = vhost_dev_get_config(dev, (uint8_t *)&s->blkcfg,

-   vdev->config_len, &local_err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), &local_err);
  if (ret < 0) {
  error_report_err(local_err);
  return ret;
  }
  
-memcpy(dev->vdev->config, &s->blkcfg, vdev->config_len);

-virtio_notify_config(dev->vdev);
-
  return 0;
  }


This factors vhost_user_blk_sync_config() out of
vhost_user_blk_handle_config_change() for reuse.  Correct?


Yes. Will split to a separate patch in v4



  
@@ -576,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, void *data)
  
  device_class_set_props(dc, vhost_user_blk_properties);

  dc->vmsd = &vmstate_vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
  vdc->realize = vhost_user_blk_device_realize;
  vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index eaaf86402c..92afbae71c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2501,6 +2501,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
  vpciklass->parent_dc_realize(qdev, errp);
  }
  
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)

+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
  static void virtio_pci_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2517,6 +2525,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
  device_class_set_parent_realize(dc, virtio_pci_dc_realize,
  &vpciklass->parent_dc_realize);
  rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
  }
  


I tried to follow the callbacks, but quickly gave up.  Leaving to a
reviewer who understands virtio.


  static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
  typedef void (*DeviceReset)(DeviceState *dev);
  typedef void (*BusRealize)(BusState *bus, Error **errp);
  typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
  
  /**

   * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
  DeviceReset reset;
  DeviceRealize realize;
  DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
  
  /**

   * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
   */
  HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
  void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev

Re: [PATCH] hw/char/stm32l4x5_usart: Fix memory corruption by adding correct class_size


On 29/4/24 09:59, Thomas Huth wrote:

"make check-qtest-aarch64" recently started failing on FreeBSD builds,
and valgrind on Linux also detected that there is something fishy with
the new stm32l4x5-usart: The code forgot to set the correct class_size
here, so the various class_init functions in this file wrote beyond
the allocated buffer when setting the subc->type field.

Fixes: 4fb37aea7e ("hw/char: Implement STM32L4x5 USART skeleton")
Signed-off-by: Thomas Huth 
---
  hw/char/stm32l4x5_usart.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PULL 20/38] accel/whpx: Use accel-specific per-vcpu @dirty field


On 28/4/24 22:12, Volker Rümelin wrote:

Am 26.04.24 um 21:41 schrieb Philippe Mathieu-Daudé:

WHPX has a specific use of the CPUState::vcpu_dirty field
(CPUState::vcpu_dirty is not used by common code).
To make this field accel-specific, add and use a new
@dirty variable in the AccelCPUState structure.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20240424174506.326-2-phi...@linaro.org>
---
  target/i386/whpx/whpx-all.c | 23 ---
  1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index 31eec7048c..b08e644517 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c



@@ -2235,7 +2236,7 @@ int whpx_init_vcpu(CPUState *cpu)
  }
  
  vcpu->interruptable = true;

-cpu->vcpu_dirty = true;


Hi Philippe,

cpu->accel is NULL here. You probably wanted to write

+    vcpu->dirty = true;

instead of

+    cpu->accel->dirty = true;

I think your patch for nvmm_init_vcpu() in target/i386/nvmm/nvmm-all.c
has the same issue.


Doh, sorry I missed that :/

I'll post fixes, thanks Volker!

Phil.

Re: [PATCH v4] target/riscv: Implement dynamic establishment of custom decoder

2024-04-29 Thread Huang Tao




On 2024/4/29 15:58, Huang Tao wrote:


On 2024/4/29 11:51, Alistair Francis wrote:
On Thu, Mar 14, 2024 at 7:23 PM Huang Tao 
 wrote:

In this patch, we modify the decoder to be a freely composable data
structure instead of a hardcoded one. It can be dynamically builded up
according to the extensions.
This approach has several benefits:
1. Provides support for heterogeneous cpu architectures. As we add 
decoder in
    RISCVCPU, each cpu can have their own decoder, and the decoders 
can be

    different due to cpu's features.
2. Improve the decoding efficiency. We run the guard_func to see if 
the decoder
    can be added to the dynamic_decoder when building up the 
decoder. Therefore,
    there is no need to run the guard_func when decoding each 
instruction. It can

    improve the decoding efficiency
3. For vendor or dynamic cpus, it allows them to customize their own 
decoder
    functions to improve decoding efficiency, especially when 
vendor-defined
    instruction sets increase. Because of dynamic building up, it 
can skip the other

    decoder guard functions when decoding.
4. Pre patch for allowing adding a vendor decoder before 
decode_insn32() with minimal

    overhead for users that don't need this particular vendor decoder.

Signed-off-by: Huang Tao 
Suggested-by: Christoph Muellner 
Co-authored-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 

Do you mind rebasing this on
https://github.com/alistair23/qemu/tree/riscv-to-apply.next?

Alistair


I will rebase this patch on the latest riscv-to-apply.next.

Thanks

I successfully applied this patch to the latest riscv-to-apply.next 
branch. I wonder what error you met on applying this patch to 
riscv-to-apply.next, so I can fix my patch.


Thanks


---
Changes in v4:
- fix typo
- rename function
- add 'if tcg_enable()'
- move function to tcg-cpu.c and declarations to tcg-cpu.h

Changes in v3:
- use GPtrArray to save decode function poionter list.
---
  target/riscv/cpu.c |  1 +
  target/riscv/cpu.h |  1 +
  target/riscv/tcg/tcg-cpu.c | 15 +++
  target/riscv/tcg/tcg-cpu.h | 15 +++
  target/riscv/translate.c   | 31 +++
  5 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c160b9216b..17070b82a7 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1132,6 +1132,7 @@ void riscv_cpu_finalize_features(RISCVCPU 
*cpu, Error **errp)

  error_propagate(errp, local_err);
  return;
  }
+    riscv_tcg_cpu_finalize_dynamic_decoder(cpu);
  } else if (kvm_enabled()) {
  riscv_kvm_cpu_finalize_features(cpu, &local_err);
  if (local_err != NULL) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3b1a02b944..48e67410e1 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -457,6 +457,7 @@ struct ArchCPU {
  uint32_t pmu_avail_ctrs;
  /* Mapping of events to counters */
  GHashTable *pmu_event_ctr_map;
+    const GPtrArray *decoders;
  };

  /**
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index ab6db817db..c9ab92ea2f 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -853,6 +853,21 @@ void riscv_tcg_cpu_finalize_features(RISCVCPU 
*cpu, Error **errp)

  }
  }

+void riscv_tcg_cpu_finalize_dynamic_decoder(RISCVCPU *cpu)
+{
+    GPtrArray *dynamic_decoders;
+    dynamic_decoders = g_ptr_array_sized_new(decoder_table_size);
+    for (size_t i = 0; i < decoder_table_size; ++i) {
+    if (decoder_table[i].guard_func &&
+    decoder_table[i].guard_func(&cpu->cfg)) {
+    g_ptr_array_add(dynamic_decoders,
+ (gpointer)decoder_table[i].riscv_cpu_decode_fn);
+    }
+    }
+
+    cpu->decoders = dynamic_decoders;
+}
+
  bool riscv_cpu_tcg_compatible(RISCVCPU *cpu)
  {
  return object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST) 
== NULL;

diff --git a/target/riscv/tcg/tcg-cpu.h b/target/riscv/tcg/tcg-cpu.h
index f7b32417f8..ce94253fe4 100644
--- a/target/riscv/tcg/tcg-cpu.h
+++ b/target/riscv/tcg/tcg-cpu.h
@@ -26,4 +26,19 @@ void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp);

  void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
  bool riscv_cpu_tcg_compatible(RISCVCPU *cpu);

+struct DisasContext;
+struct RISCVCPUConfig;
+typedef struct RISCVDecoder {
+    bool (*guard_func)(const struct RISCVCPUConfig *);
+    bool (*riscv_cpu_decode_fn)(struct DisasContext *, uint32_t);
+} RISCVDecoder;
+
+typedef bool (*riscv_cpu_decode_fn)(struct DisasContext *, uint32_t);
+
+extern const size_t decoder_table_size;
+
+extern const RISCVDecoder decoder_table[];
+
+void riscv_tcg_cpu_finalize_dynamic_decoder(RISCVCPU *cpu);
+
  #endif
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index ea5d52b2ef..bce16d5054 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -37,6 +37,8 @@
  #include "exec/helper-info.c.inc"
  #unde

Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

Daniil Tatianin  writes:

> This can be used to force-synchronize the time in guest after a long
> stop-cont pause, which can be useful for serverless-type workload.
>
> Also add a comment to highlight the fact that this (and one other QMP
> command) only works for the MC146818 RTC controller.
>
> Signed-off-by: Daniil Tatianin 
> ---
>
> Changes since v0:
> - Rename to rtc-inject-irq to match other similar API
> - Add a comment to highlight that this only works for the I386 RTC
>
> ---
>  hw/rtc/mc146818rtc.c | 20 
>  include/hw/rtc/mc146818rtc.h |  1 +
>  qapi/misc-target.json| 16 
>  3 files changed, 37 insertions(+)
>
> diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
> index f4c1869232..8501b55cbd 100644
> --- a/hw/rtc/mc146818rtc.c
> +++ b/hw/rtc/mc146818rtc.c
> @@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState 
> *s)
>  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
>  QLIST_HEAD_INITIALIZER(rtc_devices);
>  
> +/*
> + * NOTE:
> + * The two QMP functions below are _only_ implemented for the MC146818.
> + * All other RTC devices ignore this.
> + */
>  void qmp_rtc_reset_reinjection(Error **errp)
>  {
>  MC146818RtcState *s;
> @@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
>  }
>  }
>  
> +void qmp_rtc_inject_irq(Error **errp)
> +{
> +MC146818RtcState *s;
> +
> +/*
> + * See:
> + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
> + */
> +QLIST_FOREACH(s, &rtc_devices, link) {
> +s->cmos_data[RTC_REG_B] |= REG_B_UIE;
> +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
> +qemu_irq_raise(s->irq);
> +}
> +}
> +
>  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
>  {
>  kvm_reset_irq_delivered();
> diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
> index 97cec0b3e8..6cd9761d80 100644
> --- a/include/hw/rtc/mc146818rtc.h
> +++ b/include/hw/rtc/mc146818rtc.h
> @@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
> base_year,
>  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
>  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
>  void qmp_rtc_reset_reinjection(Error **errp);
> +void qmp_rtc_inject_irq(Error **errp);
>  
>  #endif /* HW_RTC_MC146818RTC_H */
> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
> index 4e0a6492a9..d84a5d07a2 100644
> --- a/qapi/misc-target.json
> +++ b/qapi/misc-target.json
> @@ -19,6 +19,22 @@
>  { 'command': 'rtc-reset-reinjection',
>'if': 'TARGET_I386' }
>  
> +##
> +# @rtc-inject-irq:
> +#
> +# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?

> +#
> +# Since: 9.1
> +#
> +# Example:
> +#
> +# -> { "execute": "rtc-inject-irq" }
> +# <- { "return": {} }
> +#
> +##
> +{ 'command': 'rtc-inject-irq',
> +  'if': 'TARGET_I386' }
> +
>  ##
>  # @SevState:
>  #

Re: [PATCH v3 5/5] qapi: introduce CONFIG_READ event


On 24.04.24 15:11, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Send a new event when guest reads virtio-pci config after
virtio_notify_config() call.

That's useful to check that guest fetched modified config, for example
after resizing disk backend.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/virtio/virtio-pci.c |  9 +
  include/monitor/qdev.h |  2 ++
  monitor/monitor.c  |  1 +
  qapi/qdev.json | 33 +
  stubs/qdev.c   |  6 ++
  system/qdev-monitor.c  |  6 ++
  6 files changed, 57 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 92afbae71c..c0c158dae2 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -23,6 +23,7 @@
  #include "hw/boards.h"
  #include "hw/virtio/virtio.h"
  #include "migration/qemu-file-types.h"
+#include "monitor/qdev.h"
  #include "hw/pci/pci.h"
  #include "hw/pci/pci_bus.h"
  #include "hw/qdev-properties.h"
@@ -530,6 +531,10 @@ static uint64_t virtio_pci_config_read(void *opaque, 
hwaddr addr,
  }
  addr -= config;
  
+if (vdev->generation > 0) {

+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
  switch (size) {
  case 1:
  val = virtio_config_readb(vdev, addr);
@@ -1884,6 +1889,10 @@ static uint64_t virtio_pci_device_read(void *opaque, 
hwaddr addr,
  return UINT64_MAX;
  }
  
+if (vdev->generation > 0) {

+qdev_virtio_config_read_event(DEVICE(proxy));
+}
+
  switch (size) {
  case 1:
  val = virtio_config_modern_readb(vdev, addr);
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf6577..fc9a834dca 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -36,4 +36,6 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
   */
  const char *qdev_set_id(DeviceState *dev, char *id, Error **errp);
  
+void qdev_virtio_config_read_event(DeviceState *dev);

+
  #endif
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 01ede1babd..5b06146503 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -316,6 +316,7 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
  [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
  [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
  [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
+[QAPI_EVENT_VIRTIO_CONFIG_READ] = { 300 * SCALE_MS },


All the other rate-limited events use 1s.  Why 0.3s for this one?


No actual reason, just seemed to me that 1s is too much. Should be better to 
keep all limits to be the same, until no concrete reason to break it.




  };
  
  /*

diff --git a/qapi/qdev.json b/qapi/qdev.json
index e8be79c3d5..29a4f47360 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -182,3 +182,36 @@
  { 'command': 'device-sync-config',
'features': [ 'unstable' ],
'data': {'id': 'str'} }
+
+##
+# @VIRTIO_CONFIG_READ:
+#
+# Emitted whenever guest reads virtio device configuration after
+# configuration change.


Is it emitted whenever the guest reads, or only when it reads after a
configuration change?


Hmm, it's emitted only when vdev->generation > 0, which generally mean that 
there was at least one call to virtio_notify_config()... That's not the logic, which 
could be simply described here.


Actually, now I think that event was a premature improvement. In our final 
downstream solution only the command device-sync-config is used, not the event. 
I see that the concept of the event is objectionable, I think, I'll better just 
drop it in v4.




+#
+# The event may be used in pair with device-sync-config. It shows
+# that guest has re-read updated configuration. It doesn't
+# guarantee that guest successfully handled it and updated the
+# view of the device for the user, but still it's a kind of
+# success indicator.


The event is virtio-only.  device-sync-config isn't.  Why?


+#
+# @device: device name
+#
+# @path: device path
+#
+# Features:
+#
+# @unstable: The event is experimental.
+#


Missing:

# Note: This event is rate-limited.
#


+# Since: 9.1
+#
+# Example:
+#
+# <- { "event": "VIRTIO_CONFIG_READ",
+#  "data": { "device": "virtio-net-pci-0",
+#"path": "/machine/peripheral/virtio-net-pci-0" },
+#  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+##
+{ 'event': 'VIRTIO_CONFIG_READ',
+  'features': [ 'unstable' ],
+  'data': { '*device': 'str', 'path': 'str' } }
diff --git a/stubs/qdev.c b/stubs/qdev.c
index 6869f6f90a..ab6c4afe0b 100644
--- a/stubs/qdev.c
+++ b/stubs/qdev.c
@@ -26,3 +26,9 @@ void qapi_event_send_device_unplug_guest_error(const char 
*device,
  {
  /* Nothing to do. */
  }
+
+void qapi_event_send_virtio_config_read(const char *device,
+const char *path)
+{
+/* Nothing to do. */
+}
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index cb35ea0b

[PATCH 0/2] accel: Fix NULL deref in NVMM / WHPX vCPU init

Fix recently introduced NULL deref in NVMM/WHPX
vCPU init() handlers.

Philippe Mathieu-Daudé (2):
  accel/whpx: Fix NULL dereference in whpx_init_vcpu()
  accel/nvmm: Fix NULL dereference in nvmm_init_vcpu()

 target/i386/nvmm/nvmm-all.c | 2 +-
 target/i386/whpx/whpx-all.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
2.41.0

[PATCH 1/2] accel/whpx: Fix NULL dereference in whpx_init_vcpu()

When mechanically moving the @dirty field to AccelCPUState
in commit 9ad49538c7, we neglected cpu->accel is still NULL
when we want to dereference it.

Fixes: 9ad49538c7 ("accel/whpx: Use accel-specific per-vcpu @dirty field")
Reported-by: Volker Rümelin 
Suggested-by: Volker Rümelin 
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/i386/whpx/whpx-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
index b08e644517..a6674a826d 100644
--- a/target/i386/whpx/whpx-all.c
+++ b/target/i386/whpx/whpx-all.c
@@ -2236,7 +2236,7 @@ int whpx_init_vcpu(CPUState *cpu)
 }
 
 vcpu->interruptable = true;
-cpu->accel->dirty = true;
+vcpu->dirty = true;
 cpu->accel = vcpu;
 max_vcpu_index = max(max_vcpu_index, cpu->cpu_index);
 qemu_add_vm_change_state_handler(whpx_cpu_update_state, env);
-- 
2.41.0

[PATCH 2/2] accel/nvmm: Fix NULL dereference in nvmm_init_vcpu()

When mechanically moving the @dirty field to AccelCPUState
in commit 79f1926b2d, we neglected cpu->accel is still NULL
when we want to dereference it.

Reported-by: Volker Rümelin 
Suggested-by: Volker Rümelin 
Fixes: 79f1926b2d ("accel/nvmm: Use accel-specific per-vcpu @dirty field")
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/i386/nvmm/nvmm-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
index f9cced53b3..65768aca03 100644
--- a/target/i386/nvmm/nvmm-all.c
+++ b/target/i386/nvmm/nvmm-all.c
@@ -982,7 +982,7 @@ nvmm_init_vcpu(CPUState *cpu)
 }
 }
 
-cpu->accel->dirty = true;
+qcpu->dirty = true;
 cpu->accel = qcpu;
 
 return 0;
-- 
2.41.0

Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin


On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json| 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
  
+/*

+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  
+void qmp_rtc_inject_irq(Error **errp)

+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, &rtc_devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  
  #endif /* HW_RTC_MC146818RTC_H */

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
'if': 'TARGET_I386' }
  
+##

+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.

Thanks


+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }
+
  ##
  # @SevState:
  #

Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP


On 29/4/24 11:34, Daniil Tatianin wrote:

On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void 
rtc_coalesced_timer_update(MC146818RtcState *s)

  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
+void qmp_rtc_inject_irq(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt

+ */
+    QLIST_FOREACH(s, &rtc_devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+    qemu_irq_raise(s->irq);
+    }
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, 
int base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int 
val);

  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.


Please also see my comments on the previous patch:
https://lore.kernel.org/qemu-devel/11c78645-e87b-4a43-8191-a73540c36...@linaro.org/

[PATCH v2] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin

This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
  used and what it does.

---
 hw/rtc/mc146818rtc.c | 20 
 include/hw/rtc/mc146818rtc.h |  1 +
 qapi/misc-target.json| 18 ++
 3 files changed, 39 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
 static QLIST_HEAD(, MC146818RtcState) rtc_devices =
 QLIST_HEAD_INITIALIZER(rtc_devices);
 
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
 void qmp_rtc_reset_reinjection(Error **errp)
 {
 MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
 }
 }
 
+void qmp_rtc_inject_irq(Error **errp)
+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, &rtc_devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
 static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
 {
 kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
 void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
 int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
 void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
 
 #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..0f2479f8f4 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,24 @@
 { 'command': 'rtc-reset-reinjection',
   'if': 'TARGET_I386' }
 
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt. This command forces the guest to synchornize
+# the time with RTC. This is useful after a long stop-cont pause, which
+# is common for serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }
+
 ##
 # @SevState:
 #
-- 
2.34.1

Re: [PATCH 2/2] target/riscv: do not set mtval2 for non guest-page faults

2024-04-29 Thread Daniel Henrique Barboza





On 4/13/24 07:59, Alexei Filippov wrote:

Previous patch fixed the PMP priority in raise_mmu_exception() but we're still
setting mtval2 incorrectly. In riscv_cpu_tlb_fill(), after pmp check in 2 stage
translation part, mtval2 will be set in case of successes 2 stage translation 
but
failed pmp check.

In this case we gonna set mtval2 via env->guest_phys_fault_addr in context of
riscv_cpu_tlb_fill(), as this was a guest-page-fault, but it didn't and mtval2
should be zero, according to RISCV privileged spec sect. 9.4.4: When a guest
page-fault is taken into M-mode, mtval2 is written with either zero or guest
physical address that faulted, shifted by 2 bits. *For other traps, mtval2
is set to zero...*

Signed-off-by: Alexei Filippov 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu_helper.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 196166f8dd..89e659fe3a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1410,17 +1410,17 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, 
int size,
__func__, pa, ret, prot_pmp, tlb_size);
  
  prot &= prot_pmp;

-}
-
-if (ret != TRANSLATE_SUCCESS) {
+} else {
  /*
   * Guest physical address translation failed, this is a HS
   * level exception
   */
  first_stage_error = false;
-env->guest_phys_fault_addr = (im_address |
-  (address &
-   (TARGET_PAGE_SIZE - 1))) >> 2;
+if (ret != TRANSLATE_PMP_FAIL) {
+env->guest_phys_fault_addr = (im_address |
+  (address &
+   (TARGET_PAGE_SIZE - 1))) >> 
2;
+}
  }
  }
  } else {

Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP


On 26/4/24 11:48, Philippe Mathieu-Daudé wrote:

Hi Daniil, Markus,

On 26/4/24 10:39, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Signed-off-by: Daniil Tatianin 
---
  hw/rtc/mc146818rtc.c | 15 +++
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 16 
  3 files changed, 32 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..6980a78d5f 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
+void qmp_rtc_notify(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt

+ */
+    QLIST_FOREACH(s, &rtc_devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+    qemu_irq_raise(s->irq);
+    }
+}
+


Note for later: qmp_rtc_notify() works on all realized mc146818rtc
devices.  Other kinds of RTC devices are silently ignored.  Just like
qmp_rtc_reset_reinjection().


IMO to avoid any future ambiguity (in heterogeneous machines), this
command must take a QOM device path (or a list of) and only notify
those.


If you disagree, at least please rename the command/method using
"broadcast" for trailer.

Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP

2024-04-29 Thread Daniil Tatianin


On 4/29/24 12:40 PM, Philippe Mathieu-Daudé wrote:


On 29/4/24 11:34, Daniil Tatianin wrote:

On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void 
rtc_coalesced_timer_update(MC146818RtcState *s)

  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the 
MC146818.

+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
+void qmp_rtc_inject_irq(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt

+ */
+    QLIST_FOREACH(s, &rtc_devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+    qemu_irq_raise(s->irq);
+    }
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h 
b/include/hw/rtc/mc146818rtc.h

index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, 
int base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int 
val);

  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.


Please also see my comments on the previous patch:
https://lore.kernel.org/qemu-devel/11c78645-e87b-4a43-8191-a73540c36...@linaro.org/ 



I think this makes sense, but there's already a similar command, which 
doesn't do it. Should that one be changed as well then? Or do we only 
change the interface for this one?

Re: [PATCH v1] mc146818rtc: add a way to generate RTC interrupts via QMP


On 29/4/24 11:43, Daniil Tatianin wrote:

On 4/29/24 12:40 PM, Philippe Mathieu-Daudé wrote:


On 29/4/24 11:34, Daniil Tatianin wrote:

On 4/29/24 11:51 AM, Markus Armbruster wrote:


Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json    | 16 
  3 files changed, 37 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void 
rtc_coalesced_timer_update(MC146818RtcState *s)

  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
+/*
+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the 
MC146818.

+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
+void qmp_rtc_inject_irq(Error **errp)
+{
+    MC146818RtcState *s;
+
+    /*
+ * See:
+ * 
https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt

+ */
+    QLIST_FOREACH(s, &rtc_devices, link) {
+    s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+    s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+    qemu_irq_raise(s->irq);
+    }
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h 
b/include/hw/rtc/mc146818rtc.h

index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, 
int base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int 
val);

  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  #endif /* HW_RTC_MC146818RTC_H */
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..d84a5d07a2 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,22 @@
  { 'command': 'rtc-reset-reinjection',
    'if': 'TARGET_I386' }
+##
+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt.

Your cover letter explains what this could be good for.  Would it make
sense to explain it here, too?


Sure, sounds useful. I'll add a description in the next version.


Please also see my comments on the previous patch:
https://lore.kernel.org/qemu-devel/11c78645-e87b-4a43-8191-a73540c36...@linaro.org/

I think this makes sense, but there's already a similar command, which 
doesn't do it. Should that one be changed as well then? Or do we only 
change the interface for this one?


Better to not follow a short sighted interface. If you can, start
with a correct one. Help fixing broken interface is certainly
welcomed, but that shouldn't block adding your new command.

Regards,

Phil.

Re: [PATCH v2] mc146818rtc: add a way to generate RTC interrupts via QMP


On 29/4/24 11:41, Daniil Tatianin wrote:

This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.

Also add a comment to highlight the fact that this (and one other QMP
command) only works for the MC146818 RTC controller.

Signed-off-by: Daniil Tatianin 
---

Changes since v0:
- Rename to rtc-inject-irq to match other similar API
- Add a comment to highlight that this only works for the I386 RTC

Changes since v1:
- Added a description below the QMP command to explain how it can be
   used and what it does.

---
  hw/rtc/mc146818rtc.c | 20 
  include/hw/rtc/mc146818rtc.h |  1 +
  qapi/misc-target.json| 18 ++
  3 files changed, 39 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..8501b55cbd 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -107,6 +107,11 @@ static void rtc_coalesced_timer_update(MC146818RtcState *s)
  static QLIST_HEAD(, MC146818RtcState) rtc_devices =
  QLIST_HEAD_INITIALIZER(rtc_devices);
  
+/*

+ * NOTE:
+ * The two QMP functions below are _only_ implemented for the MC146818.
+ * All other RTC devices ignore this.
+ */
  void qmp_rtc_reset_reinjection(Error **errp)
  {
  MC146818RtcState *s;
@@ -116,6 +121,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
  }
  }
  
+void qmp_rtc_inject_irq(Error **errp)

+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
+ */
+QLIST_FOREACH(s, &rtc_devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;
+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
+qemu_irq_raise(s->irq);
+}
+}
+
  static bool rtc_policy_slew_deliver_irq(MC146818RtcState *s)
  {
  kvm_reset_irq_delivered();
diff --git a/include/hw/rtc/mc146818rtc.h b/include/hw/rtc/mc146818rtc.h
index 97cec0b3e8..6cd9761d80 100644
--- a/include/hw/rtc/mc146818rtc.h
+++ b/include/hw/rtc/mc146818rtc.h
@@ -56,5 +56,6 @@ MC146818RtcState *mc146818_rtc_init(ISABus *bus, int 
base_year,
  void mc146818rtc_set_cmos_data(MC146818RtcState *s, int addr, int val);
  int mc146818rtc_get_cmos_data(MC146818RtcState *s, int addr);
  void qmp_rtc_reset_reinjection(Error **errp);
+void qmp_rtc_inject_irq(Error **errp);
  
  #endif /* HW_RTC_MC146818RTC_H */

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 4e0a6492a9..0f2479f8f4 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -19,6 +19,24 @@
  { 'command': 'rtc-reset-reinjection',
'if': 'TARGET_I386' }
  
+##

+# @rtc-inject-irq:
+#
+# Inject an RTC interrupt. This command forces the guest to synchornize
+# the time with RTC. This is useful after a long stop-cont pause, which
+# is common for serverless-type workload.
+#
+# Since: 9.1
+#
+# Example:
+#
+# -> { "execute": "rtc-inject-irq" }
+# <- { "return": {} }
+#
+##
+{ 'command': 'rtc-inject-irq',
+  'if': 'TARGET_I386' }


Why is that restricted to x86? Ah, this is specific to the MC146818
RTC... Other machines use hw accelerators and the MC146818, aren't
we interested in synchronizing them the same way?

Personally I'd name this command 'mc146818rtc-raise-irq-broadcast',
KISS.

[PATCH] .gitlab-ci.d/cirrus.yml: Shorten the runtime of the macOS and FreeBSD jobs

2024-04-29 Thread Thomas Huth

Cirrus-CI introduced limitations to the free CI minutes. To avoid that
we are consuming them too fast, let's drop the usual targets that are
not that important since they are either a subset of another target
(like i386 or ppc being a subset of x86_64 or ppc64 respectively), or
since there is still a similar target with the opposite endianness
(like xtensa/xtensael, microblaze/microblazeel etc.).

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/cirrus.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
index 74de2edbb4..75df1273bc 100644
--- a/.gitlab-ci.d/cirrus.yml
+++ b/.gitlab-ci.d/cirrus.yml
@@ -57,6 +57,7 @@ x64-freebsd-13-build:
 CIRRUS_VM_RAM: 8G
 UPDATE_COMMAND: pkg update; pkg upgrade -y
 INSTALL_COMMAND: pkg install -y
+CONFIGURE_ARGS: 
--target-list-exclude=arm-softmmu,i386-softmmu,microblaze-softmmu,mips64el-softmmu,mipsel-softmmu,mips-softmmu,ppc-softmmu,sh4eb-softmmu,xtensa-softmmu
 TEST_TARGETS: check
 
 aarch64-macos-13-base-build:
@@ -72,6 +73,7 @@ aarch64-macos-13-base-build:
 INSTALL_COMMAND: brew install
 PATH_EXTRA: /opt/homebrew/ccache/libexec:/opt/homebrew/gettext/bin
 PKG_CONFIG_PATH: 
/opt/homebrew/curl/lib/pkgconfig:/opt/homebrew/ncurses/lib/pkgconfig:/opt/homebrew/readline/lib/pkgconfig
+CONFIGURE_ARGS: 
--target-list-exclude=arm-softmmu,i386-softmmu,microblazeel-softmmu,mips64-softmmu,mipsel-softmmu,mips-softmmu,ppc-softmmu,sh4-softmmu,xtensaeb-softmmu
 TEST_TARGETS: check-unit check-block check-qapi-schema check-softfloat 
check-qtest-x86_64
 
 aarch64-macos-14-base-build:
-- 
2.44.0

Re: [PATCH] qga: Re-enable the qga-ssh-test when running without fuzzing

2024-04-29 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 



On Fri, Apr 26, 2024 at 7:23 PM Thomas Huth  wrote:

> According to the comment in qga/meson.build, the test got disabled
> since there were problems with the fuzzing job. But instead of
> disabling this test completely, we should still be fine running
> it when fuzzing is disabled.
>
> Signed-off-by: Thomas Huth 
> ---
>  qga/meson.build | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/qga/meson.build b/qga/meson.build
> index 1c3d2a3d1b..46c1d83d7f 100644
> --- a/qga/meson.build
> +++ b/qga/meson.build
> @@ -181,12 +181,11 @@ test_env = environment()
>  test_env.set('G_TEST_SRCDIR', meson.current_source_dir())
>  test_env.set('G_TEST_BUILDDIR', meson.current_build_dir())
>
> -# disable qga-ssh-test for now. glib's G_TEST_OPTION_ISOLATE_DIRS triggers
> +# disable qga-ssh-test with fuzzing: glib's G_TEST_OPTION_ISOLATE_DIRS
> triggers
>  # the leak detector in build-oss-fuzz Gitlab CI test. we should re-enable
>  # this when an alternative is implemented or when the underlying glib
>  # issue is identified/fix
> -#if host_os != 'windows'
> -if false
> +if host_os != 'windows' and not get_option('fuzzing')
>srcs = [files('commands-posix-ssh.c')]
>i = 0
>foreach output: qga_qapi_outputs
> --
> 2.44.0
>
>

Re: [PATCH v2 1/1] stubs: Add missing qga stubs

2024-04-29 Thread Konstantin Kostiuk

Hi Paolo,

Are you ok if I merge this patch with other QGA patches?
Or don't you agree with this version of the patch?

Best Regards,
Konstantin Kostiuk.


On Fri, Apr 26, 2024 at 3:15 PM Konstantin Kostiuk 
wrote:

> Compilation QGA without system and user fails
> ./configure --disable-system --disable-user --enable-guest-agent
>
> Link failure:
>   /usr/bin/ld: libqemuutil.a.p/util_main-loop.c.o: in function
> `os_host_main_loop_wait':
>../util/main-loop.c:303: undefined reference to `replay_mutex_unlock'
>/usr/bin/ld: ../util/main-loop.c:307: undefined reference to
> `replay_mutex_lock'
>/usr/bin/ld: libqemuutil.a.p/util_error-report.c.o: in function
> `error_printf':
>../util/error-report.c:38: undefined reference to `error_vprintf'
>/usr/bin/ld: libqemuutil.a.p/util_error-report.c.o: in function
> `vreport':
>../util/error-report.c:225: undefined reference to `error_vprintf'
>/usr/bin/ld: libqemuutil.a.p/util_qemu-timer.c.o: in function
> `timerlist_run_timers':
>../util/qemu-timer.c:562: undefined reference to `replay_checkpoint'
>/usr/bin/ld: ../util/qemu-timer.c:530: undefined reference to
> `replay_checkpoint'
>/usr/bin/ld: ../util/qemu-timer.c:525: undefined reference to
> `replay_checkpoint'
>ninja: build stopped: subcommand failed.
>
> Fixes: 3a15604900 ("stubs: include stubs only if needed")
>
> Tested-by: Philippe Mathieu-Daudé 
> Reviewed-by: Philippe Mathieu-Daudé 
> Signed-off-by: Konstantin Kostiuk 
> ---
>  stubs/meson.build | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/stubs/meson.build b/stubs/meson.build
> index 8ee1fd5753..3b9d42023c 100644
> --- a/stubs/meson.build
> +++ b/stubs/meson.build
> @@ -21,12 +21,12 @@ if have_block
>stub_ss.add(files('migr-blocker.c'))
>stub_ss.add(files('physmem.c'))
>stub_ss.add(files('ram-block.c'))
> -  stub_ss.add(files('replay-tools.c'))
>stub_ss.add(files('runstate-check.c'))
>stub_ss.add(files('uuid.c'))
>  endif
>
>  if have_block or have_ga
> +  stub_ss.add(files('replay-tools.c'))
># stubs for hooks in util/main-loop.c, util/async.c etc.
>stub_ss.add(files('cpus-get-virtual-clock.c'))
>stub_ss.add(files('icount.c'))
> @@ -45,6 +45,10 @@ if have_block or have_ga
>stub_ss.add(files('qmp-quit.c'))
>  endif
>
> +if have_ga
> +  stub_ss.add(files('error-printf.c'))
> +endif
> +
>  if have_block or have_user
>stub_ss.add(files('qtest.c'))
>stub_ss.add(files('vm-stop.c'))
> --
> 2.44.0
>
>
>

[PATCH v4 3/3] qapi: introduce device-sync-config

Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c |  1 +
 hw/virtio/virtio-pci.c|  9 
 include/hw/qdev-core.h|  3 +++
 qapi/qdev.json| 23 +++
 system/qdev-monitor.c | 48 +++
 5 files changed, 84 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 091d2c6acf..2f301f380c 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -588,6 +588,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
 
 device_class_set_props(dc, vhost_user_blk_properties);
 dc->vmsd = &vmstate_vhost_user_blk;
+dc->sync_config = vhost_user_blk_sync_config;
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 vdc->realize = vhost_user_blk_device_realize;
 vdc->unrealize = vhost_user_blk_device_unrealize;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index b1d02f4b3d..0d91e8b5dc 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2351,6 +2351,14 @@ static void virtio_pci_dc_realize(DeviceState *qdev, 
Error **errp)
 vpciklass->parent_dc_realize(qdev, errp);
 }
 
+static int virtio_pci_sync_config(DeviceState *dev, Error **errp)
+{
+VirtIOPCIProxy *proxy = VIRTIO_PCI(dev);
+VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+
+return qdev_sync_config(DEVICE(vdev), errp);
+}
+
 static void virtio_pci_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -2367,6 +2375,7 @@ static void virtio_pci_class_init(ObjectClass *klass, 
void *data)
 device_class_set_parent_realize(dc, virtio_pci_dc_realize,
 &vpciklass->parent_dc_realize);
 rc->phases.hold = virtio_pci_bus_reset_hold;
+dc->sync_config = virtio_pci_sync_config;
 }
 
 static const TypeInfo virtio_pci_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87..87135bdcdf 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -95,6 +95,7 @@ typedef void (*DeviceUnrealize)(DeviceState *dev);
 typedef void (*DeviceReset)(DeviceState *dev);
 typedef void (*BusRealize)(BusState *bus, Error **errp);
 typedef void (*BusUnrealize)(BusState *bus);
+typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
 
 /**
  * struct DeviceClass - The base class for all devices.
@@ -162,6 +163,7 @@ struct DeviceClass {
 DeviceReset reset;
 DeviceRealize realize;
 DeviceUnrealize unrealize;
+DeviceSyncConfig sync_config;
 
 /**
  * @vmsd: device state serialisation description for
@@ -546,6 +548,7 @@ bool qdev_hotplug_allowed(DeviceState *dev, Error **errp);
  */
 HotplugHandler *qdev_get_hotplug_handler(DeviceState *dev);
 void qdev_unplug(DeviceState *dev, Error **errp);
+int qdev_sync_config(DeviceState *dev, Error **errp);
 void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp);
 void qdev_machine_creation_done(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..fc5e125a45 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,26 @@
 ##
 { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
   'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize device configuration from host to guest part.  First,
+# copy the configuration from the host part (backend) to the guest
+# part (frontend).  Then notify guest software that device
+# configuration changed.
+# The command may be used to notify the guest about block device
+# capcity change.  Currently only vhost-user-blk device supports
+# this.
+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 264978aa40..47bfc0506e 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
 #include "monitor/monitor.h"
 #include "monitor/qdev.h"
 #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-qdev.h"
 #include "qapi/qmp/dispatch.h"
@@ -971,6 +972,53 @@ void qmp_device_del(const char *id, Error **errp)
 }
 }
 
+int qdev_sync_config(DeviceState *dev, Error **errp)
+{
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->sync_config) {
+error_setg(errp, "device-sync-config is not supported for '%s'",
+   object_get_typename(OBJECT(dev)));
+return -ENOTSUP;

[PATCH v4 2/3] vhost-user-blk: split vhost_user_blk_sync_config()

Split vhost_user_blk_sync_config() out from
vhost_user_blk_handle_config_change(), to be reused in the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/block/vhost-user-blk.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..091d2c6acf 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -88,27 +88,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 s->blkcfg.wce = blkcfg->wce;
 }
 
+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)
+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, &s->blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
 static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
 {
 int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
 Error *local_err = NULL;
 
 if (!dev->started) {
 return 0;
 }
 
-ret = vhost_dev_get_config(dev, (uint8_t *)&s->blkcfg,
-   vdev->config_len, &local_err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), &local_err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
 }
 
-memcpy(dev->vdev->config, &s->blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-
 return 0;
 }
 
-- 
2.34.1

[PATCH v4 1/3] qdev-monitor: add option to report GenericError from find_device_state

Here we just prepare for the following patch, making possible to report
GenericError as recommended.

This patch doesn't aim to prevent further use of DeviceNotFound by
future interfaces:

 - find_device_state() is used in blk_by_qdev_id() and qmp_get_blk()
   functions, which may lead to spread of DeviceNotFound anyway
 - also, nothing prevent simply copy-pasting find_device_state() calls
   with false argument

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 system/qdev-monitor.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 6af6ef7d66..264978aa40 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -879,13 +879,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, 
Error **errp)
 object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+/*
+ * Note that creating new APIs using error classes other than GenericError is
+ * not recommended. Set use_generic_error=true for new interfaces.
+ */
+static DeviceState *find_device_state(const char *id, bool use_generic_error,
+  Error **errp)
 {
 Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
 DeviceState *dev;
 
 if (!obj) {
-error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+error_set(errp,
+  (use_generic_error ?
+   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
   "Device '%s' not found", id);
 return NULL;
 }
@@ -950,7 +957,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 void qmp_device_del(const char *id, Error **errp)
 {
-DeviceState *dev = find_device_state(id, errp);
+DeviceState *dev = find_device_state(id, false, errp);
 if (dev != NULL) {
 if (dev->pending_deleted_event &&
 (dev->pending_deleted_expires_ms == 0 ||
@@ -1070,7 +1077,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 
 GLOBAL_STATE_CODE();
 
-dev = find_device_state(id, errp);
+dev = find_device_state(id, false, errp);
 if (dev == NULL) {
 return NULL;
 }
-- 
2.34.1

[PATCH v4 0/3] vhost-user-blk: live resize additional APIs

v4:
Fixes 01-02 from v3 are already merged.
02: new, split out from 03
03: refacting vhost_user_blk_handle_config_change() split out to 02
drop current_run_state_str() helper
some rewordings (Markus)

Vladimir Sementsov-Ogievskiy (3):
  qdev-monitor: add option to report GenericError from find_device_state
  vhost-user-blk: split vhost_user_blk_sync_config()
  qapi: introduce device-sync-config

 hw/block/vhost-user-blk.c | 27 -
 hw/virtio/virtio-pci.c|  9 ++
 include/hw/qdev-core.h|  3 ++
 qapi/qdev.json| 23 ++
 system/qdev-monitor.c | 63 ---
 5 files changed, 114 insertions(+), 11 deletions(-)

-- 
2.34.1

Re: [PATCH 0/3] Remove useless architecture prefix from the CPU list


On 22/4/24 10:22, Thomas Huth wrote:

On 22/04/2024 10.03, Daniel P. Berrangé wrote:

On Sat, Apr 20, 2024 at 07:46:03AM +0200, Thomas Huth wrote:

Printing an architecture prefix in front of each CPU name is not helpful
at all: It is confusing for the users since they don't know whether they
have to specify these letters for the "-cpu" parameter, too, and it also
takes some precious space in the dense output of the CPU entries. Let's
simply remove those now.


Could it be said that this arch prefix is about to finally become useful
with Philippe's patches to add a 'qemu-system-any' command covering
multiple arches ?


I don't think so: In that case we'd rather print it once at the 
beginning of a list ("Available x86 CPUs:") instead of printing it in 
each and every line.


Yes that is correct. Hopefully we won't have the same CPU name used
by different architectures...

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 0/3] Make it possible to compile the x86 binaries without FDC

2024-04-29 Thread Kevin Wolf

[ Cc: qemu-block ]

Am 25.04.2024 um 20:43 hat Thomas Huth geschrieben:
> For downstream versions of QEMU, we'd like to be able to compile QEMU
> without the FDC code included (since it's not required for modern VMs
> anymore and the FDC code has rather a bad reputation, see the VENOM CVE).
> 
> The q35 machine can already be instantiated without FDC, but for being
> able to link a binary without the FDC code, the Kconfig file needs some
> tweaks and there are two spots in the pc code that directly call functions
> from the FDC code - those need to be disabled via #ifdefs.
> 
> The third patch changes the i440fx and isapc machine types so that
> they can work without the FDC device, too, in case it has not been
> compiled into the binary. It's marked as RFC since I assume that the
> FDC was originally a fix compononent of these motherboards, so I'm
> unsure whether we should allow the disablement there. OTOH, it seems
> to work fine, and the FDC is only disabled when it is not available
> in the binary, so I hope this patch is fine, too.
> 
> Thomas Huth (3):
>   hw/i386/pc: Allow to compile without CONFIG_FDC_ISA
>   hw/i386/Kconfig: Allow to compile Q35 without FDC_ISA
>   hw/i386: Add the possibility to use i440fx and isapc without FDC
> 
>  hw/i386/pc.c  | 13 +
>  hw/i386/pc_piix.c |  6 --
>  hw/i386/Kconfig   |  2 +-
>  3 files changed, 14 insertions(+), 7 deletions(-)
> 
> -- 
> 2.44.0
> 
>

Re: [PATCH v2 06/33] accel/tcg: Record mmio bytes during translation


On 25/4/24 01:31, Richard Henderson wrote:

This will be able to replace plugin_insn_append, and will
be usable for disassembly.

Signed-off-by: Richard Henderson 
---
  include/exec/translator.h | 12 
  accel/tcg/translator.c| 41 +++
  2 files changed, 53 insertions(+)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 83fe66cba0..974cc4f9c4 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -90,6 +90,18 @@ typedef struct DisasContextBase {
  bool plugin_enabled;
  struct TCGOp *insn_start;
  void *host_addr[2];
+
+/*
+ * Record insn data that we cannot read directly from host memory.
+ * There are only two reasons we cannot use host memory:
+ * (1) We are executing from I/O,
+ * (2) We are executing a synthetic instruction (s390x EX).
+ * In both cases we need record exactly one instruction,
+ * and thus the maximum amount of data we record is limited.
+ */
+int record_start;
+int record_len;
+uint8_t record[32];


Alternatively (matter of style):

   struct {
   unsigned start;
   unsigned len;
   uint8_t data[32];
   }


  } DisasContextBase;


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 07/33] accel/tcg: Record when translator_fake_ldb is used


On 25/4/24 01:31, Richard Henderson wrote:

Remove left-over command from commit dcd092a063 ("accel/tcg: Improve
can_do_io management").


Signed-off-by: Richard Henderson 
---
  include/exec/translator.h | 3 ++-
  accel/tcg/translator.c| 2 ++
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 974cc4f9c4..e92dfba035 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -72,8 +72,8 @@ typedef enum DisasJumpType {
   * @num_insns: Number of translated instructions (including current).
   * @max_insns: Maximum number of instructions to be translated in this TB.
   * @singlestep_enabled: "Hardware" single stepping enabled.
- * @saved_can_do_io: Known value of cpu->neg.can_do_io, or -1 for unknown.
   * @plugin_enabled: TCG plugin enabled in this TB.
+ * @fake_insn: True if translator_fake_ldb used.
   * @insn_start: The last op emitted by the insn_start hook,
   *  which is expected to be INDEX_op_insn_start.


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 1/1] stubs: Add missing qga stubs

2024-04-29 Thread Thomas Huth


On 29/04/2024 12.09, Konstantin Kostiuk wrote:

Hi Paolo,

Are you ok if I merge this patch with other QGA patches?
Or don't you agree with this version of the patch?


Phil asked me in IRC to pick this patch up, so I'll include it in my next 
pull request.


 Thomas

Re: [PATCH v2 09/33] plugins: Copy memory in qemu_plugin_insn_data


On 25/4/24 01:31, Richard Henderson wrote:

Instead of returning a host pointer, copy the data into
storage provided by the caller.

Signed-off-by: Richard Henderson 
---
  include/qemu/qemu-plugin.h | 15 +++
  contrib/plugins/execlog.c  |  5 +++--
  contrib/plugins/howvec.c   |  4 ++--
  plugins/api.c  |  7 +--
  4 files changed, 17 insertions(+), 14 deletions(-)



  /**
   * struct qemu_info_t - system information for plugins
@@ -394,17 +394,16 @@ struct qemu_plugin_insn *
  qemu_plugin_tb_get_insn(const struct qemu_plugin_tb *tb, size_t idx);
  
  /**

- * qemu_plugin_insn_data() - return ptr to instruction data
+ * qemu_plugin_insn_data() - copy instruction data


"copy of "?


   * @insn: opaque instruction handle from qemu_plugin_tb_get_insn()
+ * @dest: destination into which data is copied
+ * @len: length of dest


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 11/33] plugins: Use translator_st for qemu_plugin_insn_data


On 25/4/24 01:31, Richard Henderson wrote:

Use the bytes that we record for the entire TB, rather than
a per-insn GByteArray.  Record the length of the insn in
plugin_gen_insn_end rather than infering from the length
of the array.

Signed-off-by: Richard Henderson 
---
  include/qemu/plugin.h  | 14 +-
  accel/tcg/plugin-gen.c |  7 +--
  accel/tcg/translator.c | 26 --
  plugins/api.c  | 12 +++-
  tcg/tcg.c  |  3 +--
  5 files changed, 14 insertions(+), 48 deletions(-)


Nice!

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 12/33] plugins: Read mem_only directly from TB cflags


On 25/4/24 01:31, Richard Henderson wrote:

Do not pass around a boolean between multiple structures,
just read it from the TranslationBlock in the TCGContext.

Signed-off-by: Richard Henderson 
---
  include/exec/plugin-gen.h |  7 +++
  include/qemu/plugin.h |  3 ---
  accel/tcg/plugin-gen.c|  4 +---
  accel/tcg/translator.c|  2 +-
  plugins/api.c | 14 +-
  5 files changed, 14 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 23/33] target/hexagon: Use translator_ldl in pkt_crosses_page


On 25/4/24 01:31, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/hexagon/translate.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 4/5] qapi: introduce device-sync-config

Vladimir Sementsov-Ogievskiy  writes:

> On 24.04.24 14:48, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> Add command to sync config from vhost-user backend to the device. It
>>> may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
>>> triggered interrupt to the guest or just not available (not supported
>>> by vhost-user server).
>>>
>>> Command result is racy if allow it during migration. Let's allow the
>>> sync only in RUNNING state.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 

[...]

>>> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
>>> index 0117d243c4..296af52322 100644
>>> --- a/include/sysemu/runstate.h
>>> +++ b/include/sysemu/runstate.h
>>> @@ -5,6 +5,7 @@
>>>   #include "qemu/notify.h"
>>>   
>>>   bool runstate_check(RunState state);
>>> +const char *current_run_state_str(void);
>>>   void runstate_set(RunState new_state);
>>>   RunState runstate_get(void);
>>>   bool runstate_is_running(void);
>>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>>> index facaa0bc6a..e8be79c3d5 100644
>>> --- a/qapi/qdev.json
>>> +++ b/qapi/qdev.json
>>> @@ -161,3 +161,24 @@
>>>   ##
>>>   { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
>>> 'data': { '*device': 'str', 'path': 'str' } }
>>> +
>>> +##
>>> +# @device-sync-config:
>>> +#
>>> +# Synchronize config from backend to the guest. The command notifies
>>> +# re-read the device config from the backend and notifies the guest
>>> +# to re-read the config. The command may be used to notify the guest
>>> +# about block device capcity change. Currently only vhost-user-blk
>>> +# device supports this.
>> 
>> I'm not sure I understand this.  To work towards an understanding, I
>> rephrase it, and you point out the errors.
>> 
>>   Synchronize device configuration from host to guest part.  First,
>>   copy the configuration from the host part (backend) to the guest
>>   part (frontend).  Then notify guest software that device
>>   configuration changed.
>
> Correct, thanks

Perhaps

  Synchronize guest-visible device configuration with the backend's
  configuration, and notify guest software that device configuration
  changed.

  This may be useful to notify the guest of a block device capacity
  change.  Currenrly, only vhost-user-blk devices support this.

Next question: what happens when the device *doesn't* support this?

>> I wonder how configuration can get out of sync.  Can you explain?
>> 
>
> The example (and the original feature, which triggered developing this) is 
> vhost disk resize. If vhost-server (backend) doesn't support 
> VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that 
> disk capacity changed.

Sounds like we wouldn't need this command if we could make the
vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG.  Is making it
support it impractical?  Or are there other uses for this command?

>>> +#
>>> +# @id: the device's ID or QOM path
>>> +#
>>> +# Features:
>>> +#
>>> +# @unstable: The command is experimental.
>>> +#
>>> +# Since: 9.1
>>> +##
>>> +{ 'command': 'device-sync-config',
>>> +  'features': [ 'unstable' ],
>>> +  'data': {'id': 'str'} }
>>> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
>>> index 7e075d91c1..cb35ea0b86 100644
>>> --- a/system/qdev-monitor.c
>>> +++ b/system/qdev-monitor.c
>>> @@ -23,6 +23,7 @@
>>>  #include "monitor/monitor.h"
>>>  #include "monitor/qdev.h"
>>>  #include "sysemu/arch_init.h"
>>> +#include "sysemu/runstate.h"
>>>  #include "qapi/error.h"
>>>  #include "qapi/qapi-commands-qdev.h"
>>>  #include "qapi/qmp/dispatch.h"
>>> @@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp)
>>>   }
>>>   }
>>>   
>>> +int qdev_sync_config(DeviceState *dev, Error **errp)
>>> +{
>>> +DeviceClass *dc = DEVICE_GET_CLASS(dev);
>>> +
>>> +if (!dc->sync_config) {
>>> +error_setg(errp, "device-sync-config is not supported for '%s'",
>>> +   object_get_typename(OBJECT(dev)));
>>> +return -ENOTSUP;
>>> +}
>>> +
>>> +return dc->sync_config(dev, errp);
>>> +}
>>> +
>>> +void qmp_device_sync_config(const char *id, Error **errp)
>>> +{
>>> +DeviceState *dev;
>>> +
>>> +/*
>>> + * During migration there is a race between syncing`config and
>>> + * migrating it, so let's just not allow it.
>> 
>> Can you briefly explain the race?
>
> If at the moment of qmp command, corresponding config already migrated to the 
> target, we'll change only the config on source, but on the target we'll still 
> have outdated config.

For RAM, dirty tracking ensures the change gets sent.  But this is
device memory.  Correct?

>>> + *
>>> + * Moreover, let's not rely on setting up interrupts in paused
>>> + * state, which may be a part of migration process.
>> 
>> What dependence exactly are you avoiding?  Config synchronization
>> depending on guest interrupt delivery?
>
> Right, guest is notified by pci_set_irq.

If we allowed it in pau

Re: [PATCH v2 00/33] accel/tcg: Improve disassembly for target and plugin


On 25/4/24 01:30, Richard Henderson wrote:

Based-on: 20240424230224.941028-1-richard.hender...@linaro.org
("[PATCH v3 00/20] Rewrite plugin code generation")


Cc'ing qemu-s390x@ for these 3 patches:


Richard Henderson (32):



   target/s390x: Fix translator_fake_ld length
   target/s390x: Disassemble EXECUTEd instructions



   target/s390x: Use translator_lduw in get_next_pc


(could be reviewed out of the series context IMO).

Thanks,

Phil.

Re: [PATCH v2 29/33] target/riscv: Use translator_ld* for everything


Cc'ing qemu-riscv@

On 25/4/24 01:31, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  target/riscv/translate.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index c999e942e1..2c27fd4ce1 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -20,7 +20,6 @@
  #include "qemu/log.h"
  #include "cpu.h"
  #include "tcg/tcg-op.h"
-#include "exec/cpu_ldst.h"
  #include "exec/exec-all.h"
  #include "exec/helper-proto.h"
  #include "exec/helper-gen.h"
@@ -1082,7 +1081,7 @@ static uint32_t opcode_at(DisasContextBase *dcbase, 
target_ulong pc)
  CPUState *cpu = ctx->cs;
  CPURISCVState *env = cpu_env(cpu);
  
-return cpu_ldl_code(env, pc);

+return translator_ldl(env, &ctx->base, pc);
  }
  
  /* Include insn module translation function */

@@ -1243,7 +1242,8 @@ static void riscv_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
  unsigned page_ofs = ctx->base.pc_next & ~TARGET_PAGE_MASK;
  
  if (page_ofs > TARGET_PAGE_SIZE - MAX_INSN_LEN) {

-uint16_t next_insn = cpu_lduw_code(env, ctx->base.pc_next);
+uint16_t next_insn =
+translator_lduw(env, &ctx->base, ctx->base.pc_next);
  int len = insn_len(next_insn);
  
  if (!is_same_page(&ctx->base, ctx->base.pc_next + len - 1)) {

Re: [PATCH v3 01/20] tcg: Make tcg/helper-info.h self-contained


On 25/4/24 01:02, Richard Henderson wrote:

Move MAX_CALL_IARGS from tcg.h and include for
the define of TCG_TARGET_REG_BITS.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
  include/tcg/helper-info.h | 3 +++
  include/tcg/tcg.h | 2 --
  tcg/tci.c | 1 +
  3 files changed, 4 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 04/20] plugins: Move function pointer in qemu_plugin_dyn_cb


On 25/4/24 01:02, Richard Henderson wrote:

The out-of-line function pointer is mutually exclusive
with inline expansion, so move it into the union.
Wrap the pointer in a structure named 'regular' to match
PLUGIN_CB_REGULAR.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
  include/qemu/plugin.h  | 4 +++-
  accel/tcg/plugin-gen.c | 4 ++--
  plugins/core.c | 8 
  3 files changed, 9 insertions(+), 7 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 02/20] tcg: Pass function pointer to tcg_gen_call*


On 25/4/24 01:02, Richard Henderson wrote:

For normal helpers, read the function pointer from the
structure earlier.  For plugins, this will allow the
function pointer to come from elsewhere.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
  include/tcg/tcg.h | 21 +---
  include/exec/helper-gen.h.inc | 24 ---
  tcg/tcg.c | 45 +++
  3 files changed, 52 insertions(+), 38 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 6.1.y] virtio_net: Do not send RSS key if it is not supported

2024-04-29 Thread Greg KH

On Wed, Apr 24, 2024 at 03:57:04AM -0700, Konstantin Ovsepian wrote:
> From: Breno Leitao 
> 
> commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream
> 
> There is a bug when setting the RSS options in virtio_net that can break
> the whole machine, getting the kernel into an infinite loop.
> 
> Running the following command in any QEMU virtual machine with virtionet
> will reproduce this problem:
> 
> # ethtool -X eth0  hfunc toeplitz
> 
> This is how the problem happens:
> 
> 1) ethtool_set_rxfh() calls virtnet_set_rxfh()
> 
> 2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
> 
> 3) virtnet_commit_rss_command() populates 4 entries for the rss
> scatter-gather
> 
> 4) Since the command above does not have a key, then the last
> scatter-gatter entry will be zeroed, since rss_key_size == 0.
> sg_buf_size = vi->rss_key_size;
> 
> 5) This buffer is passed to qemu, but qemu is not happy with a buffer
> with zero length, and do the following in virtqueue_map_desc() (QEMU
> function):
> 
>   if (!sz) {
>   virtio_error(vdev, "virtio: zero sized buffers are not allowed");
> 
> 6) virtio_error() (also QEMU function) set the device as broken
> 
> vdev->broken = true;
> 
> 7) Qemu bails out, and do not repond this crazy kernel.
> 
> 8) The kernel is waiting for the response to come back (function
> virtnet_send_command())
> 
> 9) The kernel is waiting doing the following :
> 
>   while (!virtqueue_get_buf(vi->cvq, &tmp) &&
>!virtqueue_is_broken(vi->cvq))
> cpu_relax();
> 
> 10) None of the following functions above is true, thus, the kernel
> loops here forever. Keeping in mind that virtqueue_is_broken() does
> not look at the qemu `vdev->broken`, so, it never realizes that the
> vitio is broken at QEMU side.
> 
> Fix it by not sending RSS commands if the feature is not available in
> the device.
> 
> Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
> Cc: sta...@vger.kernel.org
> Cc: qemu-devel@nongnu.org
> Signed-off-by: Breno Leitao 
> Reviewed-by: Heng Qi 
> Reviewed-by: Xuan Zhuo 
> Signed-off-by: David S. Miller 
> (cherry picked from commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47)
> Signed-off-by: Konstantin Ovsepian 
> ---
>  drivers/net/virtio_net.c | 26 ++
>  1 file changed, 22 insertions(+), 4 deletions(-)

Now queued up,t hanks.

greg k-h

Re: [PATCH v6 08/10] util/bufferiszero: Simplify test_buffer_is_zero_next_accel


On 25/4/24 00:57, Richard Henderson wrote:

Because the three alternatives are monotonic, we don't
need to keep a couple of bitmasks, just identify the
strongest alternative at startup.

Signed-off-by: Richard Henderson 
---
  util/bufferiszero.c | 56 ++---
  1 file changed, 22 insertions(+), 34 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v6 09/10] util/bufferiszero: Add simd acceleration for aarch64


Cc'ing Ard :)

On 25/4/24 00:57, Richard Henderson wrote:

Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
double-check with the compiler flags for __ARM_NEON and don't bother with
a runtime check.  Otherwise, model the loop after the x86 SSE2 function.

Use UMAXV for the vector reduction.  This is 3 cycles on cortex-a76 and
2 cycles on neoverse-n1.

Signed-off-by: Richard Henderson 
---
  util/bufferiszero.c | 77 +
  1 file changed, 77 insertions(+)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index ff003dc40e..38477a3eac 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -213,7 +213,84 @@ bool test_buffer_is_zero_next_accel(void)
  }
  return false;
  }
+
+#elif defined(__aarch64__) && defined(__ARM_NEON)
+#include 
+
+#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1))
+
+static bool buffer_is_zero_simd(const void *buf, size_t len)
+{
+uint32x4_t t0, t1, t2, t3;
+
+/* Align head/tail to 16-byte boundaries.  */
+const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16);
+const uint32x4_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16);
+
+/* Unaligned loads at head/tail.  */
+t0 = vld1q_u32(buf) | vld1q_u32(buf + len - 16);
+
+/* Collect a partial block at tail end.  */
+t1 = e[-7] | e[-6];
+t2 = e[-5] | e[-4];
+t3 = e[-3] | e[-2];
+t0 |= e[-1];
+REASSOC_BARRIER(t0, t1);
+REASSOC_BARRIER(t2, t3);
+t0 |= t1;
+t2 |= t3;
+REASSOC_BARRIER(t0, t2);
+t0 |= t2;
+
+/*
+ * Loop over complete 128-byte blocks.
+ * With the head and tail removed, e - p >= 14, so the loop
+ * must iterate at least once.
+ */
+do {
+/*
+ * Reduce via UMAXV.  Whatever the actual result,
+ * it will only be zero if all input bytes are zero.
+ */
+if (unlikely(vmaxvq_u32(t0) != 0)) {
+return false;
+}
+
+t0 = p[0] | p[1];
+t1 = p[2] | p[3];
+t2 = p[4] | p[5];
+t3 = p[6] | p[7];
+REASSOC_BARRIER(t0, t1);
+REASSOC_BARRIER(t2, t3);
+t0 |= t1;
+t2 |= t3;
+REASSOC_BARRIER(t0, t2);
+t0 |= t2;
+p += 8;
+} while (p < e - 7);
+
+return vmaxvq_u32(t0) == 0;
+}
+
+static biz_accel_fn const accel_table[] = {
+buffer_is_zero_int_ge256,
+buffer_is_zero_simd,
+};
+
+static unsigned accel_index = 1;
+#define INIT_ACCEL buffer_is_zero_simd
+
+bool test_buffer_is_zero_next_accel(void)
+{
+if (accel_index != 0) {
+buffer_is_zero_accel = accel_table[--accel_index];
+return true;
+}
+return false;
+}
+
  #else
+
  bool test_buffer_is_zero_next_accel(void)
  {
  return false;

Patch "virtio_net: Do not send RSS key if it is not supported" has been added to the 6.1-stable tree

2024-04-29 Thread gregkh



This is a note to let you know that I've just added the patch titled

virtio_net: Do not send RSS key if it is not supported

to the 6.1-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 virtio_net-do-not-send-rss-key-if-it-is-not-supported.patch
and it can be found in the queue-6.1 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 Mon Sep 17 00:00:00 2001
From: Breno Leitao 
Date: Wed, 3 Apr 2024 08:43:12 -0700
Subject: virtio_net: Do not send RSS key if it is not supported

From: Breno Leitao 

commit 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 upstream.

There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.

Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:

# ethtool -X eth0  hfunc toeplitz

This is how the problem happens:

1) ethtool_set_rxfh() calls virtnet_set_rxfh()

2) virtnet_set_rxfh() calls virtnet_commit_rss_command()

3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather

4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;

5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):

  if (!sz) {
  virtio_error(vdev, "virtio: zero sized buffers are not allowed");

6) virtio_error() (also QEMU function) set the device as broken

vdev->broken = true;

7) Qemu bails out, and do not repond this crazy kernel.

8) The kernel is waiting for the response to come back (function
virtnet_send_command())

9) The kernel is waiting doing the following :

  while (!virtqueue_get_buf(vi->cvq, &tmp) &&
 !virtqueue_is_broken(vi->cvq))
  cpu_relax();

10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.

Fix it by not sending RSS commands if the feature is not available in
the device.

Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: sta...@vger.kernel.org
Cc: qemu-devel@nongnu.org
Signed-off-by: Breno Leitao 
Reviewed-by: Heng Qi 
Reviewed-by: Xuan Zhuo 
Signed-off-by: David S. Miller 
Signed-off-by: Konstantin Ovsepian 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/net/virtio_net.c |   26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2948,19 +2948,35 @@ static int virtnet_get_rxfh(struct net_d
 static int virtnet_set_rxfh(struct net_device *dev, const u32 *indir, const u8 
*key, const u8 hfunc)
 {
struct virtnet_info *vi = netdev_priv(dev);
+   bool update = false;
int i;
 
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
if (indir) {
+   if (!vi->has_rss)
+   return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = indir[i];
+   update = true;
}
-   if (key)
+
+   if (key) {
+   /* If either _F_HASH_REPORT or _F_RSS are negotiated, the
+* device provides hash calculation capabilities, that is,
+* hash_key is configured.
+*/
+   if (!vi->has_rss && !vi->has_rss_hash_report)
+   return -EOPNOTSUPP;
+
memcpy(vi->ctrl->rss.key, key, vi->rss_key_size);
+   update = true;
+   }
 
-   virtnet_commit_rss_command(vi);
+   if (update)
+   virtnet_commit_rss_command(vi);
 
return 0;
 }
@@ -3852,13 +3868,15 @@ static int virtnet_probe(struct virtio_d
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
 
-   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
 
-   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+   }
+
+   if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, 
rss_max_key_size));
 


Patches currently in stable-queue which might be from lei...@debian.org are

q

Re: [PATCH v6 10/10] tests/bench: Add bufferiszero-bench


On 25/4/24 00:57, Richard Henderson wrote:

Benchmark each acceleration function vs an aligned buffer of zeros.

Signed-off-by: Richard Henderson 
---
  tests/bench/bufferiszero-bench.c | 47 
  tests/bench/meson.build  |  1 +
  2 files changed, 48 insertions(+)
  create mode 100644 tests/bench/bufferiszero-bench.c




+static void test(const void *opaque)
+{
+size_t max = 64 * KiB;
+void *buf = g_malloc0(max);
+int accel_index = 0;


Nitpicking, accel_index could be unsigned like in util/bufferiszero.c.


+
+do {
+if (accel_index != 0) {
+g_test_message("%s", "");  /* gnu_printf Werror for simple "" */
+}
+for (size_t len = 1 * KiB; len <= max; len *= 4) {
+double total = 0.0;
+
+g_test_timer_start();
+do {
+buffer_is_zero_ge256(buf, len);
+total += len;
+} while (g_test_timer_elapsed() < 0.5);
+
+total /= MiB;
+g_test_message("buffer_is_zero #%d: %2zuKB %8.0f MB/sec",
+   accel_index, len / (size_t)KiB,


Thus "buffer_is_zero #%u:..."

Regardless,
Reviewed-by: Philippe Mathieu-Daudé 


+   total / g_test_timer_last());
+}
+accel_index++;
+} while (test_buffer_is_zero_next_accel());
+
+g_free(buf);
+}

Re: [PULL 1/1] hw/ufs: Fix buffer overflow bug

2024-04-29 Thread Michael Tokarev


29.04.2024 06:25, Jeuk Kim wrote:

From: Jeuk Kim 

It fixes the buffer overflow vulnerability in the ufs device.
The bug was detected by sanitizers.


...

Resolves: #2299
Fixes: 329f16624499 ("hw/ufs: Support for Query Transfer Requests")
Reported-by: Zheyu Ma 
Signed-off-by: Jeuk Kim 


Cc: qemu-stable@ for 8.2 and 9.0 series.

Please do not forget to Cc qemu-stable@ for relevant changes.

Thanks,

/mjt

Re: [PATCH v6 07/10] util/bufferiszero: Introduce biz_accel_fn typedef


On 25/4/24 00:57, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  util/bufferiszero.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)




@@ -178,13 +179,15 @@ buffer_zero_avx2(const void *buf, size_t len)
  }
  #endif /* CONFIG_AVX2_OPT */
  
+

+


Spurious new lines :)

Reviewed-by: Philippe Mathieu-Daudé 


  static unsigned __attribute__((noinline))

Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant

2024-04-29 Thread Daniel P . Berrangé

On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote:
> From: Alexander Monakov 
> 
> Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> routines are invoked much more rarely in normal use when most buffers
> are non-zero. This makes use of AVX512 unprofitable, as it incurs extra
> frequency and voltage transition periods during which the CPU operates
> at reduced performance, as described in
> https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html

This is describing limitations of Intel's AVX512 implementation.

AMD's AVX512 implementation is said to not have the kind of
power / frequency limitations that Intel's does:

  https://www.mersenneforum.org/showthread.php?p=614191

  "Overall, AMD's AVX512 implementation beat my expectations.
   I was expecting something similar to Zen1's "double-pumping"
   of AVX with half the register file and cross-lane instructions
   being super slow. But this is not the case on Zen4. The lack
   of power or thermal issues combined with stellar shuffle support
   makes it completely worthwhile to use from a developer standpoint.
   If your code can vectorize without excessive wasted computation,
   then go all the way to 512-bit. AMD not only made this worthwhile,
   but *incentivizes* it with the power savings. And if in the future
   AMD decides to widen things up, you may get a 2x speedup for free."

IOW, it sounds like we could be sacrificing performance on modern
AMD Genoa generation CPUs by removing the AVX512 impl

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC 1/2] iova_tree: add an id member to DMAMap

2024-04-29 Thread Jonah Palmer





On 4/29/24 4:14 AM, Eugenio Perez Martin wrote:

On Thu, Apr 25, 2024 at 7:44 PM Si-Wei Liu  wrote:




On 4/24/2024 12:33 AM, Eugenio Perez Martin wrote:

On Wed, Apr 24, 2024 at 12:21 AM Si-Wei Liu  wrote:



On 4/22/2024 1:49 AM, Eugenio Perez Martin wrote:

On Sat, Apr 20, 2024 at 1:50 AM Si-Wei Liu  wrote:


On 4/19/2024 1:29 AM, Eugenio Perez Martin wrote:

On Thu, Apr 18, 2024 at 10:46 PM Si-Wei Liu  wrote:

On 4/10/2024 3:03 AM, Eugenio Pérez wrote:

IOVA tree is also used to track the mappings of virtio-net shadow
virtqueue.  This mappings may not match with the GPA->HVA ones.

This causes a problem when overlapped regions (different GPA but same
translated HVA) exists in the tree, as looking them by HVA will return
them twice.  To solve this, create an id member so we can assign unique
identifiers (GPA) to the maps.

Signed-off-by: Eugenio Pérez 
---
  include/qemu/iova-tree.h | 5 +++--
  util/iova-tree.c | 3 ++-
  2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 2a10a7052e..34ee230e7d 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -36,6 +36,7 @@ typedef struct DMAMap {
  hwaddr iova;
  hwaddr translated_addr;
  hwaddr size;/* Inclusive */
+uint64_t id;
  IOMMUAccessFlags perm;
  } QEMU_PACKED DMAMap;
  typedef gboolean (*iova_tree_iterator)(DMAMap *map);
@@ -100,8 +101,8 @@ const DMAMap *iova_tree_find(const IOVATree *tree, const 
DMAMap *map);
   * @map: the mapping to search
   *
   * Search for a mapping in the iova tree that translated_addr overlaps 
with the
- * mapping range specified.  Only the first found mapping will be
- * returned.
+ * mapping range specified and map->id is equal.  Only the first found
+ * mapping will be returned.
   *
   * Return: DMAMap pointer if found, or NULL if not found.  Note that
   * the returned DMAMap pointer is maintained internally.  User should
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 536789797e..0863e0a3b8 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -97,7 +97,8 @@ static gboolean iova_tree_find_address_iterator(gpointer key, 
gpointer value,

  needle = args->needle;
  if (map->translated_addr + map->size < needle->translated_addr ||
-needle->translated_addr + needle->size < map->translated_addr) {
+needle->translated_addr + needle->size < map->translated_addr ||
+needle->id != map->id) {

It looks this iterator can also be invoked by SVQ from
vhost_svq_translate_addr() -> iova_tree_find_iova(), where guest GPA
space will be searched on without passing in the ID (GPA), and exact
match for the same GPA range is not actually needed unlike the mapping
removal case. Could we create an API variant, for the SVQ lookup case
specifically? Or alternatively, add a special flag, say skip_id_match to
DMAMap, and the id match check may look like below:

(!needle->skip_id_match && needle->id != map->id)

I think vhost_svq_translate_addr() could just call the API variant or
pass DMAmap with skip_id_match set to true to svq_iova_tree_find_iova().


I think you're totally right. But I'd really like to not complicate
the API of the iova_tree more.

I think we can look for the hwaddr using memory_region_from_host and
then get the hwaddr. It is another lookup though...

Yeah, that will be another means of doing translation without having to
complicate the API around iova_tree. I wonder how the lookup through
memory_region_from_host() may perform compared to the iova tree one, the
former looks to be an O(N) linear search on a linked list while the
latter would be roughly O(log N) on an AVL tree?

Even worse, as the reverse lookup (from QEMU vaddr to SVQ IOVA) is
linear too. It is not even ordered.

Oh Sorry, I misread the code and I should look for g_tree_foreach ()
instead of g_tree_search_node(). So the former is indeed linear
iteration, but it looks to be ordered?

https://urldefense.com/v3/__https://github.com/GNOME/glib/blob/main/glib/gtree.c*L1115__;Iw!!ACWV5N9M2RV99hQ!Ng2rLfRd9tLyNTNocW50Mf5AcxSt0uF0wOdv120djff-z_iAdbujYK-jMi5UC1DZLxb1yLUv2vV0j3wJo8o$

The GPA / IOVA are ordered but we're looking by QEMU's vaddr.

If we have these translations:
[0x1000, 0x2000] -> [0x1, 0x11000]
[0x2000, 0x3000] -> [0x6000, 0x7000]

We will see them in this order, so we cannot stop the search at the first node.

Yeah, reverse lookup is unordered indeed, anyway.




But apart from this detail you're right, I have the same concerns with
this solution too. If we see a hard performance regression we could go
to more complicated solutions, like maintaining a reverse IOVATree in
vhost-iova-tree too. First RFCs of SVQ did that actually.

Agreed, yeap we can use memory_region_from_host for now.  Any reason why
reverse IOVATree was dropped, lack of users? But now we have one!


No, it is just simplicity. We already have

Re: [PATCH v2] mc146818rtc: add a way to generate RTC interrupts via QMP

Philippe Mathieu-Daudé  writes:

> On 29/4/24 11:41, Daniil Tatianin wrote:
>> This can be used to force-synchronize the time in guest after a long
>> stop-cont pause, which can be useful for serverless-type workload.
>> Also add a comment to highlight the fact that this (and one other QMP
>> command) only works for the MC146818 RTC controller.
>> Signed-off-by: Daniil Tatianin 
>> ---
>> Changes since v0:
>> - Rename to rtc-inject-irq to match other similar API
>> - Add a comment to highlight that this only works for the I386 RTC
>> Changes since v1:
>> - Added a description below the QMP command to explain how it can be
>>used and what it does.

[...]

>> diff --git a/qapi/misc-target.json b/qapi/misc-target.json
>> index 4e0a6492a9..0f2479f8f4 100644
>> --- a/qapi/misc-target.json
>> +++ b/qapi/misc-target.json
>> @@ -19,6 +19,24 @@
>>   { 'command': 'rtc-reset-reinjection',
>> 'if': 'TARGET_I386' }
>> +##
>> +# @rtc-inject-irq:
>> +#
>> +# Inject an RTC interrupt. This command forces the guest to synchornize

synchronize

>> +# the time with RTC. This is useful after a long stop-cont pause, which
>> +# is common for serverless-type workload.

docs/devel/qapi-code-gen.rst:

For legibility, wrap text paragraphs so every line is at most 70
characters long.

Separate sentences with two spaces.

>> +#
>> +# Since: 9.1
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "rtc-inject-irq" }
>> +# <- { "return": {} }
>> +#
>> +##
>> +{ 'command': 'rtc-inject-irq',
>> +  'if': 'TARGET_I386' }
>
> Why is that restricted to x86? Ah, this is specific to the MC146818
> RTC... Other machines use hw accelerators and the MC146818, aren't
> we interested in synchronizing them the same way?
>
> Personally I'd name this command 'mc146818rtc-raise-irq-broadcast',
> KISS.

I might be wrong, but the *interface* looks general to me, only nobody
bothered to implement for the other RTCs.

Re: [PATCH v6 02/10] util/bufferiszero: Remove AVX512 variant

2024-04-29 Thread Alexander Monakov


On Mon, 29 Apr 2024, Daniel P. Berrangé wrote:

> On Wed, Apr 24, 2024 at 03:56:57PM -0700, Richard Henderson wrote:
> > From: Alexander Monakov 
> > 
> > Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
> > routines are invoked much more rarely in normal use when most buffers
> > are non-zero. This makes use of AVX512 unprofitable, as it incurs extra
> > frequency and voltage transition periods during which the CPU operates
> > at reduced performance, as described in
> > https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html
> 
> This is describing limitations of Intel's AVX512 implementation.
> 
> AMD's AVX512 implementation is said to not have the kind of
> power / frequency limitations that Intel's does:
> 
>   https://www.mersenneforum.org/showthread.php?p=614191
> 
>   "Overall, AMD's AVX512 implementation beat my expectations.
>I was expecting something similar to Zen1's "double-pumping"
>of AVX with half the register file and cross-lane instructions
>being super slow. But this is not the case on Zen4. The lack
>of power or thermal issues combined with stellar shuffle support
>makes it completely worthwhile to use from a developer standpoint.
>If your code can vectorize without excessive wasted computation,
>then go all the way to 512-bit. AMD not only made this worthwhile,
>but *incentivizes* it with the power savings. And if in the future
>AMD decides to widen things up, you may get a 2x speedup for free."
> 
> IOW, it sounds like we could be sacrificing performance on modern
> AMD Genoa generation CPUs by removing the AVX512 impl

No, the new implementation saturates load ports, and Genoa runs 512-bit
AVX instructions at half throughput compared to their 256-bit counterparts
(so one 512-bit load or two 256-bit loads per cycle), so there's no
obvious reason why this patch would sacrifice performance there.

Maybe it could, indirectly, by lowering the turbo clock limit due to
higher front-end activity, but I don't have access to a Zen 4 machine
to check, and even so it would be a few percent, not 2x.

Alexander

Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP

Philippe Mathieu-Daudé  writes:

> Hi Daniil, Markus,
>
> On 26/4/24 10:39, Markus Armbruster wrote:
>> Daniil Tatianin  writes:
>> 
>>> This can be used to force-synchronize the time in guest after a long
>>> stop-cont pause, which can be useful for serverless-type workload.
>>>
>>> Signed-off-by: Daniil Tatianin 
>>> ---
>>>   hw/rtc/mc146818rtc.c | 15 +++
>>>   include/hw/rtc/mc146818rtc.h |  1 +
>>>   qapi/misc-target.json| 16 
>>>   3 files changed, 32 insertions(+)
>>>
>>> diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
>>> index f4c1869232..6980a78d5f 100644
>>> --- a/hw/rtc/mc146818rtc.c
>>> +++ b/hw/rtc/mc146818rtc.c
>>> @@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
>>>   }
>>>   }
>>>   +void qmp_rtc_notify(Error **errp)
>>> +{
>>> +MC146818RtcState *s;
>>> +
>>> +/*
>>> + * See:
>>> + * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt
>>> + */
>>> +QLIST_FOREACH(s, &rtc_devices, link) {
>>> +s->cmos_data[RTC_REG_B] |= REG_B_UIE;
>>> +s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;
>>> +qemu_irq_raise(s->irq);
>>> +}
>>> +}
>>> +
>> Note for later: qmp_rtc_notify() works on all realized mc146818rtc
>> devices.  Other kinds of RTC devices are silently ignored.  Just like
>> qmp_rtc_reset_reinjection().
>
> IMO to avoid any future ambiguity (in heterogeneous machines), this
> command must take a QOM device path (or a list of) and only notify
> those.

Let's compare:

• With QOM path:

  · You need to know the machine's RTC device(s).

Unfortunately, this is bothersome, as the QOM path is not stable.

For Q35, it's generally "/machine/unattached/device[N]/rtc", but N
varies with configuration (TCG N=2, KVM N=3 for me), and it might
vary with machine type version.  That's because the machine code
creates ICH9-LPC without a proper name.  We do that a lot.  I hate
it.

Likewise for i440FX with PIIX3 instead of ICH9-LPC.

For isapc, it's /machine/unattached/device[3].  I suspect the 3
isn't reliable there, either.

microvm doesn't seem to have an RTC by default.

  · If the device so named doesn't support IRQ inject, the command
should fail.

  · Could be generalized to non-RTC devices when that's useful.

• Broadcast:

  · You don't need to know the machine's RTC device(s).

  · If there are multiple RTC devices that support IRQ inject, we inject
for each of them.  There is no way to select specific RTCs.

  · If there is no RTC device that supports IRQ inject, the command does
nothing silently.

I don't like silent failures.  It could be made to fail instead.

If it wasn't for the unstable QOM path problem, I'd advise against
the broadcast interface.

Thoughts?

[PATCH 1/1] vhost-vsock: add VIRTIO_F_RING_PACKED to feaure_bits

2024-04-29 Thread Halil Pasic

Not having VIRTIO_F_RING_PACKED in feature_bits[] is a problem when the
vhost-vsock device does not offer the feature bit VIRTIO_F_RING_PACKED
but the in QEMU device is configured to try to use the packed layout
(the virtio property "packed" is on).

As of today, the  Linux kernel vhost-vsock device does not support the
packed queue layout (as vhost does not support packed), and does not
offer VIRTIO_F_RING_PACKED. Thus when for example a vhost-vsock-ccw is
used with packed=on, VIRTIO_F_RING_PACKED ends up being negotiated,
despite the fact that the device does not actually support it, and
one gets to keep the pieces.

Fixes: 74b3e46630 ("virtio: add property to enable packed virtqueue")
Reported-by: Marc Hartmayer 
Signed-off-by: Halil Pasic 
---

This is a minimal fix, that follows the current patterns in the
codebase, and not necessarily the best one.

I don't quite understand why vhost_get_features() works the way
it works. Fortunately it is documented, so let me quote the
documentation.

"""
/**
 * vhost_get_features() - return a sanitised set of feature bits
 * @hdev: common vhost_dev structure
 * @feature_bits: pointer to terminated table of feature bits
 * @features: original feature set
 *
 * This returns a set of features bits that is an intersection of what
 * is supported by the vhost backend (hdev->features), the supported
 * feature_bits and the requested feature set.
 */
uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
uint64_t features);
"""

Based on this I would expect the following statement to be true: if a
feature bit is not in feature_bits then the corresponding bit in the
return value is guaranteed to be not set (regardless of the values of
the 3rd arguments and hdev->features).

The implementation however does the following: if the feature bit is not
listed in feature_bits (2nd argument) then the corresponding bit in the
return value is set iff the corresponding bit in the 3rd argument
(features) is set (i.e. it does not matter what hdev->features and thus
the vhost backend says).

The documentation however does kind of state, that feature_bits is
supposed to contain the supported features. And under the assumption
that feature bit not in feature_bits implies that the corresponding bit
must not be set in the 3rd argument (features), then even with the
current implementation we do end up with the intersection of the three
as stated. And then vsock would be at fault for violating that
assumption, and my fix would be the best thing to do -- I guess.

Is the implementation the way it is for a good reason, I can't judge
that with certainty for myself.

But I'm pretty convinced that the current approach is fragile,
especially for the feature bits form the range 24 to 40, as those are
not specific to a device.

BTW vsock also lacks VIRTIO_F_ACCESS_PLATFORM, and VIRTIO_F_RING_RESET
as well while vhost-net has both.

If our design is indeed to make the individual devices responsible for
having a complete list of possible features in feature_bits, then at
least having a common macro for the non-device specific features would
make sense to me.

On the other hand, I'm also very happy to send a patch which changes the
behavior of vhost_get_features(), should the community decide that the
current behavior does not make all that much sense -- I lean towards:
probably it does not make much sense, but things like
VIRTIO_F_ACCESS_PLATFORM, which are mandatory feature bits, need careful
consideration, because there vhost can't do so we just won't offer it
and proceed on our merry way is not the right behavior.

Please comment!

Regards,
Halil
---
 hw/virtio/vhost-vsock-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index 12ea87d7a7..fd88df2560 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -22,6 +22,7 @@
 const int feature_bits[] = {
 VIRTIO_VSOCK_F_SEQPACKET,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_RING_PACKED,
 VHOST_INVALID_FEATURE_BIT
 };
 

base-commit: fd87be1dada5672f877e03c2ca8504458292c479
-- 
2.40.1

[PULL 0/6] Block jobs patches for 2024-04-29

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

  Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

  https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29

for you to fetch changes up to 2ca7608c6b8d57fd6347b11af12a0f035263efef:

  iotests: add backup-discard-source (2024-04-29 13:35:30 +0300)


Block jobs patches for 2024-04-29

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source

 block/backup.c |   5 +-
 block/block-copy.c |  12 +-
 block/copy-before-write.c  |  39 +--
 block/copy-before-write.h  |   1 +
 block/mirror.c |  11 +-
 block/replication.c|   4 +-
 blockdev.c |   2 +-
 include/block/block-common.h   |   2 +
 include/block/block-copy.h |   2 +
 include/block/block_int-global-state.h |   2 +-
 qapi/block-core.json   |   4 +
 tests/qemu-iotests/257.out | 112 +-
 tests/qemu-iotests/tests/backup-discard-source | 152 
+
 tests/qemu-iotests/tests/backup-discard-source.out |   5 +
 14 files changed, 281 insertions(+), 72 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source

 block/backup.c|   5 +-
 block/block-copy.c|  12 +-
 block/copy-before-write.c |  39 -
 block/copy-before-write.h |   1 +
 block/mirror.c|  11 +-
 block/replication.c   |   4 +-
 blockdev.c|   2 +-
 include/block/block-common.h  |   2 +
 include/block/block-copy.h|   2 +
 include/block/block_int-global-state.h|   2 +-
 qapi/block-core.json  |   4 +
 tests/qemu-iotests/257.out| 112 ++---
 .../qemu-iotests/tests/backup-discard-source  | 152 ++
 .../tests/backup-discard-source.out   |   5 +
 14 files changed, 281 insertions(+), 72 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

-- 
2.34.1

[PULL 6/6] iotests: add backup-discard-source

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-6-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 .../qemu-iotests/tests/backup-discard-source  | 152 ++
 .../tests/backup-discard-source.out   |   5 +
 2 files changed, 157 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

diff --git a/tests/qemu-iotests/tests/backup-discard-source 
b/tests/qemu-iotests/tests/backup-discard-source
new file mode 100755
index 00..2391b12acd
--- /dev/null
+++ b/tests/qemu-iotests/tests/backup-discard-source
@@ -0,0 +1,152 @@
+#!/usr/bin/env python3
+#
+# Test backup discard-source parameter
+#
+# Copyright (c) Virtuozzo International GmbH.
+# Copyright (c) Yandex
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+
+import iotests
+from iotests import qemu_img_create, qemu_img_map, qemu_io
+
+
+temp_img = os.path.join(iotests.test_dir, 'temp')
+source_img = os.path.join(iotests.test_dir, 'source')
+target_img = os.path.join(iotests.test_dir, 'target')
+size = '1M'
+
+
+def get_actual_size(vm, node_name):
+nodes = vm.cmd('query-named-block-nodes', flat=True)
+node = next(n for n in nodes if n['node-name'] == node_name)
+return node['image']['actual-size']
+
+
+class TestBackup(iotests.QMPTestCase):
+def setUp(self):
+qemu_img_create('-f', iotests.imgfmt, source_img, size)
+qemu_img_create('-f', iotests.imgfmt, temp_img, size)
+qemu_img_create('-f', iotests.imgfmt, target_img, size)
+qemu_io('-c', 'write 0 1M', source_img)
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+self.vm.cmd('blockdev-add', {
+'node-name': 'cbw',
+'driver': 'copy-before-write',
+'file': {
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': source_img,
+}
+},
+'target': {
+'driver': iotests.imgfmt,
+'discard': 'unmap',
+'node-name': 'temp',
+'file': {
+'driver': 'file',
+'filename': temp_img
+}
+}
+})
+
+self.vm.cmd('blockdev-add', {
+'node-name': 'access',
+'discard': 'unmap',
+'driver': 'snapshot-access',
+'file': 'cbw'
+})
+
+self.vm.cmd('blockdev-add', {
+'driver': iotests.imgfmt,
+'node-name': 'target',
+'file': {
+'driver': 'file',
+'filename': target_img
+}
+})
+
+self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024)
+
+def tearDown(self):
+# That should fail, because region is discarded
+self.vm.hmp_qemu_io('access', 'read 0 1M')
+
+self.vm.shutdown()
+
+self.assertTrue('read failed: Permission denied' in self.vm.get_log())
+
+# Final check that temp image is empty
+mapping = qemu_img_map(temp_img)
+self.assertEqual(len(mapping), 1)
+self.assertEqual(mapping[0]['start'], 0)
+self.assertEqual(mapping[0]['length'], 1024 * 1024)
+self.assertEqual(mapping[0]['data'], False)
+
+os.remove(temp_img)
+os.remove(source_img)
+os.remove(target_img)
+
+def do_backup(self):
+self.vm.cmd('blockdev-backup', device='access',
+sync='full', target='target',
+job_id='backup0',
+discard_source=True)
+
+self.vm.event_wait(name='BLOCK_JOB_COMPLETED')
+
+def test_discard_written(self):
+"""
+1. Guest writes
+2. copy-before-write operation, data is stored to temp
+3. start backup(discard_source=True), check that data is
+   removed from temp
+"""
+# Trigger copy-before-write operation
+result = self.vm.hmp_qemu_io('cbw', 'write 0 1M')
+self.assert_qmp(result, 'return', '')
+
+# Check that data is written to temporary image
+self.assertGreater(get_actual_size(self

[PULL 1/6] blockcommit: Reopen base image as RO after abort

From: Alexander Ivanov 

If a blockcommit is aborted the base image remains in RW mode, that leads
to a fail of subsequent live migration.

How to reproduce:
  $ virsh snapshot-create-as vm snp1 --disk-only

  *** write something to the disk inside the guest ***

  $ virsh blockcommit vm vda --active --shallow && virsh blockjob vm vda --abort
  $ lsof /vzt/vm.qcow2
  COMMAND  PID USER   FD   TYPE DEVICE   SIZE/OFF NODE NAME
  qemu-syst 433203 root   45u   REG  253,0 1724776448  133 /vzt/vm.qcow2
  $ cat /proc/433203/fdinfo/45
  pos:0
  flags:  02140002 < The last 2 means RW mode

If the base image is in RW mode at the end of blockcommit and was in RO
mode before blockcommit, reopen the base BDS in RO.

Signed-off-by: Alexander Ivanov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Message-Id: <20240404091136.129811-1-alexander.iva...@virtuozzo.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1bdce3b657..61f0a717b7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -93,6 +93,7 @@ typedef struct MirrorBlockJob {
 int64_t active_write_bytes_in_flight;
 bool prepared;
 bool in_drain;
+bool base_ro;
 } MirrorBlockJob;
 
 typedef struct MirrorBDSOpaque {
@@ -794,6 +795,10 @@ static int mirror_exit_common(Job *job)
 bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 bdrv_graph_wrunlock();
 
+if (abort && s->base_ro && !bdrv_is_read_only(target_bs)) {
+bdrv_reopen_set_read_only(target_bs, true, NULL);
+}
+
 bdrv_drained_end(target_bs);
 bdrv_unref(target_bs);
 
@@ -1717,6 +1722,7 @@ static BlockJob *mirror_start_job(
  bool is_none_mode, BlockDriverState *base,
  bool auto_complete, const char *filter_node_name,
  bool is_mirror, MirrorCopyMode copy_mode,
+ bool base_ro,
  Error **errp)
 {
 MirrorBlockJob *s;
@@ -1800,6 +1806,7 @@ static BlockJob *mirror_start_job(
 bdrv_unref(mirror_top_bs);
 
 s->mirror_top_bs = mirror_top_bs;
+s->base_ro = base_ro;
 
 /* No resize for the target either; while the mirror is still running, a
  * consistent read isn't necessarily possible. We could possibly allow
@@ -2029,7 +2036,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
  speed, granularity, buf_size, backing_mode, zero_target,
  on_source_error, on_target_error, unmap, NULL, NULL,
  &mirror_job_driver, is_none_mode, base, false,
- filter_node_name, true, copy_mode, errp);
+ filter_node_name, true, copy_mode, false, errp);
 }
 
 BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -2058,7 +2065,7 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
  on_error, on_error, true, cb, opaque,
  &commit_active_job_driver, false, base, auto_complete,
  filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
- errp);
+ base_read_only, errp);
 if (!job) {
 goto error_restore_flags;
 }
-- 
2.34.1

[PULL 2/6] block/copy-before-write: fix permission

In case when source node does not have any parents, the condition still
works as required: backup job do create the parent by

  block_job_create -> block_job_add_bdrv -> bdrv_root_attach_child

Still, in this case checking @perm variable doesn't work, as backup job
creates the root blk with empty permissions (as it rely on CBW filter
to require correct permissions and don't want to create extra
conflicts).

So, we should not check @perm.

The hack may be dropped entirely when transactional insertion of
filter (when we don't try to recalculate permissions in intermediate
state, when filter does conflict with original parent of the source
node) merged (old big series
"[PATCH v5 00/45] Transactional block-graph modifying API"[1] and it's
current in-flight part is "[PATCH v8 0/7] blockdev-replace"[2])

[1] https://patchew.org/QEMU/20220330212902.590099-1-vsement...@openvz.org/
[2] https://patchew.org/QEMU/2023101718.932733-1-vsement...@yandex-team.ru/

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-2-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 8aba27a71d..3e3af30c08 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -364,9 +364,13 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, 
BdrvChildRole role,
perm, shared, nperm, nshared);
 
 if (!QLIST_EMPTY(&bs->parents)) {
-if (perm & BLK_PERM_WRITE) {
-*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
-}
+/*
+ * Note, that source child may be shared with backup job. Backup 
job
+ * does create own blk parent on copy-before-write node, so this
+ * works even if source node does not have any parents before 
backup
+ * start
+ */
+*nperm = *nperm | BLK_PERM_CONSISTENT_READ;
 *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
 }
 }
-- 
2.34.1

[PULL 3/6] block/copy-before-write: support unligned snapshot-discard

First thing that crashes on unligned access here is
bdrv_reset_dirty_bitmap(). Correct way is to align-down the
snapshot-discard request.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-3-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 3e3af30c08..6d89af0b29 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -325,14 +325,24 @@ static int coroutine_fn GRAPH_RDLOCK
 cbw_co_pdiscard_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
+uint32_t cluster_size = block_copy_cluster_size(s->bcs);
+int64_t aligned_offset = QEMU_ALIGN_UP(offset, cluster_size);
+int64_t aligned_end = QEMU_ALIGN_DOWN(offset + bytes, cluster_size);
+int64_t aligned_bytes;
+
+if (aligned_end <= aligned_offset) {
+return 0;
+}
+aligned_bytes = aligned_end - aligned_offset;
 
 WITH_QEMU_LOCK_GUARD(&s->lock) {
-bdrv_reset_dirty_bitmap(s->access_bitmap, offset, bytes);
+bdrv_reset_dirty_bitmap(s->access_bitmap, aligned_offset,
+aligned_bytes);
 }
 
-block_copy_reset(s->bcs, offset, bytes);
+block_copy_reset(s->bcs, aligned_offset, aligned_bytes);
 
-return bdrv_co_pdiscard(s->target, offset, bytes);
+return bdrv_co_pdiscard(s->target, aligned_offset, aligned_bytes);
 }
 
 static void GRAPH_RDLOCK cbw_refresh_filename(BlockDriverState *bs)
-- 
2.34.1

[PULL 5/6] qapi: blockdev-backup: add discard-source parameter

Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]  [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   ||
   | root   | file
   vv
[copy-before-write]
   | |
   | file| target
   v v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Acked-by: Markus Armbruster 
Message-Id: <20240313152822.626493-5-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c |  5 +++--
 block/block-copy.c |  9 +
 block/copy-before-write.c  | 15 +--
 block/copy-before-write.h  |  1 +
 block/replication.c|  4 ++--
 blockdev.c |  2 +-
 include/block/block-common.h   |  2 ++
 include/block/block-copy.h |  1 +
 include/block/block_int-global-state.h |  2 +-
 qapi/block-core.json   |  4 
 10 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index ec29d6b810..3dd2e229d2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -356,7 +356,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BlockDriverState *target, int64_t speed,
   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
   BitmapSyncMode bitmap_mode,
-  bool compress,
+  bool compress, bool discard_source,
   const char *filter_node_name,
   BackupPerf *perf,
   BlockdevOnError on_source_error,
@@ -457,7 +457,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 goto error;
 }
 
-cbw = bdrv_cbw_append(bs, target, filter_node_name, &bcs, errp);
+cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
+  &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/block-copy.c b/block/block-copy.c
index 8fca2c3698..7e3b378528 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@ typedef struct BlockCopyState {
 CoMutex lock;
 int64_t in_flight_bytes;
 BlockCopyMethod method;
+bool discard_source;
 BlockReqList reqs;
 QLIST_HEAD(, BlockCopyCallState) calls;
 /*
@@ -353,6 +354,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
+ bool discard_source,
  Error **errp)
 {
 ERRP_GUARD();
@@ -418,6 +420,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 cluster_size),
 };
 
+s->discard_source = discard_source;
 block_copy_set_copy_opts(s, false, false);
 
 ratelimit_init(&s->rate_limit);
@@ -589,6 +592,12 @@ static coroutine_fn int block_copy_task_entry(AioTask 
*task)
 co_put_to_shres(s->mem, t->req.bytes);
 block_copy_task_end(t, ret);
 
+if (s->discard_source && ret == 0) {
+int64_t nbytes =
+MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+}
+
 return ret;
 }
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index ed2c228da7..cd65524e26 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -44,6 +44,7 @@ typedef struct BDRVCopyBeforeWriteState {
 BdrvChild *target;
 OnCbwError on_cbw_error;
 uint32_t cbw_timeout_ns;
+bool discard_source;
 
 /*
  * @lock: protects access to @access_bitmap, @done_bitmap and
@@ -357,6 +358,8 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, 
BdrvChildRole role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
 if (!(role & BDRV_CHILD_FILTERED)) {
 /*
  * Target child
@@ -381,6 +384,10 @@ cbw_child_perm(BlockDriverState *bs, BdrvChild *c, 
BdrvChildRole role,
  * start

[PULL 4/6] block/copy-before-write: create block_copy bitmap in filter node

Currently block_copy creates copy_bitmap in source node. But that is in
bad relation with .independent_close=true of copy-before-write filter:
source node may be detached and removed before .bdrv_close() handler
called, which should call block_copy_state_free(), which in turn should
remove copy_bitmap.

That's all not ideal: it would be better if internal bitmap of
block-copy object is not attached to any node. But that is not possible
now.

The simplest solution is just create copy_bitmap in filter node, where
anyway two other bitmaps are created.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 
Message-Id: <20240313152822.626493-4-vsement...@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c |   3 +-
 block/copy-before-write.c  |   2 +-
 include/block/block-copy.h |   1 +
 tests/qemu-iotests/257.out | 112 ++---
 4 files changed, 60 insertions(+), 58 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 9ee3dd7ef5..8fca2c3698 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -351,6 +351,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp)
 {
@@ -367,7 +368,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 return NULL;
 }
 
-copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
+copy_bitmap = bdrv_create_dirty_bitmap(copy_bitmap_bs, cluster_size, NULL,
errp);
 if (!copy_bitmap) {
 return NULL;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 6d89af0b29..ed2c228da7 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -468,7 +468,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, bs, bitmap, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 0700953ab8..8b41643bfa 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,6 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
+ BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  Error **errp);
 
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index aa76131ca9..c33dd7f3a9 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -120,16 +120,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -596,16 +596,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -865,16 +865,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,
@@ -1341,16 +1341,16 @@ write -P0x67 0x3fe 0x2
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  }
-],
-"drive0": [
+  },
   {
 "busy": false,
 "count": 0,
 "granularity": 65536,
 "persistent": false,
 "recording": false
-  },
+  }
+],
+"drive0": [
   {
 "busy": false,
 "count": 458752,

Re: [PULL 0/6] Block jobs patches for 2024-04-29


Sorry for too much CC-ing, I've mistakenly added 
--cc-cmd=./scripts/get_maintainer.pl


On 29.04.24 14:51, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

   Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

   https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29

for you to fetch changes up to 2ca7608c6b8d57fd6347b11af12a0f035263efef:

   iotests: add backup-discard-source (2024-04-29 13:35:30 +0300)


Block jobs patches for 2024-04-29

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Alexander Ivanov (1):
   blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
   block/copy-before-write: fix permission
   block/copy-before-write: support unligned snapshot-discard
   block/copy-before-write: create block_copy bitmap in filter node
   qapi: blockdev-backup: add discard-source parameter
   iotests: add backup-discard-source

  block/backup.c |   5 +-
  block/block-copy.c |  12 +-
  block/copy-before-write.c  |  39 +--
  block/copy-before-write.h  |   1 +
  block/mirror.c |  11 +-
  block/replication.c|   4 +-
  blockdev.c |   2 +-
  include/block/block-common.h   |   2 +
  include/block/block-copy.h |   2 +
  include/block/block_int-global-state.h |   2 +-
  qapi/block-core.json   |   4 +
  tests/qemu-iotests/257.out | 112 +-
  tests/qemu-iotests/tests/backup-discard-source | 152 
+
  tests/qemu-iotests/tests/backup-discard-source.out |   5 +
  14 files changed, 281 insertions(+), 72 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/backup-discard-source
  create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

Alexander Ivanov (1):
   blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (5):
   block/copy-before-write: fix permission
   block/copy-before-write: support unligned snapshot-discard
   block/copy-before-write: create block_copy bitmap in filter node
   qapi: blockdev-backup: add discard-source parameter
   iotests: add backup-discard-source

  block/backup.c|   5 +-
  block/block-copy.c|  12 +-
  block/copy-before-write.c |  39 -
  block/copy-before-write.h |   1 +
  block/mirror.c|  11 +-
  block/replication.c   |   4 +-
  blockdev.c|   2 +-
  include/block/block-common.h  |   2 +
  include/block/block-copy.h|   2 +
  include/block/block_int-global-state.h|   2 +-
  qapi/block-core.json  |   4 +
  tests/qemu-iotests/257.out| 112 ++---
  .../qemu-iotests/tests/backup-discard-source  | 152 ++
  .../tests/backup-discard-source.out   |   5 +
  14 files changed, 281 insertions(+), 72 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/backup-discard-source
  create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out



--
Best regards,
Vladimir

Re: [PATCH v2 00/12] exec: Rework around CPUState user fields


On 28/4/24 23:49, Philippe Mathieu-Daudé wrote:


Philippe Mathieu-Daudé (12):
   plugins: Update stale comment
   plugins/api: Only include 'exec/ram_addr.h' with system emulation
   exec: Include missing license in 'exec/cpu-common.h'
   exec/cpu: Indent TARGET_PAGE_foo definitions
   exec/cpu: Remove obsolete PAGE_RESERVED definition
   exec/cpu: Remove duplicated PAGE_PASSTHROUGH definition
   exec/cpu: Extract page-protection definitions to page-protection.h
   accel/tcg: Use cpu_loop_exit_requested() in cpu_loop_exec_tb()
   accel/tcg: Restrict cpu_loop_exit_requested() to TCG
   accel/tcg: Remove pointless initialization of cflags_next_tb
   accel/tcg: Reset TCG specific fields in tcg_cpu_reset_hold()
   accel/tcg: Access tcg_cflags with getter / setter


Thanks, series queued.

Re: [PULL 6/9] backends/cryptodev-builtin: Fix local_error leaks


On 4/28/24 23:45, Michael Tokarev wrote:

From: Li Zhijian via 


Please fix the author on this patch.

r~



It seems that this error does not need to be propagated to the upper,
directly output the error to avoid the leaks

Closes: https://gitlab.com/qemu-project/qemu/-/issues/2283
Fixes: 2fda101de07 ("virtio-crypto: Support asynchronous mode")
Signed-off-by: Li Zhijian 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: zhenwei pi 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
  backends/cryptodev-builtin.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c
index a514bbb310..940104ee55 100644
--- a/backends/cryptodev-builtin.c
+++ b/backends/cryptodev-builtin.c
@@ -23,6 +23,7 @@
  
  #include "qemu/osdep.h"

  #include "sysemu/cryptodev.h"
+#include "qemu/error-report.h"
  #include "qapi/error.h"
  #include "standard-headers/linux/virtio_crypto.h"
  #include "crypto/cipher.h"
@@ -396,8 +397,8 @@ static int cryptodev_builtin_create_session(
  case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
  case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
  default:
-error_setg(&local_error, "Unsupported opcode :%" PRIu32 "",
-   sess_info->op_code);
+error_report("Unsupported opcode :%" PRIu32 "",
+ sess_info->op_code);
  return -VIRTIO_CRYPTO_NOTSUPP;
  }
  
@@ -554,8 +555,8 @@ static int cryptodev_builtin_operation(
  
  if (op_info->session_id >= MAX_NUM_SESSIONS ||

builtin->sessions[op_info->session_id] == NULL) {
-error_setg(&local_error, "Cannot find a valid session id: %" PRIu64 "",
-   op_info->session_id);
+error_report("Cannot find a valid session id: %" PRIu64 "",
+ op_info->session_id);
  return -VIRTIO_CRYPTO_INVSESS;
  }

Re: [PATCH v6 06/10] util/bufferiszero: Improve scalar variant


On 25/4/24 00:57, Richard Henderson wrote:

Split less-than and greater-than 256 cases.
Use unaligned accesses for head and tail.
Avoid using out-of-bounds pointers in loop boundary conditions.

Signed-off-by: Richard Henderson 
---
  util/bufferiszero.c | 85 +++--
  1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 02df82b4ff..c9a7ded016 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -28,40 +28,57 @@
  
  static bool (*buffer_is_zero_accel)(const void *, size_t);
  
-static bool buffer_is_zero_integer(const void *buf, size_t len)

+static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
  {
-if (unlikely(len < 8)) {
-/* For a very small buffer, simply accumulate all the bytes.  */
-const unsigned char *p = buf;
-const unsigned char *e = buf + len;
-unsigned char t = 0;
+uint64_t t;
+const uint64_t *p, *e;
  
-do {

-t |= *p++;
-} while (p < e);
-
-return t == 0;
-} else {
-/* Otherwise, use the unaligned memory access functions to
-   handle the beginning and end of the buffer, with a couple
-   of loops handling the middle aligned section.  */
-uint64_t t = ldq_he_p(buf);
-const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8);
-const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
-
-for (; p + 8 <= e; p += 8) {
-if (t) {
-return false;
-}
-t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7];
-}
-while (p < e) {
-t |= *p++;
-}
-t |= ldq_he_p(buf + len - 8);
-
-return t == 0;
+/*
+ * Use unaligned memory access functions to handle
+ * the beginning and end of the buffer.
+ */
+if (unlikely(len <= 8)) {
+return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0;
  }
+
+t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);


Here we read #0 and #31, ...


+p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+/* Read 0 to 31 aligned words from the middle. */


... so here is #1 to #30?


+while (p < e) {
+t |= *p++;
+}
+return t == 0;
+}


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 4/5] qapi: introduce device-sync-config


On 29.04.24 13:51, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


On 24.04.24 14:48, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


Add command to sync config from vhost-user backend to the device. It
may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
triggered interrupt to the guest or just not available (not supported
by vhost-user server).

Command result is racy if allow it during migration. Let's allow the
sync only in RUNNING state.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


[...]


diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d243c4..296af52322 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -5,6 +5,7 @@
   #include "qemu/notify.h"
   
   bool runstate_check(RunState state);

+const char *current_run_state_str(void);
   void runstate_set(RunState new_state);
   RunState runstate_get(void);
   bool runstate_is_running(void);
diff --git a/qapi/qdev.json b/qapi/qdev.json
index facaa0bc6a..e8be79c3d5 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -161,3 +161,24 @@
   ##
   { 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
 'data': { '*device': 'str', 'path': 'str' } }
+
+##
+# @device-sync-config:
+#
+# Synchronize config from backend to the guest. The command notifies
+# re-read the device config from the backend and notifies the guest
+# to re-read the config. The command may be used to notify the guest
+# about block device capcity change. Currently only vhost-user-blk
+# device supports this.


I'm not sure I understand this.  To work towards an understanding, I
rephrase it, and you point out the errors.

   Synchronize device configuration from host to guest part.  First,
   copy the configuration from the host part (backend) to the guest
   part (frontend).  Then notify guest software that device
   configuration changed.


Correct, thanks


Perhaps

   Synchronize guest-visible device configuration with the backend's
   configuration, and notify guest software that device configuration
   changed.

   This may be useful to notify the guest of a block device capacity
   change.  Currenrly, only vhost-user-blk devices support this.


Sounds good



Next question: what happens when the device *doesn't* support this?


An error "device-sync-config is not supported ..."




I wonder how configuration can get out of sync.  Can you explain?



The example (and the original feature, which triggered developing this) is 
vhost disk resize. If vhost-server (backend) doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that disk 
capacity changed.


Sounds like we wouldn't need this command if we could make the
vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG.  Is making it
support it impractical?  Or are there other uses for this command?


Qemu's internal vhost-server do support it. But that's not the only vhost-user server) So 
the command is useful for those servers which doesn't support 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires setting up additional 
channel of server -> client communication. That was the reason, why the 
"change-msg" solution was rejected in our downstream: it's safer to reuse existing 
channel (QMP), than to add and support an additional channel.

Also, the command may help to debug the system, when 
VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason.




+#
+# @id: the device's ID or QOM path
+#
+# Features:
+#
+# @unstable: The command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'device-sync-config',
+  'features': [ 'unstable' ],
+  'data': {'id': 'str'} }
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 7e075d91c1..cb35ea0b86 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -23,6 +23,7 @@
  #include "monitor/monitor.h"
  #include "monitor/qdev.h"
  #include "sysemu/arch_init.h"
+#include "sysemu/runstate.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-qdev.h"
  #include "qapi/qmp/dispatch.h"
@@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp)
   }
   }
   
+int qdev_sync_config(DeviceState *dev, Error **errp)

+{
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->sync_config) {
+error_setg(errp, "device-sync-config is not supported for '%s'",
+   object_get_typename(OBJECT(dev)));
+return -ENOTSUP;
+}
+
+return dc->sync_config(dev, errp);
+}
+
+void qmp_device_sync_config(const char *id, Error **errp)
+{
+DeviceState *dev;
+
+/*
+ * During migration there is a race between syncing`config and
+ * migrating it, so let's just not allow it.


Can you briefly explain the race?


If at the moment of qmp command, corresponding config already migrated to the 
target, we'll change only the config on source, but on the target we'll still 
have outdated config.


For RAM, dirty tracking ensures the change gets sent.  But this is
device mem

Re: [PATCH v4 1/1] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine


On 4/28/24 23:35, Marcin Juszkiewicz wrote:

W dniu 26.04.2024 o 18:06, Richard Henderson pisze:


Isn't this basically what MPIDR_EL1 is supposed to indicate?
We do not yet implement all of that in QEMU, but should.


QEMU has socket/cluster/core/thread model which could map to
aff3/aff2/aff1/aff0 (or aff0/1/2/3) of MPIDR_EL1 register, right? But it does 
not.


Yes, I know, but there's patches on list that started to deal with MPIDR.SMT,

  
https://lore.kernel.org/qemu-devel/20240419183135.12276-1-dorjoychy...@gmail.com/

and I suggested that we go whole hog and support all of the -smp options.


r~

Re: [PATCH 0/2] accel: Fix NULL deref in NVMM / WHPX vCPU init


On 4/29/24 02:19, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (2):
   accel/whpx: Fix NULL dereference in whpx_init_vcpu()
   accel/nvmm: Fix NULL dereference in nvmm_init_vcpu()


Reviewed-by: Richard Henderson 

r~

[PULL v2 0/9] Trivial patches for 2024-04-29

2024-04-29 Thread Michael Tokarev

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

  Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

  https://gitlab.com/mjt0k/qemu.git tags/pull-trivial-patches

for you to fetch changes up to ce1992d45c875c29a9018b7ac2fa9bad6587c711:

  checkpatch.pl: forbid strerrorname_np() (2024-04-29 15:26:56 +0300)


trivial patches for 2024-04-29

v2: fix author of "backends/cryptodev-builtin: Fix local_error leaks"
(which should be catched now by checkpatch.pl additions in this series)
Only patch 6 is being resent.

Daniel Henrique Barboza (2):
  target/riscv/kvm: remove sneaky strerrorname_np() instance
  checkpatch.pl: forbid strerrorname_np()

Li Zhijian (1):
  backends/cryptodev-builtin: Fix local_error leaks

Michael Tokarev (1):
  target/loongarch/cpu.c: typo fix: expection

Philippe Mathieu-Daudé (2):
  scripts/checkpatch: Avoid author email mangled by qemu-*@nongnu.org
  scripts/checkpatch: Do not use mailmap

Thomas Huth (3):
  target/i386/cpu: Remove "x86" prefix from the CPU list
  target/s390x/cpu_models: Rework the output of "-cpu help"
  target/ppc/cpu_init: Remove "PowerPC" prefix from the CPU list

 backends/cryptodev-builtin.c |  9 +
 scripts/checkpatch.pl| 11 +++
 target/i386/cpu.c|  2 +-
 target/loongarch/cpu.c   |  2 +-
 target/ppc/cpu_init.c|  9 +
 target/riscv/kvm/kvm-cpu.c   |  4 ++--
 target/s390x/cpu_models.c|  9 +
 7 files changed, 26 insertions(+), 20 deletions(-)

[PULL v2 6/9] backends/cryptodev-builtin: Fix local_error leaks

2024-04-29 Thread Michael Tokarev

From: Li Zhijian 

It seems that this error does not need to be propagated to the upper,
directly output the error to avoid the leaks

Closes: https://gitlab.com/qemu-project/qemu/-/issues/2283
Fixes: 2fda101de07 ("virtio-crypto: Support asynchronous mode")
Signed-off-by: Li Zhijian 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: zhenwei pi 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 backends/cryptodev-builtin.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/backends/cryptodev-builtin.c b/backends/cryptodev-builtin.c
index a514bbb310..940104ee55 100644
--- a/backends/cryptodev-builtin.c
+++ b/backends/cryptodev-builtin.c
@@ -23,6 +23,7 @@
 
 #include "qemu/osdep.h"
 #include "sysemu/cryptodev.h"
+#include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "standard-headers/linux/virtio_crypto.h"
 #include "crypto/cipher.h"
@@ -396,8 +397,8 @@ static int cryptodev_builtin_create_session(
 case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
 case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
 default:
-error_setg(&local_error, "Unsupported opcode :%" PRIu32 "",
-   sess_info->op_code);
+error_report("Unsupported opcode :%" PRIu32 "",
+ sess_info->op_code);
 return -VIRTIO_CRYPTO_NOTSUPP;
 }
 
@@ -554,8 +555,8 @@ static int cryptodev_builtin_operation(
 
 if (op_info->session_id >= MAX_NUM_SESSIONS ||
   builtin->sessions[op_info->session_id] == NULL) {
-error_setg(&local_error, "Cannot find a valid session id: %" PRIu64 "",
-   op_info->session_id);
+error_report("Cannot find a valid session id: %" PRIu64 "",
+ op_info->session_id);
 return -VIRTIO_CRYPTO_INVSESS;
 }
 
-- 
2.39.2

Re: [PATCH v6 06/10] util/bufferiszero: Improve scalar variant


On 4/29/24 05:18, Philippe Mathieu-Daudé wrote:

+
+    t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);


Here we read #0 and #31, ...


+    p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+    e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+    /* Read 0 to 31 aligned words from the middle. */


... so here is #1 to #30?


Not indexes, but count.  There may be zero words remaining in the middle, etc.


r~

Re: [PATCH v3] vfio/pci: migration: Skip config space check for Vendor Specific Information in VSC during restore/load

2024-04-29 Thread Cédric Le Goater


Hello Vinayak,

On 3/28/24 10:30, Cédric Le Goater wrote:

On 3/27/24 21:52, Alex Williamson wrote:

On Wed, 27 Mar 2024 16:11:37 -0400
"Michael S. Tsirkin"  wrote:


On Wed, Mar 27, 2024 at 11:39:15AM -0600, Alex Williamson wrote:

On Fri, 22 Mar 2024 12:12:10 +0530
Vinayak Kale  wrote:

In case of migration, during restore operation, qemu checks config space of the
pci device with the config space in the migration stream captured during save
operation. In case of config space data mismatch, restore operation is failed.

config space check is done in function get_pci_config_device(). By default VSC
(vendor-specific-capability) in config space is checked.

Due to qemu's config space check for VSC, live migration is broken across NVIDIA
vGPU devices in situation where source and destination host driver is different.
In this situation, Vendor Specific Information in VSC varies on the destination
to ensure vGPU feature capabilities exposed to the guest driver are compatible
with destination host.

If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with
volatile Vendor Specific Info in VSC then qemu should exempt config space check
for Vendor Specific Info. It is vendor driver's responsibility to ensure that
VSC is consistent across migration. Here consistency could mean that VSC format
should be same on source and destination, however actual Vendor Specific Info
may not be byte-to-byte identical.

This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI
device by clearing pdev->cmask[] offsets. Config space check is still enforced
for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips
config space check for that offset.

Signed-off-by: Vinayak Kale 
---
Version History
v2->v3:
 - Config space check skipped only for Vendor Specific Info in VSC, check is
   still enforced for 3 byte VSC header.
 - Updated commit description with live migration failure scenario.
v1->v2:
 - Limited scope of change to vfio-pci devices instead of all pci devices.

  hw/vfio/pci.c | 24 
  1 file changed, 24 insertions(+)



Acked-by: Alex Williamson 



A very reasonable way to do it.

Reviewed-by: Michael S. Tsirkin 

Merge through the VFIO tree I presume?


Yep, Cédric said he´d grab it for 9.1.  Thanks,


Could you please resend an update of this change adding a machine
compatibility property for migration ?

Thanks,

C.

Re: [PATCH v6 09/10] util/bufferiszero: Add simd acceleration for aarch64


On 25/4/24 00:57, Richard Henderson wrote:

Because non-embedded aarch64 is expected to have AdvSIMD enabled, merely
double-check with the compiler flags for __ARM_NEON and don't bother with
a runtime check.  Otherwise, model the loop after the x86 SSE2 function.

Use UMAXV for the vector reduction.  This is 3 cycles on cortex-a76 and
2 cycles on neoverse-n1.

Signed-off-by: Richard Henderson 
---
  util/bufferiszero.c | 77 +
  1 file changed, 77 insertions(+)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index ff003dc40e..38477a3eac 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -213,7 +213,84 @@ bool test_buffer_is_zero_next_accel(void)
  }
  return false;
  }
+
+#elif defined(__aarch64__) && defined(__ARM_NEON)
+#include 
+


Maybe use the same SSE comment:

/*
 * Helper for preventing the compiler from reassociating
 * chains of binary vector operations.
 */


+#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1))




+static unsigned accel_index = 1;
+#define INIT_ACCEL buffer_is_zero_simd
+
+bool test_buffer_is_zero_next_accel(void)
+{
+if (accel_index != 0) {
+buffer_is_zero_accel = accel_table[--accel_index];
+return true;
+}
+return false;
+}


Alternatively we could initialize accel_index in
__attribute__((constructor)) init_accel(void) and keep
a single test_buffer_is_zero_next_accel(), squashing:

-- >8 --
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 38477a3eac..afb3822251 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -82,6 +82,17 @@ static bool buffer_is_zero_int_ge256(const void *buf, 
size_t len)

 return t == 0;
 }

+static unsigned accel_index;
+
+bool test_buffer_is_zero_next_accel(void)
+{
+if (accel_index != 0) {
+buffer_is_zero_accel = accel_table[--accel_index];
+return true;
+}
+return false;
+}
+
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include 

@@ -186,7 +197,6 @@ static biz_accel_fn const accel_table[] = {
 buffer_zero_avx2,
 #endif
 };
-static unsigned accel_index;

 static void __attribute__((constructor)) init_accel(void)
 {
@@ -205,15 +215,6 @@ static void __attribute__((constructor)) 
init_accel(void)


 #define INIT_ACCEL NULL

-bool test_buffer_is_zero_next_accel(void)
-{
-if (accel_index != 0) {
-buffer_is_zero_accel = accel_table[--accel_index];
-return true;
-}
-return false;
-}
-
 #elif defined(__aarch64__) && defined(__ARM_NEON)
 #include 

@@ -277,25 +278,15 @@ static biz_accel_fn const accel_table[] = {
 buffer_is_zero_simd,
 };

-static unsigned accel_index = 1;
 #define INIT_ACCEL buffer_is_zero_simd

-bool test_buffer_is_zero_next_accel(void)
+static void __attribute__((constructor)) init_accel(void)
 {
-if (accel_index != 0) {
-buffer_is_zero_accel = accel_table[--accel_index];
-return true;
-}
-return false;
+accel_index = 1;
 }

 #else

-bool test_buffer_is_zero_next_accel(void)
-{
-return false;
-}
-
 #define INIT_ACCEL buffer_is_zero_int_ge256
 #endif

---

Or clearer in 2 patches, unifying test_buffer_is_zero_next_accel()
first:

-- >8 --
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index ff003dc40e..b4da9d5297 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -82,6 +82,17 @@ static bool buffer_is_zero_int_ge256(const void *buf, 
size_t len)

 return t == 0;
 }

+static unsigned accel_index;
+
+bool test_buffer_is_zero_next_accel(void)
+{
+if (accel_index != 0) {
+buffer_is_zero_accel = accel_table[--accel_index];
+return true;
+}
+return false;
+}
+
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include 

@@ -186,7 +197,6 @@ static biz_accel_fn const accel_table[] = {
 buffer_zero_avx2,
 #endif
 };
-static unsigned accel_index;

 static void __attribute__((constructor)) init_accel(void)
 {
@@ -205,19 +215,7 @@ static void __attribute__((constructor)) 
init_accel(void)


 #define INIT_ACCEL NULL

-bool test_buffer_is_zero_next_accel(void)
-{
-if (accel_index != 0) {
-buffer_is_zero_accel = accel_table[--accel_index];
-return true;
-}
-return false;
-}
 #else
-bool test_buffer_is_zero_next_accel(void)
-{
-return false;
-}

 #define INIT_ACCEL buffer_is_zero_int_ge256
 #endif

---

Then this patch becomes:

-- >8 --
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index b4da9d5297..afb3822251 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -215,6 +215,76 @@ static void __attribute__((constructor)) 
init_accel(void)


 #define INIT_ACCEL NULL

+#elif defined(__aarch64__) && defined(__ARM_NEON)
+#include 
+
+#define REASSOC_BARRIER(vec0, vec1) asm("" : "+w"(vec0), "+w"(vec1))
+
+static bool buffer_is_zero_simd(const void *buf, size_t len)
+{
+uint32x4_t t0, t1, t2, t3;
+
+/* Align head/tail to 16-byte boundaries.  */
+const uint32x4_t *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16

Re: [PATCH 1/3] target/ppc: Move VMX storage access instructions to decodetree


On 4/28/24 22:13, Chinmay Rath wrote:

+static bool trans_LVX(DisasContext *ctx, arg_X *a)
+{
+TCGv EA;
+TCGv_i64 avr;
+REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
+REQUIRE_VECTOR(ctx);
+gen_set_access_type(ctx, ACCESS_INT);
+avr = tcg_temp_new_i64();
+EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
+tcg_gen_andi_tl(EA, EA, ~0xf);
+/*
+ * We only need to swap high and low halves. gen_qemu_ld64_i64
+ * does necessary 64-bit byteswap already.
+ */
+gen_qemu_ld64_i64(ctx, avr, EA);
+set_avr64(a->rt, avr, !ctx->le_mode);
+tcg_gen_addi_tl(EA, EA, 8);
+gen_qemu_ld64_i64(ctx, avr, EA);
+set_avr64(a->rt, avr, ctx->le_mode);
+return true;
+}


This is an accurate transcription of the current code, so,

Reviewed-by: Richard Henderson 

but at some point this should use tcg_gen_qemu_ld_i128, with the proper 
atomicity flags.


r~

Re: [PATCH 2/3] target/ppc: Move VMX integer logical instructions to decodetree.


On 4/28/24 22:13, Chinmay Rath wrote:

Moving the following instructions to decodetree specification:

v{and, andc, nand, or, orc, nor, xor, eqv}  : VX-form

The changes were verified by validating that the tcp ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Signed-off-by: Chinmay Rath
---
  target/ppc/insn32.decode| 11 +++
  target/ppc/translate/vmx-impl.c.inc | 22 ++
  target/ppc/translate/vmx-ops.c.inc  | 15 ---
  3 files changed, 21 insertions(+), 27 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 3/3] target/ppc: Move VMX integer max/min instructions to decodetree.


On 4/28/24 22:13, Chinmay Rath wrote:

Moving the following instructions to decodetree specification :

v{max, min}{u, s}{b, h, w, d}   : VX-form

The changes were verified by validating that the tcg ops generated by those
instructions remain the same, which were captured with the '-d in_asm,op' flag.

Signed-off-by: Chinmay Rath
---
  target/ppc/insn32.decode| 22 +
  target/ppc/translate/vmx-impl.c.inc | 37 -
  target/ppc/translate/vmx-ops.c.inc  | 16 -
  3 files changed, 43 insertions(+), 32 deletions(-)


Reviewed-by: Richard Henderson 

r~

[RFC] vhost-blk: add a vhost_blk implementation

2024-04-29 Thread inspurisinspur

From: Rock Li 

vhost-blk could accelerate io for high-performance devices such as NVME,
or high end SAN backend. The main benefit is that syscall costs are eliminated
because vhost-blk could bypass the qemu io processing. This patch works along
with vhost-blk implementation in kernel.

Signed-off-by: lihongweizz 
---
 hw/block/virtio-blk.c  | 406 -
 include/hw/virtio/virtio-blk.h |  20 ++
 2 files changed, 425 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index bb86e65f65..749e6cde48 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -36,6 +36,27 @@
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-blk-common.h"
 #include "qemu/coroutine.h"
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-backend.h"
+#include 
+#include 
+
+static const int kernel_feature_bits[] = {
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_F_VERSION_1,
+VIRTIO_BLK_F_SEG_MAX,
+VIRTIO_BLK_F_GEOMETRY,
+VIRTIO_BLK_F_TOPOLOGY,
+VIRTIO_BLK_F_BLK_SIZE,
+VIRTIO_BLK_F_MQ,
+VIRTIO_BLK_F_RO,
+/* VIRTIO_BLK_F_FLUSH == VIRTIO_BLK_F_WCE  */
+VIRTIO_BLK_F_FLUSH,
+VHOST_INVALID_FEATURE_BIT
+};
 
 static void virtio_blk_ioeventfd_attach(VirtIOBlock *s);
 
@@ -1145,7 +1166,8 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIOBlock *s = (VirtIOBlock *)vdev;
 
-if (!s->ioeventfd_disabled && !s->ioeventfd_started) {
+if ((!s->ioeventfd_disabled && !s->ioeventfd_started) ||
+s->vhost_enabled) {
 /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
  * ioeventfd here instead of waiting for .set_status().
  */
@@ -1236,11 +1258,41 @@ static void virtio_blk_dma_restart_cb(void *opaque, 
bool running,
 }
 }
 
+static void vhost_blk_set_status(VirtIODevice *vdev, uint8_t status);
+
+static void vhost_blk_reset(VirtIODevice *vdev)
+{
+VirtIOBlock *s = VIRTIO_BLK(vdev);
+int i;
+
+if (!s->vhost_enabled || !s->vhost_acked) {
+return;
+}
+
+/*
+ * clear the acked features, and wait for the
+ * next negotiation.
+ */
+for (i = 0; i < s->conf.num_queues; i++) {
+s->vhblk[i].dev.acked_features = 0;
+}
+
+vhost_blk_set_status(vdev, 0);
+
+s->vhost_acked = false;
+}
+
 static void virtio_blk_reset(VirtIODevice *vdev)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 VirtIOBlockReq *req;
 
+if (s->vhost_enabled) {
+vhost_blk_reset(vdev);
+blk_set_enable_write_cache(s->blk, s->original_wce);
+return;
+}
+
 /* Dataplane has stopped... */
 assert(!s->ioeventfd_started);
 
@@ -1367,6 +1419,228 @@ static void virtio_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
 blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
 }
 
+static void vhost_blk_stop_one(VirtIODevice *vdev, unsigned int idx)
+{
+VirtIOBlock *s = VIRTIO_BLK(vdev);
+struct vhost_vring_file backend = { .index = 0, .fd = -1 };
+vhost_blk *blk = &s->vhblk[idx];
+int ret;
+
+if (!blk->dev.started) {
+return;
+}
+
+ret = ioctl(blk->vhostfd, VHOST_BLK_SET_BACKEND, &backend);
+assert(ret >= 0);
+
+vhost_dev_stop(&blk->dev, vdev);
+vhost_dev_disable_notifiers(&blk->dev, vdev);
+}
+
+static void vhost_blk_start_one(VirtIODevice *vdev, unsigned int idx)
+{
+VirtIOBlock *s = VIRTIO_BLK(vdev);
+struct vhost_vring_file backend = { .index = 0 };
+vhost_blk *blk = &s->vhblk[idx];
+int ret;
+
+blk->dev.nvqs = 1;
+blk->dev.vqs = blk->vqs;
+
+ret = vhost_dev_enable_notifiers(&blk->dev, vdev);
+if (ret < 0) {
+error_report("Error enabling host notifiers: %d", -ret);
+abort();
+}
+
+ret = vhost_dev_start(&blk->dev, vdev);
+if (ret < 0) {
+error_report("Error starting vhost: %d", -ret);
+abort();
+}
+
+/* set backend file  */
+backend.fd = s->blkfd;
+ret = ioctl(blk->vhostfd, VHOST_BLK_SET_BACKEND, &backend);
+if (ret < 0) {
+error_report("Error setting up vq %d backend(%d): fd %d, idx %d",
+ idx, -errno, backend.fd, backend.index);
+abort();
+}
+}
+
+static void vhost_blk_start(VirtIODevice *vdev, uint32_t total_queues)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+int i, ret;
+
+if (!k->set_guest_notifiers) {
+error_report("Binding does not support guest notifiers");
+abort();
+}
+
+ret = k->set_guest_notifiers(qbus->parent, total_queues, true);
+if (ret < 0) {
+error_report("Error binding guest notifier: %d", -ret);
+abort();
+}
+
+for (i = 0; i < total_queues; i++) {
+vhost_blk_start_one(vdev, i);
+}
+}
+
+static void vhost_blk_stop(VirtIODevic

Re: [PATCH v3 4/5] qapi: introduce device-sync-config

Vladimir Sementsov-Ogievskiy  writes:

> On 29.04.24 13:51, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> On 24.04.24 14:48, Markus Armbruster wrote:
 Vladimir Sementsov-Ogievskiy  writes:

> Add command to sync config from vhost-user backend to the device. It
> may be helpful when VHOST_USER_SLAVE_CONFIG_CHANGE_MSG failed or not
> triggered interrupt to the guest or just not available (not supported
> by vhost-user server).
>
> Command result is racy if allow it during migration. Let's allow the
> sync only in RUNNING state.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> 
>> [...]
>> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index 0117d243c4..296af52322 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -5,6 +5,7 @@
>#include "qemu/notify.h"
>
>bool runstate_check(RunState state);
> +const char *current_run_state_str(void);
>void runstate_set(RunState new_state);
>RunState runstate_get(void);
>bool runstate_is_running(void);
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index facaa0bc6a..e8be79c3d5 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -161,3 +161,24 @@
>##
>{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
>  'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @device-sync-config:
> +#
> +# Synchronize config from backend to the guest. The command notifies
> +# re-read the device config from the backend and notifies the guest
> +# to re-read the config. The command may be used to notify the guest
> +# about block device capcity change. Currently only vhost-user-blk
> +# device supports this.

 I'm not sure I understand this.  To work towards an understanding, I
 rephrase it, and you point out the errors.

Synchronize device configuration from host to guest part.  First,
copy the configuration from the host part (backend) to the guest
part (frontend).  Then notify guest software that device
configuration changed.
>>>
>>> Correct, thanks
>> 
>> Perhaps
>> 
>>Synchronize guest-visible device configuration with the backend's
>>configuration, and notify guest software that device configuration
>>changed.
>> 
>>This may be useful to notify the guest of a block device capacity
>>change.  Currenrly, only vhost-user-blk devices support this.
>
> Sounds good

Except I fat-fingered "Currently".

>> 
>> Next question: what happens when the device *doesn't* support this?
>
> An error "device-sync-config is not supported ..."

Okay.

 I wonder how configuration can get out of sync.  Can you explain?

>>>
>>> The example (and the original feature, which triggered developing this) is 
>>> vhost disk resize. If vhost-server (backend) doesn't support 
>>> VHOST_USER_SLAVE_CONFIG_CHANGE_MSG, neither QEMU nor guest will know that 
>>> disk capacity changed.
>> 
>> Sounds like we wouldn't need this command if we could make the
>> vhost-server support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG.  Is making it
>> support it impractical?  Or are there other uses for this command?
>
> Qemu's internal vhost-server do support it. But that's not the only 
> vhost-user server) So the command is useful for those servers which doesn't 
> support VHOST_USER_SLAVE_CONFIG_CHANGE_MSG. Note, that this message requires 
> setting up additional channel of server -> client communication. That was the 
> reason, why the "change-msg" solution was rejected in our downstream: it's 
> safer to reuse existing channel (QMP), than to add and support an additional 
> channel.
>
> Also, the command may help to debug the system, when 
> VHOST_USER_SLAVE_CONFIG_CHANGE_MSG doesn't work for some reason.

Suggest to work this into the commit message.

> +#
> +# @id: the device's ID or QOM path
> +#
> +# Features:
> +#
> +# @unstable: The command is experimental.
> +#
> +# Since: 9.1
> +##
> +{ 'command': 'device-sync-config',
> +  'features': [ 'unstable' ],
> +  'data': {'id': 'str'} }
> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
> index 7e075d91c1..cb35ea0b86 100644
> --- a/system/qdev-monitor.c
> +++ b/system/qdev-monitor.c
> @@ -23,6 +23,7 @@
>   #include "monitor/monitor.h"
>   #include "monitor/qdev.h"
>   #include "sysemu/arch_init.h"
> +#include "sysemu/runstate.h"
>   #include "qapi/error.h"
>   #include "qapi/qapi-commands-qdev.h"
>   #include "qapi/qmp/dispatch.h"
> @@ -969,6 +970,52 @@ void qmp_device_del(const char *id, Error **errp)
>}
>}
>
> +int qdev_sync_config(DeviceState *dev, Error **errp)
> +{
> +DeviceClass *dc = DEVICE_GET_CLASS(dev);
> +
> +if (!dc->sync_config) {
> +

Re: [PATCH v3 4/4] qapi: introduce exit-on-error parameter for migrate-incoming

2024-04-29 Thread Fabiano Rosas

Vladimir Sementsov-Ogievskiy  writes:

> On 25.04.24 23:30, Fabiano Rosas wrote:
>>> @@ -797,13 +801,18 @@ fail:
>>> MIGRATION_STATUS_FAILED);
>>>   migration_incoming_state_destroy();
>>>   
>>> -if (migrate_has_error(s)) {
>>> -WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>> -error_report_err(s->error);
>>> +if (mis->exit_on_error) {
>>> +if (migrate_has_error(s)) {
>>> +WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>>> +error_report_err(s->error);
>> error_report_err(error_copy(s->error))
>> 
>> ...because later on you're reading from s->error at
>> fill_destination_migration_info.
>
> No, we immediately do exit() instead. That's just a preexisting behavior, 
> moved into "if (mis->exit_on_error)"

I meant later in the patch, not later in the execution. Can't
query-migrate be called during process_incoming_migration_co?

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-29 Thread Michael Galaxy

Hi All (and Peter),

My name is Michael Galaxy (formerly Hines). Yes, I changed my last name
(highly irregular for a male) and yes, that's my real last name:
https://www.linkedin.com/in/mrgalaxy/)

I'm the original author of the RDMA implementation. I've been discussing
with Yu Zhang for a little bit about potentially handing over
maintainership of the codebase to his team.

I simply have zero access to RoCE or Infiniband hardware at all,
unfortunately. so I've never been able to run tests or use what I wrote
at work, and as all of you know, if you don't have a way to test
something, then you can't maintain it.

Yu Zhang put a (very kind) proposal forward to me to ask the community
if they feel comfortable training his team to maintain the codebase (and
run tests) while they learn about it.

If you don't mind, I'd like to let him send over his (very detailed)
proposal,

- Michael

On 4/11/24 11:36, Yu Zhang wrote:

1) Either a CI test covering at least the major RDMA paths, or at least
periodically tests for each QEMU release will be needed.

We use a batch of regression test cases for the stack, which covers the
test for QEMU. I did such test for most of the QEMU releases planned as
candidates for rollout.

The migration test needs a pair of (either physical or virtual) servers with
InfiniBand network, which makes it difficult to do on a single server. The
nested VM could be a possible approach, for which we may need virtual
InfiniBand network. Is SoftRoCE [1] a choice? I will try it and let you know.

[1]
https://urldefense.com/v3/__https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWvGHE3ig$

Thanks and best regards!

On Thu, Apr 11, 2024 at 4:20 PM Peter Xu wrote:

On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:

On Wed, Apr 10, 2024 at 02:28:59AM +, Zhijian Li (Fujitsu) via wrote:

on 4/10/2024 3:46 AM, Peter Xu wrote:

Is there document/link about the unittest/CI for migration tests, Why
are those tests missing?
Is it hard or very special to set up an environment for that? maybe we
can help in this regards.

See tests/qtest/migration-test.c. We put most of our migration tests
there and that's covered in CI.

I think one major issue is CI systems don't normally have rdma devices.
Can rdma migration test be carried out without a real hardware?

Yeah, RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
$ sudo rdma link add rxe_eth0 type rxe netdev eth0 # on host
then we can get a new RDMA interface "rxe_eth0".
This new RDMA interface is able to do the QEMU RDMA migration.

Also, the loopback(lo) device is able to emulate the RDMA interface
"rxe_lo", however when
I tried(years ago) to do RDMA migration over this
interface(rdma:127.0.0.1:) , it got something wrong.
So i gave up enabling the RDMA migration qtest at that time.

Thanks, Zhijian.

I'm not sure adding an emu-link for rdma is doable for CI systems, though.
Maybe someone more familiar with how CI works can chim in.

Some people got dropped on the cc list for unknown reason, I'm adding them
back (Fabiano, Peter Maydell, Phil). Let's make sure nobody is dropped by
accident.

I'll try to summarize what is still missing, and I think these will be
greatly helpful if we don't want to deprecate rdma migration:

1) Either a CI test covering at least the major RDMA paths, or at least
periodically tests for each QEMU release will be needed.

2) Some performance tests between modern RDMA and NIC devices are
welcomed. The current knowledge is modern NIC can work similarly to
RDMA in performance, then it's debatable why we still maintain so much
rdma specific code.

3) No need to be soild patchsets for this one, but some plan to improve
RDMA migration code so that it is not almost isolated from the rest
protocols.

4) Someone to look after this code for real.

For 2) and 3) more info is here:

https://urldefense.com/v3/__https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n__;!!GjvTz_vk!VEqNfg3Kdf58Oh1FkGL6ErDLfvUXZXPwMTaXizuIQeIgJiywPzuwbqx8wM0KUsyopw_EYQxWpIWYBhQ$

Here 4) can be the most important as Markus pointed out. We just didn't
get there yet on the discussions, but maybe Markus is right that we should
talk that first.

Thanks,

--
Peter Xu

Re: [PATCH v6 06/10] util/bufferiszero: Improve scalar variant


On 29/4/24 14:31, Richard Henderson wrote:

On 4/29/24 05:18, Philippe Mathieu-Daudé wrote:

+
+    t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);


Here we read #0 and #31, ...


+    p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+    e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+    /* Read 0 to 31 aligned words from the middle. */


... so here is #1 to #30?


Not indexes, but count.  There may be zero words remaining in the 
middle, etc.


Oh, got it, thanks!

Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP


(+Peter who has more experience on such design).

On 29/4/24 13:32, Markus Armbruster wrote:

Philippe Mathieu-Daudé  writes:


Hi Daniil, Markus,

On 26/4/24 10:39, Markus Armbruster wrote:

Daniil Tatianin  writes:


This can be used to force-synchronize the time in guest after a long
stop-cont pause, which can be useful for serverless-type workload.


What is a "serverless-type workload"?


Signed-off-by: Daniil Tatianin 
---
   hw/rtc/mc146818rtc.c | 15 +++
   include/hw/rtc/mc146818rtc.h |  1 +
   qapi/misc-target.json| 16 
   3 files changed, 32 insertions(+)

diff --git a/hw/rtc/mc146818rtc.c b/hw/rtc/mc146818rtc.c
index f4c1869232..6980a78d5f 100644
--- a/hw/rtc/mc146818rtc.c
+++ b/hw/rtc/mc146818rtc.c
@@ -116,6 +116,21 @@ void qmp_rtc_reset_reinjection(Error **errp)
   }
   }
   +void qmp_rtc_notify(Error **errp)
+{
+MC146818RtcState *s;
+
+/*
+ * See:
+ * https://www.kernel.org/doc/Documentation/virtual/kvm/timekeeping.txt


What part of this document explains why this change is required?
I probably missed it. Explaining it here briefly would be more
useful.


+ */
+QLIST_FOREACH(s, &rtc_devices, link) {
+s->cmos_data[RTC_REG_B] |= REG_B_UIE;

  // Update-ended interrupt enable


+s->cmos_data[RTC_REG_C] |= REG_C_IRQF | REG_C_UF;

  // interrupt request flag
  //   update interrupt flag


+qemu_irq_raise(s->irq);
+}
+}
+

Note for later: qmp_rtc_notify() works on all realized mc146818rtc
devices.  Other kinds of RTC devices are silently ignored.  Just like
qmp_rtc_reset_reinjection().


IMO to avoid any future ambiguity (in heterogeneous machines), this
command must take a QOM device path (or a list of) and only notify
those.


Let's compare:

• With QOM path:

   · You need to know the machine's RTC device(s).

 Unfortunately, this is bothersome, as the QOM path is not stable.


But we'll need more of that with dynamic machines...


 For Q35, it's generally "/machine/unattached/device[N]/rtc", but N
 varies with configuration (TCG N=2, KVM N=3 for me), and it might
 vary with machine type version.  That's because the machine code
 creates ICH9-LPC without a proper name.  We do that a lot.  I hate
 it.

 Likewise for i440FX with PIIX3 instead of ICH9-LPC.

 For isapc, it's /machine/unattached/device[3].  I suspect the 3
 isn't reliable there, either.

 microvm doesn't seem to have an RTC by default.

   · If the device so named doesn't support IRQ inject, the command
 should fail.


Yes, why the management app would want to run this command if there
are not RTC on the machine?


   · Could be generalized to non-RTC devices when that's useful.

• Broadcast:

   · You don't need to know the machine's RTC device(s).

   · If there are multiple RTC devices that support IRQ inject, we inject
 for each of them.  There is no way to select specific RTCs.

   · If there is no RTC device that supports IRQ inject, the command does
 nothing silently.

 I don't like silent failures.  It could be made to fail instead.

If it wasn't for the unstable QOM path problem, I'd advise against
the broadcast interface.

Thoughts?


Something bugs me in this patch but I couldn't figure out what I am
missing. The issue is when migrated VM is restored. I don't get why
the behavior depends on an external decision (via external management
transport). Don't we have post_load() hooks for such tuning?
This device implements it in rtc_post_load().

Regards,

Phil.

PD: BTW tomorrow community call could be a good opportunity to discuss
this.

Re: [PULL 0/1] ufs queue

2024-04-29 Thread Stefan Hajnoczi

On Mon, Apr 29, 2024 at 12:25:37PM +0900, Jeuk Kim wrote:
> From: Jeuk Kim 
> 
> The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:
> 
>   Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
> (2024-04-26 15:28:13 -0700)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/jeuk20.kim/qemu.git tags/pull-ufs-20240429
> 
> for you to fetch changes up to f2c8aeb1afefcda92054c448b21fc59cdd99db30:
> 
>   hw/ufs: Fix buffer overflow bug (2024-04-29 12:13:35 +0900)
> 
> 
> ufs queue
> 
> - Fix ufs sanitizer vulnerability
> 
> 
> Jeuk Kim (1):
>   hw/ufs: Fix buffer overflow bug
> 
>  hw/ufs/ufs.c | 8 
>  1 file changed, 8 insertions(+)
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

It will be included in my next block pull request.

You are welcome to send pull requests directly to the qemu.git/master
maintainer (Richard Henderson is on duty for this release cycle). If you
do that, make sure to GPG sign your pull request.

Stefan


signature.asc
Description: PGP signature

[PULL 0/1] Block patches

2024-04-29 Thread Stefan Hajnoczi

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

  Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to d1c4580662bf75bf6875bb5e1ad446b300816ac7:

  hw/ufs: Fix buffer overflow bug (2024-04-29 09:33:06 -0400)


Pull request

Buffer overflow fix for Universal Flash Storage (UFS) emulation.



Jeuk Kim (1):
  hw/ufs: Fix buffer overflow bug

 hw/ufs/ufs.c | 8 
 1 file changed, 8 insertions(+)

-- 
2.44.0

[PULL 1/1] hw/ufs: Fix buffer overflow bug

2024-04-29 Thread Stefan Hajnoczi

From: Jeuk Kim 

It fixes the buffer overflow vulnerability in the ufs device.
The bug was detected by sanitizers.

You can reproduce it by:

cat << EOF |\
qemu-system-x86_64 \
-display none -machine accel=qtest -m 512M -M q35 -nodefaults -drive \
file=null-co://,if=none,id=disk0 -device ufs,id=ufs_bus -device \
ufs-lu,drive=disk0,bus=ufs_bus -qtest stdio
outl 0xcf8 0x8810
outl 0xcfc 0xe000
outl 0xcf8 0x8804
outw 0xcfc 0x06
write 0xe058 0x1 0xa7
write 0xa 0x1 0x50
EOF

Resolves: #2299
Fixes: 329f16624499 ("hw/ufs: Support for Query Transfer Requests")
Reported-by: Zheyu Ma 
Signed-off-by: Jeuk Kim 
Signed-off-by: Stefan Hajnoczi 
Message-ID: 

---
 hw/ufs/ufs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ufs/ufs.c b/hw/ufs/ufs.c
index eccdb852a0..bac78a32bb 100644
--- a/hw/ufs/ufs.c
+++ b/hw/ufs/ufs.c
@@ -126,6 +126,10 @@ static MemTxResult ufs_dma_read_req_upiu(UfsRequest *req)
 copy_size = sizeof(UtpUpiuHeader) + UFS_TRANSACTION_SPECIFIC_FIELD_SIZE +
 data_segment_length;
 
+if (copy_size > sizeof(req->req_upiu)) {
+copy_size = sizeof(req->req_upiu);
+}
+
 ret = ufs_addr_read(u, req_upiu_base_addr, &req->req_upiu, copy_size);
 if (ret) {
 trace_ufs_err_dma_read_req_upiu(req->slot, req_upiu_base_addr);
@@ -225,6 +229,10 @@ static MemTxResult ufs_dma_write_rsp_upiu(UfsRequest *req)
 copy_size = rsp_upiu_byte_len;
 }
 
+if (copy_size > sizeof(req->rsp_upiu)) {
+copy_size = sizeof(req->rsp_upiu);
+}
+
 ret = ufs_addr_write(u, rsp_upiu_base_addr, &req->rsp_upiu, copy_size);
 if (ret) {
 trace_ufs_err_dma_write_rsp_upiu(req->slot, rsp_upiu_base_addr);
-- 
2.44.0

Re: [PULL 0/1] ufs queue


On 4/29/24 06:41, Stefan Hajnoczi wrote:

On Mon, Apr 29, 2024 at 12:25:37PM +0900, Jeuk Kim wrote:

From: Jeuk Kim 

The following changes since commit fd87be1dada5672f877e03c2ca8504458292c479:

   Merge tag 'accel-20240426' of https://github.com/philmd/qemu into staging 
(2024-04-26 15:28:13 -0700)

are available in the Git repository at:

   https://gitlab.com/jeuk20.kim/qemu.git tags/pull-ufs-20240429

for you to fetch changes up to f2c8aeb1afefcda92054c448b21fc59cdd99db30:

   hw/ufs: Fix buffer overflow bug (2024-04-29 12:13:35 +0900)


ufs queue

- Fix ufs sanitizer vulnerability


Jeuk Kim (1):
   hw/ufs: Fix buffer overflow bug

  hw/ufs/ufs.c | 8 
  1 file changed, 8 insertions(+)



Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

It will be included in my next block pull request.

You are welcome to send pull requests directly to the qemu.git/master
maintainer (Richard Henderson is on duty for this release cycle). If you
do that, make sure to GPG sign your pull request.


He did. I have

Merge tag 'pull-ufs-20240429' of https://gitlab.com/jeuk20.kim/qemu into 
staging

ufs queue

# -BEGIN PGP SIGNATURE-
#
# iQIzBAABCgAdFiEEUBfYMVl8eKPZB+73EuIgTA5dtgIFAmYvEScACgkQEuIgTA5d
# tgL3Qg//R3IcISQqqDaJ/ySzKGmkyohJSc6ySLYvla4Aki7PV+um2Dx/XNS7uG2b
# d3Qz4m6QaOKsocLfldRTn2FxVK238Rp5HNny5vc0kGRdwpR514B7aU0FhpT7qObS
# wbbgRdDddIBIiCFLhtXtg5/TK2h32VxGrVI6llX4gmd2VzqM0e4xeG1Oj8rZseOY
# SAgvDv68s1YwlO1p1vPvst/H+mUKYkqtPN1mjfCIn5tM6ss8kCLUnKjqGAg1BnSN
# xwaGrqqOlzQK2+aV02eiItiow8evU/h+c9eiTnBo/EvBwjoBn6flNXABWXFENnmP
# JjVIFeiNzSFhBPDzO23GXviuEt96j5lrcGYR48HYMZfEbJNpblXzWvEGMZWnXNgx
# Q3cpcarZ4vSWIflR9OnCSQaGLA0Ny6YqLbmrM/oD+v67EITafKKc+flmiF7DBASB
# fUoEsdffdA37LDtygJb7hfUhvPQWWAujmGzZ1cDP8Oa0MhT7aiD0Z/WqhhjVQbM0
# iLiCDDD0cc0pmT3vw3EnEjKjnSkY3H62Q7pnYHiQgij4Ls/Rdd/P7OkSd0aI82t0
# TooWGZJnyf8rjAzY2cEB1Twrhmhuyt9NnGxip9W8JsQBZMLabD2CahOm83zsk7jZ
# 3fOONz6XrW2ttFkLZcRd4x4YjKONjEXsSX2ZrXTZ5t3USz/VNvY=
# =Vwyi
# -END PGP SIGNATURE-
# gpg: Signature made Sun 28 Apr 2024 08:16:55 PM PDT
# gpg:using RSA key 5017D831597C78A3D907EEF712E2204C0E5DB602
# gpg: Good signature from "Jeuk Kim " [unknown]
# gpg: aka "Jeuk Kim " [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:  There is no indication that the signature belongs to the 
owner.
# Primary key fingerprint: 5017 D831 597C 78A3 D907  EEF7 12E2 204C 0E5D 
B602

queued for the next merge.


r~

Re: [PATCH] mc146818rtc: add a way to generate RTC interrupts via QMP

Philippe Mathieu-Daudé  writes:

> (+Peter who has more experience on such design).
>
> On 29/4/24 13:32, Markus Armbruster wrote:
>> Philippe Mathieu-Daudé  writes:

[...]

>>> IMO to avoid any future ambiguity (in heterogeneous machines), this
>>> command must take a QOM device path (or a list of) and only notify
>>> those.
>> 
>> Let's compare:
>> 
>> • With QOM path:
>> 
>>· You need to know the machine's RTC device(s).
>> 
>>  Unfortunately, this is bothersome, as the QOM path is not stable.
>
> But we'll need more of that with dynamic machines...

I view /machine/unattached a technical debt (see "hate" right below).

It saved us the trouble of coming up with sensible names for onboard
devices.

And now the interest is about to be due.

>>  For Q35, it's generally "/machine/unattached/device[N]/rtc", but N
>>  varies with configuration (TCG N=2, KVM N=3 for me), and it might
>>  vary with machine type version.  That's because the machine code
>>  creates ICH9-LPC without a proper name.  We do that a lot.  I hate
>>  it.
>> 
>>  Likewise for i440FX with PIIX3 instead of ICH9-LPC.
>> 
>>  For isapc, it's /machine/unattached/device[3].  I suspect the 3
>>  isn't reliable there, either.
>> 
>>  microvm doesn't seem to have an RTC by default.

[...]

Re: [PATCH 15/24] accel/tcg: Restrict IcountDecr and CPUTLB to TCG