date:20221120

Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v2

2022-11-20 Thread Avihai Horon




On 17/11/2022 19:24, Jason Gunthorpe wrote:

On Thu, Nov 17, 2022 at 07:07:10PM +0200, Avihai Horon wrote:

+}
+
+if (mig_state->data_fd != -1) {
+if (migration->data_fd != -1) {
+/*
+ * This can happen if the device is asynchronously reset and
+ * terminates a data transfer.
+ */
+error_report("%s: data_fd out of sync", vbasedev->name);
+close(mig_state->data_fd);
+
+return -1;

Should we go to recover_state here?  Is migration->device_state
invalid?  -EBADF?

Yes, we should.
Although VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE ioctl above succeeded, setting
the device state didn't *really* succeed, as the data_fd went out of sync.
So we should go to recover_state and return -EBADF.

The state did succeed and it is now "new_state". Getting an
unexpected data_fd means it did something like RUNNING->PRE_COPY_P2P
when the code was expecting PRE_COPY->PRE_COPY_P2P.

It is actually in PRE_COPY_P2P but the in-progress migration must be
stopped and the kernel would have made the migration->data_fd
permanently return some error when it went async to RUNNING.

The recovery is to resart the migration (of this device?) from the
start.


Yes, and restart is what's happening here - the -EBADF that we return 
here will cause the migration to be aborted.
I didn't mean that we should go to recover_state *instead* of returning 
an error.


But rethinking about it, I think you are right - recover_state should be 
used only if we can't set the device in new_state (i.e., there is some 
error in device functionality).
In the "data_fd out of sync" case, we did set the device in new_state 
(no error in device functionality), but data_fd got mixed up, so we 
should just abort migration and restart it again.


So bottom line, I think we should just return -EBADF here to abort 
migration.

Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v2

2022-11-20 Thread Avihai Horon




On 17/11/2022 19:38, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Thu, 17 Nov 2022 19:07:10 +0200
Avihai Horon  wrote:

On 16/11/2022 20:29, Alex Williamson wrote:

On Thu, 3 Nov 2022 18:16:15 +0200
Avihai Horon  wrote:

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e784374453..62afc23a8c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -44,8 +44,84 @@
   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
   #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)

+#define VFIO_MIG_DATA_BUFFER_SIZE (1024 * 1024)

Add comment explaining heuristic of this size.

This is an arbitrary size we picked with mlx5 state size in mind.
Increasing this size to higher values (128M, 1G) didn't improve
performance in our testing.

How about this comment:
This is an initial value that doesn't consume much memory and provides
good performance.

Do you have other suggestion?

I'd lean more towards your description above, ex:

/*
  * This is an arbitrary size based on migration of mlx5 devices, where
  * the worst case total device migration size is on the order of 100s
  * of MB.  Testing with larger values, ex. 128MB and 1GB, did not show
  * a performance improvement.
  */

I think that provides sufficient information for someone who might come
later to have an understanding of the basis if they want to try to
optimize further.


OK, sounds good, I will add a comment like this.


@@ -804,34 +1090,51 @@ static int vfio_migration_init(VFIODevice *vbasedev)
   return -EINVAL;
   }

-ret = vfio_get_dev_region_info(vbasedev,
-   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-   &info);
-if (ret) {
-return ret;
-}
+ret = vfio_migration_query_flags(vbasedev, &mig_flags);
+if (!ret) {
+/* Migration v2 */
+/* Basic migration functionality must be supported */
+if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
+return -EOPNOTSUPP;
+}
+vbasedev->migration = g_new0(VFIOMigration, 1);
+vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
+vbasedev->migration->data_buffer =
+g_malloc0(vbasedev->migration->data_buffer_size);

So VFIO_MIG_DATA_BUFFER_SIZE is our chunk size, but why doesn't the
later addition of estimated device data size make any changes here?
I'd think we'd want to scale the buffer to the minimum of the reported
data size and some well documented heuristic for an upper bound.

As I wrote above, increasing this size to higher values (128M, 1G)
didn't improve performance in our testing.
We can always change it later on if some other heuristics are proven to
improve performance.

Note that I'm asking about a minimum buffer size, for example if
hisi_acc reports only 10s of KB for an estimated device size, why would
we still allocate VFIO_MIG_DATA_BUFFER_SIZE here?  Thanks,


This buffer is rather small and has little memory footprint.
Do you think it is worth the extra complexity of resizing the buffer?

Re: [PATCH v3 14/17] vfio/migration: Reset device if setting recover state fails

2022-11-20 Thread Avihai Horon




On 17/11/2022 20:18, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Thu, 17 Nov 2022 19:11:47 +0200
Avihai Horon  wrote:


On 16/11/2022 20:36, Alex Williamson wrote:

External email: Use caution opening links or attachments


On Thu, 3 Nov 2022 18:16:17 +0200
Avihai Horon  wrote:


If vfio_migration_set_state() fails to set the device in the requested
state it tries to put it in a recover state. If setting the device in
the recover state fails as well, hw_error is triggered and the VM is
aborted.

To improve user experience and avoid VM data loss, reset the device with
VFIO_RESET_DEVICE instead of aborting the VM.

Signed-off-by: Avihai Horon 
---
   hw/vfio/migration.c | 14 --
   1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f8c3228314..e8068b9147 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -92,8 +92,18 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,

   mig_state->device_state = recover_state;
   if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-hw_error("%s: Failed setting device in recover state, err: %s",
- vbasedev->name, strerror(errno));
+error_report(
+"%s: Failed setting device in recover state, err: %s. Resetting 
device",
+ vbasedev->name, strerror(errno));
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+hw_error("%s: Failed resetting device, err: %s", 
vbasedev->name,
+ strerror(errno));
+}
+
+migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+
+return -1;
   }

   migration->device_state = recover_state;

This addresses one of my comments on 12/ and should probably be rolled
in there.

Not sure to which comment you refer to. Could you elaborate?

Hmm, I guess I thought this was in the section immediately following
where I questioned going to recovery state.  I'm still not sure why
this is a separate patch from the initial implementation of the
function in 12/ though.


This adds new functionality comparing to v1, so I thought this should be 
in its own patch.


I can squash it to patch 12 if you want.

[PATCH v4 0/3] block/rbd: Add support for layered encryption

2022-11-20 Thread Or Ozeri

v4: split to multiple commits
add support for more than just luks-any in layered encryption
nit fixes
v3: further nit fixes suggested by @idryomov
v2: nit fixes suggested by @idryomov


Or Ozeri (3):
  block/rbd: encryption nit fixes
  block/rbd: Add luks-any encryption opening option
  block/rbd: Add support for layered encryption

 block/rbd.c  | 204 +++
 qapi/block-core.json |  35 +++-
 2 files changed, 221 insertions(+), 18 deletions(-)

-- 
2.25.1

[PATCH v4 3/3] block/rbd: Add support for layered encryption

2022-11-20 Thread Or Ozeri

Starting from ceph Reef, RBD has built-in support for layered encryption,
where each ancestor image (in a cloned image setting) can be possibly
encrypted using a unique passphrase.

A new function, rbd_encryption_load2, was added to librbd API.
This new function supports an array of passphrases (via "spec" structs).

This commit extends the qemu rbd driver API to use this new librbd API,
in order to support this new layered encryption feature.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 161 ++-
 qapi/block-core.json |  17 -
 2 files changed, 175 insertions(+), 3 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 7feae45e82..157922e23a 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -71,6 +71,16 @@ static const char rbd_luks2_header_verification[
 'L', 'U', 'K', 'S', 0xBA, 0xBE, 0, 2
 };
 
+static const char rbd_layered_luks_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'R', 'B', 'D', 'L', 0xBA, 0xBE, 0, 1
+};
+
+static const char rbd_layered_luks2_header_verification[
+RBD_ENCRYPTION_LUKS_HEADER_VERIFICATION_LEN] = {
+'R', 'B', 'D', 'L', 0xBA, 0xBE, 0, 2
+};
+
 typedef enum {
 RBD_AIO_READ,
 RBD_AIO_WRITE,
@@ -537,6 +547,136 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 
 return 0;
 }
+
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+static int qemu_rbd_encryption_load2(rbd_image_t image,
+ RbdEncryptionOptions *encrypt,
+ Error **errp)
+{
+int r = 0;
+int encrypt_count = 1;
+int i;
+RbdEncryptionOptions *curr_encrypt;
+rbd_encryption_spec_t *specs;
+rbd_encryption_luks1_format_options_t* luks_opts;
+rbd_encryption_luks2_format_options_t* luks2_opts;
+rbd_encryption_luks_format_options_t* luks_any_opts;
+
+/* count encryption options */
+for (curr_encrypt = encrypt; curr_encrypt->has_parent;
+ curr_encrypt = curr_encrypt->parent) {
+++encrypt_count;
+}
+
+specs = g_new0(rbd_encryption_spec_t, encrypt_count);
+
+curr_encrypt = encrypt;
+for (i = 0; i < encrypt_count; ++i) {
+switch (curr_encrypt->format) {
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS: {
+specs[i].format = RBD_ENCRYPTION_FORMAT_LUKS1;
+specs[i].opts_size =
+sizeof(rbd_encryption_luks1_format_options_t);
+
+luks_opts = g_new0(rbd_encryption_luks1_format_options_t, 1);
+specs[i].opts = luks_opts;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionOptionsLUKS_base(
+&curr_encrypt->u.luks),
+&luks_opts->passphrase,
+&luks_opts->passphrase_size,
+errp);
+break;
+}
+
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
+specs[i].format = RBD_ENCRYPTION_FORMAT_LUKS2;
+specs[i].opts_size =
+sizeof(rbd_encryption_luks2_format_options_t);
+
+luks2_opts = g_new0(rbd_encryption_luks2_format_options_t, 1);
+specs[i].opts = luks2_opts;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionOptionsLUKS2_base(
+&curr_encrypt->u.luks2),
+&luks2_opts->passphrase,
+&luks2_opts->passphrase_size,
+errp);
+break;
+}
+
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS_ANY: {
+specs[i].format = RBD_ENCRYPTION_FORMAT_LUKS;
+specs[i].opts_size =
+sizeof(rbd_encryption_luks_format_options_t);
+
+luks_any_opts = g_new0(rbd_encryption_luks_format_options_t, 
1);
+specs[i].opts = luks_any_opts;
+
+r = qemu_rbd_convert_luks_options(
+qapi_RbdEncryptionOptionsLUKSAny_base(
+&curr_encrypt->u.luks_any),
+&luks_any_opts->passphrase,
+&luks_any_opts->passphrase_size,
+errp);
+break;
+}
+
+default: {
+r = -ENOTSUP;
+error_setg_errno(
+errp, -r, "unknown image encryption format: %u",
+curr_encrypt->format);
+}
+}
+
+if (r < 0) {
+goto exit;
+}
+
+curr_encrypt = curr_encrypt->parent;
+}
+
+r = rbd_encryption_load2(image, specs, encrypt_count);
+if (r < 0) {
+error_setg_errno(errp, -r, "layered encryption load fail");
+goto exit;
+}
+
+exit:
+for (i = 0; i < encrypt_count; ++i) {
+if (!specs[i].opts) {
+break;
+

[PATCH v4 1/3] block/rbd: encryption nit fixes

2022-11-20 Thread Or Ozeri

Add const modifier to passphrases,
and remove redundant stack variable passphrase_len.

Signed-off-by: Or Ozeri 
---
 block/rbd.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index f826410f40..e575105e6d 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -330,7 +330,7 @@ static int qemu_rbd_set_keypairs(rados_t cluster, const 
char *keypairs_json,
 #ifdef LIBRBD_SUPPORTS_ENCRYPTION
 static int qemu_rbd_convert_luks_options(
 RbdEncryptionOptionsLUKSBase *luks_opts,
-char **passphrase,
+const char **passphrase,
 size_t *passphrase_len,
 Error **errp)
 {
@@ -341,7 +341,7 @@ static int qemu_rbd_convert_luks_options(
 static int qemu_rbd_convert_luks_create_options(
 RbdEncryptionCreateOptionsLUKSBase *luks_opts,
 rbd_encryption_algorithm_t *alg,
-char **passphrase,
+const char **passphrase,
 size_t *passphrase_len,
 Error **errp)
 {
@@ -384,8 +384,7 @@ static int qemu_rbd_encryption_format(rbd_image_t image,
   Error **errp)
 {
 int r = 0;
-g_autofree char *passphrase = NULL;
-size_t passphrase_len;
+g_autofree const char *passphrase = NULL;
 rbd_encryption_format_t format;
 rbd_encryption_options_t opts;
 rbd_encryption_luks1_format_options_t luks_opts;
@@ -407,12 +406,12 @@ static int qemu_rbd_encryption_format(rbd_image_t image,
 opts_size = sizeof(luks_opts);
 r = qemu_rbd_convert_luks_create_options(
 qapi_RbdEncryptionCreateOptionsLUKS_base(&encrypt->u.luks),
-&luks_opts.alg, &passphrase, &passphrase_len, errp);
+&luks_opts.alg, &passphrase, &luks_opts.passphrase_size,
+errp);
 if (r < 0) {
 return r;
 }
 luks_opts.passphrase = passphrase;
-luks_opts.passphrase_size = passphrase_len;
 break;
 }
 case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
@@ -423,12 +422,12 @@ static int qemu_rbd_encryption_format(rbd_image_t image,
 r = qemu_rbd_convert_luks_create_options(
 qapi_RbdEncryptionCreateOptionsLUKS2_base(
 &encrypt->u.luks2),
-&luks2_opts.alg, &passphrase, &passphrase_len, errp);
+&luks2_opts.alg, &passphrase, &luks2_opts.passphrase_size,
+errp);
 if (r < 0) {
 return r;
 }
 luks2_opts.passphrase = passphrase;
-luks2_opts.passphrase_size = passphrase_len;
 break;
 }
 default: {
@@ -466,8 +465,7 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 Error **errp)
 {
 int r = 0;
-g_autofree char *passphrase = NULL;
-size_t passphrase_len;
+g_autofree const char *passphrase = NULL;
 rbd_encryption_luks1_format_options_t luks_opts;
 rbd_encryption_luks2_format_options_t luks2_opts;
 rbd_encryption_format_t format;
@@ -482,12 +480,11 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 opts_size = sizeof(luks_opts);
 r = qemu_rbd_convert_luks_options(
 qapi_RbdEncryptionOptionsLUKS_base(&encrypt->u.luks),
-&passphrase, &passphrase_len, errp);
+&passphrase, &luks_opts.passphrase_size, errp);
 if (r < 0) {
 return r;
 }
 luks_opts.passphrase = passphrase;
-luks_opts.passphrase_size = passphrase_len;
 break;
 }
 case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS2: {
@@ -497,12 +494,11 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 opts_size = sizeof(luks2_opts);
 r = qemu_rbd_convert_luks_options(
 qapi_RbdEncryptionOptionsLUKS2_base(&encrypt->u.luks2),
-&passphrase, &passphrase_len, errp);
+&passphrase, &luks2_opts.passphrase_size, errp);
 if (r < 0) {
 return r;
 }
 luks2_opts.passphrase = passphrase;
-luks2_opts.passphrase_size = passphrase_len;
 break;
 }
 default: {
-- 
2.25.1

[PATCH v4 2/3] block/rbd: Add luks-any encryption opening option

2022-11-20 Thread Or Ozeri

Ceph RBD encryption API required specifying the encryption format
for loading encryption. The supported formats were LUKS (v1) and LUKS2.

Starting from Reef release, RBD also supports loading with "luks-any" format,
which works for both versions of LUKS.

This commit extends the qemu rbd driver API to enable qemu users to use
this luks-any wildcard format.

Signed-off-by: Or Ozeri 
---
 block/rbd.c  | 19 +++
 qapi/block-core.json | 20 ++--
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index e575105e6d..7feae45e82 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -468,6 +468,9 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 g_autofree const char *passphrase = NULL;
 rbd_encryption_luks1_format_options_t luks_opts;
 rbd_encryption_luks2_format_options_t luks2_opts;
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+rbd_encryption_luks_format_options_t luks_any_opts;
+#endif
 rbd_encryption_format_t format;
 rbd_encryption_options_t opts;
 size_t opts_size;
@@ -501,6 +504,22 @@ static int qemu_rbd_encryption_load(rbd_image_t image,
 luks2_opts.passphrase = passphrase;
 break;
 }
+#ifdef LIBRBD_SUPPORTS_ENCRYPTION_LOAD2
+case RBD_IMAGE_ENCRYPTION_FORMAT_LUKS_ANY: {
+memset(&luks_any_opts, 0, sizeof(luks_any_opts));
+format = RBD_ENCRYPTION_FORMAT_LUKS;
+opts = &luks_any_opts;
+opts_size = sizeof(luks_any_opts);
+r = qemu_rbd_convert_luks_options(
+
qapi_RbdEncryptionOptionsLUKSAny_base(&encrypt->u.luks_any),
+&passphrase, &luks_any_opts.passphrase_size, errp);
+if (r < 0) {
+return r;
+}
+luks_any_opts.passphrase = passphrase;
+break;
+}
+#endif
 default: {
 r = -ENOTSUP;
 error_setg_errno(
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 882b266532..d064847d85 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3753,10 +3753,16 @@
 ##
 # @RbdImageEncryptionFormat:
 #
+# luks
+#
+# luks2
+#
+# luks-any: Used for opening either luks or luks2. (Since 8.0)
+#
 # Since: 6.1
 ##
 { 'enum': 'RbdImageEncryptionFormat',
-  'data': [ 'luks', 'luks2' ] }
+  'data': [ 'luks', 'luks2', 'luks-any' ] }
 
 ##
 # @RbdEncryptionOptionsLUKSBase:
@@ -3798,6 +3804,15 @@
   'base': 'RbdEncryptionOptionsLUKSBase',
   'data': { } }
 
+##
+# @RbdEncryptionOptionsLUKSAny:
+#
+# Since: 8.0
+##
+{ 'struct': 'RbdEncryptionOptionsLUKSAny',
+  'base': 'RbdEncryptionOptionsLUKSBase',
+  'data': { } }
+
 ##
 # @RbdEncryptionCreateOptionsLUKS:
 #
@@ -3825,7 +3840,8 @@
   'base': { 'format': 'RbdImageEncryptionFormat' },
   'discriminator': 'format',
   'data': { 'luks': 'RbdEncryptionOptionsLUKS',
-'luks2': 'RbdEncryptionOptionsLUKS2' } }
+'luks2': 'RbdEncryptionOptionsLUKS2',
+'luks-any': 'RbdEncryptionOptionsLUKSAny'} }
 
 ##
 # @RbdEncryptionCreateOptions:
-- 
2.25.1

[PATCH 2/2] qga:/qga-win: skip getting pci info for USB disks

2022-11-20 Thread Kfir Manor

Skip getting PCI info from disks type USB and give them an empty PCI address 
instead.

Signed-off-by: Kfir Manor 
---
 qga/commands-win32.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index a645480496..14c43b3de5 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -878,10 +878,14 @@ static void get_single_disk_info(int disk_number,
  * if that doesn't hold since that suggests some other unexpected
  * breakage
  */
-disk->pci_controller = get_pci_info(disk_number, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-goto err_close;
+if (disk->bus_type == GUEST_DISK_BUS_TYPE_USB) {
+disk->pci_controller = get_empty_pci_address();
+} else {
+disk->pci_controller = get_pci_info(disk_number, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto err_close;
+}
 }
 if (disk->bus_type == GUEST_DISK_BUS_TYPE_SCSI
 || disk->bus_type == GUEST_DISK_BUS_TYPE_IDE
-- 
2.38.1

[PATCH 0/2] qemu-ga-win: 'guest-get-fsinfo' command wont query storage devices of bus type USB

2022-11-20 Thread Kfir Manor

guest-get-fsinfo won't query storage devices of bus-type USB 
(https://bugzilla.redhat.com/show_bug.cgi?id=2090333).

Bug, get_pci_info function returns an error after not finding any storage port 
device info on the USB disk parent device (because of USB abstraction).

Fix, skip getting PCI info (get_pci_info function) for USB disks (as USB disk 
doesn't have PCI info), and return an empty PCI address instead to keep with 
schema.


Kfir Manor (2):
  adding a empty PCI address creation function
  skip getting pci info for USB disks

 qga/commands-win32.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

-- 
2.38.1

[PATCH 1/2] qga:/qga-win: adding a empty PCI address creation function

2022-11-20 Thread Kfir Manor

Refactoring code to avoid duplication of creating an empty PCI address code.

Signed-off-by: Kfir Manor 
---
 qga/commands-win32.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index ec9f55b453..a645480496 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -599,6 +599,18 @@ static void get_pci_address_for_device(GuestPCIAddress 
*pci,
 }
 }
 
+static GuestPCIAddress *get_empty_pci_address(void)
+{
+GuestPCIAddress *pci = NULL;
+
+pci = g_malloc0(sizeof(*pci));
+pci->domain = -1;
+pci->slot = -1;
+pci->function = -1;
+pci->bus = -1;
+return pci;
+}
+
 static GuestPCIAddress *get_pci_info(int number, Error **errp)
 {
 HDEVINFO dev_info = INVALID_HANDLE_VALUE;
@@ -608,13 +620,7 @@ static GuestPCIAddress *get_pci_info(int number, Error 
**errp)
 SP_DEVICE_INTERFACE_DATA dev_iface_data;
 HANDLE dev_file;
 int i;
-GuestPCIAddress *pci = NULL;
-
-pci = g_malloc0(sizeof(*pci));
-pci->domain = -1;
-pci->slot = -1;
-pci->function = -1;
-pci->bus = -1;
+GuestPCIAddress *pci = get_empty_pci_address();
 
 dev_info = SetupDiGetClassDevs(&GUID_DEVINTERFACE_DISK, 0, 0,
DIGCF_PRESENT | DIGCF_DEVICEINTERFACE);
-- 
2.38.1

[PATCH v2 3/3] hw/isa/piix4: Decouple INTx-to-LNKx routing which is board-specific

2022-11-20 Thread Bernhard Beschow

pci_map_irq_fn's in general seem to be board-specific, and PIIX4's
pci_slot_get_pirq() in particular seems very Malta-specific. So move the
latter to malta.c to 1/ keep the board logic in one place and 2/ avoid
PIIX4 to make assumptions about its board.

Signed-off-by: Bernhard Beschow 
---
 hw/isa/piix4.c  | 26 --
 hw/mips/malta.c | 28 
 2 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c
index f9211d085f..eca96fb8f0 100644
--- a/hw/isa/piix4.c
+++ b/hw/isa/piix4.c
@@ -79,31 +79,6 @@ static void piix4_set_irq(void *opaque, int irq_num, int 
level)
 }
 }
 
-static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
-{
-int slot;
-
-slot = PCI_SLOT(pci_dev->devfn);
-
-switch (slot) {
-/* PIIX4 USB */
-case 10:
-return 3;
-/* AMD 79C973 Ethernet */
-case 11:
-return 1;
-/* Crystal 4281 Sound */
-case 12:
-return 2;
-/* PCI slot 1 to 4 */
-case 18 ... 21:
-return ((slot - 18) + irq_num) & 0x03;
-/* Unknown device, don't do any translation */
-default:
-return irq_num;
-}
-}
-
 static void piix4_isa_reset(DeviceState *dev)
 {
 PIIX4State *d = PIIX4_PCI_DEVICE(dev);
@@ -272,7 +247,6 @@ static void piix4_realize(PCIDevice *dev, Error **errp)
 qdev_connect_gpio_out(DEVICE(&s->pm), 0, s->isa[9]);
 
 pci_bus_irqs(pci_bus, piix4_set_irq, s, PIIX_NUM_PIRQS);
-pci_bus_map_irqs(pci_bus, pci_slot_get_pirq);
 }
 
 static void piix4_init(Object *obj)
diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index c0a2e0ab04..e435f80973 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -39,6 +39,7 @@
 #include "hw/mips/bootloader.h"
 #include "hw/mips/cpudevs.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_bus.h"
 #include "qemu/log.h"
 #include "hw/mips/bios.h"
 #include "hw/ide/pci.h"
@@ -1140,6 +1141,31 @@ static void malta_mips_config(MIPSCPU *cpu)
 }
 }
 
+static int pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
+{
+int slot;
+
+slot = PCI_SLOT(pci_dev->devfn);
+
+switch (slot) {
+/* PIIX4 USB */
+case 10:
+return 3;
+/* AMD 79C973 Ethernet */
+case 11:
+return 1;
+/* Crystal 4281 Sound */
+case 12:
+return 2;
+/* PCI slot 1 to 4 */
+case 18 ... 21:
+return ((slot - 18) + irq_num) & 0x03;
+/* Unknown device, don't do any translation */
+default:
+return irq_num;
+}
+}
+
 static void main_cpu_reset(void *opaque)
 {
 MIPSCPU *cpu = opaque;
@@ -1411,6 +1437,8 @@ void mips_malta_init(MachineState *machine)
 /* Interrupt controller */
 qdev_connect_gpio_out_named(DEVICE(piix4), "intr", 0, i8259_irq);
 
+pci_bus_map_irqs(pci_bus, pci_slot_get_pirq);
+
 /* generate SPD EEPROM data */
 dev = DEVICE(object_resolve_path_component(OBJECT(piix4), "pm"));
 smbus = I2C_BUS(qdev_get_child_bus(dev, "i2c"));
-- 
2.38.1

[PATCH v2 1/3] hw/pci/pci: Factor out pci_bus_map_irqs() from pci_bus_irqs()

2022-11-20 Thread Bernhard Beschow

pci_bus_irqs() coupled together the assignment of pci_set_irq_fn and
pci_map_irq_fn to a PCI bus. This coupling gets in the way when the
pci_map_irq_fn is board-specific while the pci_set_irq_fn is device-
specific.

For example, both of QEMU's PIIX south bridge models have different
pci_map_irq_fn implementations which are board-specific rather than
device-specific. These implementations should therefore reside in board
code. The pci_set_irq_fn's, however, should stay in the device models
because they access memory internal to the model.

Factoring out pci_bus_map_irqs() from pci_bus_irqs() allows the
assignments to be decoupled, resolving the problem described above.

Note also how pci_vpb_realize() which gets touched in this commit
assigns different pci_map_irq_fn's depending on the board.

Signed-off-by: Bernhard Beschow 
---
 hw/i386/pc_q35.c|  4 ++--
 hw/isa/piix3.c  |  8 
 hw/isa/piix4.c  |  3 ++-
 hw/pci-host/raven.c |  3 ++-
 hw/pci-host/versatile.c |  3 ++-
 hw/pci/pci.c| 12 +---
 hw/remote/machine.c |  3 ++-
 include/hw/pci/pci.h|  3 ++-
 8 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a496bd6e74..39f035903c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -268,8 +268,8 @@ static void pc_q35_init(MachineState *machine)
 for (i = 0; i < GSI_NUM_PINS; i++) {
 qdev_connect_gpio_out_named(lpc_dev, ICH9_GPIO_GSI, i, x86ms->gsi[i]);
 }
-pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc_map_irq, ich9_lpc,
- ICH9_LPC_NB_PIRQS);
+pci_bus_irqs(host_bus, ich9_lpc_set_irq, ich9_lpc, ICH9_LPC_NB_PIRQS);
+pci_bus_map_irqs(host_bus, ich9_lpc_map_irq);
 pci_bus_set_route_irq_fn(host_bus, ich9_route_intx_pin_to_irq);
 isa_bus = ich9_lpc->isa_bus;
 
diff --git a/hw/isa/piix3.c b/hw/isa/piix3.c
index f9b4af5c05..7ad26b82e8 100644
--- a/hw/isa/piix3.c
+++ b/hw/isa/piix3.c
@@ -388,8 +388,8 @@ static void piix3_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pci_bus_irqs(pci_bus, piix3_set_irq, pci_slot_get_pirq,
- piix3, PIIX_NUM_PIRQS);
+pci_bus_irqs(pci_bus, piix3_set_irq, piix3, PIIX_NUM_PIRQS);
+pci_bus_map_irqs(pci_bus, pci_slot_get_pirq);
 pci_bus_set_route_irq_fn(pci_bus, piix3_route_intx_pin_to_irq);
 }
 
@@ -424,8 +424,8 @@ static void piix3_xen_realize(PCIDevice *dev, Error **errp)
  * connected to the IOAPIC directly.
  * These additional routes can be discovered through ACPI.
  */
-pci_bus_irqs(pci_bus, xen_piix3_set_irq, xen_pci_slot_get_pirq,
- piix3, XEN_PIIX_NUM_PIRQS);
+pci_bus_irqs(pci_bus, xen_piix3_set_irq, piix3, XEN_PIIX_NUM_PIRQS);
+pci_bus_map_irqs(pci_bus, xen_pci_slot_get_pirq);
 }
 
 static void piix3_xen_class_init(ObjectClass *klass, void *data)
diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c
index 8fc1db6dc9..f9211d085f 100644
--- a/hw/isa/piix4.c
+++ b/hw/isa/piix4.c
@@ -271,7 +271,8 @@ static void piix4_realize(PCIDevice *dev, Error **errp)
 }
 qdev_connect_gpio_out(DEVICE(&s->pm), 0, s->isa[9]);
 
-pci_bus_irqs(pci_bus, piix4_set_irq, pci_slot_get_pirq, s, PIIX_NUM_PIRQS);
+pci_bus_irqs(pci_bus, piix4_set_irq, s, PIIX_NUM_PIRQS);
+pci_bus_map_irqs(pci_bus, pci_slot_get_pirq);
 }
 
 static void piix4_init(Object *obj)
diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
index 7a105e4a63..2db577df4f 100644
--- a/hw/pci-host/raven.c
+++ b/hw/pci-host/raven.c
@@ -258,7 +258,8 @@ static void raven_pcihost_realizefn(DeviceState *d, Error 
**errp)
 
 qdev_init_gpio_in(d, raven_change_gpio, 1);
 
-pci_bus_irqs(&s->pci_bus, raven_set_irq, raven_map_irq, s, PCI_NUM_PINS);
+pci_bus_irqs(&s->pci_bus, raven_set_irq, s, PCI_NUM_PINS);
+pci_bus_map_irqs(&s->pci_bus, raven_map_irq);
 
 memory_region_init_io(&h->conf_mem, OBJECT(h), &pci_host_conf_le_ops, s,
   "pci-conf-idx", 4);
diff --git a/hw/pci-host/versatile.c b/hw/pci-host/versatile.c
index f66384fa02..c77a55 100644
--- a/hw/pci-host/versatile.c
+++ b/hw/pci-host/versatile.c
@@ -422,7 +422,8 @@ static void pci_vpb_realize(DeviceState *dev, Error **errp)
 mapfn = pci_vpb_map_irq;
 }
 
-pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, mapfn, s->irq, 4);
+pci_bus_irqs(&s->pci_bus, pci_vpb_set_irq, s->irq, 4);
+pci_bus_map_irqs(&s->pci_bus, mapfn);
 
 /* Our memory regions are:
  * 0 : our control registers
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2f450f6a72..6396cde004 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -283,6 +283,7 @@ static void pci_change_irq_level(PCIDevice *pci_dev, int 
irq_num, int change)
 PCIBus *bus;
 for (;;) {
 bus = pci_get_bus(pci_dev);
+assert(bus->map_irq);
 irq_num = bus->map_irq(pci_dev, irq_num);
 if (bus->set_irq)
 break;
@@ -521,16 +522,20 @@ void pci_root_bus_cleanup(PCIBus *bus)
 qbus_unreal

[PATCH v2 0/3] Decouple INTx-to-LNKx routing from south bridges

2022-11-20 Thread Bernhard Beschow

v1:
===

During my PIIX consolidation work [1] I've noticed that both PIIX models have
quite different pci_slot_get_pirq() implementations. These functions seem to
map PCI INTx pins to input pins of a programmable interrupt router which is
AFAIU board-specific. IOW, board-specific assumptions are baked into the device
models which prevent e.g. the whole PIIX4 south bridge to be reusable in the PC
machine.

This series first factors out pci_bus_map_irqs() from pci_bus_irqs() which
then allowes for moving the two board-specific PIIX pci_slot_get_pirq()
funtions into their respective boards. With these changes, the PIIX4 south
bridge could eventually become an alternative to the PIIX3-Frankenstein
solution in the PC machine.

v2:
===
* Remove RFC tag from whole series
* New patch to split pci_bus_irqs()
* Remove VT82xx patch which was just a demonstration

Testing done:
* `make check`
* `make check-avocado`
* `qemu-system-mips64el -M malta -kernel vmlinux-3.2.0-4-5kc-malta -hda 
debian_wheezy_mipsel_standard.qcow2 -append "root=/dev/sda1 console=ttyS0"`
* `qemu-system-x86_64 -M pc -m 2G -cdrom manjaro-kde-21.3.2-220704-linux515.iso`

Thanks,
Bernhard

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg03941.html

Bernhard Beschow (3):
  hw/pci/pci: Factor out pci_bus_map_irqs() from pci_bus_irqs()
  hw/isa/piix3: Decouple INTx-to-LNKx routing which is board-specific
  hw/isa/piix4: Decouple INTx-to-LNKx routing which is board-specific

 hw/i386/pc_piix.c   | 16 
 hw/i386/pc_q35.c|  4 ++--
 hw/isa/piix3.c  | 17 ++---
 hw/isa/piix4.c  | 27 +--
 hw/mips/malta.c | 28 
 hw/pci-host/raven.c |  3 ++-
 hw/pci-host/versatile.c |  3 ++-
 hw/pci/pci.c| 12 +---
 hw/remote/machine.c |  3 ++-
 include/hw/pci/pci.h|  3 ++-
 10 files changed, 66 insertions(+), 50 deletions(-)

-- 
2.38.1

[PATCH v2 2/3] hw/isa/piix3: Decouple INTx-to-LNKx routing which is board-specific

2022-11-20 Thread Bernhard Beschow

pci_map_irq_fn's in general seem to be board-specific. So move PIIX3's
pci_slot_get_pirq() to board code to not have PIIX3 make assuptions
about its board.

Signed-off-by: Bernhard Beschow 
---
 hw/i386/pc_piix.c | 16 
 hw/isa/piix3.c| 13 -
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 0ad0ed1603..ecae85a31e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -47,6 +47,7 @@
 #include "hw/sysbus.h"
 #include "hw/i2c/smbus_eeprom.h"
 #include "hw/xen/xen-x86.h"
+#include "hw/xen/xen.h"
 #include "exec/memory.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/piix4.h"
@@ -73,6 +74,17 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 #endif
 
+/*
+ * Return the global irq number corresponding to a given device irq
+ * pin. We could also use the bus number to have a more precise mapping.
+ */
+static int pci_slot_get_pirq(PCIDevice *pci_dev, int pci_intx)
+{
+int slot_addend;
+slot_addend = PCI_SLOT(pci_dev->devfn) - 1;
+return (pci_intx + slot_addend) & 3;
+}
+
 /* PC hardware initialisation */
 static void pc_init1(MachineState *machine,
  const char *host_type, const char *pci_type)
@@ -223,6 +235,10 @@ static void pc_init1(MachineState *machine,
 piix3->pic = x86ms->gsi;
 piix3_devfn = piix3->dev.devfn;
 isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix3), "isa.0"));
+
+pci_bus_map_irqs(pci_bus,
+ xen_enabled() ? xen_pci_slot_get_pirq
+   : pci_slot_get_pirq);
 } else {
 pci_bus = NULL;
 isa_bus = isa_bus_new(NULL, get_system_memory(), system_io,
diff --git a/hw/isa/piix3.c b/hw/isa/piix3.c
index 7ad26b82e8..30509f39e5 100644
--- a/hw/isa/piix3.c
+++ b/hw/isa/piix3.c
@@ -79,17 +79,6 @@ static void piix3_set_irq(void *opaque, int pirq, int level)
 piix3_set_irq_level(piix3, pirq, level);
 }
 
-/*
- * Return the global irq number corresponding to a given device irq
- * pin. We could also use the bus number to have a more precise mapping.
- */
-static int pci_slot_get_pirq(PCIDevice *pci_dev, int pci_intx)
-{
-int slot_addend;
-slot_addend = PCI_SLOT(pci_dev->devfn) - 1;
-return (pci_intx + slot_addend) & 3;
-}
-
 static PCIINTxRoute piix3_route_intx_pin_to_irq(void *opaque, int pin)
 {
 PIIX3State *piix3 = opaque;
@@ -389,7 +378,6 @@ static void piix3_realize(PCIDevice *dev, Error **errp)
 }
 
 pci_bus_irqs(pci_bus, piix3_set_irq, piix3, PIIX_NUM_PIRQS);
-pci_bus_map_irqs(pci_bus, pci_slot_get_pirq);
 pci_bus_set_route_irq_fn(pci_bus, piix3_route_intx_pin_to_irq);
 }
 
@@ -425,7 +413,6 @@ static void piix3_xen_realize(PCIDevice *dev, Error **errp)
  * These additional routes can be discovered through ACPI.
  */
 pci_bus_irqs(pci_bus, xen_piix3_set_irq, piix3, XEN_PIIX_NUM_PIRQS);
-pci_bus_map_irqs(pci_bus, xen_pci_slot_get_pirq);
 }
 
 static void piix3_xen_class_init(ObjectClass *klass, void *data)
-- 
2.38.1

Re: [PATCH] target/arm: build smbios 19 table

2022-11-20 Thread Mihai Carabas


La 18.11.2022 21:11, Peter Maydell a scris:

On Fri, 18 Nov 2022 at 17:37, Mihai Carabas  wrote:

Use the base_memmap to build the SMBIOS 19 table which provides the address
mapping for a Physical Memory Array (from spec [1] chapter 7.20).

This was present on i386 from commit c97294ec1b9e36887e119589d456557d72ab37b5
("SMBIOS: Build aggregate smbios tables and entry point").

[1] 
https://urldefense.com/v3/__https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.5.0.pdf__;!!ACWV5N9M2RV99hQ!KF2xmQw9nxPvqvNCgDleyVHv4MoZseoZFHmR1veww7O2BmRxSH1spOCNWX-c-FvzcaR_o8PunXSWWH2ECvFqlR4E7vw$

Signed-off-by: Mihai Carabas 

Is this a bug fix, or a new feature? What are the consequences
of it being missing? Is this important enough to go into the 7.2
release? (My default position would be "no", given this has been
like this on the virt board for a very long time.)



This is required by ARM SystemReady Virtual Environment [1]. As 
described in the Arm SystemReady Requirements Specification v2.0
 [2] page 9, 2.5.1 SystemReady Virtual Environment (VE) v1.0 
requirements,: "FirmwareTestSuite (FWTS) must still be used" -> fwts 
checks for the presence of SMBIOS type 19 table and fails the test in 
this case.



[1] 
https://www.arm.com/architecture/system-architectures/systemready-certification-program/ve


[2] https://developer.arm.com/documentation/den0109/latest




---
  hw/arm/virt.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index cda9defe8f09..855b6982f363 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1607,9 +1607,11 @@ static void *machvirt_dtb(const struct arm_boot_info 
*binfo, int *fdt_size)
  static void virt_build_smbios(VirtMachineState *vms)
  {
  MachineClass *mc = MACHINE_GET_CLASS(vms);
+MachineState *ms = MACHINE(vms);
  VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
  uint8_t *smbios_tables, *smbios_anchor;
  size_t smbios_tables_len, smbios_anchor_len;
+struct smbios_phys_mem_area mem_array;
  const char *product = "QEMU Virtual Machine";

  if (kvm_enabled()) {
@@ -1620,7 +1622,11 @@ static void virt_build_smbios(VirtMachineState *vms)
  vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
  true, SMBIOS_ENTRY_POINT_TYPE_64);

-smbios_get_tables(MACHINE(vms), NULL, 0,
+/* build the array of physical mem area from base_memmap */
+mem_array.address = vms->memmap[VIRT_MEM].base;
+mem_array.length = ms->ram_size;
+
+smbios_get_tables(ms, &mem_array, 1,
&smbios_tables, &smbios_tables_len,
&smbios_anchor, &smbios_anchor_len,
&error_fatal);

Does this show up as a difference in the ACPI tables? If so then
the bios-tables-tests will need updating (and this would probably
show up as "make check" failing).
In ./tests/qtest/bios-tables-test.c we have test_smbios_structs which is 
checking for all structs present. Also it is looking for mandatory types 
and 19 is one of them (base_required_struct_types -> 19). I think we 
cover everything. I will re-run make check to see.


Do we need to care here about pluggable memory devices?
(We seem to do something with them in the ACPI tables
via build_srat_memory(), so maybe not?)
Here you are refering to the fact that when we hot plug a memory dim, to 
automatically update smbios type 19 entry/entries?


thanks
-- PMM

Re: [PATCH 1/2] remove DEC 21154 PCI bridge

2022-11-20 Thread Mark Cave-Ayland


On 16/11/2022 19:39, BALATON Zoltan wrote:


On Wed, 16 Nov 2022, Igor Mammedov wrote:


Code has not been used practically since its inception (2004)
 f2aa58c6f4a20 UniNorth PCI bridge support
or maybe even earlier, but it was consuming contributors time
as QEMU was being rewritten.
Drop it for now. Whomever would like to actually
use the thing, can make sure it actually works/reintroduce
it back when there is a user.

PS:
I've stumbled upon this when replacing PCIDeviceClass::is_bridge
field with QOM cast to PCI_BRIDGE type. Unused DEC 21154
was the only one trying to use the field with plain PCIDevice.
It's not worth keeping the field around for the sake of the code
that was commented out 'forever'.

Signed-off-by: Igor Mammedov 
---
hw/pci-bridge/dec.h   |   9 ---
include/hw/pci/pci_ids.h  |   1 -
hw/pci-bridge/dec.c   | 164 --
hw/pci-bridge/meson.build |   2 -
hw/pci-host/uninorth.c    |   6 --
5 files changed, 182 deletions(-)
delete mode 100644 hw/pci-bridge/dec.h
delete mode 100644 hw/pci-bridge/dec.c

diff --git a/hw/pci-bridge/dec.h b/hw/pci-bridge/dec.h
deleted file mode 100644
index 869e90b136..00
--- a/hw/pci-bridge/dec.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef HW_PCI_BRIDGE_DEC_H
-#define HW_PCI_BRIDGE_DEC_H
-
-
-#define TYPE_DEC_21154 "dec-21154-sysbus"
-
-PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn);
-
-#endif
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index bc9f834fd1..e4386ebb20 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -169,7 +169,6 @@

#define PCI_VENDOR_ID_DEC    0x1011
#define PCI_DEVICE_ID_DEC_21143  0x0019
-#define PCI_DEVICE_ID_DEC_21154  0x0026

#define PCI_VENDOR_ID_CIRRUS 0x1013

diff --git a/hw/pci-bridge/dec.c b/hw/pci-bridge/dec.c
deleted file mode 100644
index 4773d07e6d..00
--- a/hw/pci-bridge/dec.c
+++ /dev/null
@@ -1,164 +0,0 @@
-/*
- * QEMU DEC 21154 PCI bridge
- *
- * Copyright (c) 2006-2007 Fabrice Bellard
- * Copyright (c) 2007 Jocelyn Mayer
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "qemu/osdep.h"
-#include "dec.h"
-#include "hw/sysbus.h"
-#include "qapi/error.h"
-#include "qemu/module.h"
-#include "hw/pci/pci.h"
-#include "hw/pci/pci_host.h"
-#include "hw/pci/pci_bridge.h"
-#include "hw/pci/pci_bus.h"
-#include "qom/object.h"
-
-OBJECT_DECLARE_SIMPLE_TYPE(DECState, DEC_21154)
-
-struct DECState {
-    PCIHostState parent_obj;
-};
-
-static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
-{
-    return irq_num;
-}
-
-static void dec_pci_bridge_realize(PCIDevice *pci_dev, Error **errp)
-{
-    pci_bridge_initfn(pci_dev, TYPE_PCI_BUS);
-}
-
-static void dec_21154_pci_bridge_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-    k->realize = dec_pci_bridge_realize;
-    k->exit = pci_bridge_exitfn;
-    k->vendor_id = PCI_VENDOR_ID_DEC;
-    k->device_id = PCI_DEVICE_ID_DEC_21154;
-    k->config_write = pci_bridge_write_config;
-    k->is_bridge = true;
-    dc->desc = "DEC 21154 PCI-PCI bridge";
-    dc->reset = pci_bridge_reset;
-    dc->vmsd = &vmstate_pci_device;
-}
-
-static const TypeInfo dec_21154_pci_bridge_info = {
-    .name  = "dec-21154-p2p-bridge",
-    .parent    = TYPE_PCI_BRIDGE,
-    .instance_size = sizeof(PCIBridge),
-    .class_init    = dec_21154_pci_bridge_class_init,
-    .interfaces = (InterfaceInfo[]) {
-    { INTERFACE_CONVENTIONAL_PCI_DEVICE },
-    { },
-    },
-};
-
-PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn)
-{
-    PCIDevice *dev;
-    PCIBridge *br;
-
-    dev = pci_new_multifunction(devfn, false, "dec-21154-p2p-bridge");
-    br = PCI_BRIDGE(dev);
-    pci_bridge_map_irq(br, "DEC 21154 PCI-PCI bridge", dec_map_irq);
-    pci_realize_and_unref(dev, p

Re: [PATCH for-8.0 5/7] target/riscv: Use QEMU_IOTHREAD_LOCK_GUARD in riscv_cpu_update_mip

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 7:22 PM Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
> Cc: Alistair Francis 
> Cc: Bin Meng 
> Cc: qemu-ri...@nongnu.org
> ---
>  target/riscv/cpu_helper.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 278d163803..241d06bab8 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -610,7 +610,6 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t 
> mask, uint64_t value)
>  CPURISCVState *env = &cpu->env;
>  CPUState *cs = CPU(cpu);
>  uint64_t gein, vsgein = 0, vstip = 0, old = env->mip;
> -bool locked = false;
>
>  if (riscv_cpu_virt_enabled(env)) {
>  gein = get_field(env->hstatus, HSTATUS_VGEIN);
> @@ -621,10 +620,7 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t 
> mask, uint64_t value)
>  mask = ((mask == MIP_VSTIP) && env->vstime_irq) ? 0 : mask;
>  vstip = env->vstime_irq ? MIP_VSTIP : 0;
>
> -if (!qemu_mutex_iothread_locked()) {
> -locked = true;
> -qemu_mutex_lock_iothread();
> -}
> +QEMU_IOTHREAD_LOCK_GUARD();
>
>  env->mip = (env->mip & ~mask) | (value & mask);
>
> @@ -634,10 +630,6 @@ uint64_t riscv_cpu_update_mip(RISCVCPU *cpu, uint64_t 
> mask, uint64_t value)
>  cpu_reset_interrupt(cs, CPU_INTERRUPT_HARD);
>  }
>
> -if (locked) {
> -qemu_mutex_unlock_iothread();
> -}
> -
>  return old;
>  }
>
> --
> 2.34.1
>
>

Re: [PATCH for-8.0 1/7] qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD

2022-11-20 Thread Richard Henderson


On 11/18/22 05:30, Alex Bennée wrote:


Richard Henderson  writes:


Create a wrapper for locking/unlocking the iothread lock.

Signed-off-by: Richard Henderson 
---
Cc: Paolo Bonzini  (maintainer:Main loop)


You might want to review Paolo's comments from:

   Subject: [RFC PATCH] main-loop: introduce WITH_QEMU_IOTHREAD_LOCK
   Date: Mon, 24 Oct 2022 18:19:09 +0100
   Message-Id: <20221024171909.434818-1-alex.ben...@linaro.org>

So it would be worth having the WITH_QEMU_IOTHREAD_LOCK() and
MAYBE_WITH_QEMU_IOTHREAD_LOCK() helpers for completeness.


I don't see that (MAYBE_)WITH_QEMU_IOTHREAD_LOCK is particularly useful in any of the 
cases that I converted.



And of course the name cleanup.


What name cleanup?


r~

Re: [PATCH v11 4/5] target/riscv: smstateen check for fcsr

2022-11-20 Thread Alistair Francis

On Sun, Oct 16, 2022 at 11:09 PM Mayuresh Chitale
 wrote:
>
> If smstateen is implemented and sstateen0.fcsr is clear then the floating 
> point
> operations must return illegal instruction exception or virtual instruction
> trap, if relevant.
>
> Signed-off-by: Mayuresh Chitale 
> Reviewed-by: Weiwei Li 
> ---
>  target/riscv/csr.c| 23 
>  target/riscv/insn_trans/trans_rvf.c.inc   | 43 +--
>  target/riscv/insn_trans/trans_rvzfh.c.inc | 12 +++
>  3 files changed, 75 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index 71236f2b5d..8b25f885ec 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -84,6 +84,10 @@ static RISCVException fs(CPURISCVState *env, int csrno)
>  !RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
>  return RISCV_EXCP_ILLEGAL_INST;
>  }
> +
> +if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
> +return smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR);
> +}
>  #endif
>  return RISCV_EXCP_NONE;
>  }
> @@ -2023,6 +2027,9 @@ static RISCVException write_mstateen0(CPURISCVState 
> *env, int csrno,
>target_ulong new_val)
>  {
>  uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> +if (!riscv_has_ext(env, RVF)) {
> +wr_mask |= SMSTATEEN0_FCSR;
> +}
>
>  return write_mstateen(env, csrno, wr_mask, new_val);
>  }
> @@ -2059,6 +2066,10 @@ static RISCVException write_mstateen0h(CPURISCVState 
> *env, int csrno,
>  {
>  uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
>
> +if (!riscv_has_ext(env, RVF)) {
> +wr_mask |= SMSTATEEN0_FCSR;
> +}
> +
>  return write_mstateenh(env, csrno, wr_mask, new_val);
>  }
>
> @@ -2096,6 +2107,10 @@ static RISCVException write_hstateen0(CPURISCVState 
> *env, int csrno,
>  {
>  uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
>
> +if (!riscv_has_ext(env, RVF)) {
> +wr_mask |= SMSTATEEN0_FCSR;
> +}
> +
>  return write_hstateen(env, csrno, wr_mask, new_val);
>  }
>
> @@ -2135,6 +2150,10 @@ static RISCVException write_hstateen0h(CPURISCVState 
> *env, int csrno,
>  {
>  uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
>
> +if (!riscv_has_ext(env, RVF)) {
> +wr_mask |= SMSTATEEN0_FCSR;
> +}
> +
>  return write_hstateenh(env, csrno, wr_mask, new_val);
>  }
>
> @@ -2182,6 +2201,10 @@ static RISCVException write_sstateen0(CPURISCVState 
> *env, int csrno,
>  {
>  uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
>
> +if (!riscv_has_ext(env, RVF)) {
> +wr_mask |= SMSTATEEN0_FCSR;
> +}
> +
>  return write_sstateen(env, csrno, wr_mask, new_val);
>  }
>
> diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
> b/target/riscv/insn_trans/trans_rvf.c.inc
> index a1d3eb52ad..93657680c6 100644
> --- a/target/riscv/insn_trans/trans_rvf.c.inc
> +++ b/target/riscv/insn_trans/trans_rvf.c.inc
> @@ -24,9 +24,46 @@
>  return false; \
>  } while (0)
>
> -#define REQUIRE_ZFINX_OR_F(ctx) do {\
> -if (!ctx->cfg_ptr->ext_zfinx) { \
> -REQUIRE_EXT(ctx, RVF); \
> +#ifndef CONFIG_USER_ONLY
> +static inline bool smstateen_fcsr_check(DisasContext *ctx, int index)
> +{
> +CPUState *cpu = ctx->cs;
> +CPURISCVState *env = cpu->env_ptr;
> +uint64_t stateen = env->mstateen[index];

Sorry I missed this the first time around. You can't access env here

Richard pointed it out here:
https://patchwork.kernel.org/project/qemu-devel/patch/20221117070316.58447-8-liwei...@iscas.ac.cn/#25095773

I'm going to drop this patch and patch v5

Alistair

Re: [PATCH] riscv: Add RISCVCPUConfig.satp_mode to set sv48, sv57, etc.

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 12:26 AM Alexandre Ghiti  wrote:
>
> RISC-V specifies multiple sizes for addressable memory and Linux probes for
> the machine's support at startup via the satp CSR register (done in
> csr.c:validate_vm).
>
> As per the specification, sv64 must support sv57, which in turn must
> support sv48...etc. So we can restrict machine support by simply setting the
> "highest" supported mode in the satp_mode property. And the bare mode is
> always supported.
>
> You can set this new property as follows:
> -cpu rv64,satp-mode=sv48 # Linux will boot using sv48 scheme
> -cpu rv64,satp-mode=sv39 # Linux will boot using sv39 scheme
>
> In addition, we now correctly set the device-tree entry 'mmu-type' using
> this new satp_mode property.
>
> Co-Developed-by: Ludovic Henry 
> Signed-off-by: Ludovic Henry 
> Signed-off-by: Alexandre Ghiti 
> ---
>  hw/riscv/virt.c| 15 ++-
>  target/riscv/cpu.c | 45 +
>  target/riscv/cpu.h |  3 +++
>  target/riscv/csr.c |  6 --
>  4 files changed, 58 insertions(+), 11 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a5bc7353b4..77484b5cae 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -228,7 +228,7 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
> socket,
>  int cpu;
>  uint32_t cpu_phandle;
>  MachineState *mc = MACHINE(s);
> -char *name, *cpu_name, *core_name, *intc_name;
> +char *name, *cpu_name, *core_name, *intc_name, *sv_name;
>
>  for (cpu = s->soc[socket].num_harts - 1; cpu >= 0; cpu--) {
>  cpu_phandle = (*phandle)++;
> @@ -236,14 +236,11 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, 
> int socket,
>  cpu_name = g_strdup_printf("/cpus/cpu@%d",
>  s->soc[socket].hartid_base + cpu);
>  qemu_fdt_add_subnode(mc->fdt, cpu_name);
> -if (riscv_feature(&s->soc[socket].harts[cpu].env,
> -  RISCV_FEATURE_MMU)) {
> -qemu_fdt_setprop_string(mc->fdt, cpu_name, "mmu-type",
> -(is_32_bit) ? "riscv,sv32" : 
> "riscv,sv48");
> -} else {
> -qemu_fdt_setprop_string(mc->fdt, cpu_name, "mmu-type",
> -"riscv,none");
> -}
> +
> +sv_name = g_strdup_printf("riscv,%s",
> +  
> s->soc[socket].harts[cpu].cfg.satp_mode_str);
> +qemu_fdt_setprop_string(mc->fdt, cpu_name, "mmu-type", sv_name);
> +
>  name = riscv_isa_string(&s->soc[socket].harts[cpu]);
>  qemu_fdt_setprop_string(mc->fdt, cpu_name, "riscv,isa", name);
>  g_free(name);
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index d14e95c9dc..efdb530ad9 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -907,6 +907,48 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>   }
>  #endif
>
> +/*
> + * Either a cpu sets its supported satp_mode in XXX_cpu_init
> + * or the user sets this value using satp_mode property.
> + */
> +bool rv32 = riscv_cpu_mxl(&cpu->env) == MXL_RV32;
> +if (cpu->cfg.satp_mode_str) {
> +if (!g_strcmp0(cpu->cfg.satp_mode_str, "none"))
> +cpu->cfg.satp_mode = VM_1_10_MBARE;
> +else if (!g_strcmp0(cpu->cfg.satp_mode_str, "sv32") && rv32)
> +cpu->cfg.satp_mode = VM_1_10_SV32;
> +else if (!g_strcmp0(cpu->cfg.satp_mode_str, "sv39") && !rv32)
> +cpu->cfg.satp_mode = VM_1_10_SV39;
> +else if (!g_strcmp0(cpu->cfg.satp_mode_str, "sv48") && !rv32)
> +cpu->cfg.satp_mode = VM_1_10_SV48;
> +else if (!g_strcmp0(cpu->cfg.satp_mode_str, "sv57") && !rv32)
> +cpu->cfg.satp_mode = VM_1_10_SV57;
> +else if (!g_strcmp0(cpu->cfg.satp_mode_str, "sv64") && !rv32)
> +cpu->cfg.satp_mode = VM_1_10_SV64;
> +else {
> +error_report("Unknown option for satp_mode: %s",
> + cpu->cfg.satp_mode_str);
> +exit(EXIT_FAILURE);

You should use error_setg() and return here instead

Alistair

> +}
> +} else {
> +/*
> + * If unset by both the user and the cpu, we fallback to sv32 for 
> 32-bit
> + * or sv57 for 64-bit when a MMU is present, and bare otherwise.
> + */
> +if (riscv_feature(&cpu->env, RISCV_FEATURE_MMU)) {
> +if (rv32) {
> +cpu->cfg.satp_mode_str = g_strdup("sv32");
> +cpu->cfg.satp_mode = VM_1_10_SV32;
> +} else {
> +cpu->cfg.satp_mode_str = g_strdup("sv57");
> +cpu->cfg.satp_mode = VM_1_10_SV57;
> +}
> +} else {
> +cpu->cfg.satp_mode_str = g_strdup("none");
> +cpu->cfg.satp_mode = VM_1_10_MBARE;
> +}
> +}
> +
>  riscv_cpu_register_gdb_regs_for_features(cs);
>
>  qemu_init_vcpu(cs);
> @@ -1094,6 +1136,9 @@ s

Re: [PATCH v2] hw/riscv: virt: Remove the redundant ipi-id property

2022-11-20 Thread Alistair Francis

On Sun, Nov 13, 2022 at 7:52 PM Atish Patra  wrote:
>
> The imsic DT binding[1] has changed and no longer require an ipi-id.
> The latest IMSIC driver dynamically allocates ipi id if slow-ipi
> is not defined.
>
> Get rid of the unused dt property which may lead to confusion.
>
> [1] 
> https://lore.kernel.org/lkml/2022044207.1478350-5-apa...@ventanamicro.com/
>
> Signed-off-by: Atish Patra 

Acked-by: Alistair Francis 

Alistair

> ---
>  hw/riscv/virt.c | 2 --
>  include/hw/riscv/virt.h | 1 -
>  2 files changed, 3 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a5bc7353b412..0bc0964e42a8 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -546,8 +546,6 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
> MemMapEntry *memmap,
>  riscv_socket_count(mc) * sizeof(uint32_t) * 4);
>  qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,num-ids",
>  VIRT_IRQCHIP_NUM_MSIS);
> -qemu_fdt_setprop_cells(mc->fdt, imsic_name, "riscv,ipi-id",
> -VIRT_IRQCHIP_IPI_MSI);
>  if (riscv_socket_count(mc) > 1) {
>  qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,hart-index-bits",
>  imsic_num_bits(imsic_max_hart_per_socket));
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index be4ab8fe7f71..62513e075c47 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -93,7 +93,6 @@ enum {
>
>  #define VIRT_PLATFORM_BUS_NUM_IRQS 32
>
> -#define VIRT_IRQCHIP_IPI_MSI 1
>  #define VIRT_IRQCHIP_NUM_MSIS 255
>  #define VIRT_IRQCHIP_NUM_SOURCES VIRTIO_NDEV
>  #define VIRT_IRQCHIP_NUM_PRIO_BITS 3
> --
> 2.25.1
>
>

Re: [PATCH v2] hw/riscv: virt: Remove the redundant ipi-id property

2022-11-20 Thread Alistair Francis

On Sun, Nov 13, 2022 at 7:52 PM Atish Patra  wrote:
>
> The imsic DT binding[1] has changed and no longer require an ipi-id.
> The latest IMSIC driver dynamically allocates ipi id if slow-ipi
> is not defined.
>
> Get rid of the unused dt property which may lead to confusion.
>
> [1] 
> https://lore.kernel.org/lkml/2022044207.1478350-5-apa...@ventanamicro.com/
>
> Signed-off-by: Atish Patra 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  hw/riscv/virt.c | 2 --
>  include/hw/riscv/virt.h | 1 -
>  2 files changed, 3 deletions(-)
>
> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> index a5bc7353b412..0bc0964e42a8 100644
> --- a/hw/riscv/virt.c
> +++ b/hw/riscv/virt.c
> @@ -546,8 +546,6 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
> MemMapEntry *memmap,
>  riscv_socket_count(mc) * sizeof(uint32_t) * 4);
>  qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,num-ids",
>  VIRT_IRQCHIP_NUM_MSIS);
> -qemu_fdt_setprop_cells(mc->fdt, imsic_name, "riscv,ipi-id",
> -VIRT_IRQCHIP_IPI_MSI);
>  if (riscv_socket_count(mc) > 1) {
>  qemu_fdt_setprop_cell(mc->fdt, imsic_name, "riscv,hart-index-bits",
>  imsic_num_bits(imsic_max_hart_per_socket));
> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
> index be4ab8fe7f71..62513e075c47 100644
> --- a/include/hw/riscv/virt.h
> +++ b/include/hw/riscv/virt.h
> @@ -93,7 +93,6 @@ enum {
>
>  #define VIRT_PLATFORM_BUS_NUM_IRQS 32
>
> -#define VIRT_IRQCHIP_IPI_MSI 1
>  #define VIRT_IRQCHIP_NUM_MSIS 255
>  #define VIRT_IRQCHIP_NUM_SOURCES VIRTIO_NDEV
>  #define VIRT_IRQCHIP_NUM_PRIO_BITS 3
> --
> 2.25.1
>
>

Re: [PATCH v5 1/9] target/riscv: add cfg properties for Zc* extension

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 10:45 PM Weiwei Li  wrote:
>
> Add properties for Zca,Zcb,Zcf,Zcd,Zcmp,Zcmt extension
> Add check for these properties
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Cc: Alistair Francis 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 43 +++
>  target/riscv/cpu.h |  6 ++
>  2 files changed, 49 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 042fd541b4..1ab04ab246 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -805,6 +805,49 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  }
>  }
>
> +if (cpu->cfg.ext_c) {
> +cpu->cfg.ext_zca = true;
> +if (cpu->cfg.ext_f && env->misa_mxl_max == MXL_RV32) {
> +cpu->cfg.ext_zcf = true;
> +}
> +if (cpu->cfg.ext_d) {
> +cpu->cfg.ext_zcd = true;
> +}
> +}
> +
> +if (env->misa_mxl_max != MXL_RV32 && cpu->cfg.ext_zcf) {
> +error_setg(errp, "Zcf extension is only relevant to RV32");
> +return;
> +}
> +
> +if (!cpu->cfg.ext_f && cpu->cfg.ext_zcf) {
> +error_setg(errp, "Zcf extension requires F extension");
> +return;
> +}
> +
> +if (!cpu->cfg.ext_d && cpu->cfg.ext_zcd) {
> +error_setg(errp, "Zcd extensionrequires D extension");
> +return;
> +}
> +
> +if ((cpu->cfg.ext_zcf || cpu->cfg.ext_zcd || cpu->cfg.ext_zcb ||
> + cpu->cfg.ext_zcmp || cpu->cfg.ext_zcmt) && !cpu->cfg.ext_zca) {
> +error_setg(errp, "Zcf/Zcd/Zcb/Zcmp/Zcmt extensions require Zca "
> + "extension");
> +return;
> +}
> +
> +if (cpu->cfg.ext_zcd && (cpu->cfg.ext_zcmp || cpu->cfg.ext_zcmt)) {
> +error_setg(errp, "Zcmp/Zcmt extensions are incompatible with "
> + "Zcd extension");
> +return;
> +}
> +
> +if (cpu->cfg.ext_zcmt && !cpu->cfg.ext_icsr) {
> +error_setg(errp, "Zcmt extension requires Zicsr extension");
> +return;
> +}
> +
>  if (cpu->cfg.ext_zk) {
>  cpu->cfg.ext_zkn = true;
>  cpu->cfg.ext_zkr = true;
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 9bd539d77a..6e915b6937 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -434,6 +434,12 @@ struct RISCVCPUConfig {
>  bool ext_zbkc;
>  bool ext_zbkx;
>  bool ext_zbs;
> +bool ext_zca;
> +bool ext_zcb;
> +bool ext_zcd;
> +bool ext_zcf;
> +bool ext_zcmp;
> +bool ext_zcmt;
>  bool ext_zk;
>  bool ext_zkn;
>  bool ext_zknd;
> --
> 2.25.1
>
>

Re: [PATCH v2 0/5] Nested virtualization fixes for QEMU

2022-11-20 Thread Anup Patel

Hi Alistair,

On Tue, Nov 8, 2022 at 6:27 PM Anup Patel  wrote:
>
> This series mainly includes fixes discovered while developing nested
> virtualization running on QEMU.
>
> These patches can also be found in the riscv_nested_fixes_v2 branch at:
> https://github.com/avpatel/qemu.git
>
> Changes since v1:
>  - Added Alistair's Reviewed-by tags to appropriate patches
>  - Added detailed comment block in PATCH4
>
> Anup Patel (5):
>   target/riscv: Typo fix in sstc() predicate
>   target/riscv: Update VS timer whenever htimedelta changes
>   target/riscv: Don't clear mask in riscv_cpu_update_mip() for VSTIP
>   target/riscv: No need to re-start QEMU timer when timecmp ==
> UINT64_MAX
>   target/riscv: Ensure opcode is saved for all relevant instructions

Friendly ping ?

Regards,
Anup

>
>  target/riscv/cpu_helper.c   |  2 --
>  target/riscv/csr.c  | 18 ++-
>  target/riscv/insn_trans/trans_rva.c.inc | 10 --
>  target/riscv/insn_trans/trans_rvd.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvf.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvh.c.inc |  3 ++
>  target/riscv/insn_trans/trans_rvi.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvzfh.c.inc   |  2 ++
>  target/riscv/insn_trans/trans_svinval.c.inc |  3 ++
>  target/riscv/time_helper.c  | 36 ++---
>  10 files changed, 70 insertions(+), 10 deletions(-)
>
> --
> 2.34.1
>

Re: [PATCH] vhost: configure all host notifiers in a single MR transaction

2022-11-20 Thread Jason Wang

On Fri, Nov 18, 2022 at 10:49 PM Longpeng(Mike)  wrote:
>
> From: Longpeng 
>
> This allows the vhost device to batch the setup of all its host notifiers.
> This significantly reduces the device starting time, e.g. the vhost-vDPA
> generic device [1] start time reduce from 376ms to 9.1ms for a VM with
> 64 vCPUs and 3 vDPA device(64vq per device).

Great, I think we need to do this for host_notifiers_mr as well. This
helps for the case when the notification area could be mapped directly
to guests.

>
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg921541.html
>
> Signed-off-by: Longpeng 
> ---
>  hw/virtio/vhost.c | 39 ++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index d1c4c20b8c..bf82d9b176 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1507,6 +1507,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
>  int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
>  {
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +int vq_init_count = 0;
>  int i, r, e;
>
>  /* We will pass the notifiers to the kernel, make sure that QEMU
> @@ -1518,6 +1519,12 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev)
>  goto fail;
>  }
>
> +/*
> + * Batch all the host notifiers in a single transaction to avoid
> + * quadratic time complexity in address_space_update_ioeventfds().
> + */
> +memory_region_transaction_begin();
> +
>  for (i = 0; i < hdev->nvqs; ++i) {
>  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i,
>   true);
> @@ -1525,19 +1532,33 @@ int vhost_dev_enable_notifiers(struct vhost_dev 
> *hdev, VirtIODevice *vdev)
>  error_report("vhost VQ %d notifier binding failed: %d", i, -r);
>  goto fail_vq;
>  }
> +
> +vq_init_count++;
>  }

Nit, the name needs some tweak, it's actually the number of the host
notifiers that is initialized. And we can count it out of the loop.

>
> +memory_region_transaction_commit();
> +
>  return 0;
>  fail_vq:
> -while (--i >= 0) {
> +for (i = 0; i < vq_init_count; i++) {

It looks to me there's no need for this change.

Others look good.

Thanks

>  e = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i,
>   false);
>  if (e < 0) {
>  error_report("vhost VQ %d notifier cleanup error: %d", i, -r);
>  }
>  assert (e >= 0);
> +}
> +
> +/*
> + * The transaction expects the ioeventfds to be open when it
> + * commits. Do it now, before the cleanup loop.
> + */
> +memory_region_transaction_commit();
> +
> +for (i = 0; i < vq_init_count; i++) {
>  virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i);
>  }
> +
>  virtio_device_release_ioeventfd(vdev);
>  fail:
>  return r;
> @@ -1553,6 +1574,12 @@ void vhost_dev_disable_notifiers(struct vhost_dev 
> *hdev, VirtIODevice *vdev)
>  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
>  int i, r;
>
> +/*
> + * Batch all the host notifiers in a single transaction to avoid
> + * quadratic time complexity in address_space_update_ioeventfds().
> + */
> +memory_region_transaction_begin();
> +
>  for (i = 0; i < hdev->nvqs; ++i) {
>  r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i,
>   false);
> @@ -1560,8 +1587,18 @@ void vhost_dev_disable_notifiers(struct vhost_dev 
> *hdev, VirtIODevice *vdev)
>  error_report("vhost VQ %d notifier cleanup failed: %d", i, -r);
>  }
>  assert (r >= 0);
> +}
> +
> +/*
> + * The transaction expects the ioeventfds to be open when it
> + * commits. Do it now, before the cleanup loop.
> + */
> +memory_region_transaction_commit();
> +
> +for (i = 0; i < hdev->nvqs; ++i) {
>  virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + 
> i);
>  }
> +
>  virtio_device_release_ioeventfd(vdev);
>  }
>
> --
> 2.23.0
>

Re: [PATCH for-7.2 v3 2/3] rtl8139: keep Tx command mode 0 and 1 separate

2022-11-20 Thread Jason Wang

On Fri, Nov 18, 2022 at 12:56 AM Stefan Hajnoczi  wrote:
>
> There are two Tx Descriptor formats called mode 0 and mode 1. The mode
> is determined by the Large Send bit.
>
> CP_TX_IPCS (bit 18) is defined in mode 1 but the code checks the bit
> unconditionally. In mode 0 bit 18 is part of the Large Send MSS value.
>
> Explicitly check the Large Send bit to distinguish Tx command modes.
> This avoids bugs where modes are confused. Note that I didn't find any
> actual bugs aside from needlessly computing the IP checksum when the
> Large Send bit is enabled.
>
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Jason Wang 

Thanks

> ---
>  hw/net/rtl8139.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index ffef3789b5..6dd7a8e6e0 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -2135,7 +2135,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  }
>  ip_data_len -= hlen;
>
> -if (txdw0 & CP_TX_IPCS)
> +if (!(txdw0 & CP_TX_LGSEN) && (txdw0 & CP_TX_IPCS))
>  {
>  DPRINTF("+++ C+ mode need IP checksum\n");
>
> @@ -2268,7 +2268,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  /* Stop sending this frame */
>  saved_size = 0;
>  }
> -else if (txdw0 & (CP_TX_TCPCS|CP_TX_UDPCS))
> +else if (!(txdw0 & CP_TX_LGSEN) && (txdw0 & 
> (CP_TX_TCPCS|CP_TX_UDPCS)))
>  {
>  DPRINTF("+++ C+ mode need TCP or UDP checksum\n");
>
> --
> 2.38.1
>

Re: [PATCH for-7.2 v3 1/3] rtl8139: avoid clobbering tx descriptor bits

2022-11-20 Thread Jason Wang

On Fri, Nov 18, 2022 at 12:56 AM Stefan Hajnoczi  wrote:
>
> The device turns the Tx Descriptor into a Tx Status descriptor after
> fully reading the descriptor. This involves clearing Tx Own (bit 31) to
> indicate that the driver has ownership of the descriptor again as well
> as several other bits.
>
> The code keeps the first dword of the Tx Descriptor in the txdw0 local
> variable. txdw0 is reused to build the first word of the Tx Status
> descriptor. Later on the code uses txdw0 again, incorrectly assuming
> that it still contains the first dword of the Tx Descriptor. The tx
> offloading code misbehaves because it sees bogus bits in txdw0.

(This is only noticeable with patch 2).

>
> Use a separate local variable for Tx Status and preserve Tx Descriptor
> in txdw0.
>
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Jason Wang 

> ---
>  hw/net/rtl8139.c | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index e6643e3c9d..ffef3789b5 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -2027,18 +2027,21 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  s->currCPlusTxDesc = 0;
>  }
>
> +/* Build the Tx Status Descriptor */
> +uint32_t tx_status = txdw0;
> +
>  /* transfer ownership to target */
> -txdw0 &= ~CP_TX_OWN;
> +tx_status &= ~CP_TX_OWN;
>
>  /* reset error indicator bits */
> -txdw0 &= ~CP_TX_STATUS_UNF;
> -txdw0 &= ~CP_TX_STATUS_TES;
> -txdw0 &= ~CP_TX_STATUS_OWC;
> -txdw0 &= ~CP_TX_STATUS_LNKF;
> -txdw0 &= ~CP_TX_STATUS_EXC;
> +tx_status &= ~CP_TX_STATUS_UNF;
> +tx_status &= ~CP_TX_STATUS_TES;
> +tx_status &= ~CP_TX_STATUS_OWC;
> +tx_status &= ~CP_TX_STATUS_LNKF;
> +tx_status &= ~CP_TX_STATUS_EXC;
>
>  /* update ring data */
> -val = cpu_to_le32(txdw0);
> +val = cpu_to_le32(tx_status);
>  pci_dma_write(d, cplus_tx_ring_desc, (uint8_t *)&val, 4);
>
>  /* Now decide if descriptor being processed is holding the last segment 
> of packet */
> --
> 2.38.1
>

Re: [PATCH for-7.2 v3 3/3] rtl8139: honor large send MSS value

2022-11-20 Thread Jason Wang

On Fri, Nov 18, 2022 at 12:56 AM Stefan Hajnoczi  wrote:
>
> The Large-Send Task Offload Tx Descriptor (9.2.1 Transmit) has a
> Large-Send MSS value where the driver specifies the MSS. See the
> datasheet here:
> http://realtek.info/pdf/rtl8139cp.pdf
>
> The code ignores this value and uses a hardcoded MSS of 1500 bytes
> instead. When the MTU is less than 1500 bytes the hardcoded value
> results in IP fragmentation and poor performance.
>
> Use the Large-Send MSS value to correctly size Large-Send packets.
>
> Jason Wang  noticed that the Large-Send MSS value
> mask was incorrect so it is adjusted to match the datasheet and Linux
> 8139cp driver.
>
> This issue was discussed in the past here:
> https://lore.kernel.org/all/20161114162505.GD26664@stefanha-x1.localdomain/
>
> Reported-by: Russell King - ARM Linux 
> Reported-by: Tobias Fiebig 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1312
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Jason Wang 

Thanks

> ---
>  hw/net/rtl8139.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index 6dd7a8e6e0..700b1b66b6 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -77,7 +77,6 @@
>  ( ( input ) & ( size - 1 )  )
>
>  #define ETHER_TYPE_LEN 2
> -#define ETH_MTU 1500
>
>  #define VLAN_TCI_LEN 2
>  #define VLAN_HLEN (ETHER_TYPE_LEN + VLAN_TCI_LEN)
> @@ -1934,8 +1933,9 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  #define CP_TX_LS (1<<28)
>  /* large send packet flag */
>  #define CP_TX_LGSEN (1<<27)
> -/* large send MSS mask, bits 16...25 */
> -#define CP_TC_LGSEN_MSS_MASK ((1 << 12) - 1)
> +/* large send MSS mask, bits 16...26 */
> +#define CP_TC_LGSEN_MSS_SHIFT 16
> +#define CP_TC_LGSEN_MSS_MASK ((1 << 11) - 1)
>
>  /* IP checksum offload flag */
>  #define CP_TX_IPCS (1<<18)
> @@ -2152,10 +2152,11 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  goto skip_offload;
>  }
>
> -int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;
> +int large_send_mss = (txdw0 >> CP_TC_LGSEN_MSS_SHIFT) &
> + CP_TC_LGSEN_MSS_MASK;
>
> -DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
> -"frame data %d specified MSS=%d\n", ETH_MTU,
> +DPRINTF("+++ C+ mode offloaded task TSO IP data %d "
> +"frame data %d specified MSS=%d\n",
>  ip_data_len, saved_size - ETH_HLEN, large_send_mss);
>
>  int tcp_send_offset = 0;
> @@ -2180,25 +2181,22 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  goto skip_offload;
>  }
>
> -/* ETH_MTU = ip header len + tcp header len + payload */
>  int tcp_data_len = ip_data_len - tcp_hlen;
> -int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
>
>  DPRINTF("+++ C+ mode TSO IP data len %d TCP hlen %d TCP "
> -"data len %d TCP chunk size %d\n", ip_data_len,
> -tcp_hlen, tcp_data_len, tcp_chunk_size);
> +"data len %d\n", ip_data_len, tcp_hlen, tcp_data_len);
>
>  /* note the cycle below overwrites IP header data,
> but restores it from saved_ip_header before sending 
> packet */
>
>  int is_last_frame = 0;
>
> -for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; 
> tcp_send_offset += tcp_chunk_size)
> +for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; 
> tcp_send_offset += large_send_mss)
>  {
> -uint16_t chunk_size = tcp_chunk_size;
> +uint16_t chunk_size = large_send_mss;
>
>  /* check if this is the last frame */
> -if (tcp_send_offset + tcp_chunk_size >= tcp_data_len)
> +if (tcp_send_offset + large_send_mss >= tcp_data_len)
>  {
>  is_last_frame = 1;
>  chunk_size = tcp_data_len - tcp_send_offset;
> @@ -2247,7 +2245,7 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
>  ip->ip_len = cpu_to_be16(hlen + tcp_hlen + chunk_size);
>
>  /* increment IP id for subsequent frames */
> -ip->ip_id = cpu_to_be16(tcp_send_offset/tcp_chunk_size + 
> be16_to_cpu(ip->ip_id));
> +ip->ip_id = cpu_to_be16(tcp_send_offset/large_send_mss + 
> be16_to_cpu(ip->ip_id));
>
>  ip->ip_sum = 0;
>  ip->ip_sum = ip_checksum(eth_payload_data, hlen);
> --
> 2.38.1
>

[RFC PATCH 1/1] Dirty quota-based throttling of vcpus

2022-11-20 Thread Shivam Kumar

Introduces a (new) throttling scheme where QEMU defines a limit on the dirty
rate of each vcpu of the VM. This limit is enfored on the vcpus in small
intervals (dirty quota intervals) by allowing the vcpus to dirty only as many
pages in these intervals as to maintain a dirty rate below the set limit.

Suggested-by: Shaju Abraham 
Suggested-by: Manish Mishra 
Co-developed-by: Anurag Madnawat 
Signed-off-by: Anurag Madnawat 
Signed-off-by: Shivam Kumar 
---
 accel/kvm/kvm-all.c   | 91 +++
 include/exec/memory.h |  3 ++
 include/hw/core/cpu.h |  5 +++
 include/sysemu/kvm_int.h  |  1 +
 linux-headers/linux/kvm.h |  9 
 migration/migration.c | 22 ++
 migration/migration.h | 31 +
 softmmu/memory.c  | 64 +++
 8 files changed, 226 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..ea50605592 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -46,6 +46,8 @@
 #include "sysemu/hw_accel.h"
 #include "kvm-cpus.h"
 #include "sysemu/dirtylimit.h"
+#include "hw/core/cpu.h"
+#include "migration/migration.h"
 
 #include "hw/boards.h"
 #include "monitor/stats.h"
@@ -2463,6 +2465,8 @@ static int kvm_init(MachineState *ms)
 }
 }
 
+s->dirty_quota_supported = kvm_vm_check_extension(s, KVM_CAP_DIRTY_QUOTA);
+
 /*
  * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is not needed when dirty ring is
  * enabled.  More importantly, KVM_DIRTY_LOG_INITIALLY_SET will assume no
@@ -2808,6 +2812,88 @@ static void kvm_eat_signals(CPUState *cpu)
 } while (sigismember(&chkset, SIG_IPI));
 }
 
+static void handle_dirty_quota_sleep(int64_t sleep_time)
+{
+/* Do not throttle the vcpu more than the maximum throttle. */
+sleep_time = MIN(sleep_time,
+DIRTY_QUOTA_MAX_THROTTLE * DIRTY_QUOTA_INTERVAL_SIZE);
+/* Convert sleep time from nanoseconds to microseconds. */
+g_usleep(sleep_time / 1000);
+}
+
+static uint64_t handle_dirty_quota_exhausted(
+CPUState *cpu, const uint64_t count, const uint64_t quota)
+{
+MigrationState *s = migrate_get_current();
+uint64_t time_to_sleep;
+int64_t unclaimed_quota;
+int64_t dirty_quota_overflow = (count - quota);
+uint64_t dirty_rate_limit = qatomic_read(&s->per_vcpu_dirty_rate_limit);
+uint64_t new_quota = (dirty_rate_limit * DIRTY_QUOTA_INTERVAL_SIZE) /
+NANOSECONDS_PER_SECOND;
+uint64_t current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+
+/* Penalize the vCPU if it dirtied more pages than it was allowed to. */
+if (dirty_quota_overflow > 0) {
+time_to_sleep = (dirty_quota_overflow * NANOSECONDS_PER_SECOND) /
+dirty_rate_limit;
+cpu->dirty_quota_expiry_time = current_time + time_to_sleep;
+return time_to_sleep;
+}
+
+/*
+ * If the current dirty quota interval hasn't ended, try using common quota
+ * if it is available, else sleep.
+ */
+current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+if (current_time < cpu->dirty_quota_expiry_time) {
+qemu_spin_lock(&s->common_dirty_quota_lock);
+if (s->common_dirty_quota > 0) {
+s->common_dirty_quota -= new_quota;
+qemu_spin_unlock(&s->common_dirty_quota_lock);
+cpu->kvm_run->dirty_quota = count + new_quota;
+return 0;
+}
+
+qemu_spin_unlock(&s->common_dirty_quota_lock);
+current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+/* If common quota isn't available, sleep for the remaining interval. 
*/
+if (current_time < cpu->dirty_quota_expiry_time) {
+time_to_sleep = cpu->dirty_quota_expiry_time - current_time;
+return time_to_sleep;
+}
+}
+
+/*
+ * This is a fresh dirty quota interval. If the vcpu has not claimed its
+ * quota for the previous intervals, add them to the common quota.
+ */
+current_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+unclaimed_quota = (current_time - cpu->dirty_quota_expiry_time) *
+dirty_rate_limit;
+qemu_spin_lock(&s->common_dirty_quota_lock);
+s->common_dirty_quota += unclaimed_quota;
+qemu_spin_unlock(&s->common_dirty_quota_lock);
+
+/*  Allocate the vcpu this new interval's dirty quota. */
+cpu->kvm_run->dirty_quota = count + new_quota;
+cpu->dirty_quota_expiry_time = current_time + DIRTY_QUOTA_INTERVAL_SIZE;
+return 0;
+}
+
+
+static void handle_kvm_exit_dirty_quota_exhausted(CPUState *cpu,
+const uint64_t count, const uint64_t quota)
+{
+uint64_t time_to_sleep;
+do {
+time_to_sleep = handle_dirty_quota_exhausted(cpu, count, quota);
+if (time_to_sleep > 0) {
+handle_dirty_quota_sleep(time_to_sleep);
+}
+

[RFC PATCH 0/1] QEMU: Dirty quota-based throttling of vcpus

2022-11-20 Thread Shivam Kumar

This patchset is the QEMU-side implementation of a (new) dirty "quota"
based throttling algorithm that selectively throttles vCPUs based on their
individual contribution to overall memory dirtying and also dynamically
adapts the throttle based on the available network bandwidth.

Overview
--
--

To throttle memory dirtying, we propose to set a limit on the number of
pages a vCPU can dirty in given fixed microscopic size time intervals. This
limit depends on the network throughput calculated over the last few
intervals so as to throttle the vCPUs based on available network bandwidth.
We are referring to this limit as the "dirty quota" of a vCPU and
the fixed size intervals as the "dirty quota intervals". 

One possible approach to distributing the overall scope of dirtying for a
dirty quota interval is to equally distribute it among all the vCPUs. This
approach to the distribution doesn't make sense if the distribution of
workloads among vCPUs is skewed. So, to counter such skewed cases, we
propose that if any vCPU doesn't need its quota for any given dirty
quota interval, we add this quota to a common pool. This common pool (or
"common quota") can be consumed on a first come first serve basis
by all vCPUs in the upcoming dirty quota intervals.


Design
--
--

Userspace KVM

[At the start of dirty logging]
Initialize dirty quota to some
non-zero value for each vcpu.->   [When dirty logging starts]
  Start incrementing dirty count
  for every dirty by the vcpu.

  [Dirty count equals/exceeds
  dirty quota]
If the vcpu has already claimed  <-   Exit to userspace.
its quota for the current dirty   
quota interval:

1) If common quota is
available, give the vcpu
its quota from common pool.

2) Else sleep the vcpu until
the next interval starts.

Give the vcpu its share for the
current(fresh) dirty quota   ->  Continue dirtying with the newly
interval.received quota.  

[At the end of dirty logging] 
Set dirty quota back to zero
for every vcpu. ->   Throttling disabled.


References
--
--

KVM Forum Talk: https://www.youtube.com/watch?v=ZBkkJf78zFA
Kernel Patchset:
https://lore.kernel.org/all/20221113170507.208810-1-shivam.kum...@nutanix.com/


Note
--
--

We understand that there is a good scope of improvement in the current
implementation. Here is a list of things we are working on:
1) Adding dirty quota as a migration capability so that it can be toggled
through QMP command.
2) Adding support for throttling guest DMAs.
3) Not enabling dirty quota for the first migration iteration.
4) Falling back to current auto-converge based throttling in cases where dirty
quota throttling can overthrottle.

Please stay tuned for the next patchset.

Shivam Kumar (1):
  Dirty quota-based throttling of vcpus

 accel/kvm/kvm-all.c   | 91 +++
 include/exec/memory.h |  3 ++
 include/hw/core/cpu.h |  5 +++
 include/sysemu/kvm_int.h  |  1 +
 linux-headers/linux/kvm.h |  9 
 migration/migration.c | 22 ++
 migration/migration.h | 31 +
 softmmu/memory.c  | 64 +++
 8 files changed, 226 insertions(+)

-- 
2.22.3

Re: [PULL v4 30/83] virtio: core: vq reset feature negotation support

2022-11-20 Thread Jason Wang

On Sun, Nov 20, 2022 at 1:19 AM Michael S. Tsirkin  wrote:
>
> On Fri, Nov 18, 2022 at 03:32:56PM +0100, Stefano Garzarella wrote:
> > Hi,
> > starting from this commit 69e1c14aa2 ("virtio: core: vq reset feature
> > negotation support"), vhost-user-vsock and vhost-vsock fails while
> > setting the device features, because VIRTIO_F_RING_RESET is not masked.
> >
> > I'm not sure vsock is the only one affected.
> >
> > We could fix in two ways:
> >
> > 1) Masking VIRTIO_F_RING_RESET when we call vhost_get_features:
> >
> > diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
> > index 29b9ab4f72..e671cc695f 100644
> > --- a/hw/virtio/vhost-vsock-common.c
> > +++ b/hw/virtio/vhost-vsock-common.c
> > @@ -21,6 +21,7 @@
> >
> >  const int feature_bits[] = {
> >  VIRTIO_VSOCK_F_SEQPACKET,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
>
> Let's do this, we need to be conservative.

Ack.

Thanks

>
>
> > 2) Or using directly the features of the device. That way we also handle
> > other features that we may have already had to mask but never did.
> >
> > diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
> > index 29b9ab4f72..41a665082c 100644
> > --- a/hw/virtio/vhost-vsock-common.c
> > +++ b/hw/virtio/vhost-vsock-common.c
> > @@ -33,7 +33,7 @@ uint64_t vhost_vsock_common_get_features(VirtIODevice 
> > *vdev, uint64_t features,
> >  virtio_add_feature(&features, VIRTIO_VSOCK_F_SEQPACKET);
> >  }
> >
> > -features = vhost_get_features(&vvc->vhost_dev, feature_bits, features);
> > +features &= vvc->vhost_dev.features;
> >
> >  if (vvc->seqpacket == ON_OFF_AUTO_ON &&
> >  !virtio_has_feature(features, VIRTIO_VSOCK_F_SEQPACKET)) {
> >
> >
> > I may be missing the real reason for calling vhost_get_features(),
> > though.
> >
> > @Michael what do you recommend we do?
> >
> > Thanks,
> > Stefano
> >
> > On Tue, Nov 8, 2022 at 12:06 AM Michael S. Tsirkin  wrote:
> > >
> > > From: Kangjie Xu 
> > >
> > > A a new command line parameter "queue_reset" is added.
> > >
> > > Meanwhile, the vq reset feature is disabled for pre-7.2 machines.
> > >
> > > Signed-off-by: Kangjie Xu 
> > > Signed-off-by: Xuan Zhuo 
> > > Acked-by: Jason Wang 
> > > Message-Id: <20221017092558.111082-5-xuanz...@linux.alibaba.com>
> > > Reviewed-by: Michael S. Tsirkin 
> > > Signed-off-by: Michael S. Tsirkin 
> > > ---
> > >  include/hw/virtio/virtio.h | 4 +++-
> > >  hw/core/machine.c  | 4 +++-
> > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > index b00b3fcf31..1423dba379 100644
> > > --- a/include/hw/virtio/virtio.h
> > > +++ b/include/hw/virtio/virtio.h
> > > @@ -313,7 +313,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
> > >  DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> > >VIRTIO_F_IOMMU_PLATFORM, false), \
> > >  DEFINE_PROP_BIT64("packed", _state, _field, \
> > > -  VIRTIO_F_RING_PACKED, false)
> > > +  VIRTIO_F_RING_PACKED, false), \
> > > +DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> > > +  VIRTIO_F_RING_RESET, true)
> > >
> > >  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
> > >  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index aa520e74a8..907fa78ff0 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -40,7 +40,9 @@
> > >  #include "hw/virtio/virtio-pci.h"
> > >  #include "qom/object_interfaces.h"
> > >
> > > -GlobalProperty hw_compat_7_1[] = {};
> > > +GlobalProperty hw_compat_7_1[] = {
> > > +{ "virtio-device", "queue_reset", "false" },
> > > +};
> > >  const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
> > >
> > >  GlobalProperty hw_compat_7_0[] = {
> > > --
> > > MST
> > >
> > >
>

Re: [PATCH 0/2] qemu-ga-win: 'guest-get-fsinfo' command wont query storage devices of bus type USB

2022-11-20 Thread Marc-André Lureau

Hi

On Sun, Nov 20, 2022 at 6:09 PM Kfir Manor  wrote:
>
> guest-get-fsinfo won't query storage devices of bus-type USB 
> (https://bugzilla.redhat.com/show_bug.cgi?id=2090333).
>
> Bug, get_pci_info function returns an error after not finding any storage 
> port device info on the USB disk parent device (because of USB abstraction).
>
> Fix, skip getting PCI info (get_pci_info function) for USB disks (as USB disk 
> doesn't have PCI info), and return an empty PCI address instead to keep with 
> schema.
>
>
> Kfir Manor (2):
>   adding a empty PCI address creation function
>   skip getting pci info for USB disks
>
>  qga/commands-win32.c | 32 +---
>  1 file changed, 21 insertions(+), 11 deletions(-)
>
> --
> 2.38.1
>
>

Reviewed-by: Marc-André Lureau 



-- 
Marc-André Lureau

Re: [PATCH 0/3] kvm: fix two svm pmu virtualization bugs

2022-11-20 Thread Like Xu


On 19/11/2022 8:28 pm, Dongli Zhang wrote:

This patchset is to fix two svm pmu virtualization bugs.

1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.

To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
virtualization. There is still below at the VM linux side ...


Many QEMU vendor forks already have similar fixes, and
thanks for bringing this issue back to the mainline.



[0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

... although we expect something like below.

[0.596381] Performance Events: PMU not available due to virtualization, 
using software events only.
[0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE
if the per-vcpu "pmu" property is disabled.

I considered 'KVM_X86_SET_MSR_FILTER' initially.
Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
finally used the latter because it is easier to use.


2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
at the KVM side may inject random unwanted/unknown NMIs to the VM.

The svm pmu registers are not reset during QEMU system_reset.

(1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
is running "perf top". The pmu registers are not disabled gracefully.

(2). Although the x86_cpu_reset() resets many registers to zero, the
kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
some pmu events are still enabled at the KVM side.

(3). The KVM pmc_speculative_in_use() always returns true so that the events
will not be reclaimed. The kvm_pmc->perf_event is still active.


I'm not sure if you're saying KVM doing something wrong, I don't think so
because KVM doesn't sense the system_reset defined by QEME or other user space,
AMD's vPMC will continue to be enabled (if it was enabled before), generating 
pmi
injection into the guest, and the newly started guest doesn't realize the 
counter is still

enabled and blowing up the error log.



(4). After the reboot, the VM kernel reports below error:

[0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, 
complain to your hardware vendor.
[0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 
c0010200 is 530076)

(5). In a worse case, the active kvm_pmc->perf_event is still able to
inject unknown NMIs randomly to the VM kernel.

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

The patch 3 is to fix the issue by resetting AMD pmu registers as well as
Intel registers.


This fix idea looks good, it does require syncing the new changed device state 
of QEMU to KVM.





This patchset does cover does not cover PerfMonV2, until the below patchset
is merged into the KVM side.

[PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
https://lore.kernel.org/all/2022102645.82001-1-lik...@tencent.com/


Dongli Zhang (3):
   kvm: introduce a helper before creating the 1st vcpu
   i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
   target/i386/kvm: get and put AMD pmu registers

  accel/kvm/kvm-all.c|   7 ++-
  include/sysemu/kvm.h   |   2 +
  target/arm/kvm64.c |   4 ++
  target/i386/cpu.h  |   5 +++
  target/i386/kvm/kvm.c  | 104 +++-
  target/mips/kvm.c  |   4 ++
  target/ppc/kvm.c   |   4 ++
  target/riscv/kvm.c |   4 ++
  target/s390x/kvm/kvm.c |   4 ++
  9 files changed, 134 insertions(+), 4 deletions(-)

Thank you very much!

Dongli Zhang

Re: [PULL v4 30/83] virtio: core: vq reset feature negotation support

2022-11-20 Thread Michael S. Tsirkin

On Mon, Nov 21, 2022 at 02:17:02PM +0800, Jason Wang wrote:
> On Sun, Nov 20, 2022 at 1:19 AM Michael S. Tsirkin  wrote:
> >
> > On Fri, Nov 18, 2022 at 03:32:56PM +0100, Stefano Garzarella wrote:
> > > Hi,
> > > starting from this commit 69e1c14aa2 ("virtio: core: vq reset feature
> > > negotation support"), vhost-user-vsock and vhost-vsock fails while
> > > setting the device features, because VIRTIO_F_RING_RESET is not masked.
> > >
> > > I'm not sure vsock is the only one affected.
> > >
> > > We could fix in two ways:
> > >
> > > 1) Masking VIRTIO_F_RING_RESET when we call vhost_get_features:
> > >
> > > diff --git a/hw/virtio/vhost-vsock-common.c 
> > > b/hw/virtio/vhost-vsock-common.c
> > > index 29b9ab4f72..e671cc695f 100644
> > > --- a/hw/virtio/vhost-vsock-common.c
> > > +++ b/hw/virtio/vhost-vsock-common.c
> > > @@ -21,6 +21,7 @@
> > >
> > >  const int feature_bits[] = {
> > >  VIRTIO_VSOCK_F_SEQPACKET,
> > > +VIRTIO_F_RING_RESET,
> > >  VHOST_INVALID_FEATURE_BIT
> > >  };
> > >
> >
> > Let's do this, we need to be conservative.
> 
> Ack.
> 
> Thanks


Patch pls? Stefano?

> >
> >
> > > 2) Or using directly the features of the device. That way we also handle
> > > other features that we may have already had to mask but never did.
> > >
> > > diff --git a/hw/virtio/vhost-vsock-common.c 
> > > b/hw/virtio/vhost-vsock-common.c
> > > index 29b9ab4f72..41a665082c 100644
> > > --- a/hw/virtio/vhost-vsock-common.c
> > > +++ b/hw/virtio/vhost-vsock-common.c
> > > @@ -33,7 +33,7 @@ uint64_t vhost_vsock_common_get_features(VirtIODevice 
> > > *vdev, uint64_t features,
> > >  virtio_add_feature(&features, VIRTIO_VSOCK_F_SEQPACKET);
> > >  }
> > >
> > > -features = vhost_get_features(&vvc->vhost_dev, feature_bits, 
> > > features);
> > > +features &= vvc->vhost_dev.features;
> > >
> > >  if (vvc->seqpacket == ON_OFF_AUTO_ON &&
> > >  !virtio_has_feature(features, VIRTIO_VSOCK_F_SEQPACKET)) {
> > >
> > >
> > > I may be missing the real reason for calling vhost_get_features(),
> > > though.
> > >
> > > @Michael what do you recommend we do?
> > >
> > > Thanks,
> > > Stefano
> > >
> > > On Tue, Nov 8, 2022 at 12:06 AM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > From: Kangjie Xu 
> > > >
> > > > A a new command line parameter "queue_reset" is added.
> > > >
> > > > Meanwhile, the vq reset feature is disabled for pre-7.2 machines.
> > > >
> > > > Signed-off-by: Kangjie Xu 
> > > > Signed-off-by: Xuan Zhuo 
> > > > Acked-by: Jason Wang 
> > > > Message-Id: <20221017092558.111082-5-xuanz...@linux.alibaba.com>
> > > > Reviewed-by: Michael S. Tsirkin 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  include/hw/virtio/virtio.h | 4 +++-
> > > >  hw/core/machine.c  | 4 +++-
> > > >  2 files changed, 6 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > > > index b00b3fcf31..1423dba379 100644
> > > > --- a/include/hw/virtio/virtio.h
> > > > +++ b/include/hw/virtio/virtio.h
> > > > @@ -313,7 +313,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
> > > >  DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
> > > >VIRTIO_F_IOMMU_PLATFORM, false), \
> > > >  DEFINE_PROP_BIT64("packed", _state, _field, \
> > > > -  VIRTIO_F_RING_PACKED, false)
> > > > +  VIRTIO_F_RING_PACKED, false), \
> > > > +DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> > > > +  VIRTIO_F_RING_RESET, true)
> > > >
> > > >  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
> > > >  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> > > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > > index aa520e74a8..907fa78ff0 100644
> > > > --- a/hw/core/machine.c
> > > > +++ b/hw/core/machine.c
> > > > @@ -40,7 +40,9 @@
> > > >  #include "hw/virtio/virtio-pci.h"
> > > >  #include "qom/object_interfaces.h"
> > > >
> > > > -GlobalProperty hw_compat_7_1[] = {};
> > > > +GlobalProperty hw_compat_7_1[] = {
> > > > +{ "virtio-device", "queue_reset", "false" },
> > > > +};
> > > >  const size_t hw_compat_7_1_len = G_N_ELEMENTS(hw_compat_7_1);
> > > >
> > > >  GlobalProperty hw_compat_7_0[] = {
> > > > --
> > > > MST
> > > >
> > > >
> >

Re: [PATCH v5 2/9] target/riscv: add support for Zca extension

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 10:44 PM Weiwei Li  wrote:
>
> Modify the check for C extension to Zca (C implies Zca)
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_rvi.c.inc | 4 ++--
>  target/riscv/translate.c| 8 ++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index 5c69b88d1e..0d73b919ce 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -56,7 +56,7 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
>  tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
>
>  gen_set_pc(ctx, cpu_pc);
> -if (!has_ext(ctx, RVC)) {
> +if (!ctx->cfg_ptr->ext_zca) {
>  TCGv t0 = tcg_temp_new();
>
>  misaligned = gen_new_label();
> @@ -178,7 +178,7 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, 
> TCGCond cond)
>
>  gen_set_label(l); /* branch taken */
>
> -if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
> +if (!ctx->cfg_ptr->ext_zca && ((ctx->base.pc_next + a->imm) & 0x3)) {
>  /* misaligned */
>  gen_exception_inst_addr_mis(ctx);
>  } else {
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 2ab8772ebe..ee24b451e3 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -557,7 +557,7 @@ static void gen_jal(DisasContext *ctx, int rd, 
> target_ulong imm)
>
>  /* check misaligned: */
>  next_pc = ctx->base.pc_next + imm;
> -if (!has_ext(ctx, RVC)) {
> +if (!ctx->cfg_ptr->ext_zca) {
>  if ((next_pc & 0x3) != 0) {
>  gen_exception_inst_addr_mis(ctx);
>  return;
> @@ -1097,7 +1097,11 @@ static void decode_opc(CPURISCVState *env, 
> DisasContext *ctx, uint16_t opcode)
>  ctx->virt_inst_excp = false;
>  /* Check for compressed insn */
>  if (insn_len(opcode) == 2) {
> -if (!has_ext(ctx, RVC)) {
> +/*
> + * Zca support all of the existing C extension, excluding all
> + * compressed floating point loads and stores
> + */
> +if (!ctx->cfg_ptr->ext_zca) {
>  gen_exception_illegal(ctx);
>  } else {
>  ctx->opcode = opcode;
> --
> 2.25.1
>
>

Re: [PATCH v5 3/9] target/riscv: add support for Zcf extension

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 10:44 PM Weiwei Li  wrote:
>
> Separate c_flw/c_fsw from flw/fsw to add check for Zcf extension
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn16.decode  |  8 
>  target/riscv/insn_trans/trans_rvf.c.inc | 18 ++
>  2 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/insn16.decode b/target/riscv/insn16.decode
> index ccfe59f294..f3ea650325 100644
> --- a/target/riscv/insn16.decode
> +++ b/target/riscv/insn16.decode
> @@ -109,11 +109,11 @@ sw110  ... ... .. ... 00 @cs_w
>  # *** RV32C and RV64C specific Standard Extension (Quadrant 0) ***
>  {
>ld  011  ... ... .. ... 00 @cl_d
> -  flw 011  ... ... .. ... 00 @cl_w
> +  c_flw   011  ... ... .. ... 00 @cl_w
>  }
>  {
>sd  111  ... ... .. ... 00 @cs_d
> -  fsw 111  ... ... .. ... 00 @cs_w
> +  c_fsw   111  ... ... .. ... 00 @cs_w
>  }
>
>  # *** RV32/64C Standard Extension (Quadrant 1) ***
> @@ -174,9 +174,9 @@ sw110 .  .  . 10 @c_swsp
>  {
>c64_illegal 011 -  0  - 10 # c.ldsp, RES rd=0
>ld  011 .  .  . 10 @c_ldsp
> -  flw 011 .  .  . 10 @c_lwsp
> +  c_flw   011 .  .  . 10 @c_lwsp
>  }
>  {
>sd  111 .  .  . 10 @c_sdsp
> -  fsw 111 .  .  . 10 @c_swsp
> +  c_fsw   111 .  .  . 10 @c_swsp
>  }
> diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
> b/target/riscv/insn_trans/trans_rvf.c.inc
> index 93657680c6..426518957b 100644
> --- a/target/riscv/insn_trans/trans_rvf.c.inc
> +++ b/target/riscv/insn_trans/trans_rvf.c.inc
> @@ -24,6 +24,12 @@
>  return false; \
>  } while (0)
>
> +#define REQUIRE_ZCF(ctx) do {  \
> +if (!ctx->cfg_ptr->ext_zcf) {  \
> +return false;  \
> +}  \
> +} while (0)
> +
>  #ifndef CONFIG_USER_ONLY
>  static inline bool smstateen_fcsr_check(DisasContext *ctx, int index)
>  {
> @@ -96,6 +102,18 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
>  return true;
>  }
>
> +static bool trans_c_flw(DisasContext *ctx, arg_flw *a)
> +{
> +REQUIRE_ZCF(ctx);
> +return trans_flw(ctx, a);
> +}
> +
> +static bool trans_c_fsw(DisasContext *ctx, arg_fsw *a)
> +{
> +REQUIRE_ZCF(ctx);
> +return trans_fsw(ctx, a);
> +}
> +
>  static bool trans_fmadd_s(DisasContext *ctx, arg_fmadd_s *a)
>  {
>  REQUIRE_FPU;
> --
> 2.25.1
>
>

Re: [PATCH v3 01/17] migration: Remove res_compatible parameter

2022-11-20 Thread Avihai Horon


Ping.

On 10/11/2022 15:36, Avihai Horon wrote:


On 08/11/2022 19:52, Vladimir Sementsov-Ogievskiy wrote:

External email: Use caution opening links or attachments


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

It was only used for RAM, and in that case, it means that this amount
of data was sent for memory.


Not clear for me, what means "this amount of data was sent for 
memory"... That amount of data was not yet sent, actually.



Yes, this should be changed to something like:

"It was only used for RAM, and in that case, it means that this amount
of data still needs to be sent for memory, and can be sent in any phase
of migration. The same functionality can be achieved without 
res_compatible,
so just delete the field in all callers and change the definition of 
res_postcopy accordingly.".

Just delete the field in all callers.

Signed-off-by: Juan Quintela 
---
  hw/s390x/s390-stattrib.c   |  6 ++
  hw/vfio/migration.c    | 10 --
  hw/vfio/trace-events   |  2 +-
  include/migration/register.h   | 20 ++--
  migration/block-dirty-bitmap.c |  7 +++
  migration/block.c  |  7 +++
  migration/migration.c  |  9 -
  migration/ram.c    |  8 +++-
  migration/savevm.c | 14 +-
  migration/savevm.h |  4 +---
  migration/trace-events |  2 +-
  11 files changed, 37 insertions(+), 52 deletions(-)



[..]

diff --git a/include/migration/register.h 
b/include/migration/register.h

index c1dcff0f90..1950fee6a8 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -48,18 +48,18 @@ typedef struct SaveVMHandlers {
  int (*save_setup)(QEMUFile *f, void *opaque);
  void (*save_live_pending)(QEMUFile *f, void *opaque,
    uint64_t threshold_size,
-  uint64_t *res_precopy_only,
-  uint64_t *res_compatible,
-  uint64_t *res_postcopy_only);
+  uint64_t *rest_precopy,
+  uint64_t *rest_postcopy);
  /* Note for save_live_pending:
- * - res_precopy_only is for data which must be migrated in 
precopy phase
- * or in stopped state, in other words - before target vm 
start

- * - res_compatible is for data which may be migrated in any phase
- * - res_postcopy_only is for data which must be migrated in 
postcopy phase

- * or in stopped state, in other words - after source vm stop
+ * - res_precopy is for data which must be migrated in precopy
+ * phase or in stopped state, in other words - before target
+ * vm start
+ * - res_postcopy is for data which must be migrated in postcopy
+ * phase or in stopped state, in other words - after source vm
+ * stop
   *
- * Sum of res_postcopy_only, res_compatible and 
res_postcopy_only is the

- * whole amount of pending data.
+ * Sum of res_precopy and res_postcopy is the whole amount of
+ * pending data.
   */




[..]


diff --git a/migration/ram.c b/migration/ram.c
index dc1de9ddbc..20167e1102 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3435,9 +3435,7 @@ static int ram_save_complete(QEMUFile *f, void 
*opaque)

  }

  static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t 
max_size,

- uint64_t *res_precopy_only,
- uint64_t *res_compatible,
- uint64_t *res_postcopy_only)
+ uint64_t *res_precopy, uint64_t 
*res_postcopy)

  {
  RAMState **temp = opaque;
  RAMState *rs = *temp;
@@ -3457,9 +3455,9 @@ static void ram_save_pending(QEMUFile *f, void 
*opaque, uint64_t max_size,


  if (migrate_postcopy_ram()) {
  /* We can do postcopy, and all the data is postcopiable */
-    *res_compatible += remaining_size;
+    *res_postcopy += remaining_size;


That's seems to be not quite correct.

res_postcopy is defined as "data which must be migrated in postcopy", 
but that's not true here, as RAM can be migrated both in precopy and 
postcopy.


Still we really can include "compat" into "postcopy" just because in 
the logic of migration_iteration_run() we don't actually distinguish 
"compat" and "post". The logic only depends on "total" and "pre".


So, if we want to combine "compat" into "post", we should redefine 
"post" in the comment in include/migration/register.h, something like 
this:


- res_precopy is for data which MUST be migrated in precopy
  phase or in stopped state, in other words - before target
  vm start

- res_postcopy is for all data except for declared in res_precopy.
  res_postcopy data CAN be migrated in postcopy, i.e. after target
  vm start.



You are right, the definition of res_postcopy should be changed.

Yet, I am not sure if this patch really makes things more clear/

Re: [PATCH v5 4/9] target/riscv: add support for Zcd extension

2022-11-20 Thread Alistair Francis

On Fri, Nov 18, 2022 at 10:40 PM Weiwei Li  wrote:
>
> Separate c_fld/c_fsd from fld/fsd to add additional check for
> c.fld{sp}/c.fsd{sp} which is useful for zcmp/zcmt to reuse
> their encodings
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn16.decode  |  8 
>  target/riscv/insn_trans/trans_rvd.c.inc | 18 ++
>  2 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/insn16.decode b/target/riscv/insn16.decode
> index f3ea650325..b62664b6af 100644
> --- a/target/riscv/insn16.decode
> +++ b/target/riscv/insn16.decode
> @@ -97,12 +97,12 @@
>  }
>  {
>lq  001  ... ... .. ... 00 @cl_q
> -  fld 001  ... ... .. ... 00 @cl_d
> +  c_fld   001  ... ... .. ... 00 @cl_d
>  }
>  lw010  ... ... .. ... 00 @cl_w
>  {
>sq  101  ... ... .. ... 00 @cs_q
> -  fsd 101  ... ... .. ... 00 @cs_d
> +  c_fsd   101  ... ... .. ... 00 @cs_d
>  }
>  sw110  ... ... .. ... 00 @cs_w
>
> @@ -148,7 +148,7 @@ addw  100 1 11 ... 01 ... 01 @cs_2
>  slli  000 .  .  . 10 @c_shift2
>  {
>lq  001  ... ... .. ... 10 @c_lqsp
> -  fld 001 .  .  . 10 @c_ldsp
> +  c_fld   001 .  .  . 10 @c_ldsp
>  }
>  {
>illegal 010 -  0  - 10 # c.lwsp, RES rd=0
> @@ -166,7 +166,7 @@ slli  000 .  .  . 10 @c_shift2
>  }
>  {
>sq  101  ... ... .. ... 10 @c_sqsp
> -  fsd 101   ..  . 10 @c_sdsp
> +  c_fsd   101   ..  . 10 @c_sdsp
>  }
>  sw110 .  .  . 10 @c_swsp
>
> diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
> b/target/riscv/insn_trans/trans_rvd.c.inc
> index 1397c1ce1c..def0d7abfe 100644
> --- a/target/riscv/insn_trans/trans_rvd.c.inc
> +++ b/target/riscv/insn_trans/trans_rvd.c.inc
> @@ -31,6 +31,12 @@
>  } \
>  } while (0)
>
> +#define REQUIRE_ZCD(ctx) do { \
> +if (!ctx->cfg_ptr->ext_zcd) {  \
> +return false; \
> +} \
> +} while (0)
> +
>  static bool trans_fld(DisasContext *ctx, arg_fld *a)
>  {
>  TCGv addr;
> @@ -57,6 +63,18 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
>  return true;
>  }
>
> +static bool trans_c_fld(DisasContext *ctx, arg_fld *a)
> +{
> +REQUIRE_ZCD(ctx);
> +return trans_fld(ctx, a);
> +}
> +
> +static bool trans_c_fsd(DisasContext *ctx, arg_fsd *a)
> +{
> +REQUIRE_ZCD(ctx);
> +return trans_fsd(ctx, a);
> +}
> +
>  static bool trans_fmadd_d(DisasContext *ctx, arg_fmadd_d *a)
>  {
>  REQUIRE_FPU;
> --
> 2.25.1
>
>

Re: [PATCH v3 03/17] migration: Block migration comment or code is wrong

2022-11-20 Thread Avihai Horon


Ping.

On 10/11/2022 15:38, Avihai Horon wrote:


On 08/11/2022 20:36, Vladimir Sementsov-Ogievskiy wrote:

External email: Use caution opening links or attachments


On 11/3/22 19:16, Avihai Horon wrote:

From: Juan Quintela 

And it appears that what is wrong is the code. During bulk stage we
need to make sure that some block is dirty, but no games with
max_size at all.


:) That made me interested in, why we need this one block, so I 
decided to search through the history.


And what I see? Haha, that was my commit 04636dc410b163c 
"migration/block: fix pending() return value" [1], which you actually 
revert with this patch.


So, at least we should note, that it's a revert of [1].

Still that this will reintroduce the bug fixed by [1].

As I understand the problem is (was) that in block_save_complete() we 
finalize only dirty blocks, but don't finalize the bulk phase if it's 
not finalized yet. So, we can fix block_save_complete() to finalize 
the bulk phase, instead of hacking with pending in [1].


Interesting, why we need this one block, described in the comment you 
refer to? Was it an incomplete workaround for the same problem, 
described in [1]? If so, we can fix block_save_complete() and remove 
this if() together with the comment from block_save_pending().



I am not familiar with block migration.
I can drop this patch in next version.

Juan/Stefan, could you help here?



Signed-off-by: Juan Quintela 
Reviewed-by: Stefan Hajnoczi 
---
  migration/block.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index b3d680af75..39ce4003c6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -879,8 +879,8 @@ static void block_save_pending(void *opaque, 
uint64_t max_size,

  blk_mig_unlock();

  /* Report at least one block pending during bulk phase */
-    if (pending <= max_size && !block_mig_state.bulk_completed) {
-    pending = max_size + BLK_MIG_BLOCK_SIZE;
+    if (!pending && !block_mig_state.bulk_completed) {
+    pending = BLK_MIG_BLOCK_SIZE;
  }

  trace_migration_block_save_pending(pending);


--
Best regards,
Vladimir

Re: [PULL v4 46/83] acpi: pc/q35: drop ad-hoc PCI-ISA bridge AML routines and let bus ennumeration generate AML

2022-11-20 Thread Igor Mammedov

On Sat, 19 Nov 2022 12:22:13 -0500
"Michael S. Tsirkin"  wrote:

> On Fri, Nov 18, 2022 at 03:55:17PM +0100, Igor Mammedov wrote:
> > On Fri, 18 Nov 2022 14:08:36 +0100
> > Igor Mammedov  wrote:
> >   
> > > On Thu, 17 Nov 2022 22:51:46 +0100
> > > Volker Rümelin  wrote:  
> > [...]  
> > > > since this patch SeaBIOS no longer detects the PS/2 keyboard. This 
> > > > means 
> > > > there's no keyboard in SeaBIOS, GRUB or FreeDOS. OVMF and Linux detect 
> > > > the PS/2 keyboard without issues.
> > > > 
> > > > Here are a few lines from the SeaBIOS debug log.
> > > > 
> > > > table(50434146)=0x007e1971 (via rsdt)
> > > > ACPI: parse DSDT at 0x007e0040 (len 6449)
> > > > parse_termlist: parse error, skip from 92/465
> > > > Scan for VGA option rom
> > > > Running option rom at c000:0003
> > > > Start SeaVGABIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> > > > 
> > > > and later
> > > > 
> > > > SeaBIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> > > > ACPI: no PS/2 keyboard present
> > it was a bug on SeaBIOS side, we need it to parse Alias term in AML
> > instead of choking on it
> > 
> > proposed patch:
> >  
> > https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/RGPL7HESH5U5JRLEO6FP77CZVHZK5J65/
> > 
> > PS:
> > it's probably too late for it to make into 7.2  
> 
> Right. So revert?

let me check first what happens with migration case,
and maybe I can come up with a temporary hack to avoid aliases on QEMU side,
probably it will be something ugly but should do the job

>

Re: [PATCH v2 5/5] target/riscv: Ensure opcode is saved for all relevant instructions

2022-11-20 Thread Alistair Francis

On Tue, Nov 8, 2022 at 11:09 PM Anup Patel  wrote:
>
> We should call decode_save_opc() for all relevant instructions which
> can potentially generate a virtual instruction fault or a guest page
> fault because generating transformed instruction upon guest page fault
> expects opcode to be available. Without this, hypervisor will see
> transformed instruction as zero in htinst CSR for guest MMIO emulation
> which makes MMIO emulation in hypervisor slow and also breaks nested
> virtualization.
>
> Fixes: a9814e3e08d2 ("target/riscv: Minimize the calls to decode_save_opc")
> Signed-off-by: Anup Patel 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/insn_trans/trans_rva.c.inc | 10 +++---
>  target/riscv/insn_trans/trans_rvd.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvf.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvh.c.inc |  3 +++
>  target/riscv/insn_trans/trans_rvi.c.inc |  2 ++
>  target/riscv/insn_trans/trans_rvzfh.c.inc   |  2 ++
>  target/riscv/insn_trans/trans_svinval.c.inc |  3 +++
>  7 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
> b/target/riscv/insn_trans/trans_rva.c.inc
> index 45db82c9be..5f194a447b 100644
> --- a/target/riscv/insn_trans/trans_rva.c.inc
> +++ b/target/riscv/insn_trans/trans_rva.c.inc
> @@ -20,8 +20,10 @@
>
>  static bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop)
>  {
> -TCGv src1 = get_address(ctx, a->rs1, 0);
> +TCGv src1;
>
> +decode_save_opc(ctx);
> +src1 = get_address(ctx, a->rs1, 0);
>  if (a->rl) {
>  tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
>  }
> @@ -43,6 +45,7 @@ static bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp 
> mop)
>  TCGLabel *l1 = gen_new_label();
>  TCGLabel *l2 = gen_new_label();
>
> +decode_save_opc(ctx);
>  src1 = get_address(ctx, a->rs1, 0);
>  tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1);
>
> @@ -81,9 +84,10 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a,
>  MemOp mop)
>  {
>  TCGv dest = dest_gpr(ctx, a->rd);
> -TCGv src1 = get_address(ctx, a->rs1, 0);
> -TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
> +TCGv src1, src2 = get_gpr(ctx, a->rs2, EXT_NONE);
>
> +decode_save_opc(ctx);
> +src1 = get_address(ctx, a->rs1, 0);
>  func(dest, src1, src2, ctx->mem_idx, mop);
>
>  gen_set_gpr(ctx, a->rd, dest);
> diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
> b/target/riscv/insn_trans/trans_rvd.c.inc
> index 1397c1ce1c..6e3159b797 100644
> --- a/target/riscv/insn_trans/trans_rvd.c.inc
> +++ b/target/riscv/insn_trans/trans_rvd.c.inc
> @@ -38,6 +38,7 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
>  REQUIRE_FPU;
>  REQUIRE_EXT(ctx, RVD);
>
> +decode_save_opc(ctx);
>  addr = get_address(ctx, a->rs1, a->imm);
>  tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], addr, ctx->mem_idx, MO_TEUQ);
>
> @@ -52,6 +53,7 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
>  REQUIRE_FPU;
>  REQUIRE_EXT(ctx, RVD);
>
> +decode_save_opc(ctx);
>  addr = get_address(ctx, a->rs1, a->imm);
>  tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEUQ);
>  return true;
> diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
> b/target/riscv/insn_trans/trans_rvf.c.inc
> index a1d3eb52ad..965e1f8d11 100644
> --- a/target/riscv/insn_trans/trans_rvf.c.inc
> +++ b/target/riscv/insn_trans/trans_rvf.c.inc
> @@ -38,6 +38,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
>  REQUIRE_FPU;
>  REQUIRE_EXT(ctx, RVF);
>
> +decode_save_opc(ctx);
>  addr = get_address(ctx, a->rs1, a->imm);
>  dest = cpu_fpr[a->rd];
>  tcg_gen_qemu_ld_i64(dest, addr, ctx->mem_idx, MO_TEUL);
> @@ -54,6 +55,7 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
>  REQUIRE_FPU;
>  REQUIRE_EXT(ctx, RVF);
>
> +decode_save_opc(ctx);
>  addr = get_address(ctx, a->rs1, a->imm);
>  tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEUL);
>  return true;
> diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
> b/target/riscv/insn_trans/trans_rvh.c.inc
> index 4f8aecddc7..9248b48c36 100644
> --- a/target/riscv/insn_trans/trans_rvh.c.inc
> +++ b/target/riscv/insn_trans/trans_rvh.c.inc
> @@ -36,6 +36,7 @@ static bool do_hlv(DisasContext *ctx, arg_r2 *a, MemOp mop)
>  #ifdef CONFIG_USER_ONLY
>  return false;
>  #else
> +decode_save_opc(ctx);
>  if (check_access(ctx)) {
>  TCGv dest = dest_gpr(ctx, a->rd);
>  TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
> @@ -82,6 +83,7 @@ static bool do_hsv(DisasContext *ctx, arg_r2_s *a, MemOp 
> mop)
>  #ifdef CONFIG_USER_ONLY
>  return false;
>  #else
> +decode_save_opc(ctx);
>  if (check_access(ctx)) {
>  TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
>  TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
> @@ -135,6 +137,7 @@ static bool trans_hsv_d(DisasContext *ctx, arg_hsv_d *a)
>  static bool

Re: [PULL v4 46/83] acpi: pc/q35: drop ad-hoc PCI-ISA bridge AML routines and let bus ennumeration generate AML

2022-11-20 Thread Igor Mammedov

On Sat, 19 Nov 2022 09:49:39 +0100
Volker Rümelin  wrote:

> Am 18.11.22 um 15:55 schrieb Igor Mammedov:
> > On Fri, 18 Nov 2022 14:08:36 +0100
> > Igor Mammedov  wrote:
> >  
> >> On Thu, 17 Nov 2022 22:51:46 +0100
> >> Volker Rümelin  wrote:  
> > [...]  
> >>> since this patch SeaBIOS no longer detects the PS/2 keyboard. This means
> >>> there's no keyboard in SeaBIOS, GRUB or FreeDOS. OVMF and Linux detect
> >>> the PS/2 keyboard without issues.
> >>>
> >>> Here are a few lines from the SeaBIOS debug log.
> >>>
> >>> table(50434146)=0x007e1971 (via rsdt)
> >>> ACPI: parse DSDT at 0x007e0040 (len 6449)
> >>> parse_termlist: parse error, skip from 92/465
> >>> Scan for VGA option rom
> >>> Running option rom at c000:0003
> >>> Start SeaVGABIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> >>>
> >>> and later
> >>>
> >>> SeaBIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> >>> ACPI: no PS/2 keyboard present  
> > it was a bug on SeaBIOS side, we need it to parse Alias term in AML
> > instead of choking on it
> >
> > proposed patch:
> >   
> > https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/RGPL7HESH5U5JRLEO6FP77CZVHZK5J65/
> >
> > PS:
> > it's probably too late for it to make into 7.2
> >  
> 
> The proposed patch works.
> 
> It may still be an option to revert the commit 47a373faa6 (acpi: pc/q35: 
> drop ad-hoc PCI-ISA bridge AML routines and let bus ennumeration 
> generate AML). If an older QEMU version is migrated to QEMU 7.2.0 and 
> later and the guest reboots afterwards, it may end up without a working 
> keyboard because the migrated SeaBIOS is an older version.

ACPI blobs generated on old QEMU should be migrated as well,
so I'd expect it should be fine.
Problem will manifest itself only after VM was shut down and started anew.

Anyways lets see if a QEMU workaround is possible.

> With best regards,
> Volker
>

[PULL 0/1] chardev patch for 7.2

2022-11-20 Thread marcandre . lureau

From: Marc-André Lureau 

The following changes since commit a082fab9d259473a9d5d53307cf83b1223301181:

  Merge tag 'pull-ppc-20221117' of https://gitlab.com/danielhb/qemu into 
staging (2022-11-17 12:39:38 -0500)

are available in the Git repository at:

  https://gitlab.com/marcandre.lureau/qemu.git tags/chr-pull-request

for you to fetch changes up to 06639f8ff53d1dbfa709377499e6c30eca9c3c9a:

  chardev/char-win-stdio: Pass Ctrl+C to guest with a multiplexed monitor 
(2022-11-21 11:30:11 +0400)


chardev fix on win32



Bin Meng (1):
  chardev/char-win-stdio: Pass Ctrl+C to guest with a multiplexed
monitor

 chardev/char-win-stdio.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.38.1

[PULL 1/1] chardev/char-win-stdio: Pass Ctrl+C to guest with a multiplexed monitor

2022-11-20 Thread marcandre . lureau

From: Bin Meng 

At present when pressing Ctrl+C from a guest running on QEMU Windows
with a multiplexed monitor, e.g.: -serial mon:stdio, QEMU executable
just exits. This behavior is inconsistent with the Linux version.

Such behavior is caused by unconditionally setting the input mode
ENABLE_PROCESSED_INPUT for a console's input buffer. Fix this by
testing whether the chardev is allowed to do so.

Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 
Message-Id: <20221025141015.612291-1-bin.m...@windriver.com>
---
 chardev/char-win-stdio.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/chardev/char-win-stdio.c b/chardev/char-win-stdio.c
index a4771ab82e..eb830eabd9 100644
--- a/chardev/char-win-stdio.c
+++ b/chardev/char-win-stdio.c
@@ -146,6 +146,8 @@ static void qemu_chr_open_stdio(Chardev *chr,
 bool *be_opened,
 Error **errp)
 {
+ChardevStdio *opts = backend->u.stdio.data;
+bool stdio_allow_signal = !opts->has_signal || opts->signal;
 WinStdioChardev *stdio = WIN_STDIO_CHARDEV(chr);
 DWORD  dwMode;
 intis_console = 0;
@@ -193,7 +195,11 @@ static void qemu_chr_open_stdio(Chardev *chr,
 if (is_console) {
 /* set the terminal in raw mode */
 /* ENABLE_QUICK_EDIT_MODE | ENABLE_EXTENDED_FLAGS */
-dwMode |= ENABLE_PROCESSED_INPUT;
+if (stdio_allow_signal) {
+dwMode |= ENABLE_PROCESSED_INPUT;
+} else {
+dwMode &= ~ENABLE_PROCESSED_INPUT;
+}
 }
 
 SetConsoleMode(stdio->hStdIn, dwMode);
-- 
2.38.1

Re: [PATCH 0/3] kvm: fix two svm pmu virtualization bugs

2022-11-20 Thread Dongli Zhang

Hi Like,

On 11/20/22 22:42, Like Xu wrote:
> On 19/11/2022 8:28 pm, Dongli Zhang wrote:
>> This patchset is to fix two svm pmu virtualization bugs.
>>
>> 1. The 1st bug is that "-cpu,-pmu" cannot disable svm pmu virtualization.
>>
>> To use "-cpu EPYC" or "-cpu host,-pmu" cannot disable the pmu
>> virtualization. There is still below at the VM linux side ...
> 
> Many QEMU vendor forks already have similar fixes, and
> thanks for bringing this issue back to the mainline.

Would you mind helping point to if there used to be any prior patchset for
mainline to resolve the issue?

> 
>>
>> [    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
>>
>> ... although we expect something like below.
>>
>> [    0.596381] Performance Events: PMU not available due to virtualization,
>> using software events only.
>> [    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>>
>> The patch 1-2 is to disable the pmu virtualization via KVM_PMU_CAP_DISABLE
>> if the per-vcpu "pmu" property is disabled.
>>
>> I considered 'KVM_X86_SET_MSR_FILTER' initially.
>> Since both KVM_X86_SET_MSR_FILTER and KVM_PMU_CAP_DISABLE are VM ioctl. I
>> finally used the latter because it is easier to use.
>>
>>
>> 2. The 2nd bug is that un-reclaimed perf events (after QEMU system_reset)
>> at the KVM side may inject random unwanted/unknown NMIs to the VM.
>>
>> The svm pmu registers are not reset during QEMU system_reset.
>>
>> (1). The VM resets (e.g., via QEMU system_reset or VM kdump/kexec) while it
>> is running "perf top". The pmu registers are not disabled gracefully.
>>
>> (2). Although the x86_cpu_reset() resets many registers to zero, the
>> kvm_put_msrs() does not puts AMD pmu registers to KVM side. As a result,
>> some pmu events are still enabled at the KVM side.
>>
>> (3). The KVM pmc_speculative_in_use() always returns true so that the events
>> will not be reclaimed. The kvm_pmc->perf_event is still active.
> 
> I'm not sure if you're saying KVM doing something wrong, I don't think so
> because KVM doesn't sense the system_reset defined by QEME or other user 
> space,
> AMD's vPMC will continue to be enabled (if it was enabled before), generating 
> pmi
> injection into the guest, and the newly started guest doesn't realize the
> counter is still
> enabled and blowing up the error log.

I were not saying KVM was buggy.

I was trying to explain how the issue impacts KVM side and VM side.

> 
>>
>> (4). After the reboot, the VM kernel reports below error:
>>
>> [    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS 
>> detected,
>> complain to your hardware vendor.
>> [    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR
>> c0010200 is 530076)
>>
>> (5). In a worse case, the active kvm_pmc->perf_event is still able to
>> inject unknown NMIs randomly to the VM kernel.
>>
>> [...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
>>
>> The patch 3 is to fix the issue by resetting AMD pmu registers as well as
>> Intel registers.
> 
> This fix idea looks good, it does require syncing the new changed device state
> of QEMU to KVM.

Thank you very much!

Dongli Zhang

> 
>>
>>
>> This patchset does cover does not cover PerfMonV2, until the below patchset
>> is merged into the KVM side.
>>
>> [PATCH v3 0/8] KVM: x86: Add AMD Guest PerfMonV2 PMU support
>> https://lore.kernel.org/all/2022102645.82001-1-lik...@tencent.com/
>>
>>
>> Dongli Zhang (3):
>>    kvm: introduce a helper before creating the 1st vcpu
>>    i386: kvm: disable KVM_CAP_PMU_CAPABILITY if "pmu" is disabled
>>    target/i386/kvm: get and put AMD pmu registers
>>
>>   accel/kvm/kvm-all.c    |   7 ++-
>>   include/sysemu/kvm.h   |   2 +
>>   target/arm/kvm64.c |   4 ++
>>   target/i386/cpu.h  |   5 +++
>>   target/i386/kvm/kvm.c  | 104 +++-
>>   target/mips/kvm.c  |   4 ++
>>   target/ppc/kvm.c   |   4 ++
>>   target/riscv/kvm.c |   4 ++
>>   target/s390x/kvm/kvm.c |   4 ++
>>   9 files changed, 134 insertions(+), 4 deletions(-)
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>
>>

Re: [PATCH 1/2] remove DEC 21154 PCI bridge

2022-11-20 Thread Igor Mammedov

On Sun, 20 Nov 2022 22:08:54 +
Mark Cave-Ayland  wrote:

> On 16/11/2022 19:39, BALATON Zoltan wrote:
> 
> > On Wed, 16 Nov 2022, Igor Mammedov wrote:
> >   
> >> Code has not been used practically since its inception (2004)
> >>  f2aa58c6f4a20 UniNorth PCI bridge support
> >> or maybe even earlier, but it was consuming contributors time
> >> as QEMU was being rewritten.
> >> Drop it for now. Whomever would like to actually
> >> use the thing, can make sure it actually works/reintroduce
> >> it back when there is a user.
> >>
> >> PS:
> >> I've stumbled upon this when replacing PCIDeviceClass::is_bridge
> >> field with QOM cast to PCI_BRIDGE type. Unused DEC 21154
> >> was the only one trying to use the field with plain PCIDevice.
> >> It's not worth keeping the field around for the sake of the code
> >> that was commented out 'forever'.
> >>
> >> Signed-off-by: Igor Mammedov 
> >> ---
> >> hw/pci-bridge/dec.h   |   9 ---
> >> include/hw/pci/pci_ids.h  |   1 -
> >> hw/pci-bridge/dec.c   | 164 --
> >> hw/pci-bridge/meson.build |   2 -
> >> hw/pci-host/uninorth.c    |   6 --
> >> 5 files changed, 182 deletions(-)
> >> delete mode 100644 hw/pci-bridge/dec.h
> >> delete mode 100644 hw/pci-bridge/dec.c
> >>
> >> diff --git a/hw/pci-bridge/dec.h b/hw/pci-bridge/dec.h
> >> deleted file mode 100644
> >> index 869e90b136..00
> >> --- a/hw/pci-bridge/dec.h
> >> +++ /dev/null
> >> @@ -1,9 +0,0 @@
> >> -#ifndef HW_PCI_BRIDGE_DEC_H
> >> -#define HW_PCI_BRIDGE_DEC_H
> >> -
> >> -
> >> -#define TYPE_DEC_21154 "dec-21154-sysbus"
> >> -
> >> -PCIBus *pci_dec_21154_init(PCIBus *parent_bus, int devfn);
> >> -
> >> -#endif
> >> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> >> index bc9f834fd1..e4386ebb20 100644
> >> --- a/include/hw/pci/pci_ids.h
> >> +++ b/include/hw/pci/pci_ids.h
> >> @@ -169,7 +169,6 @@
> >>
> >> #define PCI_VENDOR_ID_DEC    0x1011
> >> #define PCI_DEVICE_ID_DEC_21143  0x0019
> >> -#define PCI_DEVICE_ID_DEC_21154  0x0026
> >>
> >> #define PCI_VENDOR_ID_CIRRUS 0x1013
> >>
> >> diff --git a/hw/pci-bridge/dec.c b/hw/pci-bridge/dec.c
> >> deleted file mode 100644
> >> index 4773d07e6d..00
> >> --- a/hw/pci-bridge/dec.c
> >> +++ /dev/null
> >> @@ -1,164 +0,0 @@
> >> -/*
> >> - * QEMU DEC 21154 PCI bridge
> >> - *
> >> - * Copyright (c) 2006-2007 Fabrice Bellard
> >> - * Copyright (c) 2007 Jocelyn Mayer
> >> - *
> >> - * Permission is hereby granted, free of charge, to any person obtaining 
> >> a copy
> >> - * of this software and associated documentation files (the "Software"), 
> >> to deal
> >> - * in the Software without restriction, including without limitation the 
> >> rights
> >> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> >> sell
> >> - * copies of the Software, and to permit persons to whom the Software is
> >> - * furnished to do so, subject to the following conditions:
> >> - *
> >> - * The above copyright notice and this permission notice shall be 
> >> included in
> >> - * all copies or substantial portions of the Software.
> >> - *
> >> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> >> EXPRESS OR
> >> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
> >> MERCHANTABILITY,
> >> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> >> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> >> OTHER
> >> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> >> ARISING FROM,
> >> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
> >> IN
> >> - * THE SOFTWARE.
> >> - */
> >> -
> >> -#include "qemu/osdep.h"
> >> -#include "dec.h"
> >> -#include "hw/sysbus.h"
> >> -#include "qapi/error.h"
> >> -#include "qemu/module.h"
> >> -#include "hw/pci/pci.h"
> >> -#include "hw/pci/pci_host.h"
> >> -#include "hw/pci/pci_bridge.h"
> >> -#include "hw/pci/pci_bus.h"
> >> -#include "qom/object.h"
> >> -
> >> -OBJECT_DECLARE_SIMPLE_TYPE(DECState, DEC_21154)
> >> -
> >> -struct DECState {
> >> -    PCIHostState parent_obj;
> >> -};
> >> -
> >> -static int dec_map_irq(PCIDevice *pci_dev, int irq_num)
> >> -{
> >> -    return irq_num;
> >> -}
> >> -
> >> -static void dec_pci_bridge_realize(PCIDevice *pci_dev, Error **errp)
> >> -{
> >> -    pci_bridge_initfn(pci_dev, TYPE_PCI_BUS);
> >> -}
> >> -
> >> -static void dec_21154_pci_bridge_class_init(ObjectClass *klass, void 
> >> *data)
> >> -{
> >> -    DeviceClass *dc = DEVICE_CLASS(klass);
> >> -    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >> -
> >> -    set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> >> -    k->realize = dec_pci_bridge_realize;
> >> -    k->exit = pci_bridge_exitfn;
> >> -    k->vendor_id = PCI_VENDOR_ID_DEC;
> >> -    k->device_id = PCI_DEVICE_ID_DEC_21154;
> >> -    k->config_write = pci_bridge_write_config;
> >> -    k->is_bridge = true;
> >> -    dc->desc = "DEC 21154

Re: [PULL v4 46/83] acpi: pc/q35: drop ad-hoc PCI-ISA bridge AML routines and let bus ennumeration generate AML

2022-11-20 Thread Michael S. Tsirkin

On Mon, Nov 21, 2022 at 08:23:15AM +0100, Igor Mammedov wrote:
> On Sat, 19 Nov 2022 12:22:13 -0500
> "Michael S. Tsirkin"  wrote:
> 
> > On Fri, Nov 18, 2022 at 03:55:17PM +0100, Igor Mammedov wrote:
> > > On Fri, 18 Nov 2022 14:08:36 +0100
> > > Igor Mammedov  wrote:
> > >   
> > > > On Thu, 17 Nov 2022 22:51:46 +0100
> > > > Volker Rümelin  wrote:  
> > > [...]  
> > > > > since this patch SeaBIOS no longer detects the PS/2 keyboard. This 
> > > > > means 
> > > > > there's no keyboard in SeaBIOS, GRUB or FreeDOS. OVMF and Linux 
> > > > > detect 
> > > > > the PS/2 keyboard without issues.
> > > > > 
> > > > > Here are a few lines from the SeaBIOS debug log.
> > > > > 
> > > > > table(50434146)=0x007e1971 (via rsdt)
> > > > > ACPI: parse DSDT at 0x007e0040 (len 6449)
> > > > > parse_termlist: parse error, skip from 92/465
> > > > > Scan for VGA option rom
> > > > > Running option rom at c000:0003
> > > > > Start SeaVGABIOS (version 
> > > > > rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> > > > > 
> > > > > and later
> > > > > 
> > > > > SeaBIOS (version rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org)
> > > > > ACPI: no PS/2 keyboard present
> > > it was a bug on SeaBIOS side, we need it to parse Alias term in AML
> > > instead of choking on it
> > > 
> > > proposed patch:
> > >  
> > > https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/thread/RGPL7HESH5U5JRLEO6FP77CZVHZK5J65/
> > > 
> > > PS:
> > > it's probably too late for it to make into 7.2  
> > 
> > Right. So revert?
> 
> let me check first what happens with migration case,
> and maybe I can come up with a temporary hack to avoid aliases on QEMU side,
> probably it will be something ugly but should do the job
> 
> > 

Given the timing I'd prefer the revert. But if you insist let's see how
small that turns out to be.

-- 
MST

47 matches

Mail list logo