date:20220802

Re: [PATCH v3 1/1] monitor: Support specified vCPU registers

2022-08-02 Thread Markus Armbruster

zhenwei pi  writes:

> Originally we have to get all the vCPU registers and parse the
> specified one. To improve the performance of this usage, allow user
> specified vCPU id to query registers.
>
> Run a VM with 16 vCPU, use bcc tool to track the latency of
> 'hmp_info_registers':
> 'info registers -a' uses about 3ms;
> 'info registers 12' uses about 150us.
>
> Cc: Darren Kenny 
> Signed-off-by: zhenwei pi 
> ---
>  hmp-commands-info.hx |  8 +---
>  monitor/misc.c   | 10 --
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index 188d9ece3b..dee072ac37 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -100,9 +100,11 @@ ERST
>  
>  {
>  .name   = "registers",
> -.args_type  = "cpustate_all:-a",
> -.params = "[-a]",
> -.help   = "show the cpu registers (-a: all - show register info 
> for all cpus)",
> +.args_type  = "cpustate_all:-a,vcpu:i?",
> +.params = "[-a|vcpu]",
> +.help   = "show the cpu registers (-a: all - show register info 
> for all cpus;"

Suggest to drop "all - ".

> +  " vcpu: specific vCPU to query; show the current CPU's 
> registers if"
> +  " no argument is specified)",
>  .cmd= hmp_info_registers,
>  },
>  
> diff --git a/monitor/misc.c b/monitor/misc.c
> index 3d2312ba8d..74f7c4ea36 100644
> --- a/monitor/misc.c
> +++ b/monitor/misc.c
> @@ -307,6 +307,7 @@ int monitor_get_cpu_index(Monitor *mon)
>  static void hmp_info_registers(Monitor *mon, const QDict *qdict)
>  {
>  bool all_cpus = qdict_get_try_bool(qdict, "cpustate_all", false);
> +int vcpu = qdict_get_try_int(qdict, "vcpu", -1);
>  CPUState *cs;
>  
>  if (all_cpus) {
> @@ -315,13 +316,18 @@ static void hmp_info_registers(Monitor *mon, const 
> QDict *qdict)
>  cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
>  }
>  } else {
> -cs = mon_get_cpu(mon);
> +cs = vcpu >= 0 ? qemu_get_cpu(vcpu) : mon_get_cpu(mon);
>  
>  if (!cs) {
> -monitor_printf(mon, "No CPU available\n");
> +if (vcpu >= 0) {
> +monitor_printf(mon, "\nCPU#%d not available\n", vcpu);

Please drop the initial '\n'.

> +} else {
> +monitor_printf(mon, "No CPU available\n");
> +}
>  return;
>  }
>  
> +monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
>  cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
>  }
>  }

With the error message tweaked:
Reviewed-by: Markus Armbruster

[PATCH v7 04/12] multifd: Count the number of bytes sent correctly

2022-08-02 Thread Juan Quintela

Current code asumes that all pages are whole.  That is not true for
example for compression already.  Fix it for creating a new field
->sent_bytes that includes it.

All ram_counters are used only from the migration thread, so we have
two options:
- put a mutex and fill everything when we sent it (not only
ram_counters, also qemu_file->xfer_bytes).
- Create a local variable that implements how much has been sent
through each channel.  And when we push another packet, we "add" the
previous stats.

I choose two due to less changes overall.  On the previous code we
increase transferred and then we sent.  Current code goes the other
way around.  It sents the data, and after the fact, it updates the
counters.  Notice that each channel can have a maximum of half a
megabyte of data without counting, so it is not very important.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h |  2 ++
 migration/multifd.c | 14 ++
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index e2802a9ce2..36f899c56f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -102,6 +102,8 @@ typedef struct {
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
+/* How many bytes have we sent on the last packet */
+uint64_t sent_bytes;
 /* thread has work to do */
 int pending_job;
 /* array of pages to sent.
diff --git a/migration/multifd.c b/migration/multifd.c
index aa3808a6f4..e25b529235 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -394,7 +394,6 @@ static int multifd_send_pages(QEMUFile *f)
 static int next_channel;
 MultiFDSendParams *p = NULL; /* make happy gcc */
 MultiFDPages_t *pages = multifd_send_state->pages;
-uint64_t transferred;
 
 if (qatomic_read(&multifd_send_state->exiting)) {
 return -1;
@@ -429,10 +428,10 @@ static int multifd_send_pages(QEMUFile *f)
 p->packet_num = multifd_send_state->packet_num++;
 multifd_send_state->pages = p->pages;
 p->pages = pages;
-transferred = ((uint64_t) pages->num) * p->page_size + p->packet_len;
-qemu_file_acct_rate_limit(f, transferred);
-ram_counters.multifd_bytes += transferred;
-ram_counters.transferred += transferred;
+ram_transferred_add(p->sent_bytes);
+ram_counters.multifd_bytes += p->sent_bytes;
+qemu_file_acct_rate_limit(f, p->sent_bytes);
+p->sent_bytes = 0;
 qemu_mutex_unlock(&p->mutex);
 qemu_sem_post(&p->sem);
 
@@ -605,9 +604,6 @@ int multifd_send_sync_main(QEMUFile *f)
 p->packet_num = multifd_send_state->packet_num++;
 p->flags |= MULTIFD_FLAG_SYNC;
 p->pending_job++;
-qemu_file_acct_rate_limit(f, p->packet_len);
-ram_counters.multifd_bytes += p->packet_len;
-ram_counters.transferred += p->packet_len;
 qemu_mutex_unlock(&p->mutex);
 qemu_sem_post(&p->sem);
 
@@ -714,6 +710,8 @@ static void *multifd_send_thread(void *opaque)
 }
 
 qemu_mutex_lock(&p->mutex);
+p->sent_bytes += p->packet_len;;
+p->sent_bytes += p->next_packet_size;
 p->pending_job--;
 qemu_mutex_unlock(&p->mutex);
 
-- 
2.37.1

[PATCH v7 11/12] multifd: Zero pages transmission

2022-08-02 Thread Juan Quintela

This implements the zero page dection and handling.

Signed-off-by: Juan Quintela 

---

Add comment for offset (dave)
Use local variables for offset/block to have shorter lines
---
 migration/multifd.h |  5 +
 migration/multifd.c | 41 +++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index a1b852200d..5931de6f86 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -52,6 +52,11 @@ typedef struct {
 uint32_t unused32[1];/* Reserved for future use */
 uint64_t unused64[3];/* Reserved for future use */
 char ramblock[256];
+/*
+ * This array contains the pointers to:
+ *  - normal pages (initial normal_pages entries)
+ *  - zero pages (following zero_pages entries)
+ */
 uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 4473d9f834..89811619d8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "sysemu/sysemu.h"
@@ -275,6 +276,12 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
 packet->offset[i] = cpu_to_be64(temp);
 }
+for (i = 0; i < p->zero_num; i++) {
+/* there are architectures where ram_addr_t is 32 bit */
+uint64_t temp = p->zero[i];
+
+packet->offset[p->normal_num + i] = cpu_to_be64(temp);
+}
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -358,6 +365,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 p->normal[i] = offset;
 }
 
+for (i = 0; i < p->zero_num; i++) {
+uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
+
+if (offset > (block->used_length - p->page_size)) {
+error_setg(errp, "multifd: offset too long %" PRIu64
+   " (max " RAM_ADDR_FMT ")",
+   offset, block->used_length);
+return -1;
+}
+p->zero[i] = offset;
+}
+
 return 0;
 }
 
@@ -648,6 +667,8 @@ static void *multifd_send_thread(void *opaque)
 {
 MultiFDSendParams *p = opaque;
 Error *local_err = NULL;
+/* qemu older than 7.0 don't understand zero page on multifd channel */
+bool use_zero_page = migrate_use_multifd_zero_page();
 int ret = 0;
 bool use_zero_copy_send = migrate_use_zero_copy_send();
 
@@ -670,6 +691,7 @@ static void *multifd_send_thread(void *opaque)
 qemu_mutex_lock(&p->mutex);
 
 if (p->pending_job) {
+RAMBlock *rb = p->pages->block;
 uint64_t packet_num = p->packet_num;
 p->flags = 0;
 if (p->sync_needed) {
@@ -688,8 +710,16 @@ static void *multifd_send_thread(void *opaque)
 }
 
 for (int i = 0; i < p->pages->num; i++) {
-p->normal[p->normal_num] = p->pages->offset[i];
-p->normal_num++;
+uint64_t offset = p->pages->offset[i];
+if (use_zero_page &&
+buffer_is_zero(rb->host + offset, p->page_size)) {
+p->zero[p->zero_num] = offset;
+p->zero_num++;
+ram_release_page(rb->idstr, offset);
+} else {
+p->normal[p->normal_num] = offset;
+p->normal_num++;
+}
 }
 
 if (p->normal_num) {
@@ -1152,6 +1182,13 @@ static void *multifd_recv_thread(void *opaque)
 }
 }
 
+for (int i = 0; i < p->zero_num; i++) {
+void *page = p->host + p->zero[i];
+if (!buffer_is_zero(page, p->page_size)) {
+memset(page, 0, p->page_size);
+}
+}
+
 if (sync_needed) {
 qemu_sem_post(&multifd_recv_state->sem_sync);
 qemu_sem_wait(&p->sem_sync);
-- 
2.37.1

[PATCH v7 12/12] So we use multifd to transmit zero pages.

2022-08-02 Thread Juan Quintela

Signed-off-by: Juan Quintela 

---

- Check zero_page property before using new code (Dave)
---
 migration/migration.c |  4 +---
 migration/multifd.c   |  6 +++---
 migration/ram.c   | 33 -
 3 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index ce3e5cc0cd..13842f6803 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2599,9 +2599,7 @@ bool migrate_use_main_zero_page(void)
 
 s = migrate_get_current();
 
-// We will enable this when we add the right code.
-// return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
-return true;
+return s->enabled_capabilities[MIGRATION_CAPABILITY_MAIN_ZERO_PAGE];
 }
 
 bool migrate_pause_before_switchover(void)
diff --git a/migration/multifd.c b/migration/multifd.c
index 89811619d8..54acdc004c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -667,8 +667,8 @@ static void *multifd_send_thread(void *opaque)
 {
 MultiFDSendParams *p = opaque;
 Error *local_err = NULL;
-/* qemu older than 7.0 don't understand zero page on multifd channel */
-bool use_zero_page = migrate_use_multifd_zero_page();
+/* older qemu don't understand zero page on multifd channel */
+bool use_multifd_zero_page = !migrate_use_main_zero_page();
 int ret = 0;
 bool use_zero_copy_send = migrate_use_zero_copy_send();
 
@@ -711,7 +711,7 @@ static void *multifd_send_thread(void *opaque)
 
 for (int i = 0; i < p->pages->num; i++) {
 uint64_t offset = p->pages->offset[i];
-if (use_zero_page &&
+if (use_multifd_zero_page &&
 buffer_is_zero(rb->host + offset, p->page_size)) {
 p->zero[p->zero_num] = offset;
 p->zero_num++;
diff --git a/migration/ram.c b/migration/ram.c
index 2af70f517a..26e60b9cc1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2428,6 +2428,32 @@ static void postcopy_preempt_reset_channel(RAMState *rs)
 }
 }
 
+/**
+ * ram_save_target_page_multifd: save one target page
+ *
+ * Returns the number of pages written
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+RAMBlock *block = pss->block;
+ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+int res;
+
+if (!migration_in_postcopy()) {
+return ram_save_multifd_page(rs, block, offset);
+}
+
+res = save_zero_page(rs, block, offset);
+if (res > 0) {
+return res;
+}
+
+return ram_save_page(rs, pss);
+}
+
 /**
  * ram_save_host_page: save a whole host page
  *
@@ -3225,7 +3251,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
 ram_control_before_iterate(f, RAM_CONTROL_SETUP);
 ram_control_after_iterate(f, RAM_CONTROL_SETUP);
 
-(*rsp)->ram_save_target_page = ram_save_target_page_legacy;
+if (migrate_use_multifd() && !migrate_use_main_zero_page()) {
+(*rsp)->ram_save_target_page = ram_save_target_page_multifd;
+} else {
+(*rsp)->ram_save_target_page = ram_save_target_page_legacy;
+}
+
 ret =  multifd_send_sync_main(f);
 if (ret < 0) {
 return ret;
-- 
2.37.1

Re: [PATCH v5 2/3] job: introduce dump guest memory job

2022-08-02 Thread Markus Armbruster

Hogan Wang  writes:

> There's no way to cancel the current executing dump process, lead to the
> virtual machine manager daemon((e.g. libvirtd) cannot restore the dump
> job after daemon restart.
>
> Introduce dump guest memory job type, and add an optional 'job-id'
> argument for dump-guest-memory QMP to make use of jobs framework.
>
> Signed-off-by: Hogan Wang 
> ---
>  dump/dump-hmp-cmds.c |  7 ---
>  dump/dump.c  |  1 +
>  qapi/dump.json   | 11 +--
>  qapi/job.json|  5 -
>  4 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/dump/dump-hmp-cmds.c b/dump/dump-hmp-cmds.c
> index e5053b04cd..a77f31fd15 100644
> --- a/dump/dump-hmp-cmds.c
> +++ b/dump/dump-hmp-cmds.c
> @@ -21,6 +21,7 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict)
>  bool lzo = qdict_get_try_bool(qdict, "lzo", false);
>  bool snappy = qdict_get_try_bool(qdict, "snappy", false);
>  const char *file = qdict_get_str(qdict, "filename");
> +const char *job_id = qdict_get_str(qdict, "job-id");
>  bool has_begin = qdict_haskey(qdict, "begin");
>  bool has_length = qdict_haskey(qdict, "length");
>  bool has_detach = qdict_haskey(qdict, "detach");
> @@ -63,9 +64,9 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict)
>  }
>  
>  prot = g_strconcat("file:", file, NULL);
> -
> -qmp_dump_guest_memory(paging, prot, true, detach, has_begin, begin,
> -  has_length, length, true, dump_format, &err);
> +qmp_dump_guest_memory(paging, prot, !!job_id, job_id, true,
> +  detach, has_begin, begin, has_length,
> +  length, true, dump_format, &err);
>  hmp_handle_error(mon, err);
>  g_free(prot);
>  }
> diff --git a/dump/dump.c b/dump/dump.c
> index a57c580b12..cec9be30b4 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -1895,6 +1895,7 @@ DumpQueryResult *qmp_query_dump(Error **errp)
>  }
>  
>  void qmp_dump_guest_memory(bool paging, const char *file,
> +   bool has_job_id, const char *job_id,
> bool has_detach, bool detach,
> bool has_begin, int64_t begin, bool has_length,
> int64_t length, bool has_format,
> diff --git a/qapi/dump.json b/qapi/dump.json
> index 90859c5483..ca3bd720c6 100644
> --- a/qapi/dump.json
> +++ b/qapi/dump.json
> @@ -59,9 +59,15 @@
>  #2. fd: the protocol starts with "fd:", and the following string
>  #   is the fd's name.
>  #
> +# @job-id: identifier for the newly-created memory dump job. If you want to
> +#  compatible with legacy dumping process, @job-id should omitted.
> +#  If @job-id specified, gain the ability to monitor and control dump
> +#  task with query-job, job-cancel, etc.(Since 7.2).

Space before "(Since 7.2)", please.



> +#
>  # @detach: if true, QMP will return immediately rather than
>  #  waiting for the dump to finish. The user can track progress
> -#  using "query-dump". (since 2.6).
> +#  using "query-dump". (since 2.6). If @job-id specified, @detach
> +#  argument value will be ignored (Since 7.2).

I think @detach should be rejected then.

>  #
>  # @begin: if specified, the starting physical address.
>  #
> @@ -88,7 +94,8 @@
>  #
>  ##
>  { 'command': 'dump-guest-memory',
> -  'data': { 'paging': 'bool', 'protocol': 'str', '*detach': 'bool',
> +  'data': { 'paging': 'bool', 'protocol': 'str',
> +'*job-id': 'str', '*detach': 'bool',
>  '*begin': 'int', '*length': 'int',
>  '*format': 'DumpGuestMemoryFormat'} }
>  
> diff --git a/qapi/job.json b/qapi/job.json
> index d5f84e9615..e14d2290a5 100644
> --- a/qapi/job.json
> +++ b/qapi/job.json
> @@ -28,11 +28,14 @@
>  #
>  # @snapshot-delete: snapshot delete job type, see "snapshot-delete" (since 
> 6.0)
>  #
> +# @dump-guest-memory: dump guest memory job type, see "dump-guest-memory" 
> (since 7.2)
> +#
>  # Since: 1.7
>  ##
>  { 'enum': 'JobType',
>'data': ['commit', 'stream', 'mirror', 'backup', 'create', 'amend',
> -   'snapshot-load', 'snapshot-save', 'snapshot-delete'] }
> +   'snapshot-load', 'snapshot-save', 'snapshot-delete',
> +   'dump-guest-memory'] }
>  
>  ##
>  # @JobStatus:

I'd like to hear Kevin's opinion on the alternatives I sketched
yesterday in reply to his question regarding v1.

[PATCH v4 0/1] monitor: Support specified vCPU registers

2022-08-02 Thread zhenwei pi

v3 -> v4: 
- Tweake a few document and output '\n' prefix.

v2 -> v3: 
- Add more document in help info.
- Use 'qemu_get_cpu()' to simplify code.

v1 -> v2: 
- Typo fix in commit message.
- Suggested by Darren, use '[-a|vcpu]' instead of '[-a] [vcpu]',
  becase only one of these may be specified at a time.

v1:
- Support specified vCPU registers for monitor command.

Zhenwei Pi (1):
  monitor: Support specified vCPU registers

 hmp-commands-info.hx |  8 +---
 monitor/misc.c   | 10 --
 2 files changed, 13 insertions(+), 5 deletions(-)

-- 
2.20.1

[PATCH v4 1/1] monitor: Support specified vCPU registers

2022-08-02 Thread zhenwei pi

Originally we have to get all the vCPU registers and parse the
specified one. To improve the performance of this usage, allow user
specified vCPU id to query registers.

Run a VM with 16 vCPU, use bcc tool to track the latency of
'hmp_info_registers':
'info registers -a' uses about 3ms;
'info registers 12' uses about 150us.

Cc: Darren Kenny 
Reviewed-by: Markus Armbruster 
Signed-off-by: zhenwei pi 
---
 hmp-commands-info.hx |  8 +---
 monitor/misc.c   | 10 --
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 188d9ece3b..e012035541 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -100,9 +100,11 @@ ERST
 
 {
 .name   = "registers",
-.args_type  = "cpustate_all:-a",
-.params = "[-a]",
-.help   = "show the cpu registers (-a: all - show register info 
for all cpus)",
+.args_type  = "cpustate_all:-a,vcpu:i?",
+.params = "[-a|vcpu]",
+.help   = "show the cpu registers (-a: show register info for all 
cpus;"
+  " vcpu: specific vCPU to query; show the current CPU's 
registers if"
+  " no argument is specified)",
 .cmd= hmp_info_registers,
 },
 
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..6436a8786b 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -307,6 +307,7 @@ int monitor_get_cpu_index(Monitor *mon)
 static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 {
 bool all_cpus = qdict_get_try_bool(qdict, "cpustate_all", false);
+int vcpu = qdict_get_try_int(qdict, "vcpu", -1);
 CPUState *cs;
 
 if (all_cpus) {
@@ -315,13 +316,18 @@ static void hmp_info_registers(Monitor *mon, const QDict 
*qdict)
 cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
 }
 } else {
-cs = mon_get_cpu(mon);
+cs = vcpu >= 0 ? qemu_get_cpu(vcpu) : mon_get_cpu(mon);
 
 if (!cs) {
-monitor_printf(mon, "No CPU available\n");
+if (vcpu >= 0) {
+monitor_printf(mon, "CPU#%d not available\n", vcpu);
+} else {
+monitor_printf(mon, "No CPU available\n");
+}
 return;
 }
 
+monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
 cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
 }
 }
-- 
2.20.1

[PATCH v1 01/40] * HACK * linux-headers: Update headers to pull in TDX API changes

2022-08-02 Thread Xiaoyao Li

Pull in recent TDX updates, which are not backwards compatible.

It's just to make this series runnable. It will be updated by script

scripts/update-linux-headers.sh

once TDX support is upstreamed in linux kernel.

Signed-off-by: Xiaoyao Li 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 linux-headers/asm-x86/kvm.h | 95 +
 linux-headers/linux/kvm.h   |  2 +
 2 files changed, 97 insertions(+)

diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
index bf6e96011dfe..a5433cc71f79 100644
--- a/linux-headers/asm-x86/kvm.h
+++ b/linux-headers/asm-x86/kvm.h
@@ -525,4 +525,99 @@ struct kvm_pmu_event_filter {
 #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
 #define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
 
+#define KVM_X86_DEFAULT_VM 0
+#define KVM_X86_TDX_VM 1
+
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+   KVM_TDX_CAPABILITIES = 0,
+   KVM_TDX_INIT_VM,
+   KVM_TDX_INIT_VCPU,
+   KVM_TDX_INIT_MEM_REGION,
+   KVM_TDX_FINALIZE_VM,
+
+   KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+   /* enum kvm_tdx_cmd_id */
+   __u32 id;
+   /* flags for sub-commend. If sub-command doesn't use this, set zero. */
+   __u32 flags;
+   /*
+* data for each sub-command. An immediate or a pointer to the actual
+* data in process virtual address.  If sub-command doesn't use it,
+* set zero.
+*/
+   __u64 data;
+   /*
+* Auxiliary error code.  The sub-command may return TDX SEAMCALL
+* status code in addition to -Exxx.
+* Defined for consistency with struct kvm_sev_cmd.
+*/
+   __u64 error;
+   /* Reserved: Defined for consistency with struct kvm_sev_cmd. */
+   __u64 unused;
+};
+
+struct kvm_tdx_cpuid_config {
+   __u32 leaf;
+   __u32 sub_leaf;
+   __u32 eax;
+   __u32 ebx;
+   __u32 ecx;
+   __u32 edx;
+};
+
+struct kvm_tdx_capabilities {
+   __u64 attrs_fixed0;
+   __u64 attrs_fixed1;
+   __u64 xfam_fixed0;
+   __u64 xfam_fixed1;
+
+   __u32 nr_cpuid_configs;
+   __u32 padding;
+   struct kvm_tdx_cpuid_config cpuid_configs[0];
+};
+
+struct kvm_tdx_init_vm {
+   __u64 attributes;
+   __u32 max_vcpus;
+   __u32 padding;
+   __u64 mrconfigid[6];/* sha384 digest */
+   __u64 mrowner[6];   /* sha384 digest */
+   __u64 mrownerconfig[6]; /* sha348 digest */
+   union {
+   /*
+* KVM_TDX_INIT_VM is called before vcpu creation, thus before
+* KVM_SET_CPUID2.  CPUID configurations needs to be passed.
+*
+* This configuration supersedes KVM_SET_CPUID{,2}.
+* The user space VMM, e.g. qemu, should make them consistent
+* with this values.
+* sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES(256)
+* = 8KB.
+*/
+   struct {
+   struct kvm_cpuid2 cpuid;
+   /* 8KB with KVM_MAX_CPUID_ENTRIES. */
+   struct kvm_cpuid_entry2 entries[];
+   };
+   /*
+* For future extensibility.
+* The size(struct kvm_tdx_init_vm) = 16KB.
+* This should be enough given sizeof(TD_PARAMS) = 1024
+*/
+   __u64 reserved[2028];
+   };
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION  (1UL << 0)
+
+struct kvm_tdx_init_mem_region {
+   __u64 source_addr;
+   __u64 gpa;
+   __u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index f089349149a5..054cf89fa2d6 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1151,6 +1151,8 @@ struct kvm_ppc_resize_hpt {
 /* #define KVM_CAP_VM_TSC_CONTROL 214 */
 #define KVM_CAP_SYSTEM_EVENT_DATA 215
 
+#define KVM_CAP_VM_TYPES 216
+
 #ifdef KVM_CAP_IRQ_ROUTING
 
 struct kvm_irq_routing_irqchip {
-- 
2.27.0

[PATCH v1 00/40] TDX QEMU support

2022-08-02 Thread Xiaoyao Li

This is the first version that removes RFC tag since last RFC gots
several acked-by. Hope more people and reviewers can help review it.


This patch series aims to enable TDX support to allow creating and booting a
TD (TDX VM) with QEMU. It needs to work with corresponding KVM patch [1].
TDX related documents can be found in [2].

this series is also available in github:

https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v1

To boot a TDX VM, it requires several changes/additional steps in the flow:

 1. specify the vm type KVM_X86_TDX_VM when creating VM with
IOCTL(KVM_CREATE_VM);
 2. initialize VM scope configuration before creating any VCPU;
 3. initialize VCPU scope configuration;
 4. initialize virtual firmware (TDVF) in guest private memory before
vcpu running;

Besides, TDX VM needs to boot with TDVF (TDX virtual firmware) and currently
upstream OVMF can serve as TDVF. This series adds the support of parsing TDVF,
loading TDVF into guest's private memory and preparing TD HOB info for TDVF.

[1] KVM TDX basic feature support v7
https://lore.kernel.org/all/cover.1656366337.git.isaku.yamah...@intel.com/

[2] 
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html

== Limitation and future work ==
- Readonly memslot

  TDX only support readonly (write protection) memslot for shared memory, but
  not for private memory. For simplicity, just mark readonly memslot not
  supported entirely for TDX. 

- CPU model

  We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
  because only a subset of features can be configured for TD.
  
  - It's recommended to use '-cpu host' to create TD;
  - '+feature/-feature' might not work as expected;

  future work: To introduce specific CPU model for TDs and enhance +/-features
   for TDs.

- gdb suppport

  gdb support to debug a TD of off-debug mode is future work.

== Patch organization ==
1   Manually fetch Linux UAPI changes for TDX;
2-19,29-30  Basic TDX support that parses vm-type and invoke TDX
specific IOCTLs
20-28   Load, parse and initialize TDVF for TDX VM;
31-35   Disable unsupported functions for TDX VM;
36-39   Avoid errors due to KVM's requirement on TDX;
40  Add documentation of TDX;

== Change history ==
Changes from RFC v4:
[RFC v4] 
https://lore.kernel.org/qemu-devel/20220512031803.3315890-1-xiaoyao...@intel.com/

- Add 3 more patches(9, 10, 11) to improve the tdx_get_supported_cpuid();
- make attributes of object tdx-guest not settable by user;
- improve get_tdx_capabilities() by using a known starting value and
  limiting the loop with a known size;
- clarify why isa.bios needs to be skipped;
- remove the MMIO hob setup since OVMF sets them up itself;

Changes from RFC v3:
[RFC v3] 
https://lore.kernel.org/qemu-devel/20220317135913.2166202-1-xiaoyao...@intel.com/

- Load TDVF with -bios interface;
- Adapt to KVM API changes;
- KVM_TDX_CAPABILITIES changes back to KVM-scope;
- struct kvm_tdx_init_vm changes;
- Define TDX_SUPPORTED_KVM_FEATURES;
- Drop the patch of introducing property sept-ve-disable since it's not
  public yet;
- some misc cleanups


Changes from RFC v2:
[RFC v2] 
https://lore.kernel.org/qemu-devel/cover.1625704980.git.isaku.yamah...@intel.com/

- Get vm-type from confidential-guest-support object type;
- Drop machine_init_done_late_notifiers;
- Refactor tdx_ioctl implementation;
- re-use existing pflash interface to load TDVF (i.e., OVMF binaries);
- introduce new date structure to track memory type instead of changing
  e820 table;
- Force smm to off for TDX VM;
- Drop the patches that suppress level-trigger/SMI/INIT/SIPI since KVM
  will ingore them;
- Add documentation;


Changes from RFC v1:
[RFC v1] 
https://lore.kernel.org/qemu-devel/cover.1613188118.git.isaku.yamah...@intel.com/

- suppress level trigger/SMI/INIT/SIPI related to IOAPIC.
- add VM attribute sha384 to TD measurement.
- guest TSC Hz specification


Isaku Yamahata (4):
  i386/tdvf: Introduce function to parse TDVF metadata
  i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION
  hw/i386: add option to forcibly report edge trigger in acpi tables
  i386/tdx: Don't synchronize guest tsc for TDs

Sean Christopherson (2):
  i386/kvm: Move architectural CPUID leaf generation to separate helper
  i386/tdx: Don't get/put guest state for TDX VMs

Xiaoyao Li (34):
  *** HACK *** linux-headers: Update headers to pull in TDX API changes
  i386: Introduce tdx-guest object
  target/i386: Implement mc->kvm_type() to get VM type
  target/i386: Introduce kvm_confidential_guest_init()
  i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context
  i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES
  i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object
  i386/tdx: Adjust the supported CPUID based on TDX restrictions
  i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[]
  i386/tdx: Integrate tdx_caps-

[PATCH v1 03/40] target/i386: Implement mc->kvm_type() to get VM type

2022-08-02 Thread Xiaoyao Li

TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM). Hence implement mc->kvm_type() for i386
architecture.

If tdx-guest object is specified to confidential-guest-support, like,

  qemu -machine ...,confidential-guest-support=tdx0 \
   -object tdx-guest,id=tdx0,...

it parses VM type as KVM_X86_TDX_VM. Otherwise, it's KVM_X86_DEFAULT_VM.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 hw/i386/x86.c  |  6 ++
 target/i386/kvm/kvm.c  | 30 ++
 target/i386/kvm/kvm_i386.h |  1 +
 3 files changed, 37 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 050eedc0c8e2..a15fadeb0e68 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1379,6 +1379,11 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, 
const char *name,
 qapi_free_SgxEPCList(list);
 }
 
+static int x86_kvm_type(MachineState *ms, const char *vm_type)
+{
+return kvm_get_vm_type(ms, vm_type);
+}
+
 static void x86_machine_initfn(Object *obj)
 {
 X86MachineState *x86ms = X86_MACHINE(obj);
@@ -1403,6 +1408,7 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
 mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+mc->kvm_type = x86_kvm_type;
 x86mc->save_tsc_khz = true;
 x86mc->fwcfg_dma_enabled = true;
 nc->nmi_monitor_handler = x86_nmi;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f148a6d52fa4..33e0d2948f77 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -30,6 +30,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "tdx.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -143,6 +144,35 @@ static struct kvm_msr_list *kvm_feature_msrs;
 static RateLimit bus_lock_ratelimit_ctrl;
 static int kvm_get_one_msr(X86CPU *cpu, int index, uint64_t *value);
 
+static const char* vm_type_name[] = {
+[KVM_X86_DEFAULT_VM] = "X86_DEFAULT_VM",
+[KVM_X86_TDX_VM] = "X86_TDX_VM",
+};
+
+int kvm_get_vm_type(MachineState *ms, const char *vm_type)
+{
+int kvm_type = KVM_X86_DEFAULT_VM;
+
+if (ms->cgs && object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+kvm_type = KVM_X86_TDX_VM;
+}
+
+/*
+ * old KVM doesn't support KVM_CAP_VM_TYPES and KVM_X86_DEFAULT_VM
+ * is always supported
+ */
+if (kvm_type == KVM_X86_DEFAULT_VM) {
+return kvm_type;
+}
+
+if (!(kvm_check_extension(KVM_STATE(ms->accelerator), KVM_CAP_VM_TYPES) & 
BIT(kvm_type))) {
+error_report("vm-type %s not supported by KVM", 
vm_type_name[kvm_type]);
+exit(1);
+}
+
+return kvm_type;
+}
+
 int kvm_has_pit_state2(void)
 {
 return has_pit_state2;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 4124912c202e..b434feaa6b1d 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -37,6 +37,7 @@ bool kvm_has_adjust_clock(void);
 bool kvm_has_adjust_clock_stable(void);
 bool kvm_has_exception_payload(void);
 void kvm_synchronize_all_tsc(void);
+int kvm_get_vm_type(MachineState *ms, const char *vm_type);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);
 
-- 
2.27.0

[PATCH v1 04/40] target/i386: Introduce kvm_confidential_guest_init()

2022-08-02 Thread Xiaoyao Li

Introduce a separate function kvm_confidential_guest_init() for SEV (and
future TDX).

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/kvm.c | 11 ++-
 target/i386/sev.c |  1 -
 target/i386/sev.h |  2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 33e0d2948f77..1f4a6a4dff28 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2448,6 +2448,15 @@ static void register_smram_listener(Notifier *n, void 
*unused)
  &smram_address_space, 1, "kvm-smram");
 }
 
+static int kvm_confidential_guest_init(MachineState *ms, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
+return sev_kvm_init(ms->cgs, errp);
+}
+
+return 0;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
 uint64_t identity_base = 0xfffbc000;
@@ -2468,7 +2477,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  * mechanisms are supported in future (e.g. TDX), they'll need
  * their own initialization either here or elsewhere.
  */
-ret = sev_kvm_init(ms->cgs, &local_err);
+ret = kvm_confidential_guest_init(ms, &local_err);
 if (ret < 0) {
 error_report_err(local_err);
 return ret;
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 32f7dbac4efa..6089b91cc698 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -39,7 +39,6 @@
 #include "hw/i386/pc.h"
 #include "exec/address-spaces.h"
 
-#define TYPE_SEV_GUEST "sev-guest"
 OBJECT_DECLARE_SIMPLE_TYPE(SevGuestState, SEV_GUEST)
 
 
diff --git a/target/i386/sev.h b/target/i386/sev.h
index 7b1528248a54..64fbf186dbd2 100644
--- a/target/i386/sev.h
+++ b/target/i386/sev.h
@@ -20,6 +20,8 @@
 
 #include "exec/confidential-guest-support.h"
 
+#define TYPE_SEV_GUEST "sev-guest"
+
 #define SEV_POLICY_NODBG0x1
 #define SEV_POLICY_NOKS 0x2
 #define SEV_POLICY_ES   0x4
-- 
2.27.0

[PATCH v1 10/40] i386/tdx: Integrate tdx_caps->xfam_fixed0/1 into tdx_cpuid_lookup

2022-08-02 Thread Xiaoyao Li

KVMM requires userspace to pass XFAM configuration via CPUID leaves 0xDs.

Convert tdx_caps->xfam_fixed0/1 into corresponding
tdx_cpuid_lookup[].tdx_fixed0/1 field of CPUID leaves 0xD. Thus the
requirement can applied naturally.

Signed-off-by: Xiaoyao Li 
---
 target/i386/cpu.c |  3 ---
 target/i386/cpu.h |  3 +++
 target/i386/kvm/tdx.c | 24 
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 194b5a31afac..45652bb2fd7c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1418,9 +1418,6 @@ static const X86RegisterInfo32 
x86_reg_info_32[CPU_NB_REGS32] = {
 };
 #undef REGISTER
 
-/* CPUID feature bits available in XSS */
-#define CPUID_XSTATE_XSS_MASK(XSTATE_ARCH_LBR_MASK)
-
 ExtSaveArea x86_ext_save_areas[XSAVE_STATE_AREA_COUNT] = {
 [XSTATE_FP_BIT] = {
 /* x87 FP state component is always enabled if XSAVE is supported */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cc9da9fc4318..90f403aecd8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -583,6 +583,9 @@ typedef enum X86Seg {
  XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK | \
  XSTATE_XTILE_CFG_MASK | 
XSTATE_XTILE_DATA_MASK)
 
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_XSS_MASK(XSTATE_ARCH_LBR_MASK)
+
 /* CPUID feature words */
 typedef enum FeatureWord {
 FEAT_1_EDX, /* CPUID[1].EDX */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d12b03fa05c9..dffaa533f899 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -395,6 +395,30 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
 entry->tdx_fixed0 &= ~config;
 entry->tdx_fixed1 &= ~config;
 }
+
+/*
+ * Because KVM gets XFAM settings via CPUID leaves 0xD,  map
+ * tdx_caps->xfam_fixed{0, 1} into tdx_cpuid_lookup[].tdx_fixed{0, 1}.
+ *
+ * Then the enforment applies in tdx_get_configurable_cpuid() naturally.
+ */
+tdx_cpuid_lookup[FEAT_XSAVE_XCR0_LO].tdx_fixed0 =
+(uint32_t)~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XCR0_MASK;
+tdx_cpuid_lookup[FEAT_XSAVE_XCR0_LO].tdx_fixed1 =
+(uint32_t)tdx_caps->xfam_fixed1 & CPUID_XSTATE_XCR0_MASK;
+tdx_cpuid_lookup[FEAT_XSAVE_XCR0_HI].tdx_fixed0 =
+(~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XCR0_MASK) >> 32;
+tdx_cpuid_lookup[FEAT_XSAVE_XCR0_HI].tdx_fixed1 =
+(tdx_caps->xfam_fixed1 & CPUID_XSTATE_XCR0_MASK) >> 32;
+
+tdx_cpuid_lookup[FEAT_XSAVE_XSS_LO].tdx_fixed0 =
+(uint32_t)~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XSS_MASK;
+tdx_cpuid_lookup[FEAT_XSAVE_XSS_LO].tdx_fixed1 =
+(uint32_t)tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK;
+tdx_cpuid_lookup[FEAT_XSAVE_XSS_HI].tdx_fixed0 =
+(~tdx_caps->xfam_fixed0 & CPUID_XSTATE_XSS_MASK) >> 32;
+tdx_cpuid_lookup[FEAT_XSAVE_XSS_HI].tdx_fixed1 =
+(tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
-- 
2.27.0

[PATCH v1 02/40] i386: Introduce tdx-guest object

2022-08-02 Thread Xiaoyao Li

Introduce tdx-guest object which implements the interface of
CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by

  qemu -machine ...,confidential-guest-support=tdx0 \
   -object tdx-guset,id=tdx0

It has only one property 'attributes' with fixed value 0 and not
configurable so far.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
changes from RFC-V4
- make @attributes not user-settable
---
 configs/devices/i386-softmmu/default.mak |  1 +
 hw/i386/Kconfig  |  5 +++
 qapi/qom.json| 12 +++
 target/i386/kvm/meson.build  |  2 ++
 target/i386/kvm/tdx.c| 40 
 target/i386/kvm/tdx.h| 19 +++
 6 files changed, 79 insertions(+)
 create mode 100644 target/i386/kvm/tdx.c
 create mode 100644 target/i386/kvm/tdx.h

diff --git a/configs/devices/i386-softmmu/default.mak 
b/configs/devices/i386-softmmu/default.mak
index 598c6646dfc0..9b5ec59d65b0 100644
--- a/configs/devices/i386-softmmu/default.mak
+++ b/configs/devices/i386-softmmu/default.mak
@@ -18,6 +18,7 @@
 #CONFIG_QXL=n
 #CONFIG_SEV=n
 #CONFIG_SGA=n
+#CONFIG_TDX=n
 #CONFIG_TEST_DEVICES=n
 #CONFIG_TPM_CRB=n
 #CONFIG_TPM_TIS_ISA=n
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b952..9e40ff79fc2d 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -10,6 +10,10 @@ config SGX
 bool
 depends on KVM
 
+config TDX
+bool
+depends on KVM
+
 config PC
 bool
 imply APPLESMC
@@ -26,6 +30,7 @@ config PC
 imply QXL
 imply SEV
 imply SGX
+imply TDX
 imply SGA
 imply TEST_DEVICES
 imply TPM_CRB
diff --git a/qapi/qom.json b/qapi/qom.json
index 80dd419b3925..38177848abc1 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -830,6 +830,16 @@
 'reduced-phys-bits': 'uint32',
 '*kernel-hashes': 'bool' } }
 
+##
+# @TdxGuestProperties:
+#
+# Properties for tdx-guest objects.
+#
+# Since: 7.2
+##
+{ 'struct': 'TdxGuestProperties',
+  'data': { }}
+
 ##
 # @ObjectType:
 #
@@ -883,6 +893,7 @@
   'if': 'CONFIG_SECRET_KEYRING' },
 'sev-guest',
 's390-pv-guest',
+'tdx-guest',
 'throttle-group',
 'tls-creds-anon',
 'tls-creds-psk',
@@ -948,6 +959,7 @@
   'secret_keyring': { 'type': 'SecretKeyringProperties',
   'if': 'CONFIG_SECRET_KEYRING' },
   'sev-guest':  'SevGuestProperties',
+  'tdx-guest':  'TdxGuestProperties',
   'throttle-group': 'ThrottleGroupProperties',
   'tls-creds-anon': 'TlsCredsAnonProperties',
   'tls-creds-psk':  'TlsCredsPskProperties',
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 736df8b72e3f..b2d7d41acde2 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,6 +9,8 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), 
if_false: files('hyperv-stub.c'))
 
 i386_softmmu_ss.add_all(when: 'CONFIG_KVM', if_true: i386_softmmu_kvm_ss)
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
new file mode 100644
index ..d3792d4a3d56
--- /dev/null
+++ b/target/i386/kvm/tdx.c
@@ -0,0 +1,40 @@
+/*
+ * QEMU TDX support
+ *
+ * Copyright Intel
+ *
+ * Author:
+ *  Xiaoyao Li 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object_interfaces.h"
+
+#include "tdx.h"
+
+/* tdx guest */
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
+   tdx_guest,
+   TDX_GUEST,
+   CONFIDENTIAL_GUEST_SUPPORT,
+   { TYPE_USER_CREATABLE },
+   { NULL })
+
+static void tdx_guest_init(Object *obj)
+{
+TdxGuest *tdx = TDX_GUEST(obj);
+
+tdx->attributes = 0;
+}
+
+static void tdx_guest_finalize(Object *obj)
+{
+}
+
+static void tdx_guest_class_init(ObjectClass *oc, void *data)
+{
+}
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
new file mode 100644
index ..415aeb5af746
--- /dev/null
+++ b/target/i386/kvm/tdx.h
@@ -0,0 +1,19 @@
+#ifndef QEMU_I386_TDX_H
+#define QEMU_I386_TDX_H
+
+#include "exec/confidential-guest-support.h"
+
+#define TYPE_TDX_GUEST "tdx-guest"
+#define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
+
+typedef struct TdxGuestClass {
+ConfidentialGuestSupportClass parent_class;
+} TdxGuestClass;
+
+typedef struct TdxGuest {
+ConfidentialGuestSupport parent_obj;
+
+uint64_t attributes;/* TD attributes */
+} TdxGuest;
+
+#endif /* QEMU_I386_TDX_H

[PATCH v1 05/40] i386/tdx: Implement tdx_kvm_init() to initialize TDX VM context

2022-08-02 Thread Xiaoyao Li

Introduce tdx_kvm_init() and invoke it in kvm_confidential_guest_init()
if it's a TDX VM. More initialization will be added later.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/kvm.c   | 15 ++-
 target/i386/kvm/meson.build |  2 +-
 target/i386/kvm/tdx-stub.c  |  9 +
 target/i386/kvm/tdx.c   |  7 +++
 target/i386/kvm/tdx.h   |  2 ++
 5 files changed, 25 insertions(+), 10 deletions(-)
 create mode 100644 target/i386/kvm/tdx-stub.c

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 1f4a6a4dff28..335f87e6cc59 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -54,6 +54,7 @@
 #include "migration/blocker.h"
 #include "exec/memattrs.h"
 #include "trace.h"
+#include "tdx.h"
 
 #include CONFIG_DEVICES
 
@@ -2452,6 +2453,8 @@ static int kvm_confidential_guest_init(MachineState *ms, 
Error **errp)
 {
 if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_SEV_GUEST)) {
 return sev_kvm_init(ms->cgs, errp);
+} else if (object_dynamic_cast(OBJECT(ms->cgs), TYPE_TDX_GUEST)) {
+return tdx_kvm_init(ms, errp);
 }
 
 return 0;
@@ -2466,16 +2469,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 Error *local_err = NULL;
 
 /*
- * Initialize SEV context, if required
+ * Initialize confidential guest (SEV/TDX) context, if required
  *
- * If no memory encryption is requested (ms->cgs == NULL) this is
- * a no-op.
- *
- * It's also a no-op if a non-SEV confidential guest support
- * mechanism is selected.  SEV is the only mechanism available to
- * select on x86 at present, so this doesn't arise, but if new
- * mechanisms are supported in future (e.g. TDX), they'll need
- * their own initialization either here or elsewhere.
+ * It's a no-op if a non-SEV/non-tdx confidential guest support
+ * mechanism is selected, i.e., ms->cgs == NULL
  */
 ret = kvm_confidential_guest_init(ms, &local_err);
 if (ret < 0) {
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index b2d7d41acde2..fd30b93ecec9 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -9,7 +9,7 @@ i386_softmmu_kvm_ss.add(files(
 
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
-i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
+i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'), if_false: 
files('tdx-stub.c'))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), 
if_false: files('hyperv-stub.c'))
 
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
new file mode 100644
index ..1df24735201e
--- /dev/null
+++ b/target/i386/kvm/tdx-stub.c
@@ -0,0 +1,9 @@
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "tdx.h"
+
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d3792d4a3d56..77e33ae01147 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,10 +12,17 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "qom/object_interfaces.h"
 
+#include "hw/i386/x86.h"
 #include "tdx.h"
 
+int tdx_kvm_init(MachineState *ms, Error **errp)
+{
+return 0;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 415aeb5af746..c8a23d95258d 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -16,4 +16,6 @@ typedef struct TdxGuest {
 uint64_t attributes;/* TD attributes */
 } TdxGuest;
 
+int tdx_kvm_init(MachineState *ms, Error **errp);
+
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0

[PATCH v1 06/40] i386/tdx: Get tdx_capabilities via KVM_TDX_CAPABILITIES

2022-08-02 Thread Xiaoyao Li

KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.

Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.

Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.

Signed-off-by: Xiaoyao Li 
---
changes from RFC v4:
  - start from nr_cpuid_configs = 6 for the loop;
  - stop the loop when nr_cpuid_configs exceeds KVM_MAX_CPUID_ENTRIES;
---
 target/i386/kvm/kvm.c  |  2 -
 target/i386/kvm/kvm_i386.h |  2 +
 target/i386/kvm/tdx.c  | 92 ++
 3 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 335f87e6cc59..9e30fa9f4eb5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1704,8 +1704,6 @@ static int hyperv_init_vcpu(X86CPU *cpu)
 
 static Error *invtsc_mig_blocker;
 
-#define KVM_MAX_CPUID_ENTRIES  100
-
 static void kvm_init_xsave(CPUX86State *env)
 {
 if (has_xsave2) {
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index b434feaa6b1d..6b24ab2a7813 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -13,6 +13,8 @@
 
 #include "sysemu/kvm.h"
 
+#define KVM_MAX_CPUID_ENTRIES  100
+
 #define kvm_apic_in_kernel() (kvm_irqchip_in_kernel())
 
 #ifdef CONFIG_KVM
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 77e33ae01147..89f81f7d7082 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -14,12 +14,104 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
+#include "sysemu/kvm.h"
 
 #include "hw/i386/x86.h"
+#include "kvm_i386.h"
 #include "tdx.h"
 
+static struct kvm_tdx_capabilities *tdx_caps;
+
+enum tdx_ioctl_level{
+TDX_PLATFORM_IOCTL,
+TDX_VM_IOCTL,
+TDX_VCPU_IOCTL,
+};
+
+static int __tdx_ioctl(void *state, enum tdx_ioctl_level level, int cmd_id,
+__u32 flags, void *data)
+{
+struct kvm_tdx_cmd tdx_cmd;
+int r;
+
+memset(&tdx_cmd, 0x0, sizeof(tdx_cmd));
+
+tdx_cmd.id = cmd_id;
+tdx_cmd.flags = flags;
+tdx_cmd.data = (__u64)(unsigned long)data;
+
+switch (level) {
+case TDX_PLATFORM_IOCTL:
+r = kvm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+break;
+case TDX_VM_IOCTL:
+r = kvm_vm_ioctl(kvm_state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+break;
+case TDX_VCPU_IOCTL:
+r = kvm_vcpu_ioctl(state, KVM_MEMORY_ENCRYPT_OP, &tdx_cmd);
+break;
+default:
+error_report("Invalid tdx_ioctl_level %d", level);
+exit(1);
+}
+
+return r;
+}
+
+static inline int tdx_platform_ioctl(int cmd_id, __u32 flags, void *data)
+{
+return __tdx_ioctl(NULL, TDX_PLATFORM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vm_ioctl(int cmd_id, __u32 flags, void *data)
+{
+return __tdx_ioctl(NULL, TDX_VM_IOCTL, cmd_id, flags, data);
+}
+
+static inline int tdx_vcpu_ioctl(void *vcpu_fd, int cmd_id, __u32 flags,
+ void *data)
+{
+return  __tdx_ioctl(vcpu_fd, TDX_VCPU_IOCTL, cmd_id, flags, data);
+}
+
+static void get_tdx_capabilities(void)
+{
+struct kvm_tdx_capabilities *caps;
+/* 1st generation of TDX reports 6 cpuid configs */
+int nr_cpuid_configs = 6;
+int r, size;
+
+do {
+size = sizeof(struct kvm_tdx_capabilities) +
+   nr_cpuid_configs * sizeof(struct kvm_tdx_cpuid_config);
+caps = g_malloc0(size);
+caps->nr_cpuid_configs = nr_cpuid_configs;
+
+r = tdx_platform_ioctl(KVM_TDX_CAPABILITIES, 0, caps);
+if (r == -E2BIG) {
+g_free(caps);
+nr_cpuid_configs *= 2;
+if (nr_cpuid_configs > KVM_MAX_CPUID_ENTRIES) {
+error_report("KVM TDX seems broken");
+exit(1);
+}
+} else if (r < 0) {
+g_free(caps);
+error_report("KVM_TDX_CAPABILITIES failed: %s\n", strerror(-r));
+exit(1);
+}
+}
+while (r == -E2BIG);
+
+tdx_caps = caps;
+}
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+if (!tdx_caps) {
+get_tdx_capabilities();
+}
+
 return 0;
 }
 
-- 
2.27.0

[PATCH v1 09/40] i386/tdx: Update tdx_fixed0/1 bits by tdx_caps.cpuid_config[]

2022-08-02 Thread Xiaoyao Li

tdx_cpuid_lookup[].tdx_fixed0/1 is the QEMU maintained data which
reflects TDX restrictions regrading how some CPUID is virtualized by
TDX. It's retrieved from TDX spec. However, TDX may change some fixed
fields to configurable in the future. Update
tdx_cpuid.lookup[].tdx_fixed0/1 fields by removing the bits that
reported from TDX module as configurable. This can adapt with the
updated TDX (module) automatically.

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/tdx.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index e3e9a424512e..d12b03fa05c9 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -369,6 +369,34 @@ static void get_tdx_capabilities(void)
 tdx_caps = caps;
 }
 
+static void update_tdx_cpuid_lookup_by_tdx_caps(void)
+{
+KvmTdxCpuidLookup *entry;
+FeatureWordInfo *fi;
+uint32_t config;
+FeatureWord w;
+
+/*
+ * Patch tdx_fixed0/1 by tdx_caps that what TDX module reports as
+ * configurable is not fixed.
+ */
+for (w = 0; w < FEATURE_WORDS; w++) {
+fi = &feature_word_info[w];
+entry = &tdx_cpuid_lookup[w];
+
+if (fi->type != CPUID_FEATURE_WORD) {
+continue;
+}
+
+config = tdx_cap_cpuid_config(fi->cpuid.eax,
+  fi->cpuid.needs_ecx ? fi->cpuid.ecx : 
~0u,
+  fi->cpuid.reg);
+
+entry->tdx_fixed0 &= ~config;
+entry->tdx_fixed1 &= ~config;
+}
+}
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
 TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
@@ -378,6 +406,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 get_tdx_capabilities();
 }
 
+update_tdx_cpuid_lookup_by_tdx_caps();
+
 tdx_guest = tdx;
 
 return 0;
-- 
2.27.0

[PATCH v1 07/40] i386/tdx: Introduce is_tdx_vm() helper and cache tdx_guest object

2022-08-02 Thread Xiaoyao Li

It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.

Cache tdx_guest object thus no need to cast from ms->cgs every time.

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/tdx.c | 13 +
 target/i386/kvm/tdx.h | 10 ++
 2 files changed, 23 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 89f81f7d7082..fdd6bec58758 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -20,8 +20,16 @@
 #include "kvm_i386.h"
 #include "tdx.h"
 
+static TdxGuest *tdx_guest;
+
 static struct kvm_tdx_capabilities *tdx_caps;
 
+/* It's valid after kvm_confidential_guest_init()->kvm_tdx_init() */
+bool is_tdx_vm(void)
+{
+return !!tdx_guest;
+}
+
 enum tdx_ioctl_level{
 TDX_PLATFORM_IOCTL,
 TDX_VM_IOCTL,
@@ -108,10 +116,15 @@ static void get_tdx_capabilities(void)
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
+TYPE_TDX_GUEST);
+
 if (!tdx_caps) {
 get_tdx_capabilities();
 }
 
+tdx_guest = tdx;
+
 return 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index c8a23d95258d..4036ca2f3f99 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -1,6 +1,10 @@
 #ifndef QEMU_I386_TDX_H
 #define QEMU_I386_TDX_H
 
+#ifndef CONFIG_USER_ONLY
+#include CONFIG_DEVICES /* CONFIG_TDX */
+#endif
+
 #include "exec/confidential-guest-support.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
@@ -16,6 +20,12 @@ typedef struct TdxGuest {
 uint64_t attributes;/* TD attributes */
 } TdxGuest;
 
+#ifdef CONFIG_TDX
+bool is_tdx_vm(void);
+#else
+#define is_tdx_vm() 0
+#endif /* CONFIG_TDX */
+
 int tdx_kvm_init(MachineState *ms, Error **errp);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0

[PATCH v1 12/40] i386/kvm: Move architectural CPUID leaf generation to separate helper

2022-08-02 Thread Xiaoyao Li

From: Sean Christopherson 

Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.

Signed-off-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c  | 220 +++--
 target/i386/kvm/kvm_i386.h |   3 +
 2 files changed, 118 insertions(+), 105 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9930902ae890..9c0d5be5cc23 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1728,115 +1728,21 @@ static void kvm_init_xsave(CPUX86State *env)
env->xsave_buf_len);
 }
 
-int kvm_arch_init_vcpu(CPUState *cs)
+uint32_t kvm_x86_arch_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
+uint32_t cpuid_i)
 {
-struct {
-struct kvm_cpuid2 cpuid;
-struct kvm_cpuid_entry2 entries[KVM_MAX_CPUID_ENTRIES];
-} cpuid_data;
-/*
- * The kernel defines these structs with padding fields so there
- * should be no extra padding in our cpuid_data struct.
- */
-QEMU_BUILD_BUG_ON(sizeof(cpuid_data) !=
-  sizeof(struct kvm_cpuid2) +
-  sizeof(struct kvm_cpuid_entry2) * KVM_MAX_CPUID_ENTRIES);
-
-X86CPU *cpu = X86_CPU(cs);
-CPUX86State *env = &cpu->env;
-uint32_t limit, i, j, cpuid_i;
+uint32_t limit, i, j;
 uint32_t unused;
 struct kvm_cpuid_entry2 *c;
-uint32_t signature[3];
-int kvm_base = KVM_CPUID_SIGNATURE;
-int max_nested_state_len;
-int r;
-Error *local_err = NULL;
-
-memset(&cpuid_data, 0, sizeof(cpuid_data));
-
-cpuid_i = 0;
-
-has_xsave2 = kvm_check_extension(cs->kvm_state, KVM_CAP_XSAVE2);
-
-r = kvm_arch_set_tsc_khz(cs);
-if (r < 0) {
-return r;
-}
-
-/* vcpu's TSC frequency is either specified by user, or following
- * the value used by KVM if the former is not present. In the
- * latter case, we query it from KVM and record in env->tsc_khz,
- * so that vcpu's TSC frequency can be migrated later via this field.
- */
-if (!env->tsc_khz) {
-r = kvm_check_extension(cs->kvm_state, KVM_CAP_GET_TSC_KHZ) ?
-kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ) :
--ENOTSUP;
-if (r > 0) {
-env->tsc_khz = r;
-}
-}
-
-env->apic_bus_freq = KVM_APIC_BUS_FREQUENCY;
-
-/*
- * kvm_hyperv_expand_features() is called here for the second time in case
- * KVM_CAP_SYS_HYPERV_CPUID is not supported. While we can't possibly 
handle
- * 'query-cpu-model-expansion' in this case as we don't have a KVM vCPU to
- * check which Hyper-V enlightenments are supported and which are not, we
- * can still proceed and check/expand Hyper-V enlightenments here so legacy
- * behavior is preserved.
- */
-if (!kvm_hyperv_expand_features(cpu, &local_err)) {
-error_report_err(local_err);
-return -ENOSYS;
-}
-
-if (hyperv_enabled(cpu)) {
-r = hyperv_init_vcpu(cpu);
-if (r) {
-return r;
-}
-
-cpuid_i = hyperv_fill_cpuids(cs, cpuid_data.entries);
-kvm_base = KVM_CPUID_SIGNATURE_NEXT;
-has_msr_hv_hypercall = true;
-}
-
-if (cpu->expose_kvm) {
-memcpy(signature, "KVMKVMKVM\0\0\0", 12);
-c = &cpuid_data.entries[cpuid_i++];
-c->function = KVM_CPUID_SIGNATURE | kvm_base;
-c->eax = KVM_CPUID_FEATURES | kvm_base;
-c->ebx = signature[0];
-c->ecx = signature[1];
-c->edx = signature[2];
-
-c = &cpuid_data.entries[cpuid_i++];
-c->function = KVM_CPUID_FEATURES | kvm_base;
-c->eax = env->features[FEAT_KVM];
-c->edx = env->features[FEAT_KVM_HINTS];
-}
 
 cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
 
-if (cpu->kvm_pv_enforce_cpuid) {
-r = kvm_vcpu_enable_cap(cs, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, 0, 1);
-if (r < 0) {
-fprintf(stderr,
-"failed to enable KVM_CAP_ENFORCE_PV_FEATURE_CPUID: %s",
-strerror(-r));
-abort();
-}
-}
-
 for (i = 0; i <= limit; i++) {
 if (cpuid_i == KVM_MAX_CPUID_ENTRIES) {
 fprintf(stderr, "unsupported level value: 0x%x\n", limit);
 abort();
 }
-c = &cpuid_data.entries[cpuid_i++];
+c = &entries[cpuid_i++];
 
 switch (i) {
 case 2: {
@@ -1855,7 +1761,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 "cpuid(eax:2):eax & 0xf = 0x%x\n", times);
 abort();
 }
-c = &cpuid_data.entries[cpuid_i++];
+c = &entries[cpuid_i++];
 c->function = i;
 c->flags = KVM_CPUID_FLAG_STATEFUL_FUNC;
 cpu_x86_cpuid(env, i, 0, &c->eax

[PATCH v1 19/40] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM

2022-08-02 Thread Xiaoyao Li

TDX only supports readonly for shared memory but not for private memory.

In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/tdx.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 0162d7cc9df4..3aa0e374a514 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -461,6 +461,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 
 update_tdx_cpuid_lookup_by_tdx_caps();
 
+/*
+ * Set kvm_readonly_mem_allowed to false, because TDX only supports 
readonly
+ * memory for shared memory but not for private memory. Besides, whether a
+ * memslot is private or shared is not determined by QEMU.
+ *
+ * Thus, just mark readonly memory not supported for simplicity.
+ */
+kvm_readonly_mem_allowed = false;
+
 tdx_guest = tdx;
 
 return 0;
-- 
2.27.0

[PATCH v1 08/40] i386/tdx: Adjust the supported CPUID based on TDX restrictions

2022-08-02 Thread Xiaoyao Li

According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
bits of TD can be classified into 6 types:


1 | As configured | configurable by VMM, independent of native value;

2 | As configured | configurable by VMM if the bit is supported natively
(if native)   | Otherwise it equals as native(0).

3 | Fixed | fixed to 0/1

4 | Native| reflect the native value

5 | Calculated| calculated by TDX module.

6 | Inducing #VE  | get #VE exception


Note:
1. All the configurable XFAM related features and TD attributes related
   features fall into type #2. And fixed0/1 bits of XFAM and TD
   attributes fall into type #3.

2. For CPUID leaves not listed in "CPUID virtualization Overview" table
   in TDX module spec. When they are queried, TDX module injects #VE to
   TDs. For this case, TDs can request CPUID emulation from VMM via
   TDVMCALL and the values are fully controlled by VMM.

Due to TDX module has its own virtualization policy on CPUID bits, it leads
to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
CPUID bits for TDS. In order to keep a consistent CPUID configuration
between VMM and TDs. Adjust supported CPUID for TDs based on TDX
restrictions.

Currently only focus on the CPUID leaves recognized by QEMU's
feature_word_info[] that are indexed by a FeatureWord.

Introduce a TDX CPUID lookup table, which maintains 1 entry for each
FeatureWord. Each entry has below fields:

 - tdx_fixed0/1: The bits that are fixed as 0/1;

 - vmm_fixup:   The bits that are configurable from the view of TDX module.
But they requires emulation of VMM when they are configured
as enabled. For those, they are not supported if VMM doesn't
report them as supported. So they need be fixed up by
checking if VMM supports them.

 - inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
totally configurable by VMM.

 - supported_on_ve: It's valid only when @inducing_ve is true. It represents
the maximum feature set supported that be emulated
for TDs.

By applying TDX CPUID lookup table and TDX capabilities reported from
TDX module, the supported CPUID for TDs can be obtained from following
steps:

- get the base of VMM supported feature set;

- if the leaf is not a FeatureWord just return VMM's value without
  modification;

- if the leaf is an inducing_ve type, applying supported_on_ve mask and
  return;

- include all native bits, it covers type #2, #4, and parts of type #1.
  (it also includes some unsupported bits. The following step will
   correct it.)

- apply fixed0/1 to it (it covers #3, and rectifies the previous step);

- add configurable bits (it covers the other part of type #1);

- fix the ones in vmm_fixup;

- filter the one has valid .supported field;

(Calculated type is ignored since it's determined at runtime).

Co-developed-by: Chenyi Qiang 
Signed-off-by: Chenyi Qiang 
Signed-off-by: Xiaoyao Li 
---
 target/i386/cpu.h |  16 +++
 target/i386/kvm/kvm.c |   4 +
 target/i386/kvm/tdx.c | 255 ++
 target/i386/kvm/tdx.h |   2 +
 4 files changed, 277 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 82004b65b944..cc9da9fc4318 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -771,6 +771,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 
 /* Support RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE */
 #define CPUID_7_0_EBX_FSGSBASE  (1U << 0)
+/* Support for TSC adjustment MSR 0x3B */
+#define CPUID_7_0_EBX_TSC_ADJUST(1U << 1)
 /* Support SGX */
 #define CPUID_7_0_EBX_SGX   (1U << 2)
 /* 1st Group of Advanced Bit Manipulation Extensions */
@@ -789,8 +791,12 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_EBX_INVPCID   (1U << 10)
 /* Restricted Transactional Memory */
 #define CPUID_7_0_EBX_RTM   (1U << 11)
+/* Cache QoS Monitoring */
+#define CPUID_7_0_EBX_PQM   (1U << 12)
 /* Memory Protection Extension */
 #define CPUID_7_0_EBX_MPX   (1U << 14)
+/* Resource Director Technology Allocation */
+#define CPUID_7_0_EBX_RDT_A (1U << 15)
 /* AVX-512 Foundation */
 #define CPUID_7_0_EBX_AVX512F   (1U << 16)
 /* AVX-512 Doubleword & Quadword Instruction */
@@ -846,10 +852,16 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7

[PATCH v1 11/40] i386/tdx: Integrate tdx_caps->attrs_fixed0/1 to tdx_cpuid_lookup

2022-08-02 Thread Xiaoyao Li

Some bits in TD attributes have corresponding CPUID feature bits. Reflect
the fixed0/1 restriction on TD attributes to their corresponding CPUID
bits in tdx_cpuid_lookup[] as well.

Signed-off-by: Xiaoyao Li 
---
 target/i386/cpu-internal.h |  9 +
 target/i386/cpu.c  |  9 -
 target/i386/cpu.h  |  2 ++
 target/i386/kvm/tdx.c  | 21 +
 4 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu-internal.h b/target/i386/cpu-internal.h
index 9baac5c0b450..e980f6e3147f 100644
--- a/target/i386/cpu-internal.h
+++ b/target/i386/cpu-internal.h
@@ -20,6 +20,15 @@
 #ifndef I386_CPU_INTERNAL_H
 #define I386_CPU_INTERNAL_H
 
+typedef struct FeatureMask {
+FeatureWord index;
+uint64_t mask;
+} FeatureMask;
+
+typedef struct FeatureDep {
+FeatureMask from, to;
+} FeatureDep;
+
 typedef enum FeatureWordType {
CPUID_FEATURE_WORD,
MSR_FEATURE_WORD,
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 45652bb2fd7c..e5c1ffcb138a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1289,15 +1289,6 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 },
 };
 
-typedef struct FeatureMask {
-FeatureWord index;
-uint64_t mask;
-} FeatureMask;
-
-typedef struct FeatureDep {
-FeatureMask from, to;
-} FeatureDep;
-
 static FeatureDep feature_dependencies[] = {
 {
 .from = { FEAT_7_0_EDX, CPUID_7_0_EDX_ARCH_CAPABILITIES },
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 90f403aecd8b..8f4de62b02e9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -867,6 +867,8 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define CPUID_7_0_ECX_MAWAU (31U << 17)
 /* Read Processor ID */
 #define CPUID_7_0_ECX_RDPID (1U << 22)
+/* KeyLocker */
+#define CPUID_7_0_ECX_KeyLocker (1U << 23)
 /* Bus Lock Debug Exception */
 #define CPUID_7_0_ECX_BUS_LOCK_DETECT   (1U << 24)
 /* Cache Line Demote Instruction */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index dffaa533f899..6fe47cf4e29e 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,13 @@
  (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
  (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_ATTRIBUTES_MAX_BITS  64
+
+static FeatureMask tdx_attrs_ctrl_fields[TDX_ATTRIBUTES_MAX_BITS] = {
+[30] = { .index = FEAT_7_0_ECX, .mask = CPUID_7_0_ECX_PKS },
+[31] = { .index = FEAT_7_0_ECX, .mask = CPUID_7_0_ECX_KeyLocker},
+};
+
 typedef struct KvmTdxCpuidLookup {
 uint32_t tdx_fixed0;
 uint32_t tdx_fixed1;
@@ -375,6 +382,8 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
 FeatureWordInfo *fi;
 uint32_t config;
 FeatureWord w;
+FeatureMask *fm;
+int i;
 
 /*
  * Patch tdx_fixed0/1 by tdx_caps that what TDX module reports as
@@ -396,6 +405,18 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
 entry->tdx_fixed1 &= ~config;
 }
 
+for (i = 0; i < ARRAY_SIZE(tdx_attrs_ctrl_fields); i++) {
+fm = &tdx_attrs_ctrl_fields[i];
+
+if (tdx_caps->attrs_fixed0 & (1ULL << i)) {
+tdx_cpuid_lookup[fm->index].tdx_fixed0 |= fm->mask;
+}
+
+if (tdx_caps->attrs_fixed1 & (1ULL << i)) {
+tdx_cpuid_lookup[fm->index].tdx_fixed1 |= fm->mask;
+}
+}
+
 /*
  * Because KVM gets XFAM settings via CPUID leaves 0xD,  map
  * tdx_caps->xfam_fixed{0, 1} into tdx_cpuid_lookup[].tdx_fixed{0, 1}.
-- 
2.27.0

[PATCH v1 13/40] KVM: Introduce kvm_arch_pre_create_vcpu()

2022-08-02 Thread Xiaoyao Li

Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.

Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c  | 12 
 include/sysemu/kvm.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 46e609570ce1..c26d602f5476 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -460,6 +460,11 @@ static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
 return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
 }
 
+int __attribute__ ((weak)) kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+return 0;
+}
+
 int kvm_init_vcpu(CPUState *cpu, Error **errp)
 {
 KVMState *s = kvm_state;
@@ -468,6 +473,13 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
 trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+ret = kvm_arch_pre_create_vcpu(cpu);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+goto err;
+}
+
 ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
 if (ret < 0) {
 error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
(%lu)",
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index efd6dee818f2..e3159e1e711d 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -373,6 +373,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
 
 int kvm_arch_init(MachineState *ms, KVMState *s);
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu);
 int kvm_arch_init_vcpu(CPUState *cpu);
 int kvm_arch_destroy_vcpu(CPUState *cpu);
 
-- 
2.27.0

[PATCH v1 14/40] i386/tdx: Initialize TDX before creating TD vcpus

2022-08-02 Thread Xiaoyao Li

Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD state, e.g. the canonical CPUID config, and must
be executed prior to creating vCPUs.

Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM and
tie x86cpu->enable_pmu with TD's attributes.

Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().

Signed-off-by: Xiaoyao Li 
---
 accel/kvm/kvm-all.c|  9 -
 target/i386/kvm/kvm.c  |  8 
 target/i386/kvm/tdx-stub.c |  5 +
 target/i386/kvm/tdx.c  | 34 ++
 target/i386/kvm/tdx.h  |  4 
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c26d602f5476..c1348c380680 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -473,10 +473,17 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
 trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
+/*
+ * tdx_pre_create_vcpu() may call cpu_x86_cpuid(). It in turn may call
+ * kvm_vm_ioctl(). Set cpu->kvm_state in advance to avoid NULL pointer
+ * dereference.
+ */
+cpu->kvm_state = s;
 ret = kvm_arch_pre_create_vcpu(cpu);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
+cpu->kvm_state = NULL;
 goto err;
 }
 
@@ -484,11 +491,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 if (ret < 0) {
 error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
(%lu)",
  kvm_arch_vcpu_id(cpu));
+cpu->kvm_state = NULL;
 goto err;
 }
 
 cpu->kvm_fd = ret;
-cpu->kvm_state = s;
 cpu->vcpu_dirty = true;
 cpu->dirty_pages = 0;
 cpu->throttle_us_per_full = 0;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9c0d5be5cc23..4f491f871f3e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2197,6 +2197,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
 return r;
 }
 
+int kvm_arch_pre_create_vcpu(CPUState *cpu)
+{
+if (is_tdx_vm())
+return tdx_pre_create_vcpu(cpu);
+
+return 0;
+}
+
 int kvm_arch_destroy_vcpu(CPUState *cs)
 {
 X86CPU *cpu = X86_CPU(cs);
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 1df24735201e..2871de9d7b56 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -7,3 +7,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 {
 return -EINVAL;
 }
+
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 6fe47cf4e29e..ecb0205651bd 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -458,6 +458,38 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 return 0;
 }
 
+int tdx_pre_create_vcpu(CPUState *cpu)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+X86CPU *x86cpu = X86_CPU(cpu);
+CPUX86State *env = &x86cpu->env;
+struct kvm_tdx_init_vm init_vm;
+int r = 0;
+
+qemu_mutex_lock(&tdx_guest->lock);
+if (tdx_guest->initialized) {
+goto out;
+}
+
+memset(&init_vm, 0, sizeof(init_vm));
+init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
+
+init_vm.attributes = tdx_guest->attributes;
+init_vm.max_vcpus = ms->smp.cpus;
+
+r = tdx_vm_ioctl(KVM_TDX_INIT_VM, 0, &init_vm);
+if (r < 0) {
+error_report("KVM_TDX_INIT_VM failed %s", strerror(-r));
+goto out;
+}
+
+tdx_guest->initialized = true;
+
+out:
+qemu_mutex_unlock(&tdx_guest->lock);
+return r;
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -470,6 +502,8 @@ static void tdx_guest_init(Object *obj)
 {
 TdxGuest *tdx = TDX_GUEST(obj);
 
+qemu_mutex_init(&tdx->lock);
+
 tdx->attributes = 0;
 }
 
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 06599b65b827..46a24ee8c7cc 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -17,6 +17,9 @@ typedef struct TdxGuestClass {
 typedef struct TdxGuest {
 ConfidentialGuestSupport parent_obj;
 
+QemuMutex lock;
+
+bool initialized;
 uint64_t attributes;/* TD attributes */
 } TdxGuest;
 
@@ -29,5 +32,6 @@ bool is_tdx_vm(void);
 int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
  uint32_t *ret);
+int tdx_pre_create_vcpu(CPUState *cpu);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0

[PATCH v1 29/40] i386/tdx: Call KVM_TDX_INIT_VCPU to initialize TDX vcpu

2022-08-02 Thread Xiaoyao Li

TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.

KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/tdx.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index d0bbe06f5504..2dbe26f2e950 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -571,6 +571,22 @@ static void tdx_init_ram_entries(void)
 tdx_guest->nr_ram_entries = j;
 }
 
+static void tdx_post_init_vcpus(void)
+{
+TdxFirmwareEntry *hob;
+CPUState *cpu;
+int r;
+
+hob = tdx_get_hob_entry(tdx_guest);
+CPU_FOREACH(cpu) {
+r = tdx_vcpu_ioctl(cpu, KVM_TDX_INIT_VCPU, 0, (void *)hob->address);
+if (r < 0) {
+error_report("KVM_TDX_INIT_VCPU failed %s", strerror(-r));
+exit(1);
+}
+}
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
 TdxFirmware *tdvf = &tdx_guest->tdvf;
@@ -602,6 +618,8 @@ static void tdx_finalize_vm(Notifier *notifier, void 
*unused)
 
 tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
 
+tdx_post_init_vcpus();
+
 for_each_tdx_fw_entry(tdvf, entry) {
 struct kvm_tdx_init_mem_region mem_region = {
 .source_addr = (__u64)entry->mem_ptr,
-- 
2.27.0

[PATCH v1 20/40] i386/tdvf: Introduce function to parse TDVF metadata

2022-08-02 Thread Xiaoyao Li

From: Isaku Yamahata 

TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.

A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.

TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.

Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.

Signed-off-by: Isaku Yamahata 
Co-developed-by: Xiaoyao Li 
Signed-off-by: Xiaoyao Li 

---
Changes from RFC v4:
 - rename tdvf_parse_section_entry() to
   tdvf_parse_and_check_section_entry()
Changes in v4:
 - rename TDX_METADATA_GUID to TDX_METADATA_OFFSET_GUID
---
 hw/i386/Kconfig|   1 +
 hw/i386/meson.build|   1 +
 hw/i386/tdvf.c | 197 +
 include/hw/i386/tdvf.h |  51 +++
 4 files changed, 250 insertions(+)
 create mode 100644 hw/i386/tdvf.c
 create mode 100644 include/hw/i386/tdvf.h

diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 9e40ff79fc2d..0c3e3a464012 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -12,6 +12,7 @@ config SGX
 
 config TDX
 bool
+select X86_FW_OVMF
 depends on KVM
 
 config PC
diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 213e2e82b3d7..97f3b50503b0 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,6 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
 if_false: 
files('pc_sysfw_ovmf-stubs.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
new file mode 100644
index ..a40198f9407a
--- /dev/null
+++ b/hw/i386/tdvf.c
@@ -0,0 +1,197 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata 
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/tdvf.h"
+#include "sysemu/kvm.h"
+
+#define TDX_METADATA_OFFSET_GUID"e47a6535-984a-4798-865e-4685a7bf8ec2"
+#define TDX_METADATA_VERSION1
+#define TDVF_SIGNATURE  0x46564454 /* TDVF as little endian */
+
+typedef struct {
+uint32_t DataOffset;
+uint32_t RawDataSize;
+uint64_t MemoryAddress;
+uint64_t MemoryDataSize;
+uint32_t Type;
+uint32_t Attributes;
+} TdvfSectionEntry;
+
+typedef struct {
+uint32_t Signature;
+uint32_t Length;
+uint32_t Version;
+uint32_t NumberOfSectionEntries;
+TdvfSectionEntry SectionEntries[];
+} TdvfMetadata;
+
+struct tdx_metadata_offset {
+uint32_t offset;
+};
+
+static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
+{
+TdvfMetadata *metadata;
+uint32_t offset = 0;
+uint8_t *data;
+
+if ((uint32_t) size != size) {
+return NULL;
+}
+
+if (pc_system_ovmf_table_find(TDX_METADATA_OFFSET_GUID, &data, NULL)) {
+offset = size - le32_to_cpu(((struct tdx_metadata_offset 
*)data)->offset);
+
+if (offset + sizeof(*metadata) > size) {
+return NULL;
+}
+} else {
+error_report("Cannot find TDX_METADATA_OFFSET_GUID");
+return NULL;
+}
+
+metadata = flash_ptr + offset;
+
+/* Finally, verify the signature to determine if this is a TDVF image. */
+metadata->Signature = le32_to_cpu(metadata->Signature);
+if (metadata->Signature != TDVF_SIGNATURE) {
+error_report("Invalid TDVF signature in metadata!");
+return NULL;
+}
+
+/* Sanity check that the TDVF doesn't overlap its own metadata. */
+metadata->Length = le32_to_cpu(metadata->Length);
+if (offset + metadata->Length > size) {
+return NULL;
+}
+
+/* Only version 1 is supported/defined. */
+metadata->Version = le32_to_cpu(metadata->Version);
+if (metadata->Version != TDX_METADATA_

[PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object

2022-08-02 Thread Xiaoyao Li

Bit 28, named SEPT_VE_DISABLE, disables EPT violation conversion to #VE
on guest TD access of PENDING pages when set to 1. Some guest OS (e.g.,
Linux TD guest) may require this bit set as 1. Otherwise refuse to boot.

Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.

Signed-off-by: Xiaoyao Li 
---
 qapi/qom.json |  4 +++-
 target/i386/kvm/tdx.c | 24 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38177848abc1..2a5486bfed3e 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -835,10 +835,12 @@
 #
 # Properties for tdx-guest objects.
 #
+# @sept-ve-disable: bit 28 of TD attributes (default: 0)
+#
 # Since: 7.2
 ##
 { 'struct': 'TdxGuestProperties',
-  'data': { }}
+  'data': { '*sept-ve-disable': 'bool' } }
 
 ##
 # @ObjectType:
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index ecb0205651bd..bf57f270ac9d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,8 @@
  (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
  (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
+
 #define TDX_ATTRIBUTES_MAX_BITS  64
 
 static FeatureMask tdx_attrs_ctrl_fields[TDX_ATTRIBUTES_MAX_BITS] = {
@@ -490,6 +492,24 @@ out:
 return r;
 }
 
+static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
+{
+TdxGuest *tdx = TDX_GUEST(obj);
+
+return !!(tdx->attributes & TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE);
+}
+
+static void tdx_guest_set_sept_ve_disable(Object *obj, bool value, Error 
**errp)
+{
+TdxGuest *tdx = TDX_GUEST(obj);
+
+if (value) {
+tdx->attributes |= TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+} else {
+tdx->attributes &= ~TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE;
+}
+}
+
 /* tdx guest */
 OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
tdx_guest,
@@ -505,6 +525,10 @@ static void tdx_guest_init(Object *obj)
 qemu_mutex_init(&tdx->lock);
 
 tdx->attributes = 0;
+
+object_property_add_bool(obj, "sept-ve-disable",
+ tdx_guest_get_sept_ve_disable,
+ tdx_guest_set_sept_ve_disable);
 }
 
 static void tdx_guest_finalize(Object *obj)
-- 
2.27.0

[PATCH v1 39/40] i386/tdx: Don't get/put guest state for TDX VMs

2022-08-02 Thread Xiaoyao Li

From: Sean Christopherson 

Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.

Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.

Signed-off-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 948c87ebdb97..95afbbac7116 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4584,6 +4584,11 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 
 assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
+/* TODO: Allow accessing guest state for debug TDs. */
+if (is_tdx_vm()) {
+return 0;
+}
+
 /* must be before kvm_put_nested_state so that EFER.SVME is set */
 ret = has_sregs2 ? kvm_put_sregs2(x86_cpu) : kvm_put_sregs(x86_cpu);
 if (ret < 0) {
@@ -4678,6 +4683,12 @@ int kvm_arch_get_registers(CPUState *cs)
 if (ret < 0) {
 goto out;
 }
+
+/* TODO: Allow accessing guest state for debug TDs. */
+if (is_tdx_vm()) {
+return 0;
+}
+
 ret = kvm_getput_regs(cpu, 0);
 if (ret < 0) {
 goto out;
-- 
2.27.0

[PATCH v1 16/40] i386/tdx: Wire CPU features up with attributes of TD guest

2022-08-02 Thread Xiaoyao Li

For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
configured by x86cpu->enable_pmu. Reuse the existing configuration
interface for TDX VMs.

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/tdx.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index bf57f270ac9d..f2372002077d 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -31,6 +31,8 @@
  (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
+#define TDX_TD_ATTRIBUTES_PKS   BIT_ULL(30)
+#define TDX_TD_ATTRIBUTES_PERFMON   BIT_ULL(63)
 
 #define TDX_ATTRIBUTES_MAX_BITS  64
 
@@ -460,6 +462,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 return 0;
 }
 
+static void setup_td_guest_attributes(X86CPU *x86cpu)
+{
+CPUX86State *env = &x86cpu->env;
+
+tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) 
?
+ TDX_TD_ATTRIBUTES_PKS : 0;
+tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 
0;
+}
+
 int tdx_pre_create_vcpu(CPUState *cpu)
 {
 MachineState *ms = MACHINE(qdev_get_machine());
@@ -473,6 +484,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 goto out;
 }
 
+setup_td_guest_attributes(x86cpu);
+
 memset(&init_vm, 0, sizeof(init_vm));
 init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
 
-- 
2.27.0

[PATCH v1 17/40] i386/tdx: Validate TD attributes

2022-08-02 Thread Xiaoyao Li

Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.

Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/tdx.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index f2372002077d..42cef484c574 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,7 @@
  (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
  (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
 #define TDX_TD_ATTRIBUTES_PKS   BIT_ULL(30)
 #define TDX_TD_ATTRIBUTES_PERFMON   BIT_ULL(63)
@@ -462,13 +463,32 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 return 0;
 }
 
-static void setup_td_guest_attributes(X86CPU *x86cpu)
+static int tdx_validate_attributes(TdxGuest *tdx)
+{
+if (((tdx->attributes & tdx_caps->attrs_fixed0) | tdx_caps->attrs_fixed1) 
!=
+tdx->attributes) {
+error_report("Invalid attributes 0x%lx for TDX VM (fixed0 0x%llx, 
fixed1 0x%llx)",
+  tdx->attributes, tdx_caps->attrs_fixed0, 
tdx_caps->attrs_fixed1);
+return -EINVAL;
+}
+
+if (tdx->attributes & TDX_TD_ATTRIBUTES_DEBUG) {
+error_report("Current QEMU doesn't support attributes.debug[bit 0] for 
TDX VM");
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int setup_td_guest_attributes(X86CPU *x86cpu)
 {
 CPUX86State *env = &x86cpu->env;
 
 tdx_guest->attributes |= (env->features[FEAT_7_0_ECX] & CPUID_7_0_ECX_PKS) 
?
  TDX_TD_ATTRIBUTES_PKS : 0;
 tdx_guest->attributes |= x86cpu->enable_pmu ? TDX_TD_ATTRIBUTES_PERFMON : 
0;
+
+return tdx_validate_attributes(tdx_guest);
 }
 
 int tdx_pre_create_vcpu(CPUState *cpu)
@@ -484,7 +504,10 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 goto out;
 }
 
-setup_td_guest_attributes(x86cpu);
+r = setup_td_guest_attributes(x86cpu);
+if (r) {
+goto out;
+}
 
 memset(&init_vm, 0, sizeof(init_vm));
 init_vm.cpuid.nent = kvm_x86_arch_cpuid(env, init_vm.entries, 0);
-- 
2.27.0

[PATCH v1 27/40] i386/tdx: Setup the TD HOB list

2022-08-02 Thread Xiaoyao Li

The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.

Build the TD HOB in TDX's machine_init_done callback.

Co-developed-by: Isaku Yamahata 
Signed-off-by: Isaku Yamahata 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
Signed-off-by: Xiaoyao Li 

---
Changes from RFC v4:
  - drop the code of adding mmio resources since OVMF prepares all the
MMIO hob itself.
---
 hw/i386/meson.build   |   2 +-
 hw/i386/tdvf-hob.c| 146 ++
 hw/i386/tdvf-hob.h|  24 +++
 target/i386/kvm/tdx.c |  16 +
 4 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/tdvf-hob.c
 create mode 100644 hw/i386/tdvf-hob.h

diff --git a/hw/i386/meson.build b/hw/i386/meson.build
index 97f3b50503b0..b59e0d35bba3 100644
--- a/hw/i386/meson.build
+++ b/hw/i386/meson.build
@@ -28,7 +28,7 @@ i386_ss.add(when: 'CONFIG_PC', if_true: files(
   'port92.c'))
 i386_ss.add(when: 'CONFIG_X86_FW_OVMF', if_true: files('pc_sysfw_ovmf.c'),
 if_false: 
files('pc_sysfw_ovmf-stubs.c'))
-i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c'))
+i386_ss.add(when: 'CONFIG_TDX', if_true: files('tdvf.c', 'tdvf-hob.c'))
 
 subdir('kvm')
 subdir('xen')
diff --git a/hw/i386/tdvf-hob.c b/hw/i386/tdvf-hob.c
new file mode 100644
index ..bdf3b4823340
--- /dev/null
+++ b/hw/i386/tdvf-hob.c
@@ -0,0 +1,146 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+
+ * Copyright (c) 2020 Intel Corporation
+ * Author: Isaku Yamahata 
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "e820_memory_layout.h"
+#include "hw/i386/pc.h"
+#include "hw/i386/x86.h"
+#include "hw/pci/pcie_host.h"
+#include "sysemu/kvm.h"
+#include "standard-headers/uefi/uefi.h"
+#include "tdvf-hob.h"
+
+typedef struct TdvfHob {
+hwaddr hob_addr;
+void *ptr;
+int size;
+
+/* working area */
+void *current;
+void *end;
+} TdvfHob;
+
+static uint64_t tdvf_current_guest_addr(const TdvfHob *hob)
+{
+return hob->hob_addr + (hob->current - hob->ptr);
+}
+
+static void tdvf_align(TdvfHob *hob, size_t align)
+{
+hob->current = QEMU_ALIGN_PTR_UP(hob->current, align);
+}
+
+static void *tdvf_get_area(TdvfHob *hob, uint64_t size)
+{
+void *ret;
+
+if (hob->current + size > hob->end) {
+error_report("TD_HOB overrun, size = 0x%" PRIx64, size);
+exit(1);
+}
+
+ret = hob->current;
+hob->current += size;
+tdvf_align(hob, 8);
+return ret;
+}
+
+static void tdvf_hob_add_memory_resources(TdxGuest *tdx, TdvfHob *hob)
+{
+EFI_HOB_RESOURCE_DESCRIPTOR *region;
+EFI_RESOURCE_ATTRIBUTE_TYPE attr;
+EFI_RESOURCE_TYPE resource_type;
+
+TdxRamEntry *e;
+int i;
+
+for (i = 0; i < tdx->nr_ram_entries; i++) {
+e = &tdx->ram_entries[i];
+
+if (e->type == TDX_RAM_UNACCEPTED) {
+resource_type = EFI_RESOURCE_MEMORY_UNACCEPTED;
+attr = EFI_RESOURCE_ATTRIBUTE_TDVF_UNACCEPTED;
+} else if (e->type == TDX_RAM_ADDED){
+resource_type = EFI_RESOURCE_SYSTEM_MEMORY;
+attr = EFI_RESOURCE_ATTRIBUTE_TDVF_PRIVATE;
+} else {
+error_report("unknown TDX_RAM_ENTRY type %d", e->type);
+exit(1);
+}
+
+region = tdvf_get_area(hob, sizeof(*region));
+*region = (EFI_HOB_RESOURCE_DESCRIPTOR) {
+.Header = {
+.HobType = EFI_HOB_TYPE_RESOURCE_DESCRIPTOR,
+.HobLength = cpu_to_le16(sizeof(*region)),
+.Reserved = cpu_to_le32(0),
+},
+.Owner = EFI_HOB_OWNER_ZERO,
+.ResourceType = cpu_to_le32(resource_type),
+.ResourceAttribute = cpu_to_le32(attr),
+.PhysicalStart = cpu_to_le64(e->address),
+.ResourceLength = cpu_to_le64(e->length),
+};
+}
+}
+
+void tdvf_hob_create(TdxGuest *tdx, TdxFirmwareEntry *td_hob)
+{
+TdvfHob hob = {
+.hob_addr = td_hob->address,
+.size = td_hob->size,
+.ptr = td_hob->mem_ptr,
+
+.current = td_hob->mem_ptr,
+.end = td_hob->mem_ptr + td_hob->size,
+};
+
+

[PATCH v1 18/40] i386/tdx: Implement user specified tsc frequency

2022-08-02 Thread Xiaoyao Li

Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.

Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).

Signed-off-by: Xiaoyao Li 
---
Changes from RFC v4:
  - Use VM scope VM_SET_TSC_KHZ to set the TSC frequency of TD since KVM
side drop the @tsc_khz field in struct kvm_tdx_init_vm
---
 target/i386/kvm/kvm.c |  9 +
 target/i386/kvm/tdx.c | 24 
 2 files changed, 33 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4f491f871f3e..1545b6f870f5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -812,6 +812,15 @@ static int kvm_arch_set_tsc_khz(CPUState *cs)
 int r, cur_freq;
 bool set_ioctl = false;
 
+/*
+ * TSC of TD vcpu is immutable, it cannot be set/changed via vcpu scope
+ * VM_SET_TSC_KHZ, but only be initialized via VM scope VM_SET_TSC_KHZ
+ * before ioctl KVM_TDX_INIT_VM in tdx_pre_create_vcpu()
+ */
+if (is_tdx_vm()) {
+return 0;
+}
+
 if (!env->tsc_khz) {
 return 0;
 }
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 42cef484c574..0162d7cc9df4 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -30,6 +30,9 @@
  (1U << KVM_FEATURE_PV_SCHED_YIELD) | \
  (1U << KVM_FEATURE_MSI_EXT_DEST_ID))
 
+#define TDX_MIN_TSC_FREQUENCY_KHZ   (100 * 1000)
+#define TDX_MAX_TSC_FREQUENCY_KHZ   (10 * 1000 * 1000)
+
 #define TDX_TD_ATTRIBUTES_DEBUG BIT_ULL(0)
 #define TDX_TD_ATTRIBUTES_SEPT_VE_DISABLE   BIT_ULL(28)
 #define TDX_TD_ATTRIBUTES_PKS   BIT_ULL(30)
@@ -504,6 +507,27 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 goto out;
 }
 
+r = -EINVAL;
+if (env->tsc_khz && (env->tsc_khz < TDX_MIN_TSC_FREQUENCY_KHZ ||
+ env->tsc_khz > TDX_MAX_TSC_FREQUENCY_KHZ)) {
+error_report("Invalid TSC %ld KHz, must specify cpu_frequency between 
[%d, %d] kHz",
+  env->tsc_khz, TDX_MIN_TSC_FREQUENCY_KHZ,
+  TDX_MAX_TSC_FREQUENCY_KHZ);
+goto out;
+}
+
+if (env->tsc_khz % (25 * 1000)) {
+error_report("Invalid TSC %ld KHz, it must be multiple of 25MHz", 
env->tsc_khz);
+goto out;
+}
+
+/* it's safe even env->tsc_khz is 0. KVM uses host's tsc_khz in this case 
*/
+r = kvm_vm_ioctl(kvm_state, KVM_SET_TSC_KHZ, env->tsc_khz);
+if (r < 0) {
+error_report("Unable to set TSC frequency to %" PRId64 " kHz", 
env->tsc_khz);
+goto out;
+}
+
 r = setup_td_guest_attributes(x86cpu);
 if (r) {
 goto out;
-- 
2.27.0

[PATCH v1 40/40] docs: Add TDX documentation

2022-08-02 Thread Xiaoyao Li

Add docs/system/i386/tdx.rst for TDX support, and add tdx in
confidential-guest-support.rst

Signed-off-by: Xiaoyao Li 

---
changes in v5:
 - add the restriction that kernel-irqchip must be split
---
 docs/system/confidential-guest-support.rst |   1 +
 docs/system/i386/tdx.rst   | 105 +
 docs/system/target-i386.rst|   1 +
 3 files changed, 107 insertions(+)
 create mode 100644 docs/system/i386/tdx.rst

diff --git a/docs/system/confidential-guest-support.rst 
b/docs/system/confidential-guest-support.rst
index 0c490dbda2b7..66129fbab64c 100644
--- a/docs/system/confidential-guest-support.rst
+++ b/docs/system/confidential-guest-support.rst
@@ -38,6 +38,7 @@ Supported mechanisms
 Currently supported confidential guest mechanisms are:
 
 * AMD Secure Encrypted Virtualization (SEV) (see 
:doc:`i386/amd-memory-encryption`)
+* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
 * POWER Protected Execution Facility (PEF) (see 
:ref:`power-papr-protected-execution-facility-pef`)
 * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
 
diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
new file mode 100644
index ..1f95e742f75c
--- /dev/null
+++ b/docs/system/i386/tdx.rst
@@ -0,0 +1,105 @@
+Intel Trusted Domain eXtension (TDX)
+
+
+Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that 
extends
+Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
+with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
+in a CPU mode that is designed to protect the confidentiality of its memory
+contents and its CPU state from any other software, including the hosting
+Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
+
+Prerequisites
+-
+
+To run TD, the physical machine needs to have TDX module loaded and initialized
+while KVM hypervisor has TDX support and has TDX enabled. If those requirements
+are met, the ``KVM_CAP_VM_TYPES`` will report the support of 
``KVM_X86_TDX_VM``.
+
+Trust Domain Virtual Firmware (TDVF)
+
+
+Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
+TD Guest OS. TDVF needs to be copied to guest private memory and measured 
before
+a TD boots.
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command 
``KVM_TDX_INIT_MEM_REGION``
+to copy the TDVF image to TD's private memory space.
+
+Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
+device and it actually works as RAM. "-bios" option is chosen to load TDVF.
+
+OVMF is the opensource firmware that implements the TDVF support. Thus the
+command line to specify and load TDVF is ``-bios OVMF.fd``
+
+Feature Control
+---
+
+Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD is not
+under full control of VMM. VMM can only configure part of features of a TD on
+``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
+
+The configurable features have three types:
+
+- Attributes:
+  - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
+  which determines related CPUID bit and CR4 bit;
+  - PERFMON (bit 63) controls whether PMU is exposed to TD.
+
+- XSAVE related features (XFAM):
+  XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
+  determines the set of extended features available for use by the guest TD.
+
+- CPUID features:
+  Only some bits of some CPUID leaves are directly configurable by VMM.
+
+What features can be configured is reported via TDX capabilities.
+
+TDX capabilities
+
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command 
``KVM_TDX_CAPABILITIES``
+to get the TDX capabilities from KVM. It returns a data structure of
+``struct kvm_tdx_capabilites``, which tells the supported configuration of
+attributes, XFAM and CPUIDs.
+
+Launching a TD (TDX VM)
+---
+
+To launch a TDX guest:
+
+.. parsed-literal::
+
+|qemu_system_x86| \\
+-machine ...,kernel-irqchip=split,confidential-guest-support=tdx0 \\
+-object tdx-guest,id=tdx0 \\
+-bios OVMF.fd \\
+
+Debugging
+-
+
+Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
+debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
+accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke 
those
+SEAMCALLs and resonponding QEMU change.
+
+It's targeted as future work.
+
+restrictions
+
+
+ - kernel-irqchip must be split;
+
+ - No readonly support for private memory;
+
+ - No SMM support: SMM support requires manipulating the guset register states
+   which is not allowed;
+
+Live Migration
+--
+
+TODO
+
+References
+--
+
+- `TDX Homepage

[PATCH v1 23/40] i386/tdx: Don't initialize pc.rom for TDX VMs

2022-08-02 Thread Xiaoyao Li

For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.

Signed-off-by: Xiaoyao Li 
---
This is more as a workaround of the issue that for q35 machine type, the
real memslot update (which requires memslot deletion )for pc.rom happens
after tdx_init_memory_region. It leads to the private memory ADD'ed
before get lost. I haven't work out a good solution to resolve the
order issue. So just skip the pc.rom setup to avoid memslot deletion.
---
 hw/i386/pc.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1f62971759bf..c089dc49485d 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,6 +62,7 @@
 #include "sysemu/reset.h"
 #include "sysemu/runstate.h"
 #include "kvm/kvm_i386.h"
+#include "kvm/tdx.h"
 #include "hw/xen/xen.h"
 #include "hw/xen/start_info.h"
 #include "ui/qemu-spice.h"
@@ -1084,16 +1085,18 @@ void pc_memory_init(PCMachineState *pcms,
 /* Initialize PC system firmware */
 pc_system_firmware_init(pcms, rom_memory);
 
-option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
-   &error_fatal);
-if (pcmc->pci_enabled) {
-memory_region_set_readonly(option_rom_mr, true);
+if (!is_tdx_vm()) {
+option_rom_mr = g_malloc(sizeof(*option_rom_mr));
+memory_region_init_ram(option_rom_mr, NULL, "pc.rom", PC_ROM_SIZE,
+&error_fatal);
+if (pcmc->pci_enabled) {
+memory_region_set_readonly(option_rom_mr, true);
+}
+memory_region_add_subregion_overlap(rom_memory,
+PC_ROM_MIN_VGA,
+option_rom_mr,
+1);
 }
-memory_region_add_subregion_overlap(rom_memory,
-PC_ROM_MIN_VGA,
-option_rom_mr,
-1);
 
 fw_cfg = fw_cfg_arch_create(machine,
 x86ms->boot_cpus, x86ms->apic_id_limit);
-- 
2.27.0

[PATCH v1 22/40] i386/tdx: Skip BIOS shadowing setup

2022-08-02 Thread Xiaoyao Li

TDX doesn't support map different GPAs to same private memory. Thus,
aliasing top 128KB of BIOS as isa-bios is not supported.

On the other hand, TDX guest cannot go to real mode, it can work fine
without isa-bios.

Signed-off-by: Xiaoyao Li 
---
Changes from RFC v4:
 - update commit message and comment to clarify
---
 hw/i386/x86.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 006b0e670e4d..a389ee26265a 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1172,17 +1172,20 @@ void x86_bios_rom_init(MachineState *ms, const char 
*default_firmware,
 }
 g_free(filename);
 
-/* map the last 128KB of the BIOS in ISA space */
-isa_bios_size = MIN(bios_size, 128 * KiB);
-isa_bios = g_malloc(sizeof(*isa_bios));
-memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
- bios_size - isa_bios_size, isa_bios_size);
-memory_region_add_subregion_overlap(rom_memory,
-0x10 - isa_bios_size,
-isa_bios,
-1);
-if (!isapc_ram_fw) {
-memory_region_set_readonly(isa_bios, true);
+/* For TDX, alias different GPAs to same private memory is not supported */
+if (!is_tdx_vm()) {
+/* map the last 128KB of the BIOS in ISA space */
+isa_bios_size = MIN(bios_size, 128 * KiB);
+isa_bios = g_malloc(sizeof(*isa_bios));
+memory_region_init_alias(isa_bios, NULL, "isa-bios", bios,
+bios_size - isa_bios_size, isa_bios_size);
+memory_region_add_subregion_overlap(rom_memory,
+0x10 - isa_bios_size,
+isa_bios,
+1);
+if (!isapc_ram_fw) {
+memory_region_set_readonly(isa_bios, true);
+}
 }
 
 /* map all the bios at the top of memory */
-- 
2.27.0

[PATCH v1 21/40] i386/tdx: Parse TDVF metadata for TDX VM

2022-08-02 Thread Xiaoyao Li

TDX cannot support pflash device since it doesn't support read-only
memslot and doesn't support emulation. Load TDVF(OVMF) with -bios option
for TDs.

When boot a TD, besides load TDVF to the address below 4G, it needs
parse TDVF metadata.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 hw/i386/pc_sysfw.c | 7 +++
 hw/i386/x86.c  | 3 ++-
 target/i386/kvm/tdx-stub.c | 5 +
 target/i386/kvm/tdx.c  | 5 +
 target/i386/kvm/tdx.h  | 4 
 5 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index c8d9e71b889b..cf63434ba89d 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -37,6 +37,7 @@
 #include "hw/block/flash.h"
 #include "sysemu/kvm.h"
 #include "sev.h"
+#include "kvm/tdx.h"
 
 #define FLASH_SECTOR_SIZE 4096
 
@@ -265,5 +266,11 @@ void x86_firmware_configure(void *ptr, int size)
 }
 
 sev_encrypt_flash(ptr, size, &error_fatal);
+} else if (is_tdx_vm()) {
+ret = tdx_parse_tdvf(ptr, size);
+if (ret) {
+error_report("failed to parse TDVF for TDX VM");
+exit(1);
+}
 }
 }
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a15fadeb0e68..006b0e670e4d 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -49,6 +49,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "kvm/tdx.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -1149,7 +1150,7 @@ void x86_bios_rom_init(MachineState *ms, const char 
*default_firmware,
 }
 bios = g_malloc(sizeof(*bios));
 memory_region_init_ram(bios, NULL, "pc.bios", bios_size, &error_fatal);
-if (sev_enabled()) {
+if (sev_enabled() || is_tdx_vm()) {
 /*
  * The concept of a "reset" simply doesn't exist for
  * confidential computing guests, we have to destroy and
diff --git a/target/i386/kvm/tdx-stub.c b/target/i386/kvm/tdx-stub.c
index 2871de9d7b56..395a59721266 100644
--- a/target/i386/kvm/tdx-stub.c
+++ b/target/i386/kvm/tdx-stub.c
@@ -12,3 +12,8 @@ int tdx_pre_create_vcpu(CPUState *cpu)
 {
 return -EINVAL;
 }
+
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+return -EINVAL;
+}
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 3aa0e374a514..25b3e2058cb3 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -561,6 +561,11 @@ out:
 return r;
 }
 
+int tdx_parse_tdvf(void *flash_ptr, int size)
+{
+return tdvf_parse_metadata(&tdx_guest->tdvf, flash_ptr, size);
+}
+
 static bool tdx_guest_get_sept_ve_disable(Object *obj, Error **errp)
 {
 TdxGuest *tdx = TDX_GUEST(obj);
diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
index 46a24ee8c7cc..12bcf25bb95b 100644
--- a/target/i386/kvm/tdx.h
+++ b/target/i386/kvm/tdx.h
@@ -6,6 +6,7 @@
 #endif
 
 #include "exec/confidential-guest-support.h"
+#include "hw/i386/tdvf.h"
 
 #define TYPE_TDX_GUEST "tdx-guest"
 #define TDX_GUEST(obj)  OBJECT_CHECK(TdxGuest, (obj), TYPE_TDX_GUEST)
@@ -21,6 +22,8 @@ typedef struct TdxGuest {
 
 bool initialized;
 uint64_t attributes;/* TD attributes */
+
+TdxFirmware tdvf;
 } TdxGuest;
 
 #ifdef CONFIG_TDX
@@ -33,5 +36,6 @@ int tdx_kvm_init(MachineState *ms, Error **errp);
 void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
  uint32_t *ret);
 int tdx_pre_create_vcpu(CPUState *cpu);
+int tdx_parse_tdvf(void *flash_ptr, int size);
 
 #endif /* QEMU_I386_TDX_H */
-- 
2.27.0

[PATCH v1 38/40] i386/tdx: Skip kvm_put_apicbase() for TDs

2022-08-02 Thread Xiaoyao Li

KVM doesn't allow wirting to MSR_IA32_APICBASE for TDs.

Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 53ab539e7e4d..948c87ebdb97 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2949,6 +2949,11 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value)
 {
 int ret;
 
+/* TODO: Allow accessing guest state for debug TDs. */
+if (is_tdx_vm()) {
+return;
+}
+
 ret = kvm_put_one_msr(cpu, MSR_IA32_APICBASE, value);
 assert(ret == 1);
 }
-- 
2.27.0

[PATCH v1 24/40] i386/tdx: Track mem_ptr for each firmware entry of TDVF

2022-08-02 Thread Xiaoyao Li

For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).

Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.

TDVF sections can be classified into two groups:
 - Firmware itself, e.g., TDVF BFV and CFV, that located separately from
   guest RAM. Its memory pointer is the bios pointer.

 - Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
   mmap a new memory range for them.

Register a machine_init_done callback to do the stuff.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 hw/i386/tdvf.c |  1 +
 include/hw/i386/tdvf.h |  7 +++
 target/i386/kvm/tdx.c  | 32 
 3 files changed, 40 insertions(+)

diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
index a40198f9407a..dca209098f7a 100644
--- a/hw/i386/tdvf.c
+++ b/hw/i386/tdvf.c
@@ -187,6 +187,7 @@ int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, 
int size)
 }
 g_free(sections);
 
+fw->mem_ptr = flash_ptr;
 return 0;
 
 err:
diff --git a/include/hw/i386/tdvf.h b/include/hw/i386/tdvf.h
index 593341eb2e93..d880af245a73 100644
--- a/include/hw/i386/tdvf.h
+++ b/include/hw/i386/tdvf.h
@@ -39,13 +39,20 @@ typedef struct TdxFirmwareEntry {
 uint64_t size;
 uint32_t type;
 uint32_t attributes;
+
+void *mem_ptr;
 } TdxFirmwareEntry;
 
 typedef struct TdxFirmware {
+void *mem_ptr;
+
 uint32_t nr_entries;
 TdxFirmwareEntry *entries;
 } TdxFirmware;
 
+#define for_each_tdx_fw_entry(fw, e)\
+for (e = (fw)->entries; e != (fw)->entries + (fw)->nr_entries; e++)
+
 int tdvf_parse_metadata(TdxFirmware *fw, void *flash_ptr, int size);
 
 #endif /* HW_I386_TDVF_H */
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 25b3e2058cb3..95a9c2b26516 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -12,12 +12,15 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/mmap-alloc.h"
 #include "qapi/error.h"
 #include "qom/object_interfaces.h"
 #include "standard-headers/asm-x86/kvm_para.h"
 #include "sysemu/kvm.h"
+#include "sysemu/sysemu.h"
 
 #include "hw/i386/x86.h"
+#include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
 #include "tdx.h"
 #include "../cpu-internal.h"
@@ -450,6 +453,33 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
 (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
+static void tdx_finalize_vm(Notifier *notifier, void *unused)
+{
+TdxFirmware *tdvf = &tdx_guest->tdvf;
+TdxFirmwareEntry *entry;
+
+for_each_tdx_fw_entry(tdvf, entry) {
+switch (entry->type) {
+case TDVF_SECTION_TYPE_BFV:
+case TDVF_SECTION_TYPE_CFV:
+entry->mem_ptr = tdvf->mem_ptr + entry->data_offset;
+break;
+case TDVF_SECTION_TYPE_TD_HOB:
+case TDVF_SECTION_TYPE_TEMP_MEM:
+entry->mem_ptr = qemu_ram_mmap(-1, entry->size,
+   qemu_real_host_page_size(), 0, 0);
+break;
+default:
+error_report("Unsupported TDVF section %d", entry->type);
+exit(1);
+}
+}
+}
+
+static Notifier tdx_machine_done_notify = {
+.notify = tdx_finalize_vm,
+};
+
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
 TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
@@ -470,6 +500,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
  */
 kvm_readonly_mem_allowed = false;
 
+qemu_add_machine_init_done_notifier(&tdx_machine_done_notify);
+
 tdx_guest = tdx;
 
 return 0;
-- 
2.27.0

[PATCH v4 0/4] Enable unix socket support on Windows

2022-08-02 Thread Bin Meng

Support for the unix socket has existed both in BSD and Linux for the
longest time, but not on Windows. Since Windows 10 build 17063 [1],
the native support for the unix socket has come to Windows. Starting
this build, two Win32 processes can use the AF_UNIX address family
over Winsock API to communicate with each other.

[1] https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

Changes in v4:
- instead of introducing CONFIG_AF_UNIX, add fallback afunix.h header
  in os-win32.h, and compile the AF_UNIX stuff for all Windows hosts
- drop CONFIG_AF_UNIX
- introduce a new helper socket_check_afunix_support() to runtime-check
  the availability of AF_UNIX socket, and skip those appropriately

Changes in v3:
- drop the run-time check afunix_available()

Changes in v2:
- move #include  to os-win32.h
- define WIN_BUILD_AF_UNIX only when CONFIG_WIN32
- drop #include  as it is now already included in osdep.h
- new patch: tests/unit: Update test-io-channel-socket.c for Windows

Bin Meng (4):
  util/qemu-sockets: Replace the call to close a socket with
closesocket()
  util/qemu-sockets: Enable unix socket support on Windows
  chardev/char-socket: Update AF_UNIX for Windows
  tests/unit: Update test-io-channel-socket.c for Windows

 meson.build |  3 +++
 include/sysemu/os-win32.h   | 17 +
 tests/unit/socket-helpers.h |  9 +++
 chardev/char-socket.c   |  4 ++--
 tests/unit/socket-helpers.c | 16 +
 tests/unit/test-io-channel-socket.c | 37 ++---
 util/qemu-sockets.c | 29 ++
 7 files changed, 72 insertions(+), 43 deletions(-)

-- 
2.34.1

[PATCH v4 2/4] util/qemu-sockets: Enable unix socket support on Windows

2022-08-02 Thread Bin Meng

From: Bin Meng 

Support for the unix socket has existed both in BSD and Linux for the
longest time, but not on Windows. Since Windows 10 build 17063 [1],
the native support for the unix socket has come to Windows. Starting
this build, two Win32 processes can use the AF_UNIX address family
over Winsock API to communicate with each other.

[1] https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/

Signed-off-by: Xuzhou Cheng 
Signed-off-by: Bin Meng 
---

Changes in v4:
- instead of introducing CONFIG_AF_UNIX, add fallback afunix.h header
  in os-win32.h, and compile the AF_UNIX stuff for all Windows hosts

Changes in v3:
- drop the run-time check afunix_available()

Changes in v2:
- move #include  to os-win32.h
- define WIN_BUILD_AF_UNIX only when CONFIG_WIN32

 meson.build   |  3 +++
 include/sysemu/os-win32.h | 17 +
 util/qemu-sockets.c   | 25 -
 3 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/meson.build b/meson.build
index 294e9a8f32..6749223f1a 100644
--- a/meson.build
+++ b/meson.build
@@ -1890,6 +1890,9 @@ config_host_data.set('HAVE_PTY_H', cc.has_header('pty.h'))
 config_host_data.set('HAVE_SYS_DISK_H', cc.has_header('sys/disk.h'))
 config_host_data.set('HAVE_SYS_IOCCOM_H', cc.has_header('sys/ioccom.h'))
 config_host_data.set('HAVE_SYS_KCOV_H', cc.has_header('sys/kcov.h'))
+if targetos == 'windows'
+  config_host_data.set('HAVE_AFUNIX_H', cc.has_header('afunix.h'))
+endif
 
 # has_function
 config_host_data.set('CONFIG_ACCEPT4', cc.has_function('accept4'))
diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index edc3b38a57..5b38c7bd04 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -30,6 +30,23 @@
 #include 
 #include 
 
+#ifdef HAVE_AFUNIX_H
+#include 
+#else
+/*
+ * Fallback definitions of things we need in afunix.h, if not available from
+ * the used Windows SDK or MinGW headers.
+ */
+#define UNIX_PATH_MAX 108
+
+typedef struct sockaddr_un {
+ADDRESS_FAMILY sun_family;
+char sun_path[UNIX_PATH_MAX];
+} SOCKADDR_UN, *PSOCKADDR_UN;
+
+#define SIO_AF_UNIX_GETPEERPID _WSAIOR(IOC_VENDOR, 256)
+#endif
+
 #ifdef __cplusplus
 extern "C" {
 #endif
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 0e2298278f..83f4bd6fd2 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -880,8 +880,6 @@ static int vsock_parse(VsockSocketAddress *addr, const char 
*str,
 }
 #endif /* CONFIG_AF_VSOCK */
 
-#ifndef _WIN32
-
 static bool saddr_is_abstract(UnixSocketAddress *saddr)
 {
 #ifdef CONFIG_LINUX
@@ -1054,25 +1052,6 @@ static int unix_connect_saddr(UnixSocketAddress *saddr, 
Error **errp)
 return -1;
 }
 
-#else
-
-static int unix_listen_saddr(UnixSocketAddress *saddr,
- int num,
- Error **errp)
-{
-error_setg(errp, "unix sockets are not available on windows");
-errno = ENOTSUP;
-return -1;
-}
-
-static int unix_connect_saddr(UnixSocketAddress *saddr, Error **errp)
-{
-error_setg(errp, "unix sockets are not available on windows");
-errno = ENOTSUP;
-return -1;
-}
-#endif
-
 /* compatibility wrapper */
 int unix_listen(const char *str, Error **errp)
 {
@@ -1335,7 +1314,6 @@ socket_sockaddr_to_address_inet(struct sockaddr_storage 
*sa,
 }
 
 
-#ifndef WIN32
 static SocketAddress *
 socket_sockaddr_to_address_unix(struct sockaddr_storage *sa,
 socklen_t salen,
@@ -1362,7 +1340,6 @@ socket_sockaddr_to_address_unix(struct sockaddr_storage 
*sa,
 addr->u.q_unix.path = g_strndup(su->sun_path, salen);
 return addr;
 }
-#endif /* WIN32 */
 
 #ifdef CONFIG_AF_VSOCK
 static SocketAddress *
@@ -1394,10 +1371,8 @@ socket_sockaddr_to_address(struct sockaddr_storage *sa,
 case AF_INET6:
 return socket_sockaddr_to_address_inet(sa, salen, errp);
 
-#ifndef WIN32
 case AF_UNIX:
 return socket_sockaddr_to_address_unix(sa, salen, errp);
-#endif /* WIN32 */
 
 #ifdef CONFIG_AF_VSOCK
 case AF_VSOCK:
-- 
2.34.1

[PATCH v1 26/40] headers: Add definitions from UEFI spec for volumes, resources, etc...

2022-08-02 Thread Xiaoyao Li

Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
to the Trusted Domain Virtual Firmware (TDVF).

All values come from the UEFI specification and TDVF design guide. [1]

Note, EFI_RESOURCE_MEMORY_UNACCEPTED will be added in future UEFI spec.

[1] 
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Signed-off-by: Xiaoyao Li 
---
 include/standard-headers/uefi/uefi.h | 198 +++
 1 file changed, 198 insertions(+)
 create mode 100644 include/standard-headers/uefi/uefi.h

diff --git a/include/standard-headers/uefi/uefi.h 
b/include/standard-headers/uefi/uefi.h
new file mode 100644
index ..b15aba796156
--- /dev/null
+++ b/include/standard-headers/uefi/uefi.h
@@ -0,0 +1,198 @@
+/*
+ * Copyright (C) 2020 Intel Corporation
+ *
+ * Author: Isaku Yamahata 
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ *
+ */
+
+#ifndef HW_I386_UEFI_H
+#define HW_I386_UEFI_H
+
+/***/
+/*
+ * basic EFI definitions
+ * supplemented with UEFI Specification Version 2.8 (Errata A)
+ * released February 2020
+ */
+/* UEFI integer is little endian */
+
+typedef struct {
+uint32_t Data1;
+uint16_t Data2;
+uint16_t Data3;
+uint8_t Data4[8];
+} EFI_GUID;
+
+typedef enum {
+EfiReservedMemoryType,
+EfiLoaderCode,
+EfiLoaderData,
+EfiBootServicesCode,
+EfiBootServicesData,
+EfiRuntimeServicesCode,
+EfiRuntimeServicesData,
+EfiConventionalMemory,
+EfiUnusableMemory,
+EfiACPIReclaimMemory,
+EfiACPIMemoryNVS,
+EfiMemoryMappedIO,
+EfiMemoryMappedIOPortSpace,
+EfiPalCode,
+EfiPersistentMemory,
+EfiUnacceptedMemoryType,
+EfiMaxMemoryType
+} EFI_MEMORY_TYPE;
+
+#define EFI_HOB_HANDOFF_TABLE_VERSION 0x0009
+
+#define EFI_HOB_TYPE_HANDOFF  0x0001
+#define EFI_HOB_TYPE_MEMORY_ALLOCATION0x0002
+#define EFI_HOB_TYPE_RESOURCE_DESCRIPTOR  0x0003
+#define EFI_HOB_TYPE_GUID_EXTENSION   0x0004
+#define EFI_HOB_TYPE_FV   0x0005
+#define EFI_HOB_TYPE_CPU  0x0006
+#define EFI_HOB_TYPE_MEMORY_POOL  0x0007
+#define EFI_HOB_TYPE_FV2  0x0009
+#define EFI_HOB_TYPE_LOAD_PEIM_UNUSED 0x000A
+#define EFI_HOB_TYPE_UEFI_CAPSULE 0x000B
+#define EFI_HOB_TYPE_FV3  0x000C
+#define EFI_HOB_TYPE_UNUSED   0xFFFE
+#define EFI_HOB_TYPE_END_OF_HOB_LIST  0x
+
+typedef struct {
+uint16_t HobType;
+uint16_t HobLength;
+uint32_t Reserved;
+} EFI_HOB_GENERIC_HEADER;
+
+typedef uint64_t EFI_PHYSICAL_ADDRESS;
+typedef uint32_t EFI_BOOT_MODE;
+
+typedef struct {
+EFI_HOB_GENERIC_HEADER Header;
+uint32_t Version;
+EFI_BOOT_MODE BootMode;
+EFI_PHYSICAL_ADDRESS EfiMemoryTop;
+EFI_PHYSICAL_ADDRESS EfiMemoryBottom;
+EFI_PHYSICAL_ADDRESS EfiFreeMemoryTop;
+EFI_PHYSICAL_ADDRESS EfiFreeMemoryBottom;
+EFI_PHYSICAL_ADDRESS EfiEndOfHobList;
+} EFI_HOB_HANDOFF_INFO_TABLE;
+
+#define EFI_RESOURCE_SYSTEM_MEMORY  0x
+#define EFI_RESOURCE_MEMORY_MAPPED_IO   0x0001
+#define EFI_RESOURCE_IO 0x0002
+#define EFI_RESOURCE_FIRMWARE_DEVICE0x0003
+#define EFI_RESOURCE_MEMORY_MAPPED_IO_PORT  0x0004
+#define EFI_RESOURCE_MEMORY_RESERVED0x0005
+#define EFI_RESOURCE_IO_RESERVED0x0006
+#define EFI_RESOURCE_MEMORY_UNACCEPTED  0x0007
+#define EFI_RESOURCE_MAX_MEMORY_TYPE0x0008
+
+#define EFI_RESOURCE_ATTRIBUTE_PRESENT  0x0001
+#define EFI_RESOURCE_ATTRIBUTE_INITIALIZED  0x0002
+#define EFI_RESOURCE_ATTRIBUTE_TESTED   0x0004
+#define EFI_RESOURCE_ATTRIBUTE_SINGLE_BIT_ECC   0x0008
+#define EFI_RESOURCE_ATTRIBUTE_MULTIPLE_BIT_ECC 0x0010
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_1   0x0020
+#define EFI_RESOURCE_ATTRIBUTE_ECC_RESERVED_2   0x0040
+#define EFI_RESOURCE_ATTRIBUTE_READ_PROTECTED   0x0080
+#define EFI_RESOURCE_ATTRIBUTE_WRITE_PROTECTED  0x0100
+#define EFI_RESOURCE_ATTRIBUTE_EXECUTION_PROTECTED  0x0200
+#define EFI_RESOURCE_ATTRIBUTE_U

[PATCH v1 34/40] hw/i386: add eoi_intercept_unsupported member to X86MachineState

2022-08-02 Thread Xiaoyao Li

Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.

Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.

Signed-off-by: Xiaoyao Li 
---
 hw/i386/x86.c | 1 +
 include/hw/i386/x86.h | 1 +
 target/i386/kvm/tdx.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a389ee26265a..6ab023713bf1 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1401,6 +1401,7 @@ static void x86_machine_initfn(Object *obj)
 x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 x86ms->bus_lock_ratelimit = 0;
 x86ms->above_4g_mem_start = 4 * GiB;
+x86ms->eoi_intercept_unsupported = false;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 62fa5774f849..0a294f9c3176 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -61,6 +61,7 @@ struct X86MachineState {
 
 /* CPU and apic information: */
 bool apic_xrupt_override;
+bool eoi_intercept_unsupported;
 unsigned pci_irq_mask;
 unsigned apic_id_limit;
 uint16_t boot_cpus;
diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 2f317a6bb55b..c734772200d0 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -675,6 +675,8 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 return -EINVAL;
 }
 
+x86ms->eoi_intercept_unsupported = true;
+
 if (!tdx_caps) {
 get_tdx_capabilities();
 }
-- 
2.27.0

[PATCH v1 25/40] i386/tdx: Track RAM entries for TDX VM

2022-08-02 Thread Xiaoyao Li

The RAM of TDX VM can be classified into two types:

 - TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
   accepted by TDX guest before it can be used and will be all-zeros
   after being accepted.

 - TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
   can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.

Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.

Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.

The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.

Signed-off-by: Xiaoyao Li 

---
Changes from RFC v4:
  - simplify the algorithm of tdx_accept_ram_range() (Suggested-by: Gerd 
Hoffman)
(1) Change the existing entry to cover the accepted ram range.
(2) If there is room before the accepted ram range add a
TDX_RAM_UNACCEPTED entry for that.
(3) If there is room after the accepted ram range add a
TDX_RAM_UNACCEPTED entry for that.
---
 target/i386/kvm/tdx.c | 110 ++
 target/i386/kvm/tdx.h |  14 ++
 2 files changed, 124 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 95a9c2b26516..59cff141b4f3 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -19,6 +19,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
 
+#include "hw/i386/e820_memory_layout.h"
 #include "hw/i386/x86.h"
 #include "hw/i386/tdvf.h"
 #include "kvm_i386.h"
@@ -453,11 +454,116 @@ static void update_tdx_cpuid_lookup_by_tdx_caps(void)
 (tdx_caps->xfam_fixed1 & CPUID_XSTATE_XSS_MASK) >> 32;
 }
 
+static void tdx_add_ram_entry(uint64_t address, uint64_t length, uint32_t type)
+{
+uint32_t nr_entries = tdx_guest->nr_ram_entries;
+tdx_guest->ram_entries = g_renew(TdxRamEntry, tdx_guest->ram_entries,
+ nr_entries + 1);
+
+tdx_guest->ram_entries[nr_entries].address = address;
+tdx_guest->ram_entries[nr_entries].length = length;
+tdx_guest->ram_entries[nr_entries].type = type;
+tdx_guest->nr_ram_entries++;
+}
+
+static int tdx_accept_ram_range(uint64_t address, uint64_t length)
+{
+uint64_t head_start, tail_start, head_length, tail_length;
+uint64_t tmp_address, tmp_length;
+TdxRamEntry *e;
+int i;
+
+for (i = 0; i < tdx_guest->nr_ram_entries; i++) {
+e = &tdx_guest->ram_entries[i];
+
+if (address + length <= e->address ||
+e->address + e->length <= address) {
+continue;
+}
+
+/*
+ * The to-be-accepted ram range must be fully contained by one
+ * RAM entry.
+ */
+if (e->address > address ||
+e->address + e->length < address + length) {
+return -EINVAL;
+}
+
+if (e->type == TDX_RAM_ADDED) {
+return -EINVAL;
+}
+
+break;
+}
+
+if (i == tdx_guest->nr_ram_entries) {
+return -1;
+}
+
+tmp_address = e->address;
+tmp_length = e->length;
+
+e->address = address;
+e->length = length;
+e->type = TDX_RAM_ADDED;
+
+head_length = address - tmp_address;
+if (head_length > 0) {
+head_start = tmp_address;
+tdx_add_ram_entry(head_start, head_length, TDX_RAM_UNACCEPTED);
+}
+
+tail_start = address + length;
+if (tail_start < tmp_address + tmp_length) {
+tail_length = tmp_address + tmp_length - tail_start;
+tdx_add_ram_entry(tail_start, tail_length, TDX_RAM_UNACCEPTED);
+}
+
+return 0;
+}
+
+static int tdx_ram_entry_compare(const void *lhs_, const void* rhs_)
+{
+const TdxRamEntry *lhs = lhs_;
+const TdxRamEntry *rhs = rhs_;
+
+if (lhs->address == rhs->address) {
+return 0;
+}
+if (le64_to_cpu(lhs->address) > le64_to_cpu(rhs->address)) {
+return 1;
+}
+return -1;
+}
+
+static void tdx_init_ram_entries(void)
+{
+unsigned i, j, nr_e820_entries;
+
+nr_e820_entries = e820_get_num_entries();
+tdx_guest->ram_entries = g_new(TdxRamEntry, nr_e820_entries);
+
+for (i = 0, j = 0; i < nr_e820_entries; i++) {
+uint64_t addr, len;
+
+if (e820_get_entry(i, E820_RAM, &addr, &len)) {
+tdx_guest->ram_entries[j].address = addr;
+tdx_guest->ram_entries[j].length = len;
+tdx_guest->ram_entries[j].type = TDX_RAM_UNACCEPTED;
+j++;
+}
+}
+tdx_guest->nr_ram_entries = j;
+}
+
 static void tdx_finalize_vm(Notifier *notifier, void *unused)
 {
 TdxFirmware *tdvf = &tdx_guest->tdvf;
 TdxFirmwareEntry *entry;
 
+tdx_init_ram_entries();
+
 for_each_tdx_fw_entry(tdvf, entry) {
 switch (entry->type) {
 case TDVF_SECTION_TYPE_BFV:
@@ -468,12 +574,16 @@ static void tdx_finaliz

[PATCH v1 28/40] i386/tdx: Add TDVF memory via KVM_TDX_INIT_MEM_REGION

2022-08-02 Thread Xiaoyao Li

From: Isaku Yamahata 

TDVF firmware (CODE and VARS) needs to be added/copied to TD's private
memory via KVM_TDX_INIT_MEM_REGION, as well as TD HOB and TEMP memory.

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 

---
Changes from RFC v4:
  - rename variable @metadata to @flags
---
 target/i386/kvm/tdx.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 944f2f5b6921..d0bbe06f5504 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -575,6 +575,7 @@ static void tdx_finalize_vm(Notifier *notifier, void 
*unused)
 {
 TdxFirmware *tdvf = &tdx_guest->tdvf;
 TdxFirmwareEntry *entry;
+int r;
 
 tdx_init_ram_entries();
 
@@ -600,6 +601,29 @@ static void tdx_finalize_vm(Notifier *notifier, void 
*unused)
   sizeof(TdxRamEntry), &tdx_ram_entry_compare);
 
 tdvf_hob_create(tdx_guest, tdx_get_hob_entry(tdx_guest));
+
+for_each_tdx_fw_entry(tdvf, entry) {
+struct kvm_tdx_init_mem_region mem_region = {
+.source_addr = (__u64)entry->mem_ptr,
+.gpa = entry->address,
+.nr_pages = entry->size / 4096,
+};
+
+__u32 flags = entry->attributes & TDVF_SECTION_ATTRIBUTES_MR_EXTEND ?
+  KVM_TDX_MEASURE_MEMORY_REGION : 0;
+
+r = tdx_vm_ioctl(KVM_TDX_INIT_MEM_REGION, flags, &mem_region);
+if (r < 0) {
+ error_report("KVM_TDX_INIT_MEM_REGION failed %s", strerror(-r));
+ exit(1);
+}
+
+if (entry->type == TDVF_SECTION_TYPE_TD_HOB ||
+entry->type == TDVF_SECTION_TYPE_TEMP_MEM) {
+qemu_ram_munmap(-1, entry->mem_ptr, entry->size);
+entry->mem_ptr = NULL;
+}
+}
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0

[PATCH v1 35/40] hw/i386: add option to forcibly report edge trigger in acpi tables

2022-08-02 Thread Xiaoyao Li

From: Isaku Yamahata 

When level trigger isn't supported on x86 platform,
forcibly report edge trigger in acpi tables.

Signed-off-by: Isaku Yamahata 
Signed-off-by: Xiaoyao Li 
---
 hw/i386/acpi-build.c  | 99 ---
 hw/i386/acpi-common.c | 50 --
 2 files changed, 104 insertions(+), 45 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 0355bd3ddaad..83d4777ca9ad 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -894,7 +894,8 @@ static void build_dbg_aml(Aml *table)
 aml_append(table, scope);
 }
 
-static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg)
+static Aml *build_link_dev(const char *name, uint8_t uid, Aml *reg,
+   bool level_trigger_unsupported)
 {
 Aml *dev;
 Aml *crs;
@@ -906,7 +907,10 @@ static Aml *build_link_dev(const char *name, uint8_t uid, 
Aml *reg)
 aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
 
 crs = aml_resource_template();
-aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+aml_append(crs, aml_interrupt(AML_CONSUMER,
+  level_trigger_unsupported ?
+  AML_EDGE : AML_LEVEL,
+  AML_ACTIVE_HIGH,
   AML_SHARED, irqs, ARRAY_SIZE(irqs)));
 aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -930,7 +934,8 @@ static Aml *build_link_dev(const char *name, uint8_t uid, 
Aml *reg)
 return dev;
  }
 
-static Aml *build_gsi_link_dev(const char *name, uint8_t uid, uint8_t gsi)
+static Aml *build_gsi_link_dev(const char *name, uint8_t uid,
+   uint8_t gsi, bool level_trigger_unsupported)
 {
 Aml *dev;
 Aml *crs;
@@ -943,7 +948,10 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t 
uid, uint8_t gsi)
 
 crs = aml_resource_template();
 irqs = gsi;
-aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL, AML_ACTIVE_HIGH,
+aml_append(crs, aml_interrupt(AML_CONSUMER,
+  level_trigger_unsupported ?
+  AML_EDGE : AML_LEVEL,
+  AML_ACTIVE_HIGH,
   AML_SHARED, &irqs, 1));
 aml_append(dev, aml_name_decl("_PRS", crs));
 
@@ -962,7 +970,7 @@ static Aml *build_gsi_link_dev(const char *name, uint8_t 
uid, uint8_t gsi)
 }
 
 /* _CRS method - get current settings */
-static Aml *build_iqcr_method(bool is_piix4)
+static Aml *build_iqcr_method(bool is_piix4, bool level_trigger_unsupported)
 {
 Aml *if_ctx;
 uint32_t irqs;
@@ -970,7 +978,9 @@ static Aml *build_iqcr_method(bool is_piix4)
 Aml *crs = aml_resource_template();
 
 irqs = 0;
-aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+aml_append(crs, aml_interrupt(AML_CONSUMER,
+  level_trigger_unsupported ?
+  AML_EDGE : AML_LEVEL,
   AML_ACTIVE_HIGH, AML_SHARED, &irqs, 1));
 aml_append(method, aml_name_decl("PRR0", crs));
 
@@ -1004,7 +1014,7 @@ static Aml *build_irq_status_method(void)
 return method;
 }
 
-static void build_piix4_pci0_int(Aml *table)
+static void build_piix4_pci0_int(Aml *table, bool level_trigger_unsupported)
 {
 Aml *dev;
 Aml *crs;
@@ -1025,12 +1035,16 @@ static void build_piix4_pci0_int(Aml *table)
 aml_append(sb_scope, field);
 
 aml_append(sb_scope, build_irq_status_method());
-aml_append(sb_scope, build_iqcr_method(true));
+aml_append(sb_scope, build_iqcr_method(true, level_trigger_unsupported));
 
-aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0")));
-aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1")));
-aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2")));
-aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3")));
+aml_append(sb_scope, build_link_dev("LNKA", 0, aml_name("PRQ0"),
+level_trigger_unsupported));
+aml_append(sb_scope, build_link_dev("LNKB", 1, aml_name("PRQ1"),
+level_trigger_unsupported));
+aml_append(sb_scope, build_link_dev("LNKC", 2, aml_name("PRQ2"),
+level_trigger_unsupported));
+aml_append(sb_scope, build_link_dev("LNKD", 3, aml_name("PRQ3"),
+level_trigger_unsupported));
 
 dev = aml_device("LNKS");
 {
@@ -1039,7 +1053,9 @@ static void build_piix4_pci0_int(Aml *table)
 
 crs = aml_resource_template();
 irqs = 9;
-aml_append(crs, aml_interrupt(AML_CONSUMER, AML_LEVEL,
+aml_append(crs, aml_interrupt(AML_CONSUMER,
+  level_trigger_unsupported ?
+  AML_EDGE : AML_LEVEL,
   AM

Re: [PATCH for-7.1] Revert "migration: Simplify unqueue_page()"

2022-08-02 Thread Dr. David Alan Gilbert

* Thomas Huth (th...@redhat.com) wrote:
> This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3.
> 
> The simplification of unqueue_page() introduced a bug that sometimes
> breaks migration on s390x hosts. Seems like there are still pages here
> that do not have their dirty bit set.

I don't think it's about 'not having their dirty bit set' - it's
perfectly fine to have the bits clear (which indicates the page has
already been sent to the destination, sometime inbetween the page request
being sent from the destination and it being unqueued).
My suspicion is that either:
  * you're hitting a case where the removal of the loop
causes it to stop when it hits a !dirty page, even though there are some
diries left behind it.
  * We're retransmitting a page that is already marked !dirty
which would cause overwriting of a now modified page on the
destination.

I have no idea why either of these would be s390 specific.


> The problem is not fully understood yet, but since we are already in
> the freeze for QEMU 7.1 and we need something working there, let's
> revert this patch for the upcoming release. The optimization can be
> redone later again in a proper way if necessary.

Yeh OK

Reviewed-by: Dr. David Alan Gilbert 

> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934
> Signed-off-by: Thomas Huth 
> ---
>  migration/ram.c| 37 ++---
>  migration/trace-events |  3 ++-
>  2 files changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index b94669ba5d..dc1de9ddbc 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1612,7 +1612,6 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
> *offset)
>  {
>  struct RAMSrcPageRequest *entry;
>  RAMBlock *block = NULL;
> -size_t page_size;
>  
>  if (!postcopy_has_request(rs)) {
>  return NULL;
> @@ -1629,13 +1628,10 @@ static RAMBlock *unqueue_page(RAMState *rs, 
> ram_addr_t *offset)
>  entry = QSIMPLEQ_FIRST(&rs->src_page_requests);
>  block = entry->rb;
>  *offset = entry->offset;
> -page_size = qemu_ram_pagesize(block);
> -/* Each page request should only be multiple page size of the ramblock */
> -assert((entry->len % page_size) == 0);
>  
> -if (entry->len > page_size) {
> -entry->len -= page_size;
> -entry->offset += page_size;
> +if (entry->len > TARGET_PAGE_SIZE) {
> +entry->len -= TARGET_PAGE_SIZE;
> +entry->offset += TARGET_PAGE_SIZE;
>  } else {
>  memory_region_unref(block->mr);
>  QSIMPLEQ_REMOVE_HEAD(&rs->src_page_requests, next_req);
> @@ -1643,9 +1639,6 @@ static RAMBlock *unqueue_page(RAMState *rs, ram_addr_t 
> *offset)
>  migration_consume_urgent_request();
>  }
>  
> -trace_unqueue_page(block->idstr, *offset,
> -   test_bit((*offset >> TARGET_PAGE_BITS), block->bmap));
> -
>  return block;
>  }
>  
> @@ -2069,8 +2062,30 @@ static bool get_queued_page(RAMState *rs, 
> PageSearchStatus *pss)
>  {
>  RAMBlock  *block;
>  ram_addr_t offset;
> +bool dirty;
> +
> +do {
> +block = unqueue_page(rs, &offset);
> +/*
> + * We're sending this page, and since it's postcopy nothing else
> + * will dirty it, and we must make sure it doesn't get sent again
> + * even if this queue request was received after the background
> + * search already sent it.
> + */
> +if (block) {
> +unsigned long page;
> +
> +page = offset >> TARGET_PAGE_BITS;
> +dirty = test_bit(page, block->bmap);
> +if (!dirty) {
> +trace_get_queued_page_not_dirty(block->idstr, 
> (uint64_t)offset,
> +page);
> +} else {
> +trace_get_queued_page(block->idstr, (uint64_t)offset, page);
> +}
> +}
>  
> -block = unqueue_page(rs, &offset);
> +} while (block && !dirty);
>  
>  if (block) {
>  /* See comment above postcopy_preempted_contains() */
> diff --git a/migration/trace-events b/migration/trace-events
> index a34afe7b85..57003edcbd 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -85,6 +85,8 @@ put_qlist_end(const char *field_name, const char 
> *vmsd_name) "%s(%s)"
>  qemu_file_fclose(void) ""
>  
>  # ram.c
> +get_queued_page(const char *block_name, uint64_t tmp_offset, unsigned long 
> page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
> +get_queued_page_not_dirty(const char *block_name, uint64_t tmp_offset, 
> unsigned long page_abs) "%s/0x%" PRIx64 " page_abs=0x%lx"
>  migration_bitmap_sync_start(void) ""
>  migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
>  migration_bitmap_clear_dirty(char *str, uint64_t start, uint64_t size, 
> unsigned long page) "rb %s start 0x%"PRIx64" size 0x%"PRIx64" page 0x%lx"
> @@ -110,7 +112,6 @@ ram_sa

[PATCH v1 37/40] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs

2022-08-02 Thread Xiaoyao Li

For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.

Only configure MSR_IA32_UCODE_REV for TDs.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/kvm.c | 44 ++-
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8999f64eeaf1..53ab539e7e4d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -3167,32 +3167,34 @@ static void kvm_init_msrs(X86CPU *cpu)
 CPUX86State *env = &cpu->env;
 
 kvm_msr_buf_reset(cpu);
-if (has_msr_arch_capabs) {
-kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
-  env->features[FEAT_ARCH_CAPABILITIES]);
-}
-
-if (has_msr_core_capabs) {
-kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
-  env->features[FEAT_CORE_CAPABILITY]);
-}
-
-if (has_msr_perf_capabs && cpu->enable_pmu) {
-kvm_msr_entry_add_perf(cpu, env->features);
+
+if (!is_tdx_vm()) {
+if (has_msr_arch_capabs) {
+kvm_msr_entry_add(cpu, MSR_IA32_ARCH_CAPABILITIES,
+env->features[FEAT_ARCH_CAPABILITIES]);
+}
+
+if (has_msr_core_capabs) {
+kvm_msr_entry_add(cpu, MSR_IA32_CORE_CAPABILITY,
+env->features[FEAT_CORE_CAPABILITY]);
+}
+
+if (has_msr_perf_capabs && cpu->enable_pmu) {
+kvm_msr_entry_add_perf(cpu, env->features);
+}
+
+/*
+ * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
+ * all kernels with MSR features should have them.
+ */
+if (kvm_feature_msrs && cpu_has_vmx(env)) {
+kvm_msr_entry_add_vmx(cpu, env->features);
+}
 }
 
 if (has_msr_ucode_rev) {
 kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
 }
-
-/*
- * Older kernels do not include VMX MSRs in KVM_GET_MSR_INDEX_LIST, but
- * all kernels with MSR features should have them.
- */
-if (kvm_feature_msrs && cpu_has_vmx(env)) {
-kvm_msr_entry_add_vmx(cpu, env->features);
-}
-
 assert(kvm_buf_set_msrs(cpu) == 0);
 }
 
-- 
2.27.0

Re: [PATCH v7 05/14] qapi: net: add stream and dgram netdevs

2022-08-02 Thread Markus Armbruster

Laurent Vivier  writes:

> Copied from socket netdev file and modified to use SocketAddress
> to be able to introduce new features like unix socket.
>
> "udp" and "mcast" are squashed into dgram netdev, multicast is detected
> according to the IP address type.
> "listen" and "connect" modes are managed by stream netdev. An optional
> parameter "server" defines the mode (server by default)
>
> The two new types need to be parsed the modern way with -netdev, because
> with the traditional way, the "type" field of netdev structure collides with
> the "type" field of SocketAddress and prevents the correct evaluation of the
> command line option. Moreover the traditional way doesn't allow to use
> the same type (SocketAddress) several times with the -netdev option
> (needed to specify "local" and "remote" addresses).
>
> The previous commit paved the way for parsing the modern way, but
> omitted one detail: how to pick modern vs. traditional, in
> netdev_is_modern().
>
> We want to pick based on the value of parameter "type".  But how to
> extract it from the option argument?
>
> Parsing the option argument, either the modern or the traditional way,
> extracts it for us, but only if parsing succeeds.
>
> If parsing fails, there is no good option.  No matter which parser we
> pick, it'll be the wrong one for some arguments, and the error
> reporting will be confusing.
>
> Fortunately, the traditional parser accepts *anything* when called in
> a certain way.  This maximizes our chance to extract the value of
> "type", and in turn minimizes the risk of confusing error reporting.
>
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Stefano Brivio 

[...]

> diff --git a/qapi/net.json b/qapi/net.json
> index 75ba2cb98901..a7506a40ff12 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -7,6 +7,7 @@
>  ##
>  
>  { 'include': 'common.json' }
> +{ 'include': 'sockets.json' }
>  
>  ##
>  # @set_link:
> @@ -573,6 +574,61 @@
>  '*isolated':  'bool' },
>'if': 'CONFIG_VMNET' }
>  
> +##
> +# @NetdevStreamOptions:
> +#
> +# Configuration info for stream socket netdev
> +#
> +# @addr: socket address to listen on (server=true)
> +#or connect to (server=false)
> +# @server: create server socket (default: true)
> +#
> +# Only SocketAddress types 'inet' and 'fd' are supported.
> +#
> +# Since: 7.1
> +##
> +{ 'struct': 'NetdevStreamOptions',
> +  'data': {
> +'addr':   'SocketAddress',
> +'*server': 'bool' } }
> +
> +##
> +# @NetdevDgramOptions:
> +#
> +# Configuration info for datagram socket netdev.
> +#
> +# @remote: remote address
> +# @local: local address
> +#
> +# Only SocketAddress types 'inet' and 'fd' are supported.
> +#
> +# The code checks there is at least one of these options and reports an error
> +# if not. If remote address is present and it's a multicast address, local
> +# address is optional. Otherwise local address is required and remote address
> +# is optional.
> +#
> +# .. table:: Valid parameters combination table
> +#:widths: auto
> +#
> +#=    =
> +#remote local okay?
> +#=    =
> +#absent absentno
> +#absent not fdno
> +#absent fdyes
> +#multicast  absentyes
> +#multicast  present   yes
> +#not multicast  absentno
> +#not multicast  present   yes
> +#=    =

Looks good now.

> +#
> +# Since: 7.1
> +##
> +{ 'struct': 'NetdevDgramOptions',
> +  'data': {
> +'*local':  'SocketAddress',
> +'*remote': 'SocketAddress' } }
> +
>  ##
>  # @NetClientDriver:
>  #
> @@ -586,8 +642,9 @@
>  #@vmnet-bridged since 7.1
>  ##
>  { 'enum': 'NetClientDriver',
> -  'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'vde',
> -'bridge', 'hubport', 'netmap', 'vhost-user', 'vhost-vdpa',
> +  'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'stream',
> +'dgram', 'vde', 'bridge', 'hubport', 'netmap', 'vhost-user',
> +'vhost-vdpa',
>  { 'name': 'vmnet-host', 'if': 'CONFIG_VMNET' },
>  { 'name': 'vmnet-shared', 'if': 'CONFIG_VMNET' },
>  { 'name': 'vmnet-bridged', 'if': 'CONFIG_VMNET' }] }
> @@ -617,6 +674,8 @@
>  'tap':  'NetdevTapOptions',
>  'l2tpv3':   'NetdevL2TPv3Options',
>  'socket':   'NetdevSocketOptions',
> +'stream':   'NetdevStreamOptions',
> +'dgram':'NetdevDgramOptions',
>  'vde':  'NetdevVdeOptions',
>  'bridge':   'NetdevBridgeOptions',
>  'hubport':  'NetdevHubPortOptions',
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 79e00916a11f..170117e1adf0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2726,6 +2726,18 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
>  "-netdev socket,id=str[,fd=h][,udp=host:port][,localaddr=host:port]\n"
>  "configure a network backend to connect to another 
> network\n"
>  "

Re: [PATCH for-7.1] icount: Take iothread lock when running QEMU timers

2022-08-02 Thread Pavel Dovgalyuk


Tested-by: Pavel Dovgalyuk 

On 01.08.2022 19:45, Peter Maydell wrote:

The function icount_prepare_for_run() is called with the iothread
unlocked, but it can call icount_notify_aio_contexts() which will
run qemu timer handlers. Those are supposed to be run only with
the iothread lock held, so take the lock while we do that.

Since icount mode runs everything on a single thread anyway,
not holding the lock is likely mostly not going to introduce
races, but it can cause us to trip over assertions that we
do hold the lock, such as the one reported in issue 1130.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1130
Signed-off-by: Peter Maydell 
---
  accel/tcg/tcg-accel-ops-icount.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 8f1dda4344c..84cc7421be8 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/accel/tcg/tcg-accel-ops-icount.c
@@ -109,7 +109,13 @@ void icount_prepare_for_run(CPUState *cpu)
  replay_mutex_lock();
  
  if (cpu->icount_budget == 0) {

+/*
+ * We're called without the iothread lock, so must take it while
+ * we're calling timer handlers.
+ */
+qemu_mutex_lock_iothread();
  icount_notify_aio_contexts();
+qemu_mutex_unlock_iothread();
  }
  }

[PATCH v1 30/40] i386/tdx: Finalize TDX VM

2022-08-02 Thread Xiaoyao Li

Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/tdx.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 2dbe26f2e950..1de767a990ba 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -642,6 +642,13 @@ static void tdx_finalize_vm(Notifier *notifier, void 
*unused)
 entry->mem_ptr = NULL;
 }
 }
+
+r = tdx_vm_ioctl(KVM_TDX_FINALIZE_VM, 0, NULL);
+if (r < 0) {
+error_report("KVM_TDX_FINALIZE_VM failed %s", strerror(-r));
+exit(0);
+}
+tdx_guest->parent_obj.ready = true;
 }
 
 static Notifier tdx_machine_done_notify = {
-- 
2.27.0

[PATCH v4 3/4] chardev/char-socket: Update AF_UNIX for Windows

2022-08-02 Thread Bin Meng

From: Bin Meng 

Now that AF_UNIX has come to Windows, update the existing logic in
qemu_chr_compute_filename() and qmp_chardev_open_socket() for Windows.

Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 
---

Changes in v4:
- drop CONFIG_AF_UNIX

Changes in v2:
- drop #include  as it is now already included in osdep.h

 chardev/char-socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index dc4e218eeb..879564aa8a 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -557,12 +557,10 @@ static char *qemu_chr_compute_filename(SocketChardev *s)
 const char *left = "", *right = "";
 
 switch (ss->ss_family) {
-#ifndef _WIN32
 case AF_UNIX:
 return g_strdup_printf("unix:%s%s",
((struct sockaddr_un *)(ss))->sun_path,
s->is_listen ? ",server=on" : "");
-#endif
 case AF_INET6:
 left  = "[";
 right = "]";
@@ -1372,10 +1370,12 @@ static void qmp_chardev_open_socket(Chardev *chr,
 }
 
 qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_RECONNECTABLE);
+#ifndef _WIN32
 /* TODO SOCKET_ADDRESS_FD where fd has AF_UNIX */
 if (addr->type == SOCKET_ADDRESS_TYPE_UNIX) {
 qemu_chr_set_feature(chr, QEMU_CHAR_FEATURE_FD_PASS);
 }
+#endif
 
 /*
  * In the chardev-change special-case, we shouldn't register a new yank
-- 
2.34.1

Re: [PATCH] hw/nvme: Add helper functions for qid-db conversion

2022-08-02 Thread Klaus Jensen

On Aug  2 16:31, Jinhao Fan wrote:
> at 2:02 PM, Klaus Jensen  wrote:
> 
> > On Jul 28 16:07, Jinhao Fan wrote:
> >> With the introduction of shadow doorbell and ioeventfd, we need to do
> >> frequent conversion between qid and its doorbell offset. The original
> >> hard-coded calculation is confusing and error-prone. Add several helper
> >> functions to do this task.
> >> 
> >> Signed-off-by: Jinhao Fan 
> >> ---
> >> hw/nvme/ctrl.c | 61 --
> >> 1 file changed, 39 insertions(+), 22 deletions(-)
> >> 
> >> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> >> index 533ad14e7a..6116c0e660 100644
> >> --- a/hw/nvme/ctrl.c
> >> +++ b/hw/nvme/ctrl.c
> >> @@ -487,6 +487,29 @@ static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
> >> {
> >> return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
> >> }
> >> +static inline bool nvme_db_offset_is_cq(NvmeCtrl *n, hwaddr offset)
> >> +{
> >> +hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(&n->bar.cap));
> >> +return (offset / stride) & 1;
> >> +}
> > 
> > This can be changed morphed into `(offset >> (2 + dstrd)) & 1` if I am not
> > mistaken.
> > 
> 
> Yes. But my current code looks more readable to me. Is it necessary to
> change to `(offset >> (2 + dstrd)) & 1`.
> 

I am unsure if the compiler will transform that division into the shift
if it can infer that the divisor is a power of two (it most likely
will be able to).

But I see no reason to have a potential division here when we can do
without and to me it is just as readable when you know the definition of
DSTRD is `2 ^ (2 + DSTRD)`.

> >> +
> >> +static inline uint16_t nvme_db_offset_to_qid(NvmeCtrl *n, hwaddr offset)
> >> +{
> >> +hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(&n->bar.cap));
> >> +return offset / (2 * stride);
> >> +}
> > 
> > Same, should be able to do `offset >> (2 * dstrd + 1)`, no?
> 
> Same as above.
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

[PATCH v1 31/40] i386/tdx: Disable SMM for TDX VMs

2022-08-02 Thread Xiaoyao Li

TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.

Disable SMM for TDX VMs and error out if user requests to enable SMM.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/tdx.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 1de767a990ba..70c56b7ba32c 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -657,9 +657,17 @@ static Notifier tdx_machine_done_notify = {
 
 int tdx_kvm_init(MachineState *ms, Error **errp)
 {
+X86MachineState *x86ms = X86_MACHINE(ms);
 TdxGuest *tdx = (TdxGuest *)object_dynamic_cast(OBJECT(ms->cgs),
 TYPE_TDX_GUEST);
 
+if (x86ms->smm == ON_OFF_AUTO_AUTO) {
+x86ms->smm = ON_OFF_AUTO_OFF;
+} else if (x86ms->smm == ON_OFF_AUTO_ON) {
+error_setg(errp, "TDX VM doesn't support SMM");
+return -EINVAL;
+}
+
 if (!tdx_caps) {
 get_tdx_capabilities();
 }
-- 
2.27.0

Re: [PATCH v2 2/7] multifd: modifying 'migrate' qmp command to add multifd socket on particular src and dest pair

2022-08-02 Thread Markus Armbruster

Het Gala  writes:

> On 26/07/22 4:43 pm, Daniel P. Berrangé wrote:
>> On Thu, Jul 21, 2022 at 07:56:15PM +, Het Gala wrote:
>>> i) Modified the format of the qemu monitor command : 'migrate' by adding a 
>>> list,
>>> each element in the list consisting of multifd connection parameters: 
>>> source
>>> uri, destination uri and of the number of multifd channels between each 
>>> pair.
>>>
>>> ii) Information of all multifd connection parameters' list and length of the
>>>  list is stored in 'OutgoingMigrateParams' struct.
>>>
>>> Suggested-by: Manish Mishra 
>>> Signed-off-by: Het Gala 
>>> ---
>>>   migration/migration.c | 52 +
>>>   migration/socket.c| 60 ---
>>>   migration/socket.h| 19 +-
>>>   monitor/hmp-cmds.c|  1 +
>>>   qapi/migration.json   | 47 +
>>>   5 files changed, 160 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 81185d4311..456247af8f 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -1449,12 +1449,37 @@
>>>   ##
>>>   { 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
>>>   +##
>>> +# @MigrateUriParameter:
>>> +#
>>> +# Information regarding which source interface is connected to which
>>> +# destination interface and number of multifd channels over each interface.
>>> +#
>>> +# @source-uri: uri of the source VM. Default port number is 0.
>>> +#
>>> +# @destination-uri: uri of the destination VM
>>> +#
>>> +# @multifd-channels: number of parallel multifd channels used to migrate 
>>> data
>>> +#for specific source-uri and destination-uri. Default 
>>> value
>>> +#in this case is 2 (Since 7.1)
>>> +#
>>> +##
>>> +{ 'struct' : 'MigrateUriParameter',
>>> +  'data' : { 'source-uri' : 'str',
>>> + 'destination-uri' : 'str',
>>> + '*multifd-channels' : 'uint8'} }
>>> +
>>>   ##
>>>   # @migrate:
>>>   #
>>>   # Migrates the current running guest to another Virtual Machine.
>>>   #
>>>   # @uri: the Uniform Resource Identifier of the destination VM
>>> +#   for migration thread
>>> +#
>>> +# @multi-fd-uri-list: list of pair of source and destination VM Uniform
>>> +# Resource Identifiers with number of multifd-channels
>>> +# for each pair
>>>   #
>>>   # @blk: do block migration (full disk copy)
>>>   #
>>> @@ -1474,20 +1499,32 @@
>>>   # 1. The 'query-migrate' command should be used to check migration's 
>>> progress
>>>   #and final result (this information is provided by the 'status' 
>>> member)
>>>   #
>>> -# 2. All boolean arguments default to false
>>> +# 2. The uri argument should have the Uniform Resource Identifier of 
>>> default
>>> +#destination VM. This connection will be bound to default network
>>>   #
>>> -# 3. The user Monitor's "detach" argument is invalid in QMP and should not
>>> +# 3. All boolean arguments default to false
>>> +#
>>> +# 4. The user Monitor's "detach" argument is invalid in QMP and should not
>>>   #be used
>>>   #
>>>   # Example:
>>>   #
>>> -# -> { "execute": "migrate", "arguments": { "uri": "tcp:0:4446" } }
>>> +# -> { "execute": "migrate",
>>> +#  "arguments": {
>>> +#  "uri": "tcp:0:4446",
>>> +#  "multi-fd-uri-list": [ { "source-uri": "tcp::6900",
>>> +#   "destination-uri": "tcp:0:4480",
>>> +#   "multifd-channels": 4},
>>> +# { "source-uri": "tcp:10.0.0.0: ",
>>> +#   "destination-uri": "tcp:11.0.0.0:7789",
>>> +#   "multifd-channels": 5} ] } }
>>>   # <- { "return": {} }
>>>   #
>>>   ##
>>>   { 'command': 'migrate',
>>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>>> -   '*detach': 'bool', '*resume': 'bool' } }
>>> +  'data': {'uri': 'str', '*multi-fd-uri-list': ['MigrateUriParameter'],
>>> +   '*blk': 'bool', '*inc': 'bool', '*detach': 'bool',
>>> +   '*resume': 'bool' } }
>>
>> Considering the existing migrate API from a QAPI design POV, I
>> think there are several significant flaws with it
>>
>> The use of URIs is the big red flag. It is basically a data encoding
>> scheme within a data encoding scheme.  QEMU code should be able to
>> directly work with the results from QAPI, without having todo a
>> second level of parsing.

Concur.

>> URIs made sense in the context of HMP or the QemuOpts CLI, but do not
>> make sense in QMP. We made a mistake in this respect when we first
>> introduced QMP and implemented 'migrate'.
>>
>> If we going to extend the migrate API I think we should stop using URIs
>> for the new fields, and instead define a QAPI discriminated union for
>> the different data transport backends we offer.
>>
>>   { 'enum': 'MigrateTransport',
>>

[PATCH v1 32/40] i386/tdx: Disable PIC for TDX VMs

2022-08-02 Thread Xiaoyao Li

Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection.  Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.

Hence disable PIC for TDX VMs and error out if user wants PIC.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/tdx.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
index 70c56b7ba32c..2f317a6bb55b 100644
--- a/target/i386/kvm/tdx.c
+++ b/target/i386/kvm/tdx.c
@@ -668,6 +668,13 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
 return -EINVAL;
 }
 
+if (x86ms->pic == ON_OFF_AUTO_AUTO) {
+x86ms->pic = ON_OFF_AUTO_OFF;
+} else if (x86ms->pic == ON_OFF_AUTO_ON) {
+error_setg(errp, "TDX VM doesn't support PIC");
+return -EINVAL;
+}
+
 if (!tdx_caps) {
 get_tdx_capabilities();
 }
-- 
2.27.0

Re: [PATCH for-7.1] Revert "migration: Simplify unqueue_page()"

2022-08-02 Thread Thomas Huth


On 02/08/2022 10.47, Dr. David Alan Gilbert wrote:

* Thomas Huth (th...@redhat.com) wrote:

This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3.

The simplification of unqueue_page() introduced a bug that sometimes
breaks migration on s390x hosts. Seems like there are still pages here
that do not have their dirty bit set.


I don't think it's about 'not having their dirty bit set' - it's
perfectly fine to have the bits clear (which indicates the page has
already been sent to the destination, sometime inbetween the page request
being sent from the destination and it being unqueued).


Ok, could you maybe simply drop that sentence from the commit description 
when picking up the patch? Or shall I resend a v2?


 Thomas

[PATCH v1 33/40] i386/tdx: Don't allow system reset for TDX VMs

2022-08-02 Thread Xiaoyao Li

TDX CPU state is protected and thus vcpu state cann't be reset by VMM.

Signed-off-by: Xiaoyao Li 
Acked-by: Gerd Hoffmann 
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 1545b6f870f5..8c282122ed67 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5455,7 +5455,7 @@ bool kvm_has_waitpkg(void)
 
 bool kvm_arch_cpu_check_are_resettable(void)
 {
-return !sev_es_enabled();
+return !sev_es_enabled() && !is_tdx_vm();
 }
 
 #define ARCH_REQ_XCOMP_GUEST_PERM   0x1025
-- 
2.27.0

[PATCH] net/colo.c: Fix the pointer issuse reported by Coverity.

2022-08-02 Thread Zhang Chen

When enable the virtio-net-pci, guest network packet will
load the vnet_hdr. In COLO status, the primary VM's network
packet maybe redirect to another VM, it need filter-redirect
enable the vnet_hdr flag at the same time, COLO-proxy will
correctly parse the original network packet. If have any
misconfiguration here, the vnet_hdr_len is wrong for parse
the packet, the data+offset will point to wrong place.

Signed-off-by: Zhang Chen 
---
 net/colo.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index 6b0ff562ad..dfb15b4c14 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -44,21 +44,25 @@ int parse_packet_early(Packet *pkt)
 {
 int network_length;
 static const uint8_t vlan[] = {0x81, 0x00};
-uint8_t *data = pkt->data + pkt->vnet_hdr_len;
+uint8_t *data = pkt->data;
 uint16_t l3_proto;
 ssize_t l2hdr_len;
 
 if (data == NULL) {
-trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
-"pkt->vnet_hdr_len", 
pkt->vnet_hdr_len);
+trace_colo_proxy_main("COLO-proxy got NULL data packet ");
 return 1;
 }
-l2hdr_len = eth_get_l2_hdr_length(data);
 
-if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
-trace_colo_proxy_main("pkt->size < ETH_HLEN");
+/* Check the received vnet_hdr_len then add the offset */
+if (pkt->size < sizeof(struct eth_header) + sizeof(struct vlan_header)
++ pkt->vnet_hdr_len) {
+trace_colo_proxy_main_vnet_info("This packet may be load wrong "
+"pkt->vnet_hdr_len", 
pkt->vnet_hdr_len);
 return 1;
 }
+data += pkt->vnet_hdr_len;
+
+l2hdr_len = eth_get_l2_hdr_length(data);
 
 /*
  * TODO: support vlan.
-- 
2.25.1

[PATCH v1 36/40] i386/tdx: Don't synchronize guest tsc for TDs

2022-08-02 Thread Xiaoyao Li

From: Isaku Yamahata 

TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,

Signed-off-by: Isaku Yamahata 
Reviewed-by: Connor Kuehl 
Signed-off-by: Xiaoyao Li 
---
 target/i386/kvm/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8c282122ed67..8999f64eeaf1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -270,7 +270,7 @@ void kvm_synchronize_all_tsc(void)
 {
 CPUState *cpu;
 
-if (kvm_enabled()) {
+if (kvm_enabled() && !is_tdx_vm()) {
 CPU_FOREACH(cpu) {
 run_on_cpu(cpu, do_kvm_synchronize_tsc, RUN_ON_CPU_NULL);
 }
-- 
2.27.0

Re: [PATCH] hw/nvme: Add helper functions for qid-db conversion

2022-08-02 Thread Jinhao Fan

at 2:02 PM, Klaus Jensen  wrote:

> On Jul 28 16:07, Jinhao Fan wrote:
>> With the introduction of shadow doorbell and ioeventfd, we need to do
>> frequent conversion between qid and its doorbell offset. The original
>> hard-coded calculation is confusing and error-prone. Add several helper
>> functions to do this task.
>> 
>> Signed-off-by: Jinhao Fan 
>> ---
>> hw/nvme/ctrl.c | 61 --
>> 1 file changed, 39 insertions(+), 22 deletions(-)
>> 
>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>> index 533ad14e7a..6116c0e660 100644
>> --- a/hw/nvme/ctrl.c
>> +++ b/hw/nvme/ctrl.c
>> @@ -487,6 +487,29 @@ static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
>> {
>> return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
>> }
>> +static inline bool nvme_db_offset_is_cq(NvmeCtrl *n, hwaddr offset)
>> +{
>> +hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(&n->bar.cap));
>> +return (offset / stride) & 1;
>> +}
> 
> This can be changed morphed into `(offset >> (2 + dstrd)) & 1` if I am not
> mistaken.
> 

Yes. But my current code looks more readable to me. Is it necessary to
change to `(offset >> (2 + dstrd)) & 1`.

>> +
>> +static inline uint16_t nvme_db_offset_to_qid(NvmeCtrl *n, hwaddr offset)
>> +{
>> +hwaddr stride = 4 << NVME_CAP_DSTRD(ldq_le_p(&n->bar.cap));
>> +return offset / (2 * stride);
>> +}
> 
> Same, should be able to do `offset >> (2 * dstrd + 1)`, no?

Same as above.

[PATCH v4 1/4] util/qemu-sockets: Replace the call to close a socket with closesocket()

2022-08-02 Thread Bin Meng

From: Bin Meng 

close() is a *nix function. It works on any file descriptor, and
sockets in *nix are an example of a file descriptor.

closesocket() is a Windows-specific function, which works only
specifically with sockets. Sockets on Windows do not use *nix-style
file descriptors, and socket() returns a handle to a kernel object
instead, so it must be closed with closesocket().

In QEMU there is already a logic to handle such platform difference
in os-posix.h and os-win32.h, that:

  * closesocket maps to close on POSIX
  * closesocket maps to a wrapper that calls the real closesocket()
on Windows

Replace the call to close a socket with closesocket() instead.

Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 
---

(no changes since v1)

 util/qemu-sockets.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 13b5b197f9..0e2298278f 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -487,7 +487,7 @@ int inet_connect_saddr(InetSocketAddress *saddr, Error 
**errp)
 
 if (ret < 0) {
 error_setg_errno(errp, errno, "Unable to set KEEPALIVE");
-close(sock);
+closesocket(sock);
 return -1;
 }
 }
@@ -1050,7 +1050,7 @@ static int unix_connect_saddr(UnixSocketAddress *saddr, 
Error **errp)
 return sock;
 
  err:
-close(sock);
+closesocket(sock);
 return -1;
 }
 
-- 
2.34.1

[PATCH v4 4/4] tests/unit: Update test-io-channel-socket.c for Windows

2022-08-02 Thread Bin Meng

From: Bin Meng 

Change to dynamically include the test cases by checking AF_UNIX
availability using a new helper socket_check_afunix_support().
With such changes testing on a Windows host can be covered as well.

Signed-off-by: Bin Meng 
---

Changes in v4:
- introduce a new helper socket_check_afunix_support() to runtime-check
  the availability of AF_UNIX socket, and skip those appropriately

Changes in v2:
- new patch: tests/unit: Update test-io-channel-socket.c for Windows

 tests/unit/socket-helpers.h |  9 +++
 tests/unit/socket-helpers.c | 16 +
 tests/unit/test-io-channel-socket.c | 37 ++---
 3 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/tests/unit/socket-helpers.h b/tests/unit/socket-helpers.h
index 512a004811..ed8477ceb3 100644
--- a/tests/unit/socket-helpers.h
+++ b/tests/unit/socket-helpers.h
@@ -32,4 +32,13 @@
  */
 int socket_check_protocol_support(bool *has_ipv4, bool *has_ipv6);
 
+/*
+ * @has_afunix: set to true on return if unix socket support is available
+ *
+ * Check whether unix domain socket support is available for use.
+ * On success, @has_afunix will be set to indicate whether AF_UNIX protocol
+ * is available.
+ */
+void socket_check_afunix_support(bool *has_afunix);
+
 #endif
diff --git a/tests/unit/socket-helpers.c b/tests/unit/socket-helpers.c
index 5af4de513b..eecadf3a3c 100644
--- a/tests/unit/socket-helpers.c
+++ b/tests/unit/socket-helpers.c
@@ -154,3 +154,19 @@ int socket_check_protocol_support(bool *has_ipv4, bool 
*has_ipv6)
 
 return 0;
 }
+
+void socket_check_afunix_support(bool *has_afunix)
+{
+int fd;
+
+fd = socket(PF_UNIX, SOCK_STREAM, 0);
+closesocket(fd);
+
+#ifdef _WIN32
+*has_afunix = (fd != (int)INVALID_SOCKET);
+#else
+*has_afunix = (fd >= 0);
+#endif
+
+return;
+}
diff --git a/tests/unit/test-io-channel-socket.c 
b/tests/unit/test-io-channel-socket.c
index 6713886d02..b36a5d972a 100644
--- a/tests/unit/test-io-channel-socket.c
+++ b/tests/unit/test-io-channel-socket.c
@@ -179,10 +179,12 @@ static void test_io_channel(bool async,
 test_io_channel_setup_async(listen_addr, connect_addr,
 &srv, &src, &dst);
 
+#ifndef _WIN32
 g_assert(!passFD ||
  qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_FD_PASS));
 g_assert(!passFD ||
  qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_FD_PASS));
+#endif
 g_assert(qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_SHUTDOWN));
 g_assert(qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_SHUTDOWN));
 
@@ -206,10 +208,12 @@ static void test_io_channel(bool async,
 test_io_channel_setup_async(listen_addr, connect_addr,
 &srv, &src, &dst);
 
+#ifndef _WIN32
 g_assert(!passFD ||
  qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_FD_PASS));
 g_assert(!passFD ||
  qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_FD_PASS));
+#endif
 g_assert(qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_SHUTDOWN));
 g_assert(qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_SHUTDOWN));
 
@@ -236,10 +240,12 @@ static void test_io_channel(bool async,
 test_io_channel_setup_sync(listen_addr, connect_addr,
&srv, &src, &dst);
 
+#ifndef _WIN32
 g_assert(!passFD ||
  qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_FD_PASS));
 g_assert(!passFD ||
  qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_FD_PASS));
+#endif
 g_assert(qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_SHUTDOWN));
 g_assert(qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_SHUTDOWN));
 
@@ -263,10 +269,12 @@ static void test_io_channel(bool async,
 test_io_channel_setup_sync(listen_addr, connect_addr,
&srv, &src, &dst);
 
+#ifndef _WIN32
 g_assert(!passFD ||
  qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_FD_PASS));
 g_assert(!passFD ||
  qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_FD_PASS));
+#endif
 g_assert(qio_channel_has_feature(src, QIO_CHANNEL_FEATURE_SHUTDOWN));
 g_assert(qio_channel_has_feature(dst, QIO_CHANNEL_FEATURE_SHUTDOWN));
 
@@ -367,7 +375,6 @@ static void test_io_channel_ipv6_async(void)
 }
 
 
-#ifndef _WIN32
 static void test_io_channel_unix(bool async)
 {
 SocketAddress *listen_addr = g_new0(SocketAddress, 1);
@@ -398,6 +405,7 @@ static void test_io_channel_unix_async(void)
 return test_io_channel_unix(true);
 }
 
+#ifndef _WIN32
 static void test_io_channel_unix_fd_pass(void)
 {
 SocketAddress *listen_addr = g_new0(SocketAddress, 1);
@@ -491,6 +499,7 @@ static void test_io_channel_unix_fd_pass(void)
 }
 g_free(fdrecv);
 }
+#endif /* _WIN32 */
 
 static void test_io_channel_unix_listen_cleanup(void)
 {
@@ -522,9

[PATCH] docs/about/removed-features: Move the -soundhw into the right section

2022-08-02 Thread Thomas Huth

The note about the removal of '-soundhw' has been accidentally added
to the section of removed "linux-user mode CPUs" ... it should reside
in the section about removed "System emulator command line arguments"
instead.

Fixes: 039a68373c ("introduce -audio as a replacement for -soundhw")
Signed-off-by: Thomas Huth 
---
 docs/about/removed-features.rst | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c7b9dadd5d..925e22016f 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -396,6 +396,13 @@ Use ``-display sdl`` instead.
 
 Use ``-display curses`` instead.
 
+Creating sound card devices using ``-soundhw`` (removed in 7.1)
+'''
+
+Sound card devices should be created using ``-device`` or ``-audio``.
+The exception is ``pcspk`` which can be activated using ``-machine
+pcspk-audiodev=``.
+
 
 QEMU Machine Protocol (QMP) commands
 
@@ -681,13 +688,6 @@ tripped up the CI testing and was suspected to be quite 
broken. For that
 reason the maintainers strongly suspected no one actually used it.
 
 
-Creating sound card devices using ``-soundhw`` (removed in 7.1)
-'''
-
-Sound card devices should be created using ``-device`` or ``-audio``.
-The exception is ``pcspk`` which can be activated using ``-machine
-pcspk-audiodev=``.
-
 TCG introspection features
 --
 
-- 
2.31.1

Re: [PATCH for-7.1] Revert "migration: Simplify unqueue_page()"

2022-08-02 Thread Dr. David Alan Gilbert

* Thomas Huth (th...@redhat.com) wrote:
> On 02/08/2022 10.47, Dr. David Alan Gilbert wrote:
> > * Thomas Huth (th...@redhat.com) wrote:
> > > This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3.
> > > 
> > > The simplification of unqueue_page() introduced a bug that sometimes
> > > breaks migration on s390x hosts. Seems like there are still pages here
> > > that do not have their dirty bit set.
> > 
> > I don't think it's about 'not having their dirty bit set' - it's
> > perfectly fine to have the bits clear (which indicates the page has
> > already been sent to the destination, sometime inbetween the page request
> > being sent from the destination and it being unqueued).
> 
> Ok, could you maybe simply drop that sentence from the commit description
> when picking up the patch? Or shall I resend a v2?

Sure, I'll reword

>  Thomas
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v2 0/3] Fix some coverity issues on VDUSE

2022-08-02 Thread Kevin Wolf

Am 06.07.2022 um 11:56 hat Xie Yongji geschrieben:
> This series fixes some issues reported by coverity.
> 
> Patch 1 fixes a incorrect function name.
> 
> Patch 2 fixes Coverity CID 1490224.
> 
> Patch 3 fixes Coverity CID 1490226, 1490223.
> 
> V1 to V2:
> - Drop the patch to fix Coverity CID 1490222, 1490227 [Markus]
> - Add some commit log to explain why we don't use g_strlcpy() [Markus]

Thanks, applied to the block branch.

Kevin

Re: [PATCH v7 05/14] qapi: net: add stream and dgram netdevs

2022-08-02 Thread Laurent Vivier


On 02/08/2022 10:37, Markus Armbruster wrote:

Laurent Vivier  writes:


...

diff --git a/qemu-options.hx b/qemu-options.hx
index 79e00916a11f..170117e1adf0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2726,6 +2726,18 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
  "-netdev socket,id=str[,fd=h][,udp=host:port][,localaddr=host:port]\n"
  "configure a network backend to connect to another 
network\n"
  "using an UDP tunnel\n"
+"-netdev 
stream,id=str[,server=on|off],addr.type=inet,addr.host=host,addr.port=port\n"
+"-netdev stream,id=str[,server=on|off],addr.type=fd,addr.str=h\n"
+"configure a network backend to connect to another 
network\n"
+"using a socket connection in stream mode.\n"


From v6:

This part needs to match NetdevStreamOptions above.

Missing here: the optional members of InetSocketAddress: numeric, to,
ipv4, ...  Do we care?


At this patch level, no, because we decode them manually and not using 
socket_connect()/socket_listen(). But the doc should be updated for PATCH 13/14 as I move 
stream.c to QIO.




The next part needs to match NetdevDgramOptions above.




+"-netdev 
dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=inet,local.host=addr]\n"
+"-netdev 
dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=fd,local.str=h]\n"
+"configure a network backend to connect to a multicast maddr 
and port\n"
+"use ``local.host=addr`` to specify the host address to send 
packets from\n"


From v6:

I figure this covers table rows

   #@remote @local  |   okay?
   #+
   #multicast   absent  |   yes
   #multicast   present |   yes

for remote.type=inet and any local.type.

What about remote.type=fd?


multicast is only supported with remote.type=inet, not fd or unix

In net_dgram_init(), we initiate a multicast connection if remote.type is inet and address 
type is multicast, otherwise it's an unicast connection.



+"-netdev 
dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n"


From v6:

I figure this covers table rows

#absent  present |   yes
#not multicast   present |   yes

for *.type=inet.




+"-netdev dgram,id=str,local.type=fd,local.str=h\n"
+"configure a network backend to connect to another 
network\n"
+"using an UDP tunnel\n"


From v6:

I figure this covers table row

   #absent  present |   yes

for local.type=fd.

Together, they cover table row

#absent  present |   yes

for any local.type.  Good.

Table row

#not multicast   present |   yes

is only covered for *.type=inet.  Could either of the types be fd?


In v7, I've update the table to include the case of fd:

=    =
remote local okay?
=    =
absent absentno
absent not fdno
absent fdyes
multicast  absentyes
multicast  present   yes
not multicast  absentno
not multicast  present   yes
=    =

For local, if it's not specified otherwise, fd is supported.
Remote and local type must be the same (inet or unix), if local is fd, remote must not be 
provided.


Thanks,
Laurent

Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions

2022-08-02 Thread Eric Auger

Hi Gavin,

On 8/2/22 08:45, Gavin Shan wrote:
> There are 3 highmem IO regions as below. They can be disabled in
> two situations: (a) The specific region is disabled by user. (b)
> The specific region doesn't fit in the PA space. However, the base
> address and highest_gpa are still updated no matter if the region
> is enabled or disabled. It's incorrectly incurring waste in the PA
> space.
If I am not wrong highmem_redists and highmem_mmio are not user selectable

Only highmem ecam depends on machine type & ACPI setup. But I would say
that in server use case it is always set. So is that optimization really
needed?
>
> Improve address assignment for highmem IO regions to avoid the waste
> in the PA space by putting the logic into virt_memmap_fits().
>
> Signed-off-by: Gavin Shan 
> ---
>  hw/arm/virt.c | 54 +--
>  1 file changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 9633f822f3..bc0cd218f9 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1688,6 +1688,34 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
> *vms, int idx)
>  return arm_cpu_mp_affinity(idx, clustersz);
>  }
>  
> +static void virt_memmap_fits(VirtMachineState *vms, int index,
> + bool *enabled, hwaddr *base, int pa_bits)
> +{
> +hwaddr size = extended_memmap[index].size;
> +
> +/* The region will be disabled if its size isn't given */
> +if (!*enabled || !size) {
In which case do you have !size?
> +*enabled = false;
> +vms->memmap[index].base = 0;
> +vms->memmap[index].size = 0;
It looks dangerous to me to reset the region's base and size like that.
for instance fdt_add_gic_node() will add dummy data in the dt.
> +return;
> +}
> +
> +/*
> + * Check if the memory region fits in the PA space. The memory map
> + * and highest_gpa are updated if it fits. Otherwise, it's disabled.
> + */
> +*enabled = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));
using a 'fits' local variable would make the code more obvious I think
> +if (*enabled) {
> +*base = ROUND_UP(*base, size);
> +vms->memmap[index].base = *base;
> +vms->memmap[index].size = size;
> +vms->highest_gpa = *base + size - 1;
> +
> + *base = *base + size;
> +}
> +}
> +
>  static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
>  {
>  MachineState *ms = MACHINE(vms);
> @@ -1744,37 +1772,17 @@ static void virt_set_memmap(VirtMachineState *vms, 
> int pa_bits)
>  vms->highest_gpa = memtop - 1;
>  
>  for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
> -hwaddr size = extended_memmap[i].size;
> -bool fits;
> -
> -base = ROUND_UP(base, size);
> -vms->memmap[i].base = base;
> -vms->memmap[i].size = size;
> -
> -/*
> - * Check each device to see if they fit in the PA space,
> - * moving highest_gpa as we go.
> - *
> - * For each device that doesn't fit, disable it.
> - */
> -fits = (base + size) <= BIT_ULL(pa_bits);
> -if (fits) {
> -vms->highest_gpa = base + size - 1;
> -}
> -

we could avoid running the code below in case highmem is not set. We would need 
to reset that flags though.

>  switch (i) {
>  case VIRT_HIGH_GIC_REDIST2:
> -vms->highmem_redists &= fits;
> +virt_memmap_fits(vms, i, &vms->highmem_redists, &base, pa_bits);
>  break;
>  case VIRT_HIGH_PCIE_ECAM:
> -vms->highmem_ecam &= fits;
> +virt_memmap_fits(vms, i, &vms->highmem_ecam, &base, pa_bits);
>  break;
>  case VIRT_HIGH_PCIE_MMIO:
> -vms->highmem_mmio &= fits;
> +virt_memmap_fits(vms, i, &vms->highmem_mmio, &base, pa_bits);
>  break;
>  }
> -
> -base += size;
>  }
>  
>  if (device_memory_size > 0) {
Thanks

Eric

Re: [PATCH 1/1] vfio-user: update submodule to latest

2022-08-02 Thread Daniel P . Berrangé

On Mon, Aug 01, 2022 at 09:24:04PM -0400, Jagannathan Raman wrote:
> Update libvfio-user submodule to the latest
> 
> Signed-off-by: Jagannathan Raman 
> ---
>  subprojects/libvfio-user | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> index 0b28d20557..1305f161b7 16
> --- a/subprojects/libvfio-user
> +++ b/subprojects/libvfio-user
> @@ -1 +1 @@
> -Subproject commit 0b28d205572c80b568a1003db2c8f37ca333e4d7
> +Subproject commit 1305f161b7e0dd2c2a420c17efcb0bd49b94dad4

Only 3 changes in the submodule with this:

1305f161b7e0dd2c2a420c17efcb0bd49b94dad4 disable client-server test by default 
(#700)
36beb63be45ad1412562a98d9373a4c0bd91ab3d support for shadow ioeventfd (#698)
1c274027bb4f9d68eee846036e8d50dcde2fd7e9 improve README.md (#696)

That fixes the testing bug we see, the other change affects an API that
QEMU doesn't use. Overall looks safe for applying during soft freeze.

Reviewed-by: Daniel P. Berrangé 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] hw/block/hd-geometry: Do not override specified bios-chs-trans

2022-08-02 Thread Kevin Wolf

Am 07.07.2022 um 22:40 hat Lev Kujawski geschrieben:
> For small disk images (<4 GiB), QEMU and SeaBIOS default to the
> LARGE/ECHS disk translation method, but it is not uncommon for other
> BIOS software to use LBA in these cases as well.  Some operating
> system boot loaders (e.g., NT 4) do not handle LARGE translations
> outside of fixed configurations.  See, e.g., Q154052:

I wonder if this means that we should just always use LBA by default
instead of using LARGE for smaller disks, or if this would break other
cases that are working well with the current default.

> "When starting an x86 based computer, Ntdetect.com retrieves and
> stores Interrupt 13 information. . . If the disk controller is using a
> 32 sector/64 head translation scheme, this boundary will be 1 GB. If
> the controller uses 63 sector/255 head translation [AUTHOR: i.e.,
> LBA], the limit will be 4 GB."
> 
> To accommodate these situations, hd_geometry_guess() now follows the
> disk translation specified by the user even when the ATA disk geometry
> is guessed.
> 
> hd_geometry_guess():
> * Only set the disk translation when translation is AUTO.
> * Show the soon-to-be active translation (*ptrans) in the trace rather
>   than what was guessed.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/56
> Buglink: https://bugs.launchpad.net/qemu/+bug/1745312
> 
> Signed-off-by: Lev Kujawski 

Thanks, irrespective of my wondering above, the fix looks right, so I've
applied it to my block branch.

Kevin

Re: [PATCH 2/2] hw/arm/virt: Warn when high memory region is disabled

2022-08-02 Thread Eric Auger

Hi Gavin,

On 8/2/22 08:45, Gavin Shan wrote:
> When one specific high memory region is disabled due to the PA
> limit, it'd better to warn user about that. The warning messages
> help to identify the cause in some cases. For example, PCIe device
> that has large MMIO bar, to be covered by PCIE_MMIO high memory
> region, won't work properly if PCIE_MMIO high memory region is
> disabled due to the PA limit.
>
> Signed-off-by: Gavin Shan 
> ---
>  hw/arm/virt.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index bc0cd218f9..c91756e33d 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1691,6 +1691,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
> *vms, int idx)
>  static void virt_memmap_fits(VirtMachineState *vms, int index,
>   bool *enabled, hwaddr *base, int pa_bits)
>  {
> +const char *region_name;
>  hwaddr size = extended_memmap[index].size;
>  
>  /* The region will be disabled if its size isn't given */
> @@ -1713,6 +1714,23 @@ static void virt_memmap_fits(VirtMachineState *vms, 
> int index,
>  vms->highest_gpa = *base + size - 1;
>  
>   *base = *base + size;
> +} else {
> +switch (index) {
> +case VIRT_HIGH_GIC_REDIST2:
> +region_name = "GIC_REDIST2";
> +break;
> +case VIRT_HIGH_PCIE_ECAM:
> +region_name = "PCIE_ECAM";
> +break;
> +case VIRT_HIGH_PCIE_MMIO:
> +region_name = "PCIE_MMIO";
> +break;
> +default:
> +region_name = "unknown";
> +}
when highmem is turned off I don't think we want those warnings because
it is obvious that highmem regions are not meant to be used.

On the other hand I am afraid some users may complain about warnings
that do not affect them. If you miss high MMIO don't you get a warning
on guest side?

Thanks

Eric
> +
> +warn_report("Disabled %s high memory region due to PA limit",
> +region_name);
>  }
>  }
>

Re: [PATCH v1 01/40] * HACK * linux-headers: Update headers to pull in TDX API changes

2022-08-02 Thread Daniel P . Berrangé

On Tue, Aug 02, 2022 at 03:47:11PM +0800, Xiaoyao Li wrote:
> Pull in recent TDX updates, which are not backwards compatible.
> 
> It's just to make this series runnable. It will be updated by script
> 
>   scripts/update-linux-headers.sh
> 
> once TDX support is upstreamed in linux kernel.

I saw a bunch of TDX support merged in 5.19:

commit 3a755ebcc2557e22b895b8976257f682c653db1d
Merge: 5b828263b180 c796f02162e4
Author: Linus Torvalds 
Date:   Mon May 23 17:51:12 2022 -0700

Merge tag 'x86_tdx_for_v5.19_rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull Intel TDX support from Borislav Petkov:
 "Intel Trust Domain Extensions (TDX) support.

  This is the Intel version of a confidential computing solution called
  Trust Domain Extensions (TDX). This series adds support to run the
  kernel as part of a TDX guest. It provides similar guest protections
  to AMD's SEV-SNP like guest memory and register state encryption,
  memory integrity protection and a lot more.

  Design-wise, it differs from AMD's solution considerably: it uses a
  software module which runs in a special CPU mode called (Secure
  Arbitration Mode) SEAM. As the name suggests, this module serves as
  sort of an arbiter which the confidential guest calls for services it
  needs during its lifetime.

  Just like AMD's SNP set, this series reworks and streamlines certain
  parts of x86 arch code so that this feature can be properly
  accomodated"

Is that sufficient for this patch, or is there more pending out of
tree that QEMU still depends on ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v4 for 7.2 00/22] virtio-gpio and various virtio cleanups

2022-08-02 Thread Alex Bennée

Hi,

This is an update to the previous series which fixes the last few
niggling CI failures I was seeing.

   Subject: [PATCH v3 for 7.2 00/21] virtio-gpio and various virtio cleanups
   Date: Tue, 26 Jul 2022 20:21:29 +0100
   Message-Id: <20220726192150.2435175-1-alex.ben...@linaro.org>

The CI failures were tricky to track down because they didn't occur
locally but after patching to dump backtraces they all seem to involve
updates to virtio_set_status() as the machine was torn down. I think
patch that switches all users to use virtio_device_started() along
with consistent checking of vhost_dev->started stops this from
happening. The clean-up seems worthwhile in reducing boilerplate
anyway.

The following patches still need review:

  - tests/qtest: enable tests for virtio-gpio
  - tests/qtest: add a get_features op to vhost-user-test
  - tests/qtest: implement stub for VHOST_USER_GET_CONFIG
  - tests/qtest: add assert to catch bad features
  - tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES
  - tests/qtest: catch unhandled vhost-user messages
  - tests/qtest: use qos_printf instead of g_test_message
  - tests/qtest: pass stdout/stderr down to subtests
  - hw/virtio: move vhd->started check into helper and add FIXME
  - hw/virtio: move vm_running check to virtio_device_started
  - hw/virtio: add some vhost-user trace events
  - hw/virtio: log potentially buggy guest drivers
  - hw/virtio: fix some coding style issues
  - include/hw: document vhost_dev feature life-cycle
  - include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE
  - hw/virtio: fix vhost_user_read tracepoint
  - hw/virtio: handle un-configured shutdown in virtio-pci
  - hw/virtio: gracefully handle unset vhost_dev vdev
  - hw/virtio: incorporate backend features in features


Alex Bennée (20):
  hw/virtio: incorporate backend features in features
  hw/virtio: gracefully handle unset vhost_dev vdev
  hw/virtio: handle un-configured shutdown in virtio-pci
  hw/virtio: fix vhost_user_read tracepoint
  include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE
  include/hw: document vhost_dev feature life-cycle
  hw/virtio: fix some coding style issues
  hw/virtio: log potentially buggy guest drivers
  hw/virtio: add some vhost-user trace events
  hw/virtio: move vm_running check to virtio_device_started
  hw/virtio: move vhd->started check into helper and add FIXME
  tests/qtest: pass stdout/stderr down to subtests
  tests/qtest: add a timeout for subprocess_run_one_test
  tests/qtest: use qos_printf instead of g_test_message
  tests/qtest: catch unhandled vhost-user messages
  tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES
  tests/qtest: add assert to catch bad features
  tests/qtest: implement stub for VHOST_USER_GET_CONFIG
  tests/qtest: add a get_features op to vhost-user-test
  tests/qtest: enable tests for virtio-gpio

Viresh Kumar (2):
  hw/virtio: add boilerplate for vhost-user-gpio device
  hw/virtio: add vhost-user-gpio-pci boilerplate

 include/hw/virtio/vhost-user-gpio.h |  35 +++
 include/hw/virtio/vhost.h   |  15 +
 include/hw/virtio/virtio.h  |  12 +-
 tests/qtest/libqos/virtio-gpio.h|  35 +++
 hw/block/vhost-user-blk.c   |  10 +-
 hw/scsi/vhost-scsi.c|   4 +-
 hw/scsi/vhost-user-scsi.c   |   2 +-
 hw/virtio/vhost-user-fs.c   |   9 +-
 hw/virtio/vhost-user-gpio-pci.c |  69 +
 hw/virtio/vhost-user-gpio.c | 411 
 hw/virtio/vhost-user-i2c.c  |  10 +-
 hw/virtio/vhost-user-rng.c  |  10 +-
 hw/virtio/vhost-user-vsock.c|   8 +-
 hw/virtio/vhost-user.c  |  20 +-
 hw/virtio/vhost-vsock-common.c  |   3 +-
 hw/virtio/vhost-vsock.c |   8 +-
 hw/virtio/vhost.c   |  16 +-
 hw/virtio/virtio-pci.c  |   9 +-
 hw/virtio/virtio.c  |   7 +
 tests/qtest/libqos/virtio-gpio.c| 171 
 tests/qtest/libqos/virtio.c |   4 +-
 tests/qtest/qos-test.c  |   9 +-
 tests/qtest/vhost-user-test.c   | 175 ++--
 MAINTAINERS |   8 +
 hw/virtio/Kconfig   |   5 +
 hw/virtio/meson.build   |   2 +
 hw/virtio/trace-events  |   9 +
 tests/qtest/libqos/meson.build  |   1 +
 28 files changed, 1007 insertions(+), 70 deletions(-)
 create mode 100644 include/hw/virtio/vhost-user-gpio.h
 create mode 100644 tests/qtest/libqos/virtio-gpio.h
 create mode 100644 hw/virtio/vhost-user-gpio-pci.c
 create mode 100644 hw/virtio/vhost-user-gpio.c
 create mode 100644 tests/qtest/libqos/virtio-gpio.c

-- 
2.30.2

[PATCH v4 03/22] hw/virtio: handle un-configured shutdown in virtio-pci

2022-08-02 Thread Alex Bennée

The assert() protecting against leakage is a little aggressive and
causes needless crashes if a device is shutdown without having been
configured. In this case no descriptors are lost because none have
been assigned.

Signed-off-by: Alex Bennée 
Message-Id: <20220726192150.2435175-9-alex.ben...@linaro.org>
---
 hw/virtio/virtio-pci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..5ce61f9b45 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -996,9 +996,14 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, 
int nvqs, bool assign)
 
 nvqs = MIN(nvqs, VIRTIO_QUEUE_MAX);
 
-/* When deassigning, pass a consistent nvqs value
- * to avoid leaking notifiers.
+/*
+ * When deassigning, pass a consistent nvqs value to avoid leaking
+ * notifiers. But first check we've actually been configured, exit
+ * early if we haven't.
  */
+if (!assign && !proxy->nvqs_with_notifiers) {
+return 0;
+}
 assert(assign || nvqs == proxy->nvqs_with_notifiers);
 
 proxy->nvqs_with_notifiers = nvqs;
-- 
2.30.2

[PATCH v4 02/22] hw/virtio: gracefully handle unset vhost_dev vdev

2022-08-02 Thread Alex Bennée

I've noticed asserts firing because we query the status of vdev after
a vhost connection is closed down. Rather than faulting on the NULL
indirect just quietly reply false.

Signed-off-by: Alex Bennée 
Message-Id: <20220726192150.2435175-8-alex.ben...@linaro.org>
---
 hw/virtio/vhost.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 0827d631c0..f758f177bb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -306,7 +306,7 @@ static inline void vhost_dev_log_resize(struct vhost_dev 
*dev, uint64_t size)
 dev->log_size = size;
 }
 
-static int vhost_dev_has_iommu(struct vhost_dev *dev)
+static bool vhost_dev_has_iommu(struct vhost_dev *dev)
 {
 VirtIODevice *vdev = dev->vdev;
 
@@ -316,8 +316,12 @@ static int vhost_dev_has_iommu(struct vhost_dev *dev)
  * does not have IOMMU, there's no need to enable this feature
  * which may cause unnecessary IOTLB miss/update transactions.
  */
-return virtio_bus_device_iommu_enabled(vdev) &&
-   virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+if (vdev) {
+return virtio_bus_device_iommu_enabled(vdev) &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+} else {
+return false;
+}
 }
 
 static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
-- 
2.30.2

[PATCH v4 07/22] hw/virtio: fix some coding style issues

2022-08-02 Thread Alex Bennée

Signed-off-by: Alex Bennée 
Acked-by: Jason Wang 
---
 hw/virtio/vhost-user.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c0b50deaf2..b7c13e7e16 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -200,7 +200,7 @@ typedef struct {
 VhostUserRequest request;
 
 #define VHOST_USER_VERSION_MASK (0x3)
-#define VHOST_USER_REPLY_MASK   (0x1<<2)
+#define VHOST_USER_REPLY_MASK   (0x1 << 2)
 #define VHOST_USER_NEED_REPLY_MASK  (0x1 << 3)
 uint32_t flags;
 uint32_t size; /* the following payload size */
@@ -208,7 +208,7 @@ typedef struct {
 
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
-#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
+#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
 uint64_t u64;
 struct vhost_vring_state state;
 struct vhost_vring_addr addr;
@@ -248,7 +248,8 @@ struct vhost_user {
 size_t region_rb_len;
 /* RAMBlock associated with a given region */
 RAMBlock **region_rb;
-/* The offset from the start of the RAMBlock to the start of the
+/*
+ * The offset from the start of the RAMBlock to the start of the
  * vhost region.
  */
 ram_addr_t*region_rb_offset;
-- 
2.30.2

Re: [PATCH v1 00/40] TDX QEMU support

2022-08-02 Thread Daniel P . Berrangé

On Tue, Aug 02, 2022 at 03:47:10PM +0800, Xiaoyao Li wrote:
> This is the first version that removes RFC tag since last RFC gots
> several acked-by. Hope more people and reviewers can help review it.
> 
> 
> This patch series aims to enable TDX support to allow creating and booting a
> TD (TDX VM) with QEMU. It needs to work with corresponding KVM patch [1].
> TDX related documents can be found in [2].
> 
> this series is also available in github:
> 
> https://github.com/intel/qemu-tdx/tree/tdx-qemu-upstream-v1
> 
> To boot a TDX VM, it requires several changes/additional steps in the flow:
> 
>  1. specify the vm type KVM_X86_TDX_VM when creating VM with
> IOCTL(KVM_CREATE_VM);
>  2. initialize VM scope configuration before creating any VCPU;
>  3. initialize VCPU scope configuration;
>  4. initialize virtual firmware (TDVF) in guest private memory before
> vcpu running;
> 
> Besides, TDX VM needs to boot with TDVF (TDX virtual firmware) and currently
> upstream OVMF can serve as TDVF. This series adds the support of parsing TDVF,
> loading TDVF into guest's private memory and preparing TD HOB info for TDVF.
> 
> [1] KVM TDX basic feature support v7
> https://lore.kernel.org/all/cover.1656366337.git.isaku.yamah...@intel.com/
> 
> [2] 
> https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html
> 
> == Limitation and future work ==


> - CPU model
> 
>   We cannot create a TD with arbitrary CPU model like what for non-TDX VMs,
>   because only a subset of features can be configured for TD.
>   
>   - It's recommended to use '-cpu host' to create TD;
>   - '+feature/-feature' might not work as expected;
> 
>   future work: To introduce specific CPU model for TDs and enhance +/-features
>for TDs.

Which features are incompatible with TDX ?

Presumably you have such a list, so that KVM can block them when
using '-cpu host' ? If so, we should be able to sanity check the
use of these features in QEMU for the named CPU models / feature
selection too.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v4 10/22] hw/virtio: move vm_running check to virtio_device_started

2022-08-02 Thread Alex Bennée

All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.

Signed-off-by: Alex Bennée 
---
 include/hw/virtio/virtio.h   | 5 +
 hw/virtio/vhost-user-fs.c| 6 +-
 hw/virtio/vhost-user-i2c.c   | 6 +-
 hw/virtio/vhost-user-rng.c   | 6 +-
 hw/virtio/vhost-user-vsock.c | 6 +-
 hw/virtio/vhost-vsock.c  | 6 +-
 6 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9bb2485415..74e7ad5a92 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -100,6 +100,7 @@ struct VirtIODevice
 VirtQueue *vq;
 MemoryListener listener;
 uint16_t device_id;
+/* @vm_running: current VM running state via virtio_vmstate_change() */
 bool vm_running;
 bool broken; /* device in invalid state, needs reset */
 bool use_disabled_flag; /* allow use of 'disable' flag when needed */
@@ -376,6 +377,10 @@ static inline bool virtio_device_started(VirtIODevice 
*vdev, uint8_t status)
 return vdev->started;
 }
 
+if (!vdev->vm_running) {
+return false;
+}
+
 return status & VIRTIO_CONFIG_S_DRIVER_OK;
 }
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index e513e4fdda..d2bebba785 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -122,11 +122,7 @@ static void vuf_stop(VirtIODevice *vdev)
 static void vuf_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserFS *fs = VHOST_USER_FS(vdev);
-bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
-
-if (!vdev->vm_running) {
-should_start = false;
-}
+bool should_start = virtio_device_started(vdev, status);
 
 if (fs->vhost_dev.started == should_start) {
 return;
diff --git a/hw/virtio/vhost-user-i2c.c b/hw/virtio/vhost-user-i2c.c
index 6020eee093..b930cf6d5e 100644
--- a/hw/virtio/vhost-user-i2c.c
+++ b/hw/virtio/vhost-user-i2c.c
@@ -93,11 +93,7 @@ static void vu_i2c_stop(VirtIODevice *vdev)
 static void vu_i2c_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserI2C *i2c = VHOST_USER_I2C(vdev);
-bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
-
-if (!vdev->vm_running) {
-should_start = false;
-}
+bool should_start = virtio_device_started(vdev, status);
 
 if (i2c->vhost_dev.started == should_start) {
 return;
diff --git a/hw/virtio/vhost-user-rng.c b/hw/virtio/vhost-user-rng.c
index 3a7bf8e32d..a9c1c4bc79 100644
--- a/hw/virtio/vhost-user-rng.c
+++ b/hw/virtio/vhost-user-rng.c
@@ -90,11 +90,7 @@ static void vu_rng_stop(VirtIODevice *vdev)
 static void vu_rng_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserRNG *rng = VHOST_USER_RNG(vdev);
-bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
-
-if (!vdev->vm_running) {
-should_start = false;
-}
+bool should_start = virtio_device_started(vdev, status);
 
 if (rng->vhost_dev.started == should_start) {
 return;
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 0f8ff99f85..22c1616ebd 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -55,11 +55,7 @@ const VhostDevConfigOps vsock_ops = {
 static void vuv_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostVSockCommon *vvc = VHOST_VSOCK_COMMON(vdev);
-bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
-
-if (!vdev->vm_running) {
-should_start = false;
-}
+bool should_start = virtio_device_started(vdev, status);
 
 if (vvc->vhost_dev.started == should_start) {
 return;
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 0338de892f..8031c164a5 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -70,13 +70,9 @@ static int vhost_vsock_set_running(VirtIODevice *vdev, int 
start)
 static void vhost_vsock_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostVSockCommon *vvc = VHOST_VSOCK_COMMON(vdev);
-bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK;
+bool should_start = virtio_device_started(vdev, status);
 int ret;
 
-if (!vdev->vm_running) {
-should_start = false;
-}
-
 if (vvc->vhost_dev.started == should_start) {
 return;
 }
-- 
2.30.2

[PATCH v4 09/22] hw/virtio: add some vhost-user trace events

2022-08-02 Thread Alex Bennée

These are useful for tracing the lifetime of vhost-user connections.

Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost.c  | 6 ++
 hw/virtio/trace-events | 4 
 2 files changed, 10 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..5185c15295 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1477,6 +1477,8 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
 {
 int i;
 
+trace_vhost_dev_cleanup(hdev);
+
 for (i = 0; i < hdev->nvqs; ++i) {
 vhost_virtqueue_cleanup(hdev->vqs + i);
 }
@@ -1783,6 +1785,8 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 /* should only be called after backend is connected */
 assert(hdev->vhost_ops);
 
+trace_vhost_dev_start(hdev, vdev->name);
+
 vdev->vhost_started = true;
 hdev->started = true;
 hdev->vdev = vdev;
@@ -1869,6 +1873,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 /* should only be called after backend is connected */
 assert(hdev->vhost_ops);
 
+trace_vhost_dev_stop(hdev, vdev->name);
+
 if (hdev->vhost_ops->vhost_dev_start) {
 hdev->vhost_ops->vhost_dev_start(hdev, false);
 }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 20af2e7ebd..887ca7afa8 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -8,6 +8,10 @@ vhost_region_add_section_aligned(const char *name, uint64_t 
gpa, uint64_t size,
 vhost_section(const char *name) "%s"
 vhost_reject_section(const char *name, int d) "%s:%d"
 vhost_iotlb_miss(void *dev, int step) "%p step %d"
+vhost_dev_cleanup(void *dev) "%p"
+vhost_dev_start(void *dev, const char *name) "%p:%s"
+vhost_dev_stop(void *dev, const char *name) "%p:%s"
+
 
 # vhost-user.c
 vhost_user_postcopy_end_entry(void) ""
-- 
2.30.2

[PATCH v4 04/22] hw/virtio: fix vhost_user_read tracepoint

2022-08-02 Thread Alex Bennée

As reads happen in the callback we were never seeing them. We only
really care about the header so move the tracepoint to when the header
is complete.

Fixes: 6ca6d8ee9d (hw/virtio: add vhost_user_[read|write] trace points)
Signed-off-by: Alex Bennée 
Acked-by: Jason Wang 
Message-Id: <20220726192150.2435175-10-alex.ben...@linaro.org>
---
 hw/virtio/vhost-user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1936a44e82..c0b50deaf2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -295,6 +295,8 @@ static int vhost_user_read_header(struct vhost_dev *dev, 
VhostUserMsg *msg)
 return -EPROTO;
 }
 
+trace_vhost_user_read(msg->hdr.request, msg->hdr.flags);
+
 return 0;
 }
 
@@ -544,8 +546,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 }
 }
 
-trace_vhost_user_read(msg.hdr.request, msg.hdr.flags);
-
 return 0;
 }
 
-- 
2.30.2

[PATCH v4 05/22] include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE

2022-08-02 Thread Alex Bennée

When debugging a new vhost user you may be surprised to see
VHOST_USER_F_PROTOCOL getting squashed in the maze of
backend_features, acked_features and guest_features. Expand the
description here to help the next poor soul trying to work through
this.

Signed-off-by: Alex Bennée 

---
v3
  - s/vhost/vhost-user/
---
 include/hw/virtio/virtio.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..9bb2485415 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -24,7 +24,12 @@
 #include "qom/object.h"
 #include "hw/virtio/vhost.h"
 
-/* A guest should never accept this.  It implies negotiation is broken. */
+/*
+ * A guest should never accept this. It implies negotiation is broken
+ * between the driver frontend and the device. This bit is re-used for
+ * vhost-user to advertise VHOST_USER_F_PROTOCOL_FEATURES between QEMU
+ * and a vhost-user backend.
+ */
 #define VIRTIO_F_BAD_FEATURE   30
 
 #define VIRTIO_LEGACY_FEATURES ((0x1ULL << VIRTIO_F_BAD_FEATURE) | \
-- 
2.30.2

[PATCH v4 06/22] include/hw: document vhost_dev feature life-cycle

2022-08-02 Thread Alex Bennée

Try and explicitly document the various state of feature bits as
related to the vhost_dev structure. Importantly the backend_features
can advertise things like VHOST_USER_F_PROTOCOL_FEATURES which is
never exposed to the driver and is only present in the vhost-user
feature negotiation.

Signed-off-by: Alex Bennée 
Acked-by: Jason Wang 
---
 include/hw/virtio/vhost.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a346f23d13..586c5457e2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -86,8 +86,11 @@ struct vhost_dev {
 /* if non-zero, minimum required value for max_queues */
 int num_queues;
 uint64_t features;
+/** @acked_features: final set of negotiated features */
 uint64_t acked_features;
+/** @backend_features: backend specific feature bits */
 uint64_t backend_features;
+/** @protocol_features: final negotiated protocol features */
 uint64_t protocol_features;
 uint64_t max_queues;
 uint64_t backend_cap;
-- 
2.30.2

[PATCH v4 18/22] tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES

2022-08-02 Thread Alex Bennée

checkpatch.pl warns that non-plain asserts should be avoided so
convert the check to a plain g_assert.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index f2c19839e0..4af031c971 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -983,8 +983,7 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 static void vu_net_set_features(TestServer *s, CharBackend *chr,
 VhostUserMsg *msg)
 {
-g_assert_cmpint(msg->payload.u64 &
-(0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES), !=, 0ULL);
+g_assert(msg->payload.u64 & (0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES));
 if (s->test_flags == TEST_FLAGS_DISCONNECT) {
 qemu_chr_fe_disconnect(chr);
 s->test_flags = TEST_FLAGS_BAD;
-- 
2.30.2

Re: [PATCH] qemu-iotests: Discard stderr when probing devices

2022-08-02 Thread Kevin Wolf

Am 05.06.2022 um 16:57 hat Cole Robinson geschrieben:
> ./configure --enable-modules --enable-smartcard \
> --target-list=x86_64-softmmu,s390x-softmmu
> make
> cd build
> QEMU_PROG=`pwd`/s390x-softmmu/qemu-system-s390x \
> ../tests/check-block.sh qcow2
> ...
> --- /home/crobinso/src/qemu/tests/qemu-iotests/127.out
> +++ /home/crobinso/src/qemu/build/tests/qemu-iotests/scratch/127.out.bad
> @@ -1,4 +1,18 @@
>  QA output created by 127
> +Failed to open module: /home/crobinso/src/qemu/build/hw-usb-smartcard.so: 
> undefined symbol: ccid_card_ccid_attach
> ...
> --- /home/crobinso/src/qemu/tests/qemu-iotests/267.out
> +++ /home/crobinso/src/qemu/build/tests/qemu-iotests/scratch/267.out.bad
> @@ -1,4 +1,11 @@
>  QA output created by 267
> +Failed to open module: /home/crobinso/src/qemu/build/hw-usb-smartcard.so: 
> undefined symbol: ccid_card_ccid_attach
> 
> The stderr spew is its own known issue, but seems like iotests should
> be discarding stderr in this case.
> 
> Signed-off-by: Cole Robinson 

Oops, that took a while on my side... Thanks, applied to the block
branch.

By the way, putting diffs in the commit message is a great way to
confuse 'git am'. :-) Indenting your test scenario fixed it for me.

Kevin

[PATCH v4 08/22] hw/virtio: log potentially buggy guest drivers

2022-08-02 Thread Alex Bennée

If the guest driver attempts to use the UNUSED(30) bit it is
potentially buggy as 6.3 Legacy Interface: Reserved Feature Bits
states it "SHOULD NOT be negotiated". For now just log this guest
error.

Signed-off-by: Alex Bennée 
Acked-by: Jason Wang 
---
 hw/virtio/virtio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..97a6307c0f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2980,6 +2980,13 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
 return -EINVAL;
 }
+
+if (val & (1ull << VIRTIO_F_BAD_FEATURE)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: guest driver for %s has enabled UNUSED(30) feature 
bit!\n",
+  __func__, vdev->name);
+}
+
 ret = virtio_set_features_nocheck(vdev, val);
 if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
 /* VIRTIO_RING_F_EVENT_IDX changes the size of the caches.  */
-- 
2.30.2

[PATCH v4 14/22] tests/qtest: pass stdout/stderr down to subtests

2022-08-02 Thread Alex Bennée

When trying to work out what the virtio-net-tests where doing it was
hard because the g_test_trap_subprocess redirects all output to
/dev/null. Lift this restriction by using the appropriate flags so you
can see something similar to what the vhost-user-blk tests show when
running.

Signed-off-by: Alex Bennée 
Acked-by: Thomas Huth 
Message-Id: <20220407150042.2338562-1-alex.ben...@linaro.org>

---
v2
  - keep dumping of CLI behind the g_test_verbose flag
v4
  - fix overly long line
---
 tests/qtest/qos-test.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index f97d0a08fd..01a9393399 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -185,7 +185,9 @@ static void run_one_test(const void *arg)
 static void subprocess_run_one_test(const void *arg)
 {
 const gchar *path = arg;
-g_test_trap_subprocess(path, 0, 0);
+g_test_trap_subprocess(path, 0,
+   G_TEST_SUBPROCESS_INHERIT_STDOUT |
+   G_TEST_SUBPROCESS_INHERIT_STDERR);
 g_test_trap_assert_passed();
 }
 
-- 
2.30.2

[PATCH v4 01/22] hw/virtio: incorporate backend features in features

2022-08-02 Thread Alex Bennée

There are some extra bits used over a vhost-user connection which are
hidden from the device itself. We need to set them here to ensure we
enable things like the protocol extensions.

Currently net/vhost-user.c has it's own inscrutable way of persisting
this data but it really should live in the core vhost_user code.

Signed-off-by: Alex Bennée 
Message-Id: <20220726192150.2435175-7-alex.ben...@linaro.org>
---
 hw/virtio/vhost-user.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 75b8df21a4..1936a44e82 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1460,7 +1460,14 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
 
-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
+/*
+ * We need to include any extra backend only feature bits that
+ * might be needed by our device. Currently this includes the
+ * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol
+ * features.
+ */
+return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
+  features | dev->backend_features,
   log_enabled);
 }
 
-- 
2.30.2

[PATCH v4 17/22] tests/qtest: catch unhandled vhost-user messages

2022-08-02 Thread Alex Bennée

We don't need to action every message but lets document the ones we
are expecting to consume so future tests don't get confused about
unhandled bits.

Signed-off-by: Alex Bennée 

---
v1
  - drop g_test_fail() when we get unexpected result, that just hangs
v4
  - include ring addresses in set_vring_addr output
---
 tests/qtest/vhost-user-test.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 968113d591..f2c19839e0 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -358,12 +358,44 @@ static void chr_read(void *opaque, const uint8_t *buf, 
int size)
 }
 break;
 
+case VHOST_USER_SET_OWNER:
+/*
+ * We don't need to do anything here, the remote is just
+ * letting us know it is in charge. Just log it.
+ */
+qos_printf("set_owner: start of session\n");
+break;
+
 case VHOST_USER_GET_PROTOCOL_FEATURES:
 if (s->vu_ops->get_protocol_features) {
 s->vu_ops->get_protocol_features(s, chr, &msg);
 }
 break;
 
+case VHOST_USER_SET_PROTOCOL_FEATURES:
+/*
+ * We did set VHOST_USER_F_PROTOCOL_FEATURES so its valid for
+ * the remote end to send this. There is no handshake reply so
+ * just log the details for debugging.
+ */
+qos_printf("set_protocol_features: 0x%"PRIx64 "\n", msg.payload.u64);
+break;
+
+/*
+ * A real vhost-user backend would actually set the size and
+ * address of the vrings but we can simply report them.
+ */
+case VHOST_USER_SET_VRING_NUM:
+qos_printf("set_vring_num: %d/%d\n",
+   msg.payload.state.index, msg.payload.state.num);
+break;
+case VHOST_USER_SET_VRING_ADDR:
+qos_printf("set_vring_addr: 0x%"PRIx64"/0x%"PRIx64"/0x%"PRIx64"\n",
+   msg.payload.addr.avail_user_addr,
+   msg.payload.addr.desc_user_addr,
+   msg.payload.addr.used_user_addr);
+break;
+
 case VHOST_USER_GET_VRING_BASE:
 /* send back vring base to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
@@ -428,7 +460,18 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
 
+case VHOST_USER_SET_VRING_ENABLE:
+/*
+ * Another case we ignore as we don't need to respond. With a
+ * fully functioning vhost-user we would enable/disable the
+ * vring monitoring.
+ */
+qos_printf("set_vring(%d)=%s\n", msg.payload.state.index,
+   msg.payload.state.num ? "enabled" : "disabled");
+break;
+
 default:
+qos_printf("vhost-user: un-handled message: %d\n", msg.request);
 break;
 }
 
-- 
2.30.2

Re: [PATCH] docs/about/removed-features: Move the -soundhw into the right section

2022-08-02 Thread Daniel P . Berrangé

On Tue, Aug 02, 2022 at 09:56:11AM +0200, Thomas Huth wrote:
> The note about the removal of '-soundhw' has been accidentally added
> to the section of removed "linux-user mode CPUs" ... it should reside
> in the section about removed "System emulator command line arguments"
> instead.
> 
> Fixes: 039a68373c ("introduce -audio as a replacement for -soundhw")
> Signed-off-by: Thomas Huth 
> ---
>  docs/about/removed-features.rst | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v4 21/22] tests/qtest: add a get_features op to vhost-user-test

2022-08-02 Thread Alex Bennée

As we expand this test for more virtio devices we will need to support
different feature sets. Add a mandatory op field to fetch the list of
features needed for the test itself.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 37 +--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 61980bfc6a..fe46e28cf2 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -171,10 +171,11 @@ struct vhost_user_ops {
 const char *chr_opts);
 
 /* VHOST-USER commands. */
+uint64_t (*get_features)(TestServer *s);
 void (*set_features)(TestServer *s, CharBackend *chr,
-VhostUserMsg *msg);
+ VhostUserMsg *msg);
 void (*get_protocol_features)(TestServer *s,
-CharBackend *chr, VhostUserMsg *msg);
+  CharBackend *chr, VhostUserMsg *msg);
 };
 
 static const char *init_hugepagefs(void);
@@ -338,20 +339,22 @@ static void chr_read(void *opaque, const uint8_t *buf, 
int size)
 
 switch (msg.request) {
 case VHOST_USER_GET_FEATURES:
+/* Mandatory for tests to define get_features */
+g_assert(s->vu_ops->get_features);
+
 /* send back features to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
 msg.size = sizeof(m.payload.u64);
-msg.payload.u64 = 0x1ULL << VHOST_F_LOG_ALL |
-0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
-if (s->queues > 1) {
-msg.payload.u64 |= 0x1ULL << VIRTIO_NET_F_MQ;
-}
+
 if (s->test_flags >= TEST_FLAGS_BAD) {
 msg.payload.u64 = 0;
 s->test_flags = TEST_FLAGS_END;
+} else {
+msg.payload.u64 = s->vu_ops->get_features(s);
 }
-p = (uint8_t *) &msg;
-qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+
+qemu_chr_fe_write_all(chr, (uint8_t *) &msg,
+  VHOST_USER_HDR_SIZE + msg.size);
 break;
 
 case VHOST_USER_SET_FEATURES:
@@ -993,8 +996,21 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 wait_for_rings_started(s, s->queues * 2);
 }
 
+
+static uint64_t vu_net_get_features(TestServer *s)
+{
+uint64_t features = 0x1ULL << VHOST_F_LOG_ALL |
+0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
+
+if (s->queues > 1) {
+features |= 0x1ULL << VIRTIO_NET_F_MQ;
+}
+
+return features;
+}
+
 static void vu_net_set_features(TestServer *s, CharBackend *chr,
-VhostUserMsg *msg)
+VhostUserMsg *msg)
 {
 g_assert(msg->payload.u64 & (0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES));
 if (s->test_flags == TEST_FLAGS_DISCONNECT) {
@@ -1023,6 +1039,7 @@ static struct vhost_user_ops g_vu_net_ops = {
 
 .append_opts = append_vhost_net_opts,
 
+.get_features = vu_net_get_features,
 .set_features = vu_net_set_features,
 .get_protocol_features = vu_net_get_protocol_features,
 };
-- 
2.30.2

[PATCH v4 13/22] hw/virtio: add vhost-user-gpio-pci boilerplate

2022-08-02 Thread Alex Bennée

From: Viresh Kumar 

This allows is to instantiate a vhost-user-gpio device as part of a PCI
bus. It is mostly boilerplate which looks pretty similar to the
vhost-user-fs-pci device.

Signed-off-by: Viresh Kumar 
Reviewed-by: Alex Bennée 
Message-Id: 
<5f560cab92d0d789b1c94295ec74b9952907d69d.1641987128.git.viresh.ku...@linaro.org>
Signed-off-by: Alex Bennée 

---
v4
  - tweak MAINTAINER
---
 hw/virtio/vhost-user-gpio-pci.c | 69 +
 MAINTAINERS |  2 +-
 hw/virtio/meson.build   |  1 +
 3 files changed, 71 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/vhost-user-gpio-pci.c

diff --git a/hw/virtio/vhost-user-gpio-pci.c b/hw/virtio/vhost-user-gpio-pci.c
new file mode 100644
index 00..b3028a24a1
--- /dev/null
+++ b/hw/virtio/vhost-user-gpio-pci.c
@@ -0,0 +1,69 @@
+/*
+ * Vhost-user gpio virtio device PCI glue
+ *
+ * Copyright (c) 2022 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/vhost-user-gpio.h"
+#include "hw/virtio/virtio-pci.h"
+
+struct VHostUserGPIOPCI {
+VirtIOPCIProxy parent_obj;
+VHostUserGPIO vdev;
+};
+
+typedef struct VHostUserGPIOPCI VHostUserGPIOPCI;
+
+#define TYPE_VHOST_USER_GPIO_PCI "vhost-user-gpio-pci-base"
+
+DECLARE_INSTANCE_CHECKER(VHostUserGPIOPCI, VHOST_USER_GPIO_PCI,
+ TYPE_VHOST_USER_GPIO_PCI)
+
+static void vhost_user_gpio_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VHostUserGPIOPCI *dev = VHOST_USER_GPIO_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(&dev->vdev);
+
+vpci_dev->nvectors = 1;
+qdev_realize(vdev, BUS(&vpci_dev->bus), errp);
+}
+
+static void vhost_user_gpio_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+k->realize = vhost_user_gpio_pci_realize;
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = 0; /* Set by virtio-pci based on virtio id */
+pcidev_k->revision = 0x00;
+pcidev_k->class_id = PCI_CLASS_COMMUNICATION_OTHER;
+}
+
+static void vhost_user_gpio_pci_instance_init(Object *obj)
+{
+VHostUserGPIOPCI *dev = VHOST_USER_GPIO_PCI(obj);
+
+virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+TYPE_VHOST_USER_GPIO);
+}
+
+static const VirtioPCIDeviceTypeInfo vhost_user_gpio_pci_info = {
+.base_name = TYPE_VHOST_USER_GPIO_PCI,
+.non_transitional_name = "vhost-user-gpio-pci",
+.instance_size = sizeof(VHostUserGPIOPCI),
+.instance_init = vhost_user_gpio_pci_instance_init,
+.class_init = vhost_user_gpio_pci_class_init,
+};
+
+static void vhost_user_gpio_pci_register(void)
+{
+virtio_pci_types_register(&vhost_user_gpio_pci_info);
+}
+
+type_init(vhost_user_gpio_pci_register);
diff --git a/MAINTAINERS b/MAINTAINERS
index 2c4749a110..bb526df674 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2102,7 +2102,7 @@ vhost-user-gpio
 M: Alex Bennée 
 R: Viresh Kumar 
 S: Maintained
-F: hw/virtio/vhost-user-gpio.c
+F: hw/virtio/vhost-user-gpio*
 F: include/hw/virtio/vhost-user-gpio.h
 
 virtio-crypto
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 33c8e71fab..c14e3db10a 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -30,6 +30,7 @@ virtio_ss.add(when: 'CONFIG_VIRTIO_MEM', if_true: 
files('virtio-mem.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_I2C', if_true: 
files('vhost-user-i2c.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_RNG', if_true: 
files('vhost-user-rng.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_GPIO', if_true: 
files('vhost-user-gpio.c'))
+virtio_ss.add(when: ['CONFIG_VIRTIO_PCI', 'CONFIG_VHOST_USER_GPIO'], if_true: 
files('vhost-user-gpio-pci.c'))
 
 virtio_pci_ss = ss.source_set()
 virtio_pci_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: 
files('vhost-vsock-pci.c'))
-- 
2.30.2

RE: [PATCH 0/1] Update vfio-user module to the latest

2022-08-02 Thread Zhang, Chen




> -Original Message-
> From: Qemu-devel  bounces+chen.zhang=intel@nongnu.org> On Behalf Of Jagannathan
> Raman
> Sent: Tuesday, August 2, 2022 9:24 AM
> To: qemu-devel@nongnu.org
> Cc: stefa...@gmail.com; berra...@redhat.com
> Subject: [PATCH 0/1] Update vfio-user module to the latest
> 
> Hi,
> 
> This patch updates the libvfio-user submodule to the latest.

Just a rough idea, why not depends on linux distribution for the 
libvfio-user.so?
It looks no libvfio-user packet in distribution's repo.

Hi Thomas/Daniel:

For the RFC QEMU user space eBPF support,
https://lore.kernel.org/all/20220617073630.535914-6-chen.zh...@intel.com/T/
Maybe introduce the libubpf.so as a subproject like libvfio-user.so is more 
appropriate?

Thanks
Chen

> 
> Passed 'make check' & GitLab CI.
> 
> Thank you!
> --
> Jag
> 
> Jagannathan Raman (1):
>   vfio-user: update submodule to latest
> 
>  subprojects/libvfio-user | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --
> 2.20.1
>

[PATCH v4 12/22] hw/virtio: add boilerplate for vhost-user-gpio device

2022-08-02 Thread Alex Bennée

From: Viresh Kumar 

This creates the QEMU side of the vhost-user-gpio device which connects
to the remote daemon. It is based of vhost-user-i2c code.

Signed-off-by: Viresh Kumar 
Reviewed-by: Alex Bennée 
Message-Id: 
<5390324a748194a21bc99b1538e19761a8c64092.1641987128.git.viresh.ku...@linaro.org>
[AJB: fixes for qtest, tweaks to feature bits]
Signed-off-by: Alex Bennée 
Cc: Vincent Whitchurch 

---
v2
  - set VIRTIO_F_VERSION_1
  - set VHOST_USER_F_PROTOCOL_FEATURES
  - terminate feature_bits with VHOST_INVALID_FEATURE_BIT
  - ensure vdev->backend_features set
  - ensure vhost_dev.acked_features set
v3
  - break out vhost_dev structure for code flow reasons
  - use the vhost-user-blk connection lifecycle code
  - follow ->parent_obj style for VHostUserGPIO object
  - add all feature bits supported by the rust-vmm backend
  - clean-up errp propagation to avoid local_err and use ERRP_GAURD
  - s/vhost_dev->features/vdev->guest_features/ when calling vhost_ack_features
  - drop VHOST_USER_F_PROTOCOL_FEATURES definition (pushed to vhost-user)
  - explicitly call vhost_set_vring_enable due to properly negotiated 
VHOST_USER_F_PROTOCOL_FEATURES
  - use virtio_device_started() check instead of open code.
  - update MAINTAINERS
---
 include/hw/virtio/vhost-user-gpio.h |  35 +++
 hw/virtio/vhost-user-gpio.c | 411 
 MAINTAINERS |   7 +
 hw/virtio/Kconfig   |   5 +
 hw/virtio/meson.build   |   1 +
 hw/virtio/trace-events  |   5 +
 6 files changed, 464 insertions(+)
 create mode 100644 include/hw/virtio/vhost-user-gpio.h
 create mode 100644 hw/virtio/vhost-user-gpio.c

diff --git a/include/hw/virtio/vhost-user-gpio.h 
b/include/hw/virtio/vhost-user-gpio.h
new file mode 100644
index 00..4fe9aeecc0
--- /dev/null
+++ b/include/hw/virtio/vhost-user-gpio.h
@@ -0,0 +1,35 @@
+/*
+ * Vhost-user GPIO virtio device
+ *
+ * Copyright (c) 2021 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef _QEMU_VHOST_USER_GPIO_H
+#define _QEMU_VHOST_USER_GPIO_H
+
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "standard-headers/linux/virtio_gpio.h"
+#include "chardev/char-fe.h"
+
+#define TYPE_VHOST_USER_GPIO "vhost-user-gpio-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VHostUserGPIO, VHOST_USER_GPIO);
+
+struct VHostUserGPIO {
+/*< private >*/
+VirtIODevice parent_obj;
+CharBackend chardev;
+struct virtio_gpio_config config;
+struct vhost_virtqueue *vhost_vq;
+struct vhost_dev vhost_dev;
+VhostUserState vhost_user;
+VirtQueue *command_vq;
+VirtQueue *interrupt_vq;
+bool connected;
+/*< public >*/
+};
+
+#endif /* _QEMU_VHOST_USER_GPIO_H */
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
new file mode 100644
index 00..8b40fe450c
--- /dev/null
+++ b/hw/virtio/vhost-user-gpio.c
@@ -0,0 +1,411 @@
+/*
+ * Vhost-user GPIO virtio device
+ *
+ * Copyright (c) 2022 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/vhost-user-gpio.h"
+#include "qemu/error-report.h"
+#include "standard-headers/linux/virtio_ids.h"
+#include "trace.h"
+
+#define REALIZE_CONNECTION_RETRIES 3
+
+/* Features required from VirtIO */
+static const int feature_bits[] = {
+VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_GPIO_F_IRQ,
+VHOST_INVALID_FEATURE_BIT
+};
+
+static void vu_gpio_get_config(VirtIODevice *vdev, uint8_t *config)
+{
+VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
+
+memcpy(config, &gpio->config, sizeof(gpio->config));
+}
+
+static int vu_gpio_config_notifier(struct vhost_dev *dev)
+{
+VHostUserGPIO *gpio = VHOST_USER_GPIO(dev->vdev);
+
+memcpy(dev->vdev->config, &gpio->config, sizeof(gpio->config));
+virtio_notify_config(dev->vdev);
+
+return 0;
+}
+
+const VhostDevConfigOps gpio_ops = {
+.vhost_dev_config_notifier = vu_gpio_config_notifier,
+};
+
+static int vu_gpio_start(VirtIODevice *vdev)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
+struct vhost_dev *vhost_dev = &gpio->vhost_dev;
+int ret, i;
+
+if (!k->set_guest_notifiers) {
+error_report("binding does not support guest notifiers");
+return -ENOSYS;
+}
+
+ret = vhost_dev_enable_notifiers(vhost_dev, vdev);
+if (ret < 0) {
+error_report("Error enabling host notifiers: %d", ret);
+return ret;
+}
+
+ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, true);
+if (ret < 0) {
+error_report("Error binding guest notifier: %d", ret);
+goto err_host_notifiers;
+}
+
+

[PATCH v4 16/22] tests/qtest: use qos_printf instead of g_test_message

2022-08-02 Thread Alex Bennée

The vhost-user tests respawn qos-test as a standalone process. As a
result the gtester framework squashes all messages coming out of it
which make it hard to debug. As the test does not care about asserting
certain messages just convert the tests to use the direct qos_printf.

Signed-off-by: Alex Bennée 
---
 tests/qtest/qos-test.c|  5 +
 tests/qtest/vhost-user-test.c | 13 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index d958ef4be3..b30bbb30f7 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -321,6 +321,11 @@ static void walk_path(QOSGraphNode *orig_path, int len)
 int main(int argc, char **argv, char** envp)
 {
 g_test_init(&argc, &argv, NULL);
+
+if (g_test_subprocess()) {
+qos_printf("qos_test running single test in subprocess\n");
+}
+
 if (g_test_verbose()) {
 qos_printf("ENVIRONMENT VARIABLES: {\n");
 for (char **env = envp; *env != 0; env++) {
diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 8bf390be20..968113d591 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -26,6 +26,7 @@
 #include "libqos/virtio-pci.h"
 
 #include "libqos/malloc-pc.h"
+#include "libqos/qgraph_internal.h"
 #include "hw/virtio/virtio-net.h"
 
 #include "standard-headers/linux/vhost_types.h"
@@ -316,7 +317,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 }
 
 if (size != VHOST_USER_HDR_SIZE) {
-g_test_message("Wrong message size received %d", size);
+qos_printf("%s: Wrong message size received %d\n", __func__, size);
 return;
 }
 
@@ -327,8 +328,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 p += VHOST_USER_HDR_SIZE;
 size = qemu_chr_fe_read_all(chr, p, msg.size);
 if (size != msg.size) {
-g_test_message("Wrong message size received %d != %d",
-   size, msg.size);
+qos_printf("%s: Wrong message size received %d != %d\n",
+   __func__, size, msg.size);
 return;
 }
 }
@@ -450,7 +451,7 @@ static const char *init_hugepagefs(void)
 }
 
 if (access(path, R_OK | W_OK | X_OK)) {
-g_test_message("access on path (%s): %s", path, strerror(errno));
+qos_printf("access on path (%s): %s", path, strerror(errno));
 g_test_fail();
 return NULL;
 }
@@ -460,13 +461,13 @@ static const char *init_hugepagefs(void)
 } while (ret != 0 && errno == EINTR);
 
 if (ret != 0) {
-g_test_message("statfs on path (%s): %s", path, strerror(errno));
+qos_printf("statfs on path (%s): %s", path, strerror(errno));
 g_test_fail();
 return NULL;
 }
 
 if (fs.f_type != HUGETLBFS_MAGIC) {
-g_test_message("Warning: path not on HugeTLBFS: %s", path);
+qos_printf("Warning: path not on HugeTLBFS: %s", path);
 g_test_fail();
 return NULL;
 }
-- 
2.30.2

[PATCH v4 11/22] hw/virtio: move vhd->started check into helper and add FIXME

2022-08-02 Thread Alex Bennée

The `started` field is manipulated internally within the vhost code
except for one place, vhost-user-blk via f5b22d06fb (vhost: recheck
dev state in the vhost_migration_log routine). Mark that as a FIXME
because it introduces a potential race. I think the referenced fix
should be tracking its state locally.

Signed-off-by: Alex Bennée 
---
 include/hw/virtio/vhost.h  | 12 
 hw/block/vhost-user-blk.c  | 10 --
 hw/scsi/vhost-scsi.c   |  4 ++--
 hw/scsi/vhost-user-scsi.c  |  2 +-
 hw/virtio/vhost-user-fs.c  |  3 ++-
 hw/virtio/vhost-user-i2c.c |  4 ++--
 hw/virtio/vhost-user-rng.c |  4 ++--
 hw/virtio/vhost-user-vsock.c   |  2 +-
 hw/virtio/vhost-vsock-common.c |  3 ++-
 hw/virtio/vhost-vsock.c|  2 +-
 10 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 586c5457e2..61b957e927 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -94,6 +94,7 @@ struct vhost_dev {
 uint64_t protocol_features;
 uint64_t max_queues;
 uint64_t backend_cap;
+/* @started: is the vhost device started? */
 bool started;
 bool log_enabled;
 uint64_t log_size;
@@ -165,6 +166,17 @@ int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev);
  */
 void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev);
 
+/**
+ * vhost_dev_is_started() - report status of vhost device
+ * @hdev: common vhost_dev structure
+ *
+ * Return the started status of the vhost device
+ */
+static inline bool vhost_dev_is_started(struct vhost_dev *hdev)
+{
+return hdev->started;
+}
+
 /**
  * vhost_dev_start() - start the vhost device
  * @hdev: common vhost_dev structure
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9117222456..2bba42478d 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -229,7 +229,7 @@ static void vhost_user_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 return;
 }
 
-if (s->dev.started == should_start) {
+if (vhost_dev_is_started(&s->dev) == should_start) {
 return;
 }
 
@@ -286,7 +286,7 @@ static void vhost_user_blk_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 return;
 }
 
-if (s->dev.started) {
+if (vhost_dev_is_started(&s->dev)) {
 return;
 }
 
@@ -415,6 +415,12 @@ static void vhost_user_blk_event(void *opaque, 
QEMUChrEvent event)
  * the vhost migration code. If disconnect was caught there is an
  * option for the general vhost code to get the dev state without
  * knowing its type (in this case vhost-user).
+ *
+ * FIXME: this is sketchy to be reaching into vhost_dev
+ * now because we are forcing something that implies we
+ * have executed vhost_dev_stop() but that won't happen
+ * until vhost_user_blk_stop() gets called from the bh.
+ * Really this state check should be tracked locally.
  */
 s->dev.started = false;
 }
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3059068175..bdf337a7a2 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -120,7 +120,7 @@ static void vhost_scsi_set_status(VirtIODevice *vdev, 
uint8_t val)
 start = false;
 }
 
-if (vsc->dev.started == start) {
+if (vhost_dev_is_started(&vsc->dev) == start) {
 return;
 }
 
@@ -147,7 +147,7 @@ static int vhost_scsi_pre_save(void *opaque)
 
 /* At this point, backend must be stopped, otherwise
  * it might keep writing to memory. */
-assert(!vsc->dev.started);
+assert(!vhost_dev_is_started(&vsc->dev));
 
 return 0;
 }
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 1b2f7eed98..bc37317d55 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -49,7 +49,7 @@ static void vhost_user_scsi_set_status(VirtIODevice *vdev, 
uint8_t status)
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
 bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
 
-if (vsc->dev.started == start) {
+if (vhost_dev_is_started(&vsc->dev) == start) {
 return;
 }
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index d2bebba785..ad0f91c607 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -20,6 +20,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "qemu/error-report.h"
+#include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
@@ -124,7 +125,7 @@ static void vuf_set_status(VirtIODevice *vdev, uint8_t 
status)
 VHostUserFS *fs = VHOST_USER_FS(vdev);
 bool should_start = virtio_device_started(vdev, status);
 
-if (fs->vhost_dev.started == should_start) {
+if (vhost_dev_is_started(&fs->vhost_dev) == should

Re: [PATCH] main loop: add missing documentation links to GS/IO macros

2022-08-02 Thread Kevin Wolf

Am 09.06.2022 um 14:22 hat Emanuele Giuseppe Esposito geschrieben:
> If we go directly to GLOBAL_STATE_CODE, IO_CODE or IO_OR_GS_CODE
> definition, we just find that they "mark and check that the function
> is part of the {category} API".
> However, ther is no definition on what {category} API is, they are
> in include/block/block-*.h
> Therefore, add a comment that refers to such documentation.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 

Thanks, applied to the block branch.

Kevin

[PATCH v4 20/22] tests/qtest: implement stub for VHOST_USER_GET_CONFIG

2022-08-02 Thread Alex Bennée

We don't implement the full solution because frankly none of the tests
need to at the moment. We may end up re-implementing libvhostuser in
the end.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 4af031c971..61980bfc6a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -79,6 +79,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
+VHOST_USER_GET_CONFIG = 24,
+VHOST_USER_SET_CONFIG = 25,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -372,6 +374,17 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 }
 break;
 
+case VHOST_USER_GET_CONFIG:
+/*
+ * Treat GET_CONFIG as a NOP and just reply and let the guest
+ * consider we have updated its memory. Tests currently don't
+ * require working configs.
+ */
+msg.flags |= VHOST_USER_REPLY_MASK;
+p = (uint8_t *) &msg;
+qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+break;
+
 case VHOST_USER_SET_PROTOCOL_FEATURES:
 /*
  * We did set VHOST_USER_F_PROTOCOL_FEATURES so its valid for
-- 
2.30.2

[PATCH v4 19/22] tests/qtest: add assert to catch bad features

2022-08-02 Thread Alex Bennée

No device driver (which is what the qvirtio_ access functions
represent) should be setting UNUSED(30) in the feature space. Although
existing libqos users mask it out lets ensure nothing sneaks through.

Signed-off-by: Alex Bennée 
---
 tests/qtest/libqos/virtio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 09ec09b655..03056e5187 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -101,6 +101,8 @@ uint64_t qvirtio_get_features(QVirtioDevice *d)
 
 void qvirtio_set_features(QVirtioDevice *d, uint64_t features)
 {
+g_assert(!(features & QVIRTIO_F_BAD_FEATURE));
+
 d->features = features;
 d->bus->set_features(d, features);
 
-- 
2.30.2

[PATCH v4 15/22] tests/qtest: add a timeout for subprocess_run_one_test

2022-08-02 Thread Alex Bennée

Hangs have been observed in the tests and currently we don't timeout
if a subprocess hangs. Rectify that.

Signed-off-by: Alex Bennée 
Reviewed-by: Thomas Huth 

---
v3
  - expand timeout to 180 at Thomas' suggestion
v4
  - fix merge conflict with earlier patch
---
 tests/qtest/qos-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index 01a9393399..d958ef4be3 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -185,7 +185,7 @@ static void run_one_test(const void *arg)
 static void subprocess_run_one_test(const void *arg)
 {
 const gchar *path = arg;
-g_test_trap_subprocess(path, 0,
+g_test_trap_subprocess(path, 180 * G_USEC_PER_SEC,
G_TEST_SUBPROCESS_INHERIT_STDOUT |
G_TEST_SUBPROCESS_INHERIT_STDERR);
 g_test_trap_assert_passed();
-- 
2.30.2

[PATCH v4 22/22] tests/qtest: enable tests for virtio-gpio

2022-08-02 Thread Alex Bennée

We don't have a virtio-gpio implementation in QEMU and only
support a vhost-user backend. The QEMU side of the code is minimal so
it should be enough to instantiate the device and pass some vhost-user
messages over the control socket. To do this we hook into the existing
vhost-user-test code and just add the bits required for gpio.

Signed-off-by: Alex Bennée 
Cc: Viresh Kumar 
Cc: Paolo Bonzini 
Cc: Eric Auger 
Message-Id: <20220408155704.2777166-1-alex.ben...@linaro.org>

---
v2
  - add more of the missing boilerplate
  - don't request LOG_SHMD
  - use get_features op
  - report VIRTIO_F_VERSION_1
  - more comments
v4
  - update MAINTAINERS
---
 tests/qtest/libqos/virtio-gpio.h |  35 +++
 tests/qtest/libqos/virtio-gpio.c | 171 +++
 tests/qtest/libqos/virtio.c  |   2 +-
 tests/qtest/vhost-user-test.c|  66 
 MAINTAINERS  |   1 +
 tests/qtest/libqos/meson.build   |   1 +
 6 files changed, 275 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/libqos/virtio-gpio.h
 create mode 100644 tests/qtest/libqos/virtio-gpio.c

diff --git a/tests/qtest/libqos/virtio-gpio.h b/tests/qtest/libqos/virtio-gpio.h
new file mode 100644
index 00..f11d41bd19
--- /dev/null
+++ b/tests/qtest/libqos/virtio-gpio.h
@@ -0,0 +1,35 @@
+/*
+ * virtio-gpio structures
+ *
+ * Copyright (c) 2022 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef TESTS_LIBQOS_VIRTIO_GPIO_H
+#define TESTS_LIBQOS_VIRTIO_GPIO_H
+
+#include "qgraph.h"
+#include "virtio.h"
+#include "virtio-pci.h"
+
+typedef struct QVhostUserGPIO QVhostUserGPIO;
+typedef struct QVhostUserGPIOPCI QVhostUserGPIOPCI;
+typedef struct QVhostUserGPIODevice QVhostUserGPIODevice;
+
+struct QVhostUserGPIO {
+QVirtioDevice *vdev;
+QVirtQueue **queues;
+};
+
+struct QVhostUserGPIOPCI {
+QVirtioPCIDevice pci_vdev;
+QVhostUserGPIO gpio;
+};
+
+struct QVhostUserGPIODevice {
+QOSGraphObject obj;
+QVhostUserGPIO gpio;
+};
+
+#endif
diff --git a/tests/qtest/libqos/virtio-gpio.c b/tests/qtest/libqos/virtio-gpio.c
new file mode 100644
index 00..762aa6695b
--- /dev/null
+++ b/tests/qtest/libqos/virtio-gpio.c
@@ -0,0 +1,171 @@
+/*
+ * virtio-gpio nodes for testing
+ *
+ * Copyright (c) 2022 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "standard-headers/linux/virtio_config.h"
+#include "../libqtest.h"
+#include "qemu/module.h"
+#include "qgraph.h"
+#include "virtio-gpio.h"
+
+static QGuestAllocator *alloc;
+
+static void virtio_gpio_cleanup(QVhostUserGPIO *gpio)
+{
+QVirtioDevice *vdev = gpio->vdev;
+int i;
+
+for (i = 0; i < 2; i++) {
+qvirtqueue_cleanup(vdev->bus, gpio->queues[i], alloc);
+}
+g_free(gpio->queues);
+}
+
+/*
+ * This handles the VirtIO setup from the point of view of the driver
+ * frontend and therefor doesn't present any vhost specific features
+ * and in fact masks of the re-used bit.
+ */
+static void virtio_gpio_setup(QVhostUserGPIO *gpio)
+{
+QVirtioDevice *vdev = gpio->vdev;
+uint64_t features;
+int i;
+
+features = qvirtio_get_features(vdev);
+features &= ~QVIRTIO_F_BAD_FEATURE;
+qvirtio_set_features(vdev, features);
+
+gpio->queues = g_new(QVirtQueue *, 2);
+for (i = 0; i < 2; i++) {
+gpio->queues[i] = qvirtqueue_setup(vdev, alloc, i);
+}
+qvirtio_set_driver_ok(vdev);
+}
+
+static void *qvirtio_gpio_get_driver(QVhostUserGPIO *v_gpio,
+ const char *interface)
+{
+if (!g_strcmp0(interface, "vhost-user-gpio")) {
+return v_gpio;
+}
+if (!g_strcmp0(interface, "virtio")) {
+return v_gpio->vdev;
+}
+
+g_assert_not_reached();
+}
+
+static void *qvirtio_gpio_device_get_driver(void *object,
+const char *interface)
+{
+QVhostUserGPIODevice *v_gpio = object;
+return qvirtio_gpio_get_driver(&v_gpio->gpio, interface);
+}
+
+/* virtio-gpio (mmio) */
+static void qvirtio_gpio_device_destructor(QOSGraphObject *obj)
+{
+QVhostUserGPIODevice *gpio_dev = (QVhostUserGPIODevice *) obj;
+virtio_gpio_cleanup(&gpio_dev->gpio);
+}
+
+static void qvirtio_gpio_device_start_hw(QOSGraphObject *obj)
+{
+QVhostUserGPIODevice *gpio_dev = (QVhostUserGPIODevice *) obj;
+virtio_gpio_setup(&gpio_dev->gpio);
+}
+
+static void *virtio_gpio_device_create(void *virtio_dev,
+   QGuestAllocator *t_alloc,
+   void *addr)
+{
+QVhostUserGPIODevice *virtio_device = g_new0(QVhostUserGPIODevice, 1);
+QVhostUserGPIO *interface = &virtio_device->gpio;
+
+interface->vdev = virtio_dev;
+alloc = t_alloc;
+
+virtio_device->obj.get_driver = qvirtio_gpio_device_get_driver;
+virtio_device->obj.start_hw = qvirtio_gpio_device_start_hw;
+virtio_device->obj.destructor = qvirtio_gpio_device_destructor;
+
+return &virtio_d

Re: [PATCH] virtiofsd: Disable killpriv_v2 by default

2022-08-02 Thread Dr. David Alan Gilbert

* Vivek Goyal (vgo...@redhat.com) wrote:
> We are having bunch of issues with killpriv_v2 enabled by default. First
> of all it relies on clearing suid/sgid bits as needed by dropping
> capability CAP_FSETID. This does not work for remote filesystems like
> NFS (and possibly others). 
> 
> Secondly, we are noticing other issues related to clearing of SGID
> which leads to failures for xfstests generic/355 and generic/193.
> 
> Thirdly, there are other issues w.r.t caching of metadata (suid/sgid)
> bits in fuse client with killpriv_v2 enabled. Guest can cache that
> data for sometime even if cleared on server.
> 
> Second and Third issue are fixable. Just that it might take a little
> while to get it fixed in kernel. First one will probably not see
> any movement for a long time.
> 
> Given these issues, killpriv_v2 does not seem to be a good candidate
> for enabling by default. We have already disabled it by default in
> rust version of virtiofsd.
> 
> Hence this patch disabled killpriv_v2 by default. User can choose to
> enable it by passing option "-o killpriv_v2".
> 
> Signed-off-by: Vivek Goyal 

OK, yes I see the corresponding 9b03f65d commit in the Rust version.

Reviewed-by: Dr. David Alan Gilbert 


> ---
>  tools/virtiofsd/passthrough_ll.c |   13 ++---
>  1 file changed, 2 insertions(+), 11 deletions(-)
> 
> Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
> ===
> --- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c   2022-07-29 
> 08:19:05.925119947 -0400
> +++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c2022-07-29 
> 08:27:08.048049096 -0400
> @@ -767,19 +767,10 @@ static void lo_init(void *userdata, stru
>  fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
>  conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>  lo->killpriv_v2 = 1;
> -} else if (lo->user_killpriv_v2 == -1 &&
> -   conn->capable & FUSE_CAP_HANDLE_KILLPRIV_V2) {
> -/*
> - * User did not specify a value for killpriv_v2. By default enable it
> - * if connection offers this capability
> - */
> -fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling killpriv_v2\n");
> -conn->want |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> -lo->killpriv_v2 = 1;
>  } else {
>  /*
> - * Either user specified to disable killpriv_v2, or connection does
> - * not offer this capability. Disable killpriv_v2 in both the cases
> + * Either user specified to disable killpriv_v2, or did not
> + * specify anything. Disable killpriv_v2 in both the cases.
>   */
>  fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling killpriv_v2\n");
>  conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v1 01/40] * HACK * linux-headers: Update headers to pull in TDX API changes

2022-08-02 Thread Xiaoyao Li


On 8/2/2022 5:47 PM, Daniel P. Berrangé wrote:

On Tue, Aug 02, 2022 at 03:47:11PM +0800, Xiaoyao Li wrote:

Pull in recent TDX updates, which are not backwards compatible.

It's just to make this series runnable. It will be updated by script

scripts/update-linux-headers.sh

once TDX support is upstreamed in linux kernel.


I saw a bunch of TDX support merged in 5.19:

commit 3a755ebcc2557e22b895b8976257f682c653db1d
Merge: 5b828263b180 c796f02162e4
Author: Linus Torvalds 
Date:   Mon May 23 17:51:12 2022 -0700

 Merge tag 'x86_tdx_for_v5.19_rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
 
 Pull Intel TDX support from Borislav Petkov:

  "Intel Trust Domain Extensions (TDX) support.
 
   This is the Intel version of a confidential computing solution called

   Trust Domain Extensions (TDX). This series adds support to run the
   kernel as part of a TDX guest. It provides similar guest protections
   to AMD's SEV-SNP like guest memory and register state encryption,
   memory integrity protection and a lot more.
 
   Design-wise, it differs from AMD's solution considerably: it uses a

   software module which runs in a special CPU mode called (Secure
   Arbitration Mode) SEAM. As the name suggests, this module serves as
   sort of an arbiter which the confidential guest calls for services it
   needs during its lifetime.
 
   Just like AMD's SNP set, this series reworks and streamlines certain

   parts of x86 arch code so that this feature can be properly
   accomodated"


Is that sufficient for this patch, or is there more pending out of
tree that QEMU still depends on ?


That's TDX guest support, i.e., running Liunx as TDX guest OS.

What QEMU needs is TDX KVM support and that hasn't been merged yet.


With regards,
Daniel

1 2 3 >

1 - 100 of 228 matches

Mail list logo