date:20240828

Re: [External] Re: [PATCH v9 09/10] hw/nvme: add reservation protocal command

2024-08-28 Thread 卢长奇

Hi,

I want to know if I understand it correctly.

```
static void nvme_aio_err(NvmeRequest *req, int ret)
{
uint16_t status = NVME_SUCCESS;
Error *local_err = NULL;

switch (req->cmd.opcode) {
case NVME_CMD_READ:
case NVME_CMD_RESV_REPORT:
status = NVME_UNRECOVERED_READ;
break;
case NVME_CMD_FLUSH:
case NVME_CMD_WRITE:
case NVME_CMD_WRITE_ZEROES:
case NVME_CMD_ZONE_APPEND:
case NVME_CMD_COPY:
case NVME_CMD_RESV_REGISTER:
case NVME_CMD_RESV_ACQUIRE:
case NVME_CMD_RESV_RELEASE:
status = NVME_WRITE_FAULT;
break;
default:
status = NVME_INTERNAL_DEV_ERROR;
break;
}

trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);

error_setg_errno(&local_err, -ret, "aio failed");
error_report_err(local_err);

/*
* Set the command status code to the first encountered error but
allow a
* subsequent Internal Device Error to trump it.
*/
if (req->status && status != NVME_INTERNAL_DEV_ERROR) {
return;
}

req->status = status;
}
```
In the above use case, if it is a pr-related command and the error code
is not supported, the invalid error code should be returned instead of
the Fault error code.


On 2024/8/28 14:51, Klaus Jensen wrote:
> On Jul 12 10:36, Changqi Lu wrote:
>> Add reservation acquire, reservation register,
>> reservation release and reservation report commands
>> in the nvme device layer.
>>
>> By introducing these commands, this enables the nvme
>> device to perform reservation-related tasks, including
>> querying keys, querying reservation status, registering
>> reservation keys, initiating and releasing reservations,
>> as well as clearing and preempting reservations held by
>> other keys.
>>
>> These commands are crucial for management and control of
>> shared storage resources in a persistent manner.
>> Signed-off-by: Changqi Lu
>> Signed-off-by: zhenwei pi
>> Acked-by: Klaus Jensen
>> ---
>> hw/nvme/ctrl.c | 318 +++
>> hw/nvme/nvme.h | 4 +
>> include/block/nvme.h | 37 +
>> 3 files changed, 359 insertions(+)
>>
>
>> +static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
>> +{
>> + int ret;
>> + NvmeKeyInfo key_info;
>> + NvmeNamespace *ns = req->ns;
>> + uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
>> + bool ignore_key = cdw10 >> 3 & 0x1;
>> + uint8_t action = cdw10 & 0x7;
>> + uint8_t ptpl = cdw10 >> 30 & 0x3;
>> + bool aptpl;
>> +
>> + switch (ptpl) {
>> + case NVME_RESV_PTPL_NO_CHANGE:
>> + aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
>> + break;
>> + case NVME_RESV_PTPL_DISABLE:
>> + aptpl = false;
>> + break;
>> + case NVME_RESV_PTPL_ENABLE:
>> + aptpl = true;
>> + break;
>> + default:
>> + return NVME_INVALID_FIELD;
>> + }
>> +
>> + ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + switch (action) {
>> + case NVME_RESV_REGISTER_ACTION_REGISTER:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
>> + key_info.nr_key, 0, aptpl,
>> + ignore_key, nvme_misc_cb,
>> + req);
>> + break;
>> + case NVME_RESV_REGISTER_ACTION_UNREGISTER:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
>> + 0, aptpl, ignore_key,
>> + nvme_misc_cb, req);
>> + break;
>> + case NVME_RESV_REGISTER_ACTION_REPLACE:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
>> + key_info.nr_key, 0, aptpl, ignore_key,
>> + nvme_misc_cb, req);
>> + break;
>
> There should be some check on rescap I think. On a setup without
> reservation support from the block layer, these functions ends up
> returning ENOTSUP which causes hw/nvme to end up returning a Write Fault
> (which is a little wonky).
>
> Should they return invalid field, invalid opcode?

Re: [PATCH] hw/ppc: fix decrementer with BookE timers

2024-08-28 Thread Clément Chigot

On Tue, Aug 27, 2024 at 7:40 PM Cédric Le Goater  wrote:
>
> Hello Clément,
>
> On 7/15/24 10:46, Clément Chigot wrote:
> > The BookE decrementer stops at 0, meaning that it won't decremented
> > towards "negative" values.
> > However, the current logic is inverted: decr is updated solely when
> > the resulting value would be negative.
>
> How did you hit the issue ? which machine ? I didn't see any error
> when booting Linux 6.6.3 on mpc8544ds, e500mc, e5500 and e6500.

I hit this issue while running some version of VxWorks on a custom
machine: p3041ds (description [1] and our local implementation [2]).
So, I'm not that surprised you were not able to reproduce.

> > Signed-off-by: Clément Chigot 
> > Fixed: 8e0a5ac87800 ("hw/ppc: Avoid decrementer rounding errors")
>
> LGTM,
>
> Reviewed-by: Cédric Le Goater 
>
> We have some automated tests with the ppce500 machine which it would be
> interesting  to extend to have a better coverage of booke.

Thanks for the pointer, I'll see if I can extend them.

> Thanks,
>
> C.
>

[1] 
https://www.nxp.com/design/design-center/software/qoriq-developer-resources/p3041-qoriq-development-system:P3041DS
[2] https://github.com/AdaCore/qemu/blob/qemu-stable-9.0.0/hw/ppc/p3041ds.c

>
> > ---
> >   hw/ppc/ppc.c | 4 +++-
> >   1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > index e6fa5580c0..9fc85c7de0 100644
> > --- a/hw/ppc/ppc.c
> > +++ b/hw/ppc/ppc.c
> > @@ -729,7 +729,9 @@ static inline int64_t __cpu_ppc_load_decr(CPUPPCState 
> > *env, int64_t now,
> >   int64_t decr;
> >
> >   n = ns_to_tb(tb_env->decr_freq, now);
> > -if (next > n && tb_env->flags & PPC_TIMER_BOOKE) {
> > +
> > +/* BookE timers stop when reaching 0.  */
> > +if (next < n && tb_env->flags & PPC_TIMER_BOOKE) {
> >   decr = 0;
> >   } else {
> >   decr = next - n;
>
>

Re: [PATCH v1] softmmu/physmem: fix memory leak in dirty_memory_extend()

2024-08-28 Thread David Hildenbrand


On 27.08.24 20:41, Peter Xu wrote:

On Tue, Aug 27, 2024 at 08:00:07PM +0200, David Hildenbrand wrote:

On 27.08.24 19:57, Peter Xu wrote:

On Tue, Aug 27, 2024 at 10:37:15AM +0200, David Hildenbrand wrote:

   /* Called with ram_list.mutex held */
-static void dirty_memory_extend(ram_addr_t old_ram_size,
-ram_addr_t new_ram_size)
+static void dirty_memory_extend(ram_addr_t new_ram_size)
   {
-ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
- DIRTY_MEMORY_BLOCK_SIZE);
   ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
DIRTY_MEMORY_BLOCK_SIZE);
   int i;
-/* Only need to extend if block count increased */
-if (new_num_blocks <= old_num_blocks) {
-return;
-}


One nitpick here: IMHO we could move the n_blocks cache in ram_list
instead, then we keep the check here and avoid caching it three times with
the same value.


yes, as written in the patch description: "We'll store the number of blocks
along with the actual pointer to keep it simple."

It's cleaner to me to store it along the RCU-freed data structure that has
this size.


Yep, I can get that.

I think one reason I had my current preference is to avoid things like:

   for (...) {
 if (...)
return;
   }

I'd at least want to sanity check before "return" to make sure all three
bitmap chunks are having the same size.  It gave me the feeling that we
could process "blocks[]" differently but we actually couldn't - In our case
it has the ram_list mutex when update, so it must be guaranteed.  However
due to the same reason, I see it cleaner to just keep the counter there
too.


I'll move it to the higher level because I have more important stuff to 
work on and want to get this off my plate.


"num_blocks" does not quite make sense in RAMList (where we have a 
different "blocks" variable) so I'll call it "num_dirty_blocks" or sth 
like that.


--
Cheers,

David / dhildenb

Re: [PULL 17/20] target/arm: Do memory type alignment check when translation disabled

2024-08-28 Thread Michael Tokarev


05.03.2024 16:52, Peter Maydell wrote:

From: Richard Henderson 

If translation is disabled, the default memory type is Device, which
requires alignment checking.  This is more optimally done early via
the MemOp given to the TCG memory operation.

Reviewed-by: Philippe Mathieu-Daudé 
Reported-by: Idan Horowitz 
Signed-off-by: Richard Henderson 
Message-id: 20240301204110.656742-6-richard.hender...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204
Signed-off-by: Richard Henderson 
Signed-off-by: Peter Maydell 


Hi!

Apparently this change also breaks picolibc testsuite (between
8.2 and 9.0, bisect points to this commit).

For example:

./qemu-system-arm \
  -m 1G \
  -chardev stdio,mux=on,id=stdio0 \
  -semihosting-config enable=on,chardev=stdio0,arg=program-name \
  -monitor none \
  -serial none \
  -machine none,accel=tcg \
  -cpu cortex-a8 \
  -device 
loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/printf_scanf_thumb_v7_fp_softfp,cpu-num=0
 \
  -nographic

(yes, this testsuite uses qemu-system as a substitute of
qemu-user, sort of, (ab)using -device loader)

Before this change:

hello world 1
checking floating point
checking pos args
checking long long
checking c99 formats

(exit code = 0)

After this change:

hello world 1
checking floating point
checking pos args
ARM fault: undef
R0:   0x0002
R1:   0x5c90
R2:   0x201ffeac
R3:   0x2020
R4:   0x
R5:   0x2004
R6:   0x201ffec4
PC:   0x0364



Another test from the same picolibc:

timeout 1s ./qemu-system-arm \
  -m 1G \
  -chardev stdio,mux=on,id=stdio0 \
  -semihosting-config enable=on,chardev=stdio0,arg=program-name \
  -monitor none \
  -serial none \
  -machine none,accel=tcg \
  -cpu cortex-a7 \
  -device 
loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/newlib/testsuite/newlib.string/tstring_thumb_v7_nofp,cpu-num=0
 \
  -nographic

This one succeeds immediately before this change, and
just times out (qemu is basically doing nothing, according to
strace) after this commit.



Exactly the same happens up to current qemu master (ie, 9.1-tobe).
So is not https://gitlab.com/qemu-project/qemu/-/issues/2326
and is not fixed by 4c2c0474693229c1f533239bb983495c5427784d
"target/arm: Fix usage of MMU indexes when EL3 is AArch32".



picolibc is built this way:

picolibc-1.8.6$ meson setup . arm-none-eabi \
  --prefix=/usr \
  -Dc_args='-Wdate-time' \
  -Dtests=true \
  --cross-file scripts/cross-arm-none-eabi.txt \
  -Dspecsdir=/usr/lib/picolibc/arm-none-eabi \
  -Dincludedir=lib/picolibc/arm-none-eabi/include \
  -Dlibdir=lib/picolibc/arm-none-eabi/lib


Thanks,

/mjt

Re: [External] Re: [PATCH v9 09/10] hw/nvme: add reservation protocal command

2024-08-28 Thread Klaus Jensen

On Aug 28 00:20, 卢长奇 wrote:
> Hi,
> 
> I want to know if I understand it correctly.
> 
> ```
> static void nvme_aio_err(NvmeRequest *req, int ret)
> {
> uint16_t status = NVME_SUCCESS;
> Error *local_err = NULL;
> 
> switch (req->cmd.opcode) {
> case NVME_CMD_READ:
> case NVME_CMD_RESV_REPORT:
> status = NVME_UNRECOVERED_READ;
> break;
> case NVME_CMD_FLUSH:
> case NVME_CMD_WRITE:
> case NVME_CMD_WRITE_ZEROES:
> case NVME_CMD_ZONE_APPEND:
> case NVME_CMD_COPY:
> case NVME_CMD_RESV_REGISTER:
> case NVME_CMD_RESV_ACQUIRE:
> case NVME_CMD_RESV_RELEASE:
> status = NVME_WRITE_FAULT;
> break;
> default:
> status = NVME_INTERNAL_DEV_ERROR;
> break;
> }
> 
> trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
> 
> error_setg_errno(&local_err, -ret, "aio failed");
> error_report_err(local_err);
> 
> /*
> * Set the command status code to the first encountered error but
> allow a
> * subsequent Internal Device Error to trump it.
> */
> if (req->status && status != NVME_INTERNAL_DEV_ERROR) {
> return;
> }
> 
> req->status = status;
> }
> ```
> In the above use case, if it is a pr-related command and the error code
> is not supported, the invalid error code should be returned instead of
> the Fault error code.
> 

Yes, as far as I can tell from the spec, if a Reservations related
command is issued on a controller/namespace that does not BOTH support
Reservations (i.e., in ONCS and RESCAP), then return Invalid Command
Opcode.


signature.asc
Description: PGP signature

Re: [PATCH v4 19/34] modules: check arch on qom lookup

2024-08-28 Thread Gerd Hoffmann

On Tue, Aug 27, 2024 at 05:37:00PM GMT, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> (old patch)
> 
> On 24/6/21 12:38, Gerd Hoffmann wrote:
> > With target-specific modules we can have multiple modules implementing
> > the same object.  Therefore we have to check the target arch on lookup
> > to find the correct module.
> 
> "multiple modules implementing the same object." seems a design
> mistake to me.

IIRC that is (or was?) a problem with tcg or kvm modules, not fully sure
that ever happened in mainline qemu as the tcg modularization effort
stalled at some point.  But some object had the same name on all
architectures.  Which is not a problem when linked into
qemu-system-${arch} but is a problem when built as module.

> Assuming we clean the tree of target-specific modules "implementing
> the same object" -- due to heterogeneous emulation --,

Oh, yes, when linking multiple archs into one qemu binary the name
duplication is a problem even in the non-modular case.

> is there
> another use case for this check?

I don't think so.

take care,
  Gerd

Re: [PATCH v10 1/8] memory: prevent dma-reentracy issues

2024-08-28 Thread Gerd Hoffmann

  Hi,

> But I think unexpected access shouldn't be there in the 1st place,
> so guard looks pretty legit at this point.
> Lets see what Gerd finds out from edk2 point of view.

CPU eject happens /after/ SMM syncronisation, when CPUs are on their way
back into normal mode:

 * The boot processor will do the cpu hotplug register writes, from SMM
   mode, so it obviously will be in SMM mode still.
 * The processor to be unplugged will be parked in a halt loop in SMM
   mode until the unplug completed, so that processor will be in SMM
   mode too.
 * All other processors may or may not be in SMM mode.

So parallel access is possible.

take care,
  Gerd

[PATCH v5 1/2] kvm: replace fprintf with error_report()/printf() in kvm_init()

2024-08-28 Thread Ani Sinha

error_report() is more appropriate for error situations. Replace fprintf with
error_report() and error_printf() as appropriate. Cosmetic. No functional
change.

CC: qemu-triv...@nongnu.org
CC: zhao1@intel.com
CC: arm...@redhat.com
Reviewed-by: Zhao Liu 
Signed-off-by: Ani Sinha 
---
 accel/kvm/kvm-all.c | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

changelog:
v2: fix a bug.
v3: replace one instance of error_report() with error_printf(). added tags.
v4: changes suggested by Markus.
v5: more changes from Markus's comments on v4.

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 75d11a07b2..fcc157f0e6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2427,7 +2427,7 @@ static int kvm_init(MachineState *ms)
 QLIST_INIT(&s->kvm_parked_vcpus);
 s->fd = qemu_open_old(s->device ?: "/dev/kvm", O_RDWR);
 if (s->fd == -1) {
-fprintf(stderr, "Could not access KVM kernel module: %m\n");
+error_report("Could not access KVM kernel module: %m");
 ret = -errno;
 goto err;
 }
@@ -2437,13 +2437,13 @@ static int kvm_init(MachineState *ms)
 if (ret >= 0) {
 ret = -EINVAL;
 }
-fprintf(stderr, "kvm version too old\n");
+error_report("kvm version too old");
 goto err;
 }
 
 if (ret > KVM_API_VERSION) {
 ret = -EINVAL;
-fprintf(stderr, "kvm version not supported\n");
+error_report("kvm version not supported");
 goto err;
 }
 
@@ -2488,26 +2488,22 @@ static int kvm_init(MachineState *ms)
 } while (ret == -EINTR);
 
 if (ret < 0) {
-fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
-strerror(-ret));
+error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));
 
 #ifdef TARGET_S390X
 if (ret == -EINVAL) {
-fprintf(stderr,
-"Host kernel setup problem detected. Please verify:\n");
-fprintf(stderr, "- for kernels supporting the switch_amode or"
-" user_mode parameters, whether\n");
-fprintf(stderr,
-"  user space is running in primary address space\n");
-fprintf(stderr,
-"- for kernels supporting the vm.allocate_pgste sysctl, "
-"whether it is enabled\n");
+error_printf("Host kernel setup problem detected."
+ " Please verify:\n");
+error_printf("- for kernels supporting the"
+" switch_amode or user_mode parameters, whether");
+error_printf(" user space is running in primary address space\n");
+error_printf("- for kernels supporting the vm.allocate_pgste"
+ " sysctl, whether it is enabled\n");
 }
 #elif defined(TARGET_PPC)
 if (ret == -EINVAL) {
-fprintf(stderr,
-"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
-(type == 2) ? "pr" : "hv");
+error_printf("PPC KVM module is not loaded. Try modprobe 
kvm_%s.\n",
+ (type == 2) ? "pr" : "hv");
 }
 #endif
 goto err;
@@ -2526,9 +2522,9 @@ static int kvm_init(MachineState *ms)
 nc->name, nc->num, soft_vcpus_limit);
 
 if (nc->num > hard_vcpus_limit) {
-fprintf(stderr, "Number of %s cpus requested (%d) exceeds "
-"the maximum cpus supported by KVM (%d)\n",
-nc->name, nc->num, hard_vcpus_limit);
+error_report("Number of %s cpus requested (%d) exceeds "
+ "the maximum cpus supported by KVM (%d)",
+ nc->name, nc->num, hard_vcpus_limit);
 exit(1);
 }
 }
@@ -2542,8 +2538,8 @@ static int kvm_init(MachineState *ms)
 }
 if (missing_cap) {
 ret = -EINVAL;
-fprintf(stderr, "kvm does not support %s\n%s",
-missing_cap->name, upgrade_note);
+error_report("kvm does not support %s", missing_cap->name);
+error_printf("%s", upgrade_note);
 goto err;
 }
 
-- 
2.42.0

[PATCH v5 0/2] Some refactoring

2024-08-28 Thread Ani Sinha

replace fprintf() with error_report() in kvm_init()
refactor code in kvm_init() to move core vm creation operation to its
own function.

CC: qemu-triv...@nongnu.org
CC: qemu-devel@nongnu.org
CC: zhao1@intel.com
CC: pbonz...@redhat.com
CC: arm...@redhat.com


Ani Sinha (2):
  kvm: replace fprintf with error_report()/printf() in kvm_init()
  kvm: refactor core virtual machine creation into its own function

 accel/kvm/kvm-all.c | 106 +---
 1 file changed, 61 insertions(+), 45 deletions(-)

-- 
2.42.0

[PATCH v5 2/2] kvm: refactor core virtual machine creation into its own function

2024-08-28 Thread Ani Sinha

Refactoring the core logic around KVM_CREATE_VM into its own separate function
so that it can be called from other functions in future patches. There is
no functional change in this patch.

CC: pbonz...@redhat.com
CC: zhao1@intel.com
CC: cfont...@suse.de
CC: arm...@redhat.com
CC: qemu-triv...@nongnu.org
Reviewed-by: Zhao Liu 
Reviewed-by: Claudio Fontana 
Signed-off-by: Ani Sinha 
---
 accel/kvm/kvm-all.c | 86 -
 1 file changed, 53 insertions(+), 33 deletions(-)

v2: s/fprintf/warn_report as suggested by zhao
v3: s/warn_report/error_report. function names adjusted to conform to
other names. fprintf -> error_report() moved to its own patch.
v4: added tags and rebased.
v5: rebased.

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fcc157f0e6..cf3d820b94 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2385,6 +2385,57 @@ uint32_t kvm_dirty_ring_size(void)
 return kvm_state->kvm_dirty_ring_size;
 }
 
+static int kvm_create_vm(MachineState *ms, KVMState *s, int type)
+{
+int ret;
+
+do {
+ret = kvm_ioctl(s, KVM_CREATE_VM, type);
+} while (ret == -EINTR);
+
+if (ret < 0) {
+error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));
+
+#ifdef TARGET_S390X
+if (ret == -EINVAL) {
+error_printf("Host kernel setup problem detected."
+ " Please verify:\n");
+error_printf("- for kernels supporting the"
+" switch_amode or user_mode parameters, whether");
+error_printf(" user space is running in primary address space\n");
+error_printf("- for kernels supporting the vm.allocate_pgste"
+ " sysctl, whether it is enabled\n");
+}
+#elif defined(TARGET_PPC)
+if (ret == -EINVAL) {
+error_printf("PPC KVM module is not loaded. Try modprobe 
kvm_%s.\n",
+ (type == 2) ? "pr" : "hv");
+}
+#endif
+}
+
+return ret;
+}
+
+static int kvm_machine_type(MachineState *ms)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+int type;
+
+if (object_property_find(OBJECT(current_machine), "kvm-type")) {
+g_autofree char *kvm_type;
+kvm_type = object_property_get_str(OBJECT(current_machine),
+   "kvm-type",
+   &error_abort);
+type = mc->kvm_type(ms, kvm_type);
+} else if (mc->kvm_type) {
+type = mc->kvm_type(ms, NULL);
+} else {
+type = kvm_arch_get_default_type(ms);
+}
+return type;
+}
+
 static int kvm_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
@@ -2467,45 +2518,14 @@ static int kvm_init(MachineState *ms)
 }
 s->as = g_new0(struct KVMAs, s->nr_as);
 
-if (object_property_find(OBJECT(current_machine), "kvm-type")) {
-g_autofree char *kvm_type = 
object_property_get_str(OBJECT(current_machine),
-"kvm-type",
-&error_abort);
-type = mc->kvm_type(ms, kvm_type);
-} else if (mc->kvm_type) {
-type = mc->kvm_type(ms, NULL);
-} else {
-type = kvm_arch_get_default_type(ms);
-}
-
+type = kvm_machine_type(ms);
 if (type < 0) {
 ret = -EINVAL;
 goto err;
 }
 
-do {
-ret = kvm_ioctl(s, KVM_CREATE_VM, type);
-} while (ret == -EINTR);
-
+ret = kvm_create_vm(ms, s, type);
 if (ret < 0) {
-error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));
-
-#ifdef TARGET_S390X
-if (ret == -EINVAL) {
-error_printf("Host kernel setup problem detected."
- " Please verify:\n");
-error_printf("- for kernels supporting the"
-" switch_amode or user_mode parameters, whether");
-error_printf(" user space is running in primary address space\n");
-error_printf("- for kernels supporting the vm.allocate_pgste"
- " sysctl, whether it is enabled\n");
-}
-#elif defined(TARGET_PPC)
-if (ret == -EINVAL) {
-error_printf("PPC KVM module is not loaded. Try modprobe 
kvm_%s.\n",
- (type == 2) ? "pr" : "hv");
-}
-#endif
 goto err;
 }
 
-- 
2.42.0

Re: [PATCH for-9.2 00/10] s390: Convert virtio-ccw, cpu to three-phase reset, and followup cleanup

2024-08-28 Thread Nico Boehr

Quoting Nico Boehr (2024-08-26 14:08:20)
> There was a little hickup without the fixup to patch 2, but after Nina
> pushed the fixup, we did not observe any failures related to your
> changes in our CI. Thanks!

Peter, after a few CI runs, we unfortunately did find some issues with your
patch :-(

Rebooting a guest in a loop sometimes fails. Michael was able to bisect it
to your series.

The problem is intermittent. The guest is unable to load its initramfs:

  [0.560674] rootfs image is not initramfs (no cpio magic); looks like an 
initrd
  [0.588605] Freeing initrd memory: 95680K
  [0.593143] md: Waiting for all devices to be available before autodetect
  [0.593144] md: If you don't use raid, use raid=noautodetect
  [0.593145] md: Autodetecting RAID arrays.
  [0.593146] md: autorun ...
  [0.593147] md: ... autorun DONE.
  [0.593156] RAMDISK: gzip image found at block 0
  [0.609110] RAMDISK: incomplete write (29120 != 32768)
  [0.609113] write error

...and then a panic because the kernel doesn't find a rootfs.

It seems like the compressed initramfs is corrupted somehow, since "rootfs
image is not initramfs" doesn't appear on a successful boot.

initramfs and kernel are loaded via direct kernel boot. Running under KVM.

Some vhost error messages do appear before the guest panics, but it is not
entirely clear to me whether they are related:

  [...]
  2024-08-28T06:56:29.765324Z qemu-system-s390x: vhost vring error in virtqueue 
0: Invalid argument (22)
  2024-08-28T06:56:32.210982Z qemu-system-s390x: vhost vring error in virtqueue 
0: Invalid argument (22)
  2024-08-28 06:56:35.430+: panic s390: core='0' 
psw-mask='0x000200018000' psw-addr='0x0387b028c67e' 
reason='disabled-wait'

Any idea?

Re: [PULL 10/11] crypto: push error reporting into TLS session I/O APIs

2024-08-28 Thread Thomas Huth


On 27/08/2024 09.05, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Mon, Aug 12, 2024 at 05:38:41PM +0200, Thomas Huth wrote:

On 24/07/2024 11.47, Daniel P. Berrangé wrote:

The current TLS session I/O APIs just return a synthetic errno
value on error, which has been translated from a gnutls error
value. This looses a large amount of valuable information that
distinguishes different scenarios.

Pushing population of the "Error *errp" object into the TLS
session I/O APIs gives more detailed error information.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Daniel P. Berrangé 
---


  Hi Daniel!

iotest 233 is failing for me with -raw now, and bisection
points to this commit. Output is:

--- .../qemu/tests/qemu-iotests/233.out
+++ /tmp/qemu/tests/qemu-iotests/scratch/raw-file-233/233.out.bad
@@ -69,8 +69,8 @@
  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

  == check TLS with authorization ==
-qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: Software caused 
connection abort
-qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: Software caused 
connection abort
+qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: The TLS connection 
was non-properly terminated.
+qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=PORT,tls-creds=tls0': 
Failed to read option reply: Cannot read from TLS channel: The TLS connection 
was non-properly terminated.


This is an expected change. Previously squashed the real GNUTLS error
into ECONNABORTED:

-case GNUTLS_E_PREMATURE_TERMINATION:
-errno = ECONNABORTED;
-break;


now we report the original gnutls root cause.

IOW, we need to update the expected output files.


Has this been done?


No, I think the problem still persists.

 Thomas

[PATCH v3] target: riscv: Add Svvptc extension support

2024-08-28 Thread Alexandre Ghiti

The Svvptc extension describes a uarch that does not cache invalid TLB
entries: that's the case for qemu so there is nothing particular to
implement other than the introduction of this extension.

Since qemu already exposes Svvptc behaviour, let's enable it by default
since it allows to drastically reduce the number of sfence.vma emitted
by S-mode.

Signed-off-by: Alexandre Ghiti 
---

Changes in v3:
- Rebase on top of master
- Change 1.12 to 1.13 spec version (drew)

Changes in v2:
- Rebase on top of master
- Enable Svvptc by default

 target/riscv/cpu.c | 2 ++
 target/riscv/cpu_cfg.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a90808a3ba..cabe698f2f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -197,6 +197,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(svinval, PRIV_VERSION_1_12_0, ext_svinval),
 ISA_EXT_DATA_ENTRY(svnapot, PRIV_VERSION_1_12_0, ext_svnapot),
 ISA_EXT_DATA_ENTRY(svpbmt, PRIV_VERSION_1_12_0, ext_svpbmt),
+ISA_EXT_DATA_ENTRY(svvptc, PRIV_VERSION_1_13_0, ext_svvptc),
 ISA_EXT_DATA_ENTRY(xtheadba, PRIV_VERSION_1_11_0, ext_xtheadba),
 ISA_EXT_DATA_ENTRY(xtheadbb, PRIV_VERSION_1_11_0, ext_xtheadbb),
 ISA_EXT_DATA_ENTRY(xtheadbs, PRIV_VERSION_1_11_0, ext_xtheadbs),
@@ -1509,6 +1510,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
 MULTI_EXT_CFG_BOOL("svinval", ext_svinval, false),
 MULTI_EXT_CFG_BOOL("svnapot", ext_svnapot, false),
 MULTI_EXT_CFG_BOOL("svpbmt", ext_svpbmt, false),
+MULTI_EXT_CFG_BOOL("svvptc", ext_svvptc, true),
 
 MULTI_EXT_CFG_BOOL("zicntr", ext_zicntr, true),
 MULTI_EXT_CFG_BOOL("zihpm", ext_zihpm, true),
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 8b272fb826..7d16048a76 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -81,6 +81,7 @@ struct RISCVCPUConfig {
 bool ext_svinval;
 bool ext_svnapot;
 bool ext_svpbmt;
+bool ext_svvptc;
 bool ext_zdinx;
 bool ext_zaamo;
 bool ext_zacas;
-- 
2.39.2

Re: [PATCH] mark with for-crc32 in a consistent manner

2024-08-28 Thread Mark Cave-Ayland


On 27/08/2024 18:51, Michael Tokarev wrote:


27.08.2024 15:09, Mark Cave-Ayland wrote:

On 27/08/2024 11:02, Michael Tokarev wrote:


in many cases,  is only included for crc32 function,
and in some of them, there's a comment saying that, but in
a different way.  In one place (hw/net/rtl8139.c), there was
another #include added between the comment and  include.

Make all such comments to be on the same line as #include, make
it consistent, and also add a few missing comments, including
hw/nvram/mac_nvram.c which uses adler32 instead.

...

  //#define DEBUG_STELLARIS_ENET 1


For the hw/net devices there are separate net_crc32() and net_crc32_le() functions 
from net/net.c which are intended for (most) network devices where the "standard" 
polynomials are used.


In many hw/net files I touched in this patch, *both*
plain crc32() and qemu's net_crc32() are used.

For now I just marked the #include, nothing more, we
can finish the refactoring later if needs to be.

Speaking of crc32 from zlib, I don't really see a point
in re-implementing it in this context (it is re-implemented
in net/net.c:net_crc32(), with comment "XXX: optimize*).
Implementation from zlib is quite a good one. Not the best
possible but definitely not the worst and is better than
net_crc32().

(See also https://create.stephan-brumme.com/crc32/)


Right, I was just wondering that since you were already changing the zlib.h includes 
if you were interested to update the net devices and switch net_crc32() to use the 
zlib implementation whilst you were working in that area ;)



What we definitely *can* optimize is the two cases in
tcg (arm and loong iirc) - they have hardware isns for
crc32, but these operate in fixed 4 or 8-bytes integers,
and there, implementing a function in qemu would be
nice - not much code but significant speedup due to
fixed size of the argument.

I don't see any isse with using crc32 from zlib, since
zlib is used for other things anyway and is mandatory
dependency.  In case of qemu-user binaries, even static
link, it is tiny (since only crc32 stuff is linked to),
but there it would be more interesting to have in-qemu
implementation for static-size isns.


Agreed, I can certainly see the need for an implementation in the QEMU core. In which 
case the net_crc32_*() functions will then be simple wrappers onto those 
implementations...



ATB,

Mark.

[PATCH v2] softmmu/physmem: fix memory leak in dirty_memory_extend()

2024-08-28 Thread David Hildenbrand

As reported by Peter, we might be leaking memory when removing the
highest RAMBlock (in the weird ram_addr_t space), and adding a new one.

We will fail to realize that we already allocated bitmaps for more
dirty memory blocks, and effectively discard the pointers to them.

Fix it by getting rid of last_ram_page() and by remembering the number
of dirty memory blocks that have been allocated already.

While at it, let's use "unsigned int" for the number of blocks, which
should be sufficient until we reach ~32 exabytes.

Looks like this leak was introduced as we switched from using a single
bitmap_zero_extend() to allocating multiple bitmaps:
bitmap_zero_extend() relies on g_renew() which should have taken care of
this.

Resolves: 
https://lkml.kernel.org/r/CAFEAcA-k7a+VObGAfCFNygQNfCKL=AfX6A4kScq=vssk0pe...@mail.gmail.com
Reported-by: Peter Maydell 
Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Peter Xu 
Tested-by: Peter Maydell 
Cc: qemu-sta...@nongnu.org
Cc: Stefan Hajnoczi 
Cc: Paolo Bonzini 
Cc: Peter Xu 
Cc: "Philippe Mathieu-Daudé" 
Signed-off-by: David Hildenbrand 
---

v1 -> v2:
* Move the counter to RAMList
* Use "unsigned int" instead of "ram_addr_t" as type for the number of
  blocks

---
 include/exec/ramlist.h |  1 +
 system/physmem.c   | 35 +--
 2 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2ad2a81acc..d9cfe530be 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -50,6 +50,7 @@ typedef struct RAMList {
 /* RCU-enabled, writes protected by the ramlist lock. */
 QLIST_HEAD(, RAMBlock) blocks;
 DirtyMemoryBlocks *dirty_memory[DIRTY_MEMORY_NUM];
+unsigned int num_dirty_blocks;
 uint32_t version;
 QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
 } RAMList;
diff --git a/system/physmem.c b/system/physmem.c
index 94600a33ec..5e7f066762 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -1534,18 +1534,6 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
 return offset;
 }
 
-static unsigned long last_ram_page(void)
-{
-RAMBlock *block;
-ram_addr_t last = 0;
-
-RCU_READ_LOCK_GUARD();
-RAMBLOCK_FOREACH(block) {
-last = MAX(last, block->offset + block->max_length);
-}
-return last >> TARGET_PAGE_BITS;
-}
-
 static void qemu_ram_setup_dump(void *addr, ram_addr_t size)
 {
 int ret;
@@ -1799,13 +1787,11 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, 
ram_addr_t length)
 }
 
 /* Called with ram_list.mutex held */
-static void dirty_memory_extend(ram_addr_t old_ram_size,
-ram_addr_t new_ram_size)
+static void dirty_memory_extend(ram_addr_t new_ram_size)
 {
-ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
- DIRTY_MEMORY_BLOCK_SIZE);
-ram_addr_t new_num_blocks = DIV_ROUND_UP(new_ram_size,
- DIRTY_MEMORY_BLOCK_SIZE);
+unsigned int old_num_blocks = ram_list.num_dirty_blocks;
+unsigned int new_num_blocks = DIV_ROUND_UP(new_ram_size,
+   DIRTY_MEMORY_BLOCK_SIZE);
 int i;
 
 /* Only need to extend if block count increased */
@@ -1837,6 +1823,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
 g_free_rcu(old_blocks, rcu);
 }
 }
+
+ram_list.num_dirty_blocks = new_num_blocks;
 }
 
 static void ram_block_add(RAMBlock *new_block, Error **errp)
@@ -1846,11 +1834,9 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 RAMBlock *block;
 RAMBlock *last_block = NULL;
 bool free_on_error = false;
-ram_addr_t old_ram_size, new_ram_size;
+ram_addr_t ram_size;
 Error *err = NULL;
 
-old_ram_size = last_ram_page();
-
 qemu_mutex_lock_ramlist();
 new_block->offset = find_ram_offset(new_block->max_length);
 
@@ -1901,11 +1887,8 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 }
 }
 
-new_ram_size = MAX(old_ram_size,
-  (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
-if (new_ram_size > old_ram_size) {
-dirty_memory_extend(old_ram_size, new_ram_size);
-}
+ram_size = (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS;
+dirty_memory_extend(ram_size);
 /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
  * QLIST (which has an RCU-friendly variant) does not have insertion at
  * tail, so save the last element in last_block.
-- 
2.46.0

[PATCH] qemu-timer: check for timerlist being initialised

2024-08-28 Thread Ben Dooks

If you create a new timer before the timer lists have been
initialised then you will end up with an abort due to trying
to access an illegal timer list struct. Add an assert() for
the timer list being NON-null.

Signed-off-by: Ben Dooks 
---
 util/qemu-timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index 213114be68..5c0c6be56b 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -365,6 +365,7 @@ void timer_init_full(QEMUTimer *ts,
 timer_list_group = &main_loop_tlg;
 }
 ts->timer_list = timer_list_group->tl[type];
+assert(ts->timer_list != NULL);
 ts->cb = cb;
 ts->opaque = opaque;
 ts->scale = scale;
-- 
2.37.2.352.g3c44437643

[PATCH v4 00/17] bsd-user: Comprehensive RISCV Support

2024-08-28 Thread Ajeet Singh

Key Changes Compared to Version 3:
Minor comment changes and all the patches
have been reviewed by Richard Henderson


Mark Corbin (15):
  bsd-user: Implement RISC-V CPU initialization and main loop
  bsd-user: Add RISC-V CPU execution loop and syscall handling
  bsd-user: Implement RISC-V CPU register cloning and reset functions
  bsd-user: Implement RISC-V TLS register setup
  bsd-user: Add RISC-V ELF definitions and hardware capability detection
  bsd-user: Define RISC-V register structures and register copying
  bsd-user: Add RISC-V signal trampoline setup function
  bsd-user: Implement RISC-V sysarch system call emulation
  bsd-user: Add RISC-V thread setup and initialization support
  bsd-user: Define RISC-V VM parameters and helper functions
  bsd-user: Define RISC-V system call structures and constants
  bsd-user: Define RISC-V signal handling structures and constants
  bsd-user: Implement RISC-V signal trampoline setup functions
  bsd-user: Implement 'get_mcontext' for RISC-V
  bsd-user: Implement set_mcontext and get_ucontext_sigreturn for RISCV

Warner Losh (2):
  bsd-user: Add generic RISC-V64 target definitions
  bsd-user: Add RISC-V 64-bit Target Configuration and Debug XML Files

 bsd-user/riscv/signal.c   | 170 ++
 bsd-user/riscv/target.h   |  20 +++
 bsd-user/riscv/target_arch.h  |  27 
 bsd-user/riscv/target_arch_cpu.c  |  29 +
 bsd-user/riscv/target_arch_cpu.h  | 147 ++
 bsd-user/riscv/target_arch_elf.h  |  42 +++
 bsd-user/riscv/target_arch_reg.h  |  88 +
 bsd-user/riscv/target_arch_signal.h   |  75 
 bsd-user/riscv/target_arch_sigtramp.h |  42 +++
 bsd-user/riscv/target_arch_sysarch.h  |  41 +++
 bsd-user/riscv/target_arch_thread.h   |  47 +++
 bsd-user/riscv/target_arch_vmparam.h  |  53 
 bsd-user/riscv/target_syscall.h   |  38 ++
 configs/targets/riscv64-bsd-user.mak  |   4 +
 14 files changed, 823 insertions(+)
 create mode 100644 bsd-user/riscv/signal.c
 create mode 100644 bsd-user/riscv/target.h
 create mode 100644 bsd-user/riscv/target_arch.h
 create mode 100644 bsd-user/riscv/target_arch_cpu.c
 create mode 100644 bsd-user/riscv/target_arch_cpu.h
 create mode 100644 bsd-user/riscv/target_arch_elf.h
 create mode 100644 bsd-user/riscv/target_arch_reg.h
 create mode 100644 bsd-user/riscv/target_arch_signal.h
 create mode 100644 bsd-user/riscv/target_arch_sigtramp.h
 create mode 100644 bsd-user/riscv/target_arch_sysarch.h
 create mode 100644 bsd-user/riscv/target_arch_thread.h
 create mode 100644 bsd-user/riscv/target_arch_vmparam.h
 create mode 100644 bsd-user/riscv/target_syscall.h
 create mode 100644 configs/targets/riscv64-bsd-user.mak

-- 
2.34.1

[PATCH v4 09/17] bsd-user: Add RISC-V thread setup and initialization support

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Implemented functions for setting up and initializing threads in the
RISC-V architecture.
The 'target_thread_set_upcall' function sets up the stack pointer,
program counter, and function argument for new threads.
The 'target_thread_init' function initializes thread registers based on
the provided image information.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
Co-authored-by: Kyle Evans 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_thread.h | 47 +
 1 file changed, 47 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_thread.h

diff --git a/bsd-user/riscv/target_arch_thread.h 
b/bsd-user/riscv/target_arch_thread.h
new file mode 100644
index 00..db0f9eb52c
--- /dev/null
+++ b/bsd-user/riscv/target_arch_thread.h
@@ -0,0 +1,47 @@
+/*
+ *  RISC-V thread support
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_THREAD_H
+#define TARGET_ARCH_THREAD_H
+
+/* Compare with cpu_set_upcall() in riscv/riscv/vm_machdep.c */
+static inline void target_thread_set_upcall(CPURISCVState *regs,
+abi_ulong entry, abi_ulong arg, abi_ulong stack_base,
+abi_ulong stack_size)
+{
+abi_ulong sp;
+
+sp = ROUND_DOWN(stack_base + stack_size,16);
+
+regs->gpr[xSP] = sp;
+regs->pc = entry;
+regs->gpr[xA0] = arg;
+}
+
+/* Compare with exec_setregs() in riscv/riscv/machdep.c */
+static inline void target_thread_init(struct target_pt_regs *regs,
+struct image_info *infop)
+{
+regs->sepc = infop->entry;
+regs->regs[xRA] = infop->entry;
+regs->regs[xA0] = infop->start_stack;   
+regs->regs[xSP] = ROUND_DOWN(infop->start_stack,16);
+}
+
+#endif /* TARGET_ARCH_THREAD_H */
-- 
2.34.1

[PATCH v4 10/17] bsd-user: Define RISC-V VM parameters and helper functions

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added definitions for RISC-V VM parameters, including maximum and
default sizes for text, data, and stack, as well as address space
limits.
Implemented helper functions for retrieving and setting specific
values in the CPU state, such as stack pointer and return values.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_vmparam.h | 53 
 1 file changed, 53 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_vmparam.h

diff --git a/bsd-user/riscv/target_arch_vmparam.h 
b/bsd-user/riscv/target_arch_vmparam.h
new file mode 100644
index 00..0f2486def1
--- /dev/null
+++ b/bsd-user/riscv/target_arch_vmparam.h
@@ -0,0 +1,53 @@
+/*
+ *  RISC-V VM parameters definitions
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_VMPARAM_H
+#define TARGET_ARCH_VMPARAM_H
+
+#include "cpu.h"
+
+/* Compare with riscv/include/vmparam.h */
+#define TARGET_MAXTSIZ  (1 * GiB)   /* max text size */
+#define TARGET_DFLDSIZ  (128 * MiB) /* initial data size limit */
+#define TARGET_MAXDSIZ  (1 * GiB)   /* max data size */
+#define TARGET_DFLSSIZ  (128 * MiB) /* initial stack size limit */
+#define TARGET_MAXSSIZ  (1 * GiB)   /* max stack size */
+#define TARGET_SGROWSIZ (128 * KiB) /* amount to grow stack */
+
+#define TARGET_VM_MINUSER_ADDRESS   (0xUL)
+#define TARGET_VM_MAXUSER_ADDRESS   (0x0040UL)
+
+#define TARGET_USRSTACK (TARGET_VM_MAXUSER_ADDRESS - TARGET_PAGE_SIZE)
+
+static inline abi_ulong get_sp_from_cpustate(CPURISCVState *state)
+{
+return state->gpr[xSP];
+}
+
+static inline void set_second_rval(CPURISCVState *state, abi_ulong retval2)
+{
+state->gpr[xA1] = retval2;
+}
+
+static inline abi_ulong get_second_rval(CPURISCVState *state)
+{
+return state->gpr[xA1];
+}
+
+#endif /* TARGET_ARCH_VMPARAM_H */
-- 
2.34.1

[PATCH v4 16/17] bsd-user: Implement set_mcontext and get_ucontext_sigreturn for RISCV

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added implementations for 'set_mcontext' and 'get_ucontext_sigreturn'
functions for RISC-V architecture,
Both functions ensure that the CPU state and user context are properly
managed.

Signed-off-by: Mark Corbin 
Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
Co-authored-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/signal.c | 54 +
 1 file changed, 54 insertions(+)

diff --git a/bsd-user/riscv/signal.c b/bsd-user/riscv/signal.c
index 072ad821d2..10c940cd49 100644
--- a/bsd-user/riscv/signal.c
+++ b/bsd-user/riscv/signal.c
@@ -114,3 +114,57 @@ abi_long get_mcontext(CPURISCVState *regs, 
target_mcontext_t *mcp,
 
 return 0;
 }
+
+/* Compare with set_mcontext() in riscv/riscv/exec_machdep.c */
+abi_long set_mcontext(CPURISCVState *regs, target_mcontext_t *mcp,
+int srflag)
+{
+
+regs->gpr[5] = tswap64(mcp->mc_gpregs.gp_t[0]);
+regs->gpr[6] = tswap64(mcp->mc_gpregs.gp_t[1]);
+regs->gpr[7] = tswap64(mcp->mc_gpregs.gp_t[2]);
+regs->gpr[28] = tswap64(mcp->mc_gpregs.gp_t[3]);
+regs->gpr[29] = tswap64(mcp->mc_gpregs.gp_t[4]);
+regs->gpr[30] = tswap64(mcp->mc_gpregs.gp_t[5]);
+regs->gpr[31] = tswap64(mcp->mc_gpregs.gp_t[6]);
+
+regs->gpr[8] = tswap64(mcp->mc_gpregs.gp_s[0]);
+regs->gpr[9] = tswap64(mcp->mc_gpregs.gp_s[1]);
+regs->gpr[18] = tswap64(mcp->mc_gpregs.gp_s[2]);
+regs->gpr[19] = tswap64(mcp->mc_gpregs.gp_s[3]);
+regs->gpr[20] = tswap64(mcp->mc_gpregs.gp_s[4]);
+regs->gpr[21] = tswap64(mcp->mc_gpregs.gp_s[5]);
+regs->gpr[22] = tswap64(mcp->mc_gpregs.gp_s[6]);
+regs->gpr[23] = tswap64(mcp->mc_gpregs.gp_s[7]);
+regs->gpr[24] = tswap64(mcp->mc_gpregs.gp_s[8]);
+regs->gpr[25] = tswap64(mcp->mc_gpregs.gp_s[9]);
+regs->gpr[26] = tswap64(mcp->mc_gpregs.gp_s[10]);
+regs->gpr[27] = tswap64(mcp->mc_gpregs.gp_s[11]);
+
+regs->gpr[10] = tswap64(mcp->mc_gpregs.gp_a[0]);
+regs->gpr[11] = tswap64(mcp->mc_gpregs.gp_a[1]);
+regs->gpr[12] = tswap64(mcp->mc_gpregs.gp_a[2]);
+regs->gpr[13] = tswap64(mcp->mc_gpregs.gp_a[3]);
+regs->gpr[14] = tswap64(mcp->mc_gpregs.gp_a[4]);
+regs->gpr[15] = tswap64(mcp->mc_gpregs.gp_a[5]);
+regs->gpr[16] = tswap64(mcp->mc_gpregs.gp_a[6]);
+regs->gpr[17] = tswap64(mcp->mc_gpregs.gp_a[7]);
+
+
+regs->gpr[1] = tswap64(mcp->mc_gpregs.gp_ra);
+regs->gpr[2] = tswap64(mcp->mc_gpregs.gp_sp);
+regs->gpr[3] = tswap64(mcp->mc_gpregs.gp_gp);
+regs->gpr[4] = tswap64(mcp->mc_gpregs.gp_tp);
+regs->pc = tswap64(mcp->mc_gpregs.gp_sepc);
+
+return 0;
+}
+
+/* Compare with sys_sigreturn() in riscv/riscv/machdep.c */
+abi_long get_ucontext_sigreturn(CPURISCVState *regs,
+abi_ulong target_sf, abi_ulong *target_uc)
+{
+
+*target_uc = target_sf;
+return 0;
+}
-- 
2.34.1

[PATCH v4 17/17] bsd-user: Add RISC-V 64-bit Target Configuration and Debug XML Files

2024-08-28 Thread Ajeet Singh

From: Warner Losh 

Added configuration for RISC-V 64-bit target to the build system.

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 configs/targets/riscv64-bsd-user.mak | 4 
 1 file changed, 4 insertions(+)
 create mode 100644 configs/targets/riscv64-bsd-user.mak

diff --git a/configs/targets/riscv64-bsd-user.mak 
b/configs/targets/riscv64-bsd-user.mak
new file mode 100644
index 00..191c2c483f
--- /dev/null
+++ b/configs/targets/riscv64-bsd-user.mak
@@ -0,0 +1,4 @@
+TARGET_ARCH=riscv64
+TARGET_BASE_ARCH=riscv
+TARGET_ABI_DIR=riscv
+TARGET_XML_FILES= gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
-- 
2.34.1

[PATCH v4 01/17] bsd-user: Implement RISC-V CPU initialization and main loop

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added the initial implementation for RISC-V CPU initialization and main
loop. This includes setting up the general-purpose registers and
program counter based on the provided target architecture definitions.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_cpu.h | 39 
 1 file changed, 39 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_cpu.h

diff --git a/bsd-user/riscv/target_arch_cpu.h b/bsd-user/riscv/target_arch_cpu.h
new file mode 100644
index 00..e17c910ae9
--- /dev/null
+++ b/bsd-user/riscv/target_arch_cpu.h
@@ -0,0 +1,39 @@
+/*
+ *  RISC-V CPU init and loop
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_CPU_H
+#define TARGET_ARCH_CPU_H
+
+#include "target_arch.h"
+
+#define TARGET_DEFAULT_CPU_MODEL "max"
+
+static inline void target_cpu_init(CPURISCVState *env,
+struct target_pt_regs *regs)
+{
+int i;
+
+for (i = 1; i < 32; i++) {
+env->gpr[i] = regs->regs[i];
+}
+
+env->pc = regs->sepc;
+}
+
+#endif /* TARGET_ARCH_CPU_H */
-- 
2.34.1

[PATCH v4 03/17] bsd-user: Implement RISC-V CPU register cloning and reset functions

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added functions for cloning CPU registers and resetting the CPU state
for RISC-V architecture.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_cpu.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/bsd-user/riscv/target_arch_cpu.h b/bsd-user/riscv/target_arch_cpu.h
index ba589909e2..b1575aab20 100644
--- a/bsd-user/riscv/target_arch_cpu.h
+++ b/bsd-user/riscv/target_arch_cpu.h
@@ -130,4 +130,18 @@ static inline void target_cpu_loop(CPURISCVState *env)
 }
 }
 
+static inline void target_cpu_clone_regs(CPURISCVState *env, target_ulong 
newsp)
+{
+if (newsp) {
+env->gpr[xSP] = newsp;
+}
+
+env->gpr[xA0] = 0; 
+env->gpr[xT0] = 0; 
+}
+
+static inline void target_cpu_reset(CPUArchState *env)
+{
+}
+
 #endif /* TARGET_ARCH_CPU_H */
-- 
2.34.1

[PATCH v4 11/17] bsd-user: Define RISC-V system call structures and constants

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Introduced definitions for the RISC-V system call interface, including
the 'target_pt_regs' structure that outlines the register storage
layout during a system call.
Added constants for hardware machine identifiers.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_syscall.h | 38 +
 1 file changed, 38 insertions(+)
 create mode 100644 bsd-user/riscv/target_syscall.h

diff --git a/bsd-user/riscv/target_syscall.h b/bsd-user/riscv/target_syscall.h
new file mode 100644
index 00..e7e5231309
--- /dev/null
+++ b/bsd-user/riscv/target_syscall.h
@@ -0,0 +1,38 @@
+/*
+ *  RISC-V system call definitions
+ *
+ *  Copyright (c) Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef BSD_USER_RISCV_TARGET_SYSCALL_H
+#define BSD_USER_RISCV_TARGET_SYSCALL_H
+
+/*
+ * struct target_pt_regs defines the way the registers are stored on the stack
+ * during a system call.
+ */
+
+struct target_pt_regs {
+abi_ulong regs[32];
+abi_ulong sepc;
+};
+
+#define UNAME_MACHINE "riscv64"
+
+#define TARGET_HW_MACHINE   "riscv"
+#define TARGET_HW_MACHINE_ARCH  UNAME_MACHINE
+
+#endif /* BSD_USER_RISCV_TARGET_SYSCALL_H */
-- 
2.34.1

[PATCH v4 13/17] bsd-user: Define RISC-V signal handling structures and constants

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added definitions for RISC-V signal handling, including structures
and constants for managing signal frames and context

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_signal.h | 75 +
 1 file changed, 75 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_signal.h

diff --git a/bsd-user/riscv/target_arch_signal.h 
b/bsd-user/riscv/target_arch_signal.h
new file mode 100644
index 00..1a634b865b
--- /dev/null
+++ b/bsd-user/riscv/target_arch_signal.h
@@ -0,0 +1,75 @@
+/*
+ *  RISC-V signal definitions
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SIGNAL_H
+#define TARGET_ARCH_SIGNAL_H
+
+#include "cpu.h"
+
+
+#define TARGET_INSN_SIZE 4  /* riscv instruction size */
+
+/* Size of the signal trampoline code placed on the stack. */
+#define TARGET_SZSIGCODE((abi_ulong)(7 * TARGET_INSN_SIZE))
+
+/* Compare with riscv/include/_limits.h */
+#define TARGET_MINSIGSTKSZ  (1024 * 4)
+#define TARGET_SIGSTKSZ (TARGET_MINSIGSTKSZ + 32768)
+
+struct target_gpregs {
+uint64_tgp_ra;
+uint64_tgp_sp;
+uint64_tgp_gp;
+uint64_tgp_tp;
+uint64_tgp_t[7];
+uint64_tgp_s[12];
+uint64_tgp_a[8];
+uint64_tgp_sepc;
+uint64_tgp_sstatus;
+};
+
+struct target_fpregs {
+uint64_tfp_x[32][2];
+uint64_tfp_fcsr;
+uint32_tfp_flags;
+uint32_tpad;
+};
+
+typedef struct target_mcontext {
+struct target_gpregs   mc_gpregs;
+struct target_fpregs   mc_fpregs;
+uint32_t   mc_flags;
+#define TARGET_MC_FP_VALID 0x01
+uint32_t   mc_pad;
+uint64_t   mc_spare[8];
+} target_mcontext_t;
+
+#define TARGET_MCONTEXT_SIZE 864
+#define TARGET_UCONTEXT_SIZE 936
+
+#include "target_os_ucontext.h"
+
+struct target_sigframe {
+target_ucontext_t   sf_uc; /* = *sf_uncontext */
+target_siginfo_tsf_si; /* = *sf_siginfo (SA_SIGINFO case)*/
+};
+
+#define TARGET_SIGSTACK_ALIGN 16
+
+#endif /* TARGET_ARCH_SIGNAL_H */
-- 
2.34.1

[PATCH v4 12/17] bsd-user: Add generic RISC-V64 target definitions

2024-08-28 Thread Ajeet Singh

From: Warner Losh 

Added a generic definition for RISC-V64 target-specific details.
Implemented the 'regpairs_aligned' function,which returns 'false'
to indicate that register pairs are not aligned in the RISC-V64 ABI.

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target.h | 20 
 1 file changed, 20 insertions(+)
 create mode 100644 bsd-user/riscv/target.h

diff --git a/bsd-user/riscv/target.h b/bsd-user/riscv/target.h
new file mode 100644
index 00..036ddd185e
--- /dev/null
+++ b/bsd-user/riscv/target.h
@@ -0,0 +1,20 @@
+/*
+ * Riscv64 general target stuff that's common to all aarch details
+ *
+ * Copyright (c) 2022 M. Warner Losh 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef TARGET_H
+#define TARGET_H
+
+/*
+ * riscv64 ABI does not 'lump' the registers for 64-bit args.
+ */
+static inline bool regpairs_aligned(void *cpu_env)
+{
+return false;
+}
+
+#endif /* TARGET_H */
-- 
2.34.1

[PATCH v4 14/17] bsd-user: Implement RISC-V signal trampoline setup functions

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added functions for setting up the RISC-V signal trampoline and signal
frame:

'set_sigtramp_args()': Configures the RISC-V CPU state with arguments
for the signal handler. It sets up the registers with the signal
number,pointers to the signal info and user context, the signal handler
address, and the signal frame pointer.

'setup_sigframe_arch()': Initializes the signal frame with the current
machine context.This function copies the context from the CPU state to
the signal frame, preparing it for the signal handler.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Signed-off-by: Warner Losh 
Co-authored-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/signal.c | 63 +
 1 file changed, 63 insertions(+)
 create mode 100644 bsd-user/riscv/signal.c

diff --git a/bsd-user/riscv/signal.c b/bsd-user/riscv/signal.c
new file mode 100644
index 00..2597fec2fd
--- /dev/null
+++ b/bsd-user/riscv/signal.c
@@ -0,0 +1,63 @@
+/*
+ *  RISC-V signal definitions
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include "qemu/osdep.h"
+
+#include "qemu.h"
+
+/*
+ * Compare with sendsig() in riscv/riscv/exec_machdep.c
+ * Assumes that target stack frame memory is locked.
+ */
+abi_long
+set_sigtramp_args(CPURISCVState *regs, int sig, struct target_sigframe *frame,
+abi_ulong frame_addr, struct target_sigaction *ka)
+{
+/*
+ * Arguments to signal handler:
+ *  a0 (10) = signal number
+ *  a1 (11) = siginfo pointer
+ *  a2 (12) = ucontext pointer
+ *  pc  = signal pointer handler
+ *  sp (2)  = sigframe pointer
+ *  ra (1)  = sigtramp at base of user stack
+ */
+
+ regs->gpr[xA0] = sig;
+ regs->gpr[xA1] = frame_addr +
+ offsetof(struct target_sigframe, sf_si);
+ regs->gpr[xA2] = frame_addr +
+ offsetof(struct target_sigframe, sf_uc);
+ regs->pc = ka->_sa_handler;
+ regs->gpr[xSP] = frame_addr;
+ regs->gpr[xRA] = TARGET_PS_STRINGS - TARGET_SZSIGCODE;
+ return 0;
+}
+
+/*
+ * Compare to riscv/riscv/exec_machdep.c sendsig()
+ * Assumes that the memory is locked if frame points to user memory.
+ */
+abi_long setup_sigframe_arch(CPURISCVState *env, abi_ulong frame_addr,
+ struct target_sigframe *frame, int flags)
+{
+target_mcontext_t *mcp = &frame->sf_uc.uc_mcontext;
+
+get_mcontext(env, mcp, flags);
+return 0;
+}
-- 
2.34.1

[PATCH v4 04/17] bsd-user: Implement RISC-V TLS register setup

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Included the prototype for the 'target_cpu_set_tls' function in the
'target_arch.h' header file. This function is responsible for setting
the Thread Local Storage (TLS) register for RISC-V architecture.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch.h | 27 +++
 bsd-user/riscv/target_arch_cpu.c | 29 +
 2 files changed, 56 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch.h
 create mode 100644 bsd-user/riscv/target_arch_cpu.c

diff --git a/bsd-user/riscv/target_arch.h b/bsd-user/riscv/target_arch.h
new file mode 100644
index 00..26ce07f343
--- /dev/null
+++ b/bsd-user/riscv/target_arch.h
@@ -0,0 +1,27 @@
+/*
+ * RISC-V specific prototypes
+ *
+ * Copyright (c) 2019 Mark Corbin 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_H
+#define TARGET_ARCH_H
+
+#include "qemu.h"
+
+void target_cpu_set_tls(CPURISCVState *env, target_ulong newtls);
+
+#endif /* TARGET_ARCH_H */
diff --git a/bsd-user/riscv/target_arch_cpu.c b/bsd-user/riscv/target_arch_cpu.c
new file mode 100644
index 00..44e25d2ddf
--- /dev/null
+++ b/bsd-user/riscv/target_arch_cpu.c
@@ -0,0 +1,29 @@
+/*
+ *  RISC-V CPU related code
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#include "qemu/osdep.h"
+
+#include "target_arch.h"
+
+#define TP_OFFSET   16
+
+/* Compare with cpu_set_user_tls() in riscv/riscv/vm_machdep.c */
+void target_cpu_set_tls(CPURISCVState *env, target_ulong newtls)
+{
+env->gpr[xTP] = newtls + TP_OFFSET;
+}
-- 
2.34.1

[PATCH v4 08/17] bsd-user: Implement RISC-V sysarch system call emulation

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added the 'do_freebsd_arch_sysarch' function to emulate the 'sysarch'
system call for the RISC-V architecture.
Currently, this function returns '-TARGET_EOPNOTSUPP' to indicate that
the operation is not supported.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_sysarch.h | 41 
 1 file changed, 41 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_sysarch.h

diff --git a/bsd-user/riscv/target_arch_sysarch.h 
b/bsd-user/riscv/target_arch_sysarch.h
new file mode 100644
index 00..9af42331b4
--- /dev/null
+++ b/bsd-user/riscv/target_arch_sysarch.h
@@ -0,0 +1,41 @@
+/*
+ *  RISC-V sysarch() system call emulation
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SYSARCH_H
+#define TARGET_ARCH_SYSARCH_H
+
+#include "target_syscall.h"
+#include "target_arch.h"
+
+static inline abi_long do_freebsd_arch_sysarch(CPURISCVState *env, int op,
+abi_ulong parms)
+{
+
+return -TARGET_EOPNOTSUPP;
+}
+
+static inline void do_freebsd_arch_print_sysarch(
+const struct syscallname *name, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6)
+{
+
+gemu_log("UNKNOWN OP: %d, " TARGET_ABI_FMT_lx ")", (int)arg1, arg2);
+}
+
+#endif /* TARGET_ARCH_SYSARCH_H */
-- 
2.34.1

[PATCH v4 15/17] bsd-user: Implement 'get_mcontext' for RISC-V

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added the 'get_mcontext' function to extract and populate
the RISC-V machine context from the CPU state.
This function is used to gather the current state of the
general-purpose registers and store it in a 'target_mcontext_'
structure.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Signed-off-by: Warner Losh 
Co-authored-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/signal.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/bsd-user/riscv/signal.c b/bsd-user/riscv/signal.c
index 2597fec2fd..072ad821d2 100644
--- a/bsd-user/riscv/signal.c
+++ b/bsd-user/riscv/signal.c
@@ -61,3 +61,56 @@ abi_long setup_sigframe_arch(CPURISCVState *env, abi_ulong 
frame_addr,
 get_mcontext(env, mcp, flags);
 return 0;
 }
+
+/*
+ * Compare with get_mcontext() in riscv/riscv/machdep.c
+ * Assumes that the memory is locked if mcp points to user memory.
+ */
+abi_long get_mcontext(CPURISCVState *regs, target_mcontext_t *mcp,
+int flags)
+{
+
+mcp->mc_gpregs.gp_t[0] = tswap64(regs->gpr[5]);
+mcp->mc_gpregs.gp_t[1] = tswap64(regs->gpr[6]);
+mcp->mc_gpregs.gp_t[2] = tswap64(regs->gpr[7]);
+mcp->mc_gpregs.gp_t[3] = tswap64(regs->gpr[28]);
+mcp->mc_gpregs.gp_t[4] = tswap64(regs->gpr[29]);
+mcp->mc_gpregs.gp_t[5] = tswap64(regs->gpr[30]);
+mcp->mc_gpregs.gp_t[6] = tswap64(regs->gpr[31]);
+
+mcp->mc_gpregs.gp_s[0] = tswap64(regs->gpr[8]);
+mcp->mc_gpregs.gp_s[1] = tswap64(regs->gpr[9]);
+mcp->mc_gpregs.gp_s[2] = tswap64(regs->gpr[18]);
+mcp->mc_gpregs.gp_s[3] = tswap64(regs->gpr[19]);
+mcp->mc_gpregs.gp_s[4] = tswap64(regs->gpr[20]);
+mcp->mc_gpregs.gp_s[5] = tswap64(regs->gpr[21]);
+mcp->mc_gpregs.gp_s[6] = tswap64(regs->gpr[22]);
+mcp->mc_gpregs.gp_s[7] = tswap64(regs->gpr[23]);
+mcp->mc_gpregs.gp_s[8] = tswap64(regs->gpr[24]);
+mcp->mc_gpregs.gp_s[9] = tswap64(regs->gpr[25]);
+mcp->mc_gpregs.gp_s[10] = tswap64(regs->gpr[26]);
+mcp->mc_gpregs.gp_s[11] = tswap64(regs->gpr[27]);
+
+mcp->mc_gpregs.gp_a[0] = tswap64(regs->gpr[10]);
+mcp->mc_gpregs.gp_a[1] = tswap64(regs->gpr[11]);
+mcp->mc_gpregs.gp_a[2] = tswap64(regs->gpr[12]);
+mcp->mc_gpregs.gp_a[3] = tswap64(regs->gpr[13]);
+mcp->mc_gpregs.gp_a[4] = tswap64(regs->gpr[14]);
+mcp->mc_gpregs.gp_a[5] = tswap64(regs->gpr[15]);
+mcp->mc_gpregs.gp_a[6] = tswap64(regs->gpr[16]);
+mcp->mc_gpregs.gp_a[7] = tswap64(regs->gpr[17]);
+
+if (flags & TARGET_MC_GET_CLEAR_RET) {
+mcp->mc_gpregs.gp_a[0] = 0; /* a0 */
+mcp->mc_gpregs.gp_a[1] = 0; /* a1 */
+mcp->mc_gpregs.gp_t[0] = 0; /* clear syscall error */
+}
+
+mcp->mc_gpregs.gp_ra = tswap64(regs->gpr[1]);
+mcp->mc_gpregs.gp_sp = tswap64(regs->gpr[2]);
+mcp->mc_gpregs.gp_gp = tswap64(regs->gpr[3]);
+mcp->mc_gpregs.gp_tp = tswap64(regs->gpr[4]);
+mcp->mc_gpregs.gp_sepc = tswap64(regs->pc);
+
+return 0;
+}
-- 
2.34.1

[PATCH v4 05/17] bsd-user: Add RISC-V ELF definitions and hardware capability detection

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Introduced RISC-V specific ELF definitions and hardware capability
detection.
Additionally, a function to retrieve hardware capabilities
('get_elf_hwcap') is implemented, which returns the common bits set in
each CPU's ISA strings.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Kyle Evans 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_elf.h | 42 
 1 file changed, 42 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_elf.h

diff --git a/bsd-user/riscv/target_arch_elf.h b/bsd-user/riscv/target_arch_elf.h
new file mode 100644
index 00..4eb915e61e
--- /dev/null
+++ b/bsd-user/riscv/target_arch_elf.h
@@ -0,0 +1,42 @@
+/*
+ *  RISC-V ELF definitions
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_ELF_H
+#define TARGET_ARCH_ELF_H
+
+#define elf_check_arch(x) ((x) == EM_RISCV)
+#define ELF_START_MMAP 0x8000
+#define ELF_ET_DYN_LOAD_ADDR0x10
+#define ELF_CLASS   ELFCLASS64
+
+#define ELF_DATAELFDATA2LSB
+#define ELF_ARCHEM_RISCV
+
+#define ELF_HWCAP get_elf_hwcap()
+static uint32_t get_elf_hwcap(void)
+{
+RISCVCPU *cpu = RISCV_CPU(thread_cpu);
+
+return cpu->env.misa_ext_mask;
+}
+
+#define USE_ELF_CORE_DUMP
+#define ELF_EXEC_PAGESIZE4096
+
+#endif /* TARGET_ARCH_ELF_H */
-- 
2.34.1

[PATCH v4 02/17] bsd-user: Add RISC-V CPU execution loop and syscall handling

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Implemented the RISC-V CPU execution loop, including handling various
exceptions and system calls. The loop continuously executes CPU
instructions,processes exceptions, and handles system calls by invoking
FreeBSD syscall handlers.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
Co-authored-by: Kyle Evans 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_cpu.h | 94 
 1 file changed, 94 insertions(+)

diff --git a/bsd-user/riscv/target_arch_cpu.h b/bsd-user/riscv/target_arch_cpu.h
index e17c910ae9..ba589909e2 100644
--- a/bsd-user/riscv/target_arch_cpu.h
+++ b/bsd-user/riscv/target_arch_cpu.h
@@ -36,4 +36,98 @@ static inline void target_cpu_init(CPURISCVState *env,
 env->pc = regs->sepc;
 }
 
+static inline void target_cpu_loop(CPURISCVState *env)
+{
+CPUState *cs = env_cpu(env);
+int trapnr;
+abi_long ret;
+unsigned int syscall_num;
+int32_t signo, code;
+
+for (;;) {
+cpu_exec_start(cs);
+trapnr = cpu_exec(cs);
+cpu_exec_end(cs);
+process_queued_cpu_work(cs);
+
+signo = 0;
+
+switch (trapnr) {
+case EXCP_INTERRUPT:
+/* just indicate that signals should be handled asap */
+break;
+case EXCP_ATOMIC:
+cpu_exec_step_atomic(cs);
+break;
+case RISCV_EXCP_U_ECALL:
+syscall_num = env->gpr[xT0]; 
+env->pc += TARGET_INSN_SIZE;
+/* Compare to cpu_fetch_syscall_args() in riscv/riscv/trap.c */
+if (TARGET_FREEBSD_NR___syscall == syscall_num ||
+TARGET_FREEBSD_NR_syscall == syscall_num) {
+ret = do_freebsd_syscall(env,
+ env->gpr[xA0], 
+ env->gpr[xA1], 
+ env->gpr[xA2], 
+ env->gpr[xA3], 
+ env->gpr[xA4], 
+ env->gpr[xA5], 
+ env->gpr[xA6], 
+ env->gpr[xA7], 
+ 0);
+} else {
+ret = do_freebsd_syscall(env,
+ syscall_num,
+ env->gpr[xA0], 
+ env->gpr[xA1], 
+ env->gpr[xA2], 
+ env->gpr[xA3], 
+ env->gpr[xA4], 
+ env->gpr[xA5], 
+ env->gpr[xA6], 
+ env->gpr[xA7]  
+);
+}
+
+/*
+ * Compare to cpu_set_syscall_retval() in
+ * riscv/riscv/vm_machdep.c
+ */
+if (ret >= 0) {
+env->gpr[xA0] = ret; 
+env->gpr[xT0] = 0;   
+} else if (ret == -TARGET_ERESTART) {
+env->pc -= TARGET_INSN_SIZE;
+} else if (ret != -TARGET_EJUSTRETURN) {
+env->gpr[xA0] = -ret; 
+env->gpr[xT0] = 1;   
+}
+break;
+case RISCV_EXCP_ILLEGAL_INST:
+signo = TARGET_SIGILL;
+code = TARGET_ILL_ILLOPC;
+break;
+case RISCV_EXCP_BREAKPOINT:
+signo = TARGET_SIGTRAP;
+code = TARGET_TRAP_BRKPT;
+break;
+case EXCP_DEBUG:
+signo = TARGET_SIGTRAP;
+code = TARGET_TRAP_BRKPT;
+break;
+default:
+fprintf(stderr, "qemu: unhandled CPU exception "
+"0x%x - aborting\n", trapnr);
+cpu_dump_state(cs, stderr, 0);
+abort();
+}
+
+if (signo) {
+force_sig_fault(signo, code, env->pc);
+}
+
+process_pending_signals(env);
+}
+}
+
 #endif /* TARGET_ARCH_CPU_H */
-- 
2.34.1

[PATCH v4 07/17] bsd-user: Add RISC-V signal trampoline setup function

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Implemented the 'setup_sigtramp' function for setting up the signal
trampoline code in the RISC-V architecture.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_sigtramp.h | 42 +++
 1 file changed, 42 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_sigtramp.h

diff --git a/bsd-user/riscv/target_arch_sigtramp.h 
b/bsd-user/riscv/target_arch_sigtramp.h
new file mode 100644
index 00..fce673e65a
--- /dev/null
+++ b/bsd-user/riscv/target_arch_sigtramp.h
@@ -0,0 +1,42 @@
+/*
+ * RISC-V sigcode
+ *
+ * Copyright (c) 2019 Mark Corbin
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SIGTRAMP_H
+#define TARGET_ARCH_SIGTRAMP_H
+
+/* Compare with sigcode() in riscv/riscv/locore.S */
+static inline abi_long setup_sigtramp(abi_ulong offset, unsigned sigf_uc,
+unsigned sys_sigreturn)
+{
+int i;
+uint32_t sys_exit = TARGET_FREEBSD_NR_exit;
+
+static const uint32_t sigtramp_code[] = {
+/* 1 */ const_le32(0x00010513), /* mv a0, sp */
+/* 2 */ const_le32(0x00050513 + (sigf_uc << 20)),   /* addi a0, a0, 
sigf_uc */
+/* 3 */ const_le32(0x0293 + (sys_sigreturn << 20)), /* li t0, 
sys_sigreturn */
+/* 4 */ const_le32(0x0073), /* ecall */
+/* 5 */ const_le32(0x0293 + (sys_exit << 20)),  /* li t0, sys_exit 
*/
+/* 6 */ const_le32(0x0073), /* ecall */
+/* 7 */ const_le32(0xFF1FF06F)  /* b -16 */
+};
+
+return memcpy_to_target(offset, sigtramp_code, TARGET_SZSIGCODE);
+}
+#endif /* TARGET_ARCH_SIGTRAMP_H */
-- 
2.34.1

[PATCH v4 06/17] bsd-user: Define RISC-V register structures and register copying

2024-08-28 Thread Ajeet Singh

From: Mark Corbin 

Added definitions for RISC-V register structures, including
general-purpose registers and floating-point registers, in
'target_arch_reg.h'. Implemented the 'target_copy_regs' function to
copy register values from the CPU state to the target register
structure, ensuring proper endianness handling using 'tswapreg'.

Signed-off-by: Mark Corbin 
Signed-off-by: Ajeet Singh 
Reviewed-by: Richard Henderson 
---
 bsd-user/riscv/target_arch_reg.h | 88 
 1 file changed, 88 insertions(+)
 create mode 100644 bsd-user/riscv/target_arch_reg.h

diff --git a/bsd-user/riscv/target_arch_reg.h b/bsd-user/riscv/target_arch_reg.h
new file mode 100644
index 00..12b1c96b61
--- /dev/null
+++ b/bsd-user/riscv/target_arch_reg.h
@@ -0,0 +1,88 @@
+/*
+ *  RISC-V register structures
+ *
+ *  Copyright (c) 2019 Mark Corbin
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_REG_H
+#define TARGET_ARCH_REG_H
+
+/* Compare with riscv/include/reg.h */
+typedef struct target_reg {
+uint64_t ra;/* return address */
+uint64_t sp;/* stack pointer */
+uint64_t gp;/* global pointer */
+uint64_t tp;/* thread pointer */
+uint64_t t[7];  /* temporaries */
+uint64_t s[12]; /* saved registers */
+uint64_t a[8];  /* function arguments */
+uint64_t sepc;  /* exception program counter */
+uint64_t sstatus;   /* status register */
+} target_reg_t;
+
+typedef struct target_fpreg {
+uint64_tfp_x[32][2];/* Floating point registers */
+uint64_tfp_fcsr;/* Floating point control reg */
+} target_fpreg_t;
+
+#define tswapreg(ptr)   tswapal(ptr)
+
+/* Compare with struct trapframe in riscv/include/frame.h */
+static inline void target_copy_regs(target_reg_t *regs,
+const CPURISCVState *env)
+{
+
+regs->ra = tswapreg(env->gpr[1]);
+regs->sp = tswapreg(env->gpr[2]);
+regs->gp = tswapreg(env->gpr[3]);
+regs->tp = tswapreg(env->gpr[4]);
+
+regs->t[0] = tswapreg(env->gpr[5]);
+regs->t[1] = tswapreg(env->gpr[6]);
+regs->t[2] = tswapreg(env->gpr[7]);
+regs->t[3] = tswapreg(env->gpr[28]);
+regs->t[4] = tswapreg(env->gpr[29]);
+regs->t[5] = tswapreg(env->gpr[30]);
+regs->t[6] = tswapreg(env->gpr[31]);
+
+regs->s[0] = tswapreg(env->gpr[8]);
+regs->s[1] = tswapreg(env->gpr[9]);
+regs->s[2] = tswapreg(env->gpr[18]);
+regs->s[3] = tswapreg(env->gpr[19]);
+regs->s[4] = tswapreg(env->gpr[20]);
+regs->s[5] = tswapreg(env->gpr[21]);
+regs->s[6] = tswapreg(env->gpr[22]);
+regs->s[7] = tswapreg(env->gpr[23]);
+regs->s[8] = tswapreg(env->gpr[24]);
+regs->s[9] = tswapreg(env->gpr[25]);
+regs->s[10] = tswapreg(env->gpr[26]);
+regs->s[11] = tswapreg(env->gpr[27]);
+
+regs->a[0] = tswapreg(env->gpr[10]);
+regs->a[1] = tswapreg(env->gpr[11]);
+regs->a[2] = tswapreg(env->gpr[12]);
+regs->a[3] = tswapreg(env->gpr[13]);
+regs->a[4] = tswapreg(env->gpr[14]);
+regs->a[5] = tswapreg(env->gpr[15]);
+regs->a[6] = tswapreg(env->gpr[16]);
+regs->a[7] = tswapreg(env->gpr[17]);
+
+regs->sepc = tswapreg(env->pc);
+}
+
+#undef tswapreg
+
+#endif /* TARGET_ARCH_REG_H */
-- 
2.34.1

[PATCH v2 0/2] Postcopy migration and vhost-user errors

2024-08-28 Thread Prasad Pandit

From: Prasad Pandit 

Hello,

* virsh(1) offers multiple options to initiate Postcopy migration:

1) virsh migrate --postcopy --postcopy-after-precopy
2) virsh migrate --postcopy + virsh migrate-postcopy
3) virsh migrate --postcopy --timeout  --timeout-postcopy

When Postcopy migration is invoked via method (2) or (3) above,
the migrated guest on the destination host hangs sometimes.

* During Postcopy migration, multiple threads are spawned on the destination
host to start the guest and setup devices. One such thread starts vhost
device via vhost_dev_start() function and another called fault_thread handles
page faults in user space using kernel's userfaultfd(2) system.

* When fault_thread exits upon completion of Postcopy migration, it sends a
'postcopy_end' message to the vhost-user device. But sometimes 'postcopy_end'
message is sent while vhost device is being setup via vhost_dev_start().

 Thread-1  Thread-2

vhost_dev_startpostcopy_ram_incoming_cleanup
 vhost_device_iotlb_misspostcopy_notify
  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
   vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
process_message_reply  process_message_reply
 vhost_user_readvhost_user_read
  vhost_user_read_header vhost_user_read_header
   "Fail to update device iotlb"  "Failed to receive reply to 
postcopy_end"

This creates confusion when vhost device receives 'postcopy_end' message while
it is still trying to update IOTLB entries.

This seems to leave the guest in a stranded/hung state because fault_thread
has exited saying Postcopy migration has ended, but vhost-device is probably
still expecting updates. QEMU logs following errors on the destination host
===
...
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_postcopy_end: 700871,700900: Failed to receive reply to 
postcopy_end
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
Flags 0x8 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
Flags 0x16 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
qemu-kvm: vhost_user_read_header: 700871,700871: Failed to read msg header. 
Flags 0x0 instead of 0x5.
qemu-kvm: vhost_device_iotlb_miss: 700871,700871: Fail to update device iotlb
===

* Couple of patches here help to fix/handle these errors.

Thank you.
---
Prasad Pandit (2):
  vhost: fail device start if iotlb update fails
  vhost-user: add a request-reply lock

 hw/virtio/vhost-user.c | 74 ++
 hw/virtio/vhost.c  |  6 ++-
 include/hw/virtio/vhost-user.h |  3 ++
 3 files changed, 82 insertions(+), 1 deletion(-)

--
2.46.0

[PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-28 Thread Prasad Pandit

From: Prasad Pandit 

QEMU threads use vhost_user_write/read calls to send
and receive request/reply messages from a vhost-user
device. When multiple threads communicate with the
same vhost-user device, they can receive each other's
messages, resulting in an erroneous state.

When fault_thread exits upon completion of Postcopy
migration, it sends a 'postcopy_end' message to the
vhost-user device. But sometimes 'postcopy_end' message
is sent while vhost device is being setup via
vhost_dev_start().

 Thread-1   Thread-2

 vhost_dev_startpostcopy_ram_incoming_cleanup
 vhost_device_iotlb_misspostcopy_notify
 vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
 vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
 process_message_reply  process_message_reply
 vhost_user_readvhost_user_read
 vhost_user_read_header vhost_user_read_header
 "Fail to update device iotlb"  "Failed to receive reply to postcopy_end"

This creates confusion when vhost-user device receives
'postcopy_end' message while it is trying to update IOTLB entries.

 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
 vhost_device_iotlb_miss:
  700871,700871: Fail to update device iotlb
 vhost_user_postcopy_end:
  700871,700900: Failed to receive reply to postcopy_end
 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.

Here fault thread seems to end the postcopy migration
while another thread is starting the vhost-user device.

Add a mutex lock to hold for one request-reply cycle
and avoid such race condition.

Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
Suggested-by: Peter Xu 
Signed-off-by: Prasad Pandit 
---
 hw/virtio/vhost-user.c | 74 ++
 include/hw/virtio/vhost-user.h |  3 ++
 2 files changed, 77 insertions(+)

v2:
 - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
   the lock for longer fails some tests during rpmbuild(8).
 - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
   this patch, whereas Fedora SRPM does not build.
 - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
   on RHEL-9, but not on Fedora-40 machine.
 - koji builds successful with this patch
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369

v1: Use QEMU_LOCK_GUARD(), rename lock variable
 - 
https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/

v0:
 - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
 - https://lore.kernel.org/all/20240720153808-mutt-send-email-...@kernel.org/

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561daa06..7b030ae2cd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -24,6 +24,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/uuid.h"
 #include "qemu/sockets.h"
+#include "qemu/lockable.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cryptodev.h"
 #include "migration/postcopy-ram.h"
@@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 .hdr.size = sizeof(msg.payload.log),
 };

+struct vhost_user *u = dev->opaque;
+struct VhostUserState *us = u->user;
+QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
 /* Send only once with first queue pair */
 if (dev->vq_index != 0) {
 return 0;
@@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
bool reply_supported)
 {
 struct vhost_user *u = dev->opaque;
+struct VhostUserState *us = u->user;
 struct vhost_memory_region *shadow_reg;
 int i, fd, shadow_reg_idx, ret;
 ram_addr_t offset;
@@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
 vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0);
 msg->payload.mem_reg.region = region_buffer;

+QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
 ret = vhost_user_write(dev, msg, NULL, 0);
 if (ret < 0) {
 return ret;
@@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
 bool reply_supported, bool track_ramblocks)
 {
 struct vhost_user *u = dev->opaque;
+struct VhostUserState *us = u->user;
 int i, fd, ret, reg_idx, reg_fd_idx;
 struct vhost_memory_region *reg;
 MemoryRegion *mr;
@@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
 vhost_user_fill_msg_region(®ion_buffer, reg, offset);
 msg->payload.mem_reg.region = region_buffer;

+QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
 ret = vhost_user_write(dev, msg, &fd, 1);
 if (ret < 0) {
 return ret;
@@ -893,6 +904,7 @@ stati

[PATCH v2 1/2] vhost: fail device start if iotlb update fails

2024-08-28 Thread Prasad Pandit

From: Prasad Pandit 

While starting a vhost device, updating iotlb entries
via 'vhost_device_iotlb_miss' may return an error.

  qemu-kvm: vhost_device_iotlb_miss:
700871,700871: Fail to update device iotlb

Fail device start when such an error occurs.

Signed-off-by: Prasad Pandit 
---
 hw/virtio/vhost.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

v2:
 - Needs review/ack

v1:
 - 
https://lore.kernel.org/qemu-devel/20240808095147.291626-2-ppan...@redhat.com/

v0:
 - https://lore.kernel.org/all/20240711131424.181615-3-ppan...@redhat.com/

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 06fc71746e..a70b7422b5 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -2151,7 +2151,11 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev, bool vrings)
  * vhost-kernel code requires for this.*/
 for (i = 0; i < hdev->nvqs; ++i) {
 struct vhost_virtqueue *vq = hdev->vqs + i;
-vhost_device_iotlb_miss(hdev, vq->used_phys, true);
+r = vhost_device_iotlb_miss(hdev, vq->used_phys, true);
+if (r) {
+VHOST_OPS_DEBUG(r, "vhost_device_iotlb_miss failed");
+goto fail_start;
+}
 }
 }
 vhost_start_config_intr(hdev);
--
2.46.0

Re: [PATCH 1/1] allow using a higher icount

2024-08-28 Thread Elisha Hollander

Although it gives `undefined symbol: qemu_plugin_scoreboard_free`. But
probably I messed something up...

On Tue, Aug 27, 2024, 14:59 Elisha Hollander 
wrote:

> Oh nice, I didn't know that
>
> On Tue, Aug 27, 2024, 12:39 Alex Bennée  wrote:
>
>> Elisha Hollander  writes:
>>
>> > Signed-off-by: Elisha Hollander 
>>
>> What is the use-case for this patch?
>>
>> If you are simply looking to slow the emulated system down please have a
>> look at:
>>
>>
>> https://qemu.readthedocs.io/en/master/about/emulation.html#limit-instructions-per-second
>>
>> which uses the plugin system to limit the run rate and sleep if its
>> running too fast. The longer term goal is to deprecate the icount clock
>> alignment feature from the core code and leave icount to just provide
>> the deterministic execution needed for record/replay and reverse
>> debugging.
>>
>>
>> > ---
>> >  accel/tcg/cpu-exec.c  | 4 +---
>> >  accel/tcg/icount-common.c | 4 ++--
>> >  2 files changed, 3 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> > index 8163295f34..4c2baf8ed4 100644
>> > --- a/accel/tcg/cpu-exec.c
>> > +++ b/accel/tcg/cpu-exec.c
>> > @@ -95,11 +95,10 @@ static void align_clocks(SyncClocks *sc, CPUState
>> *cpu)
>> >  static void print_delay(const SyncClocks *sc)
>> >  {
>> >  static float threshold_delay;
>> > -static int64_t last_realtime_clock;
>> >  static int nb_prints;
>> >
>> >  if (icount_align_option &&
>> > -sc->realtime_clock - last_realtime_clock >=
>> MAX_DELAY_PRINT_RATE &&
>> > +sc->diff_clk >= MAX_DELAY_PRINT_RATE &&
>> >  nb_prints < MAX_NB_PRINTS) {
>> >  if ((-sc->diff_clk / (float)10LL > threshold_delay) ||
>> >  (-sc->diff_clk / (float)10LL <
>> > @@ -109,7 +108,6 @@ static void print_delay(const SyncClocks *sc)
>> >  threshold_delay - 1,
>> >  threshold_delay);
>> >  nb_prints++;
>> > -last_realtime_clock = sc->realtime_clock;
>> >  }
>> >  }
>> >  }
>> > diff --git a/accel/tcg/icount-common.c b/accel/tcg/icount-common.c
>> > index 8d3d3a7e9d..f07f8baf4d 100644
>> > --- a/accel/tcg/icount-common.c
>> > +++ b/accel/tcg/icount-common.c
>> > @@ -46,8 +46,8 @@
>> >   * is TCG-specific, and does not need to be built for other accels.
>> >   */
>> >  static bool icount_sleep = true;
>> > -/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
>> > -#define MAX_ICOUNT_SHIFT 10
>> > +/* Arbitrarily pick the minimum allowable speed.  */
>> > +#define MAX_ICOUNT_SHIFT 30
>> >
>> >  /* Do not count executed instructions */
>> >  ICountMode use_icount = ICOUNT_DISABLED;
>>
>> --
>> Alex Bennée
>> Virtualisation Tech Lead @ Linaro
>>
>

Re: [PATCH RESEND v9 1/9] Require meson version 1.5.0

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> From: Paolo Bonzini 
>
> This is needed for Rust support.

Just a note that b4 will fail to apply this as lore hasn't archived the
binary patch. However it applies fine manually.

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 1/1] allow using a higher icount

2024-08-28 Thread Alex Bennée

Elisha Hollander  writes:

> Although it gives `undefined symbol: qemu_plugin_scoreboard_free`. But
> probably I messed something up...

Are you using an older QEMU? We should trigger an API warning if they
are mismatched but maybe thats not working.

>
> On Tue, Aug 27, 2024, 14:59 Elisha Hollander  wrote:
>
>  Oh nice, I didn't know that
>
>  On Tue, Aug 27, 2024, 12:39 Alex Bennée  wrote:
>
>  Elisha Hollander  writes:
>
>  > Signed-off-by: Elisha Hollander 
>
>  What is the use-case for this patch?
>
>  If you are simply looking to slow the emulated system down please have a
>  look at:
>
>
> https://qemu.readthedocs.io/en/master/about/emulation.html#limit-instructions-per-second
>
>  which uses the plugin system to limit the run rate and sleep if its
>  running too fast. The longer term goal is to deprecate the icount clock
>  alignment feature from the core code and leave icount to just provide
>  the deterministic execution needed for record/replay and reverse
>  debugging.
>
>  > ---
>  >  accel/tcg/cpu-exec.c  | 4 +---
>  >  accel/tcg/icount-common.c | 4 ++--
>  >  2 files changed, 3 insertions(+), 5 deletions(-)
>  >
>  > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>  > index 8163295f34..4c2baf8ed4 100644
>  > --- a/accel/tcg/cpu-exec.c
>  > +++ b/accel/tcg/cpu-exec.c
>  > @@ -95,11 +95,10 @@ static void align_clocks(SyncClocks *sc, CPUState *cpu)
>  >  static void print_delay(const SyncClocks *sc)
>  >  {
>  >  static float threshold_delay;
>  > -static int64_t last_realtime_clock;
>  >  static int nb_prints;
>  >  
>  >  if (icount_align_option &&
>  > -sc->realtime_clock - last_realtime_clock >= MAX_DELAY_PRINT_RATE 
> &&
>  > +sc->diff_clk >= MAX_DELAY_PRINT_RATE &&
>  >  nb_prints < MAX_NB_PRINTS) {
>  >  if ((-sc->diff_clk / (float)10LL > threshold_delay) ||
>  >  (-sc->diff_clk / (float)10LL <
>  > @@ -109,7 +108,6 @@ static void print_delay(const SyncClocks *sc)
>  >  threshold_delay - 1,
>  >  threshold_delay);
>  >  nb_prints++;
>  > -last_realtime_clock = sc->realtime_clock;
>  >  }
>  >  }
>  >  }
>  > diff --git a/accel/tcg/icount-common.c b/accel/tcg/icount-common.c
>  > index 8d3d3a7e9d..f07f8baf4d 100644
>  > --- a/accel/tcg/icount-common.c
>  > +++ b/accel/tcg/icount-common.c
>  > @@ -46,8 +46,8 @@
>  >   * is TCG-specific, and does not need to be built for other accels.
>  >   */
>  >  static bool icount_sleep = true;
>  > -/* Arbitrarily pick 1MIPS as the minimum allowable speed.  */
>  > -#define MAX_ICOUNT_SHIFT 10
>  > +/* Arbitrarily pick the minimum allowable speed.  */
>  > +#define MAX_ICOUNT_SHIFT 30
>  >  
>  >  /* Do not count executed instructions */
>  >  ICountMode use_icount = ICOUNT_DISABLED;
>
>  -- 
>  Alex Bennée
>  Virtualisation Tech Lead @ Linaro

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2 2/4] reset: Add RESET_TYPE_WAKEUP

2024-08-28 Thread Juraj Marcin

Hi Peter,

On Tue, Aug 20, 2024 at 1:56 PM Peter Maydell  wrote:
>
> On Tue, 20 Aug 2024 at 12:40, David Hildenbrand  wrote:
> >
> > On 14.08.24 14:32, Juraj Marcin wrote:
> > > On Tue, Aug 13, 2024 at 6:37 PM Peter Maydell  
> > > wrote:
> > >>
> > >> On Tue, 13 Aug 2024 at 16:39, Juraj Marcin  wrote:
> > >>>
> > >>> Some devices need to distinguish cold start reset from waking up from a
> > >>> suspended state. This patch adds new value to the enum, and updates the
> > >>> i386 wakeup method to use this new reset type.
> > >>>
> > >>> Signed-off-by: Juraj Marcin 
> > >>> Reviewed-by: David Hildenbrand 
> > >>> ---
> > >>>   docs/devel/reset.rst| 8 
> > >>>   hw/i386/pc.c| 2 +-
> > >>>   include/hw/resettable.h | 2 ++
> > >>>   3 files changed, 11 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
> > >>> index 9746a4e8a0..a7c9467313 100644
> > >>> --- a/docs/devel/reset.rst
> > >>> +++ b/docs/devel/reset.rst
> > >>> @@ -44,6 +44,14 @@ The Resettable interface handles reset types with an 
> > >>> enum ``ResetType``:
> > >>> value on each cold reset, such as RNG seed information, and which 
> > >>> they
> > >>> must not reinitialize on a snapshot-load reset.
> > >>>
> > >>> +``RESET_TYPE_WAKEUP``
> > >>> +  This type is called for a reset when the system is being woken-up 
> > >>> from a
> > >>> +  suspended state using the ``qemu_system_wakeup()`` function. If the 
> > >>> machine
> > >>> +  needs to reset its devices in its ``MachineClass::wakeup()`` method, 
> > >>> this
> > >>> +  reset type should be used, so devices can differentiate system 
> > >>> wake-up from
> > >>> +  other reset types. For example, a virtio-mem device must not unplug 
> > >>> its
> > >>> +  memory during wake-up as that would clear the guest RAM.
> > >>> +
> > >>>   Devices which implement reset methods must treat any unknown 
> > >>> ``ResetType``
> > >>>   as equivalent to ``RESET_TYPE_COLD``; this will reduce the amount of
> > >>>   existing code we need to change if we add more types in future.
> > >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > >>> index ccb9731c91..49efd0a997 100644
> > >>> --- a/hw/i386/pc.c
> > >>> +++ b/hw/i386/pc.c
> > >>> @@ -1716,7 +1716,7 @@ static void pc_machine_reset(MachineState 
> > >>> *machine, ResetType type)
> > >>>   static void pc_machine_wakeup(MachineState *machine)
> > >>>   {
> > >>>   cpu_synchronize_all_states();
> > >>> -pc_machine_reset(machine, RESET_TYPE_COLD);
> > >>> +pc_machine_reset(machine, RESET_TYPE_WAKEUP);
> > >>>   cpu_synchronize_all_post_reset();
> > >>>   }
> > >>
> > >> I'm happy (following discussion in the previous thread)
> > >> that 'wakeup' is the right reset event to be using here.
> > >> But looking at the existing code for qemu_system_wakeup()
> > >> something seems odd here. qemu_system_wakeup() calls
> > >> the MachineClass::wakeup method if it's set, and does
> > >> nothing if it's not. The PC implementation of that calls
> > >> pc_machine_reset(), which does a qemu_devices_reset(),
> > >> which does a complete three-phase reset of the system.
> > >> But if the machine doesn't implement wakeup then we
> > >> never reset the system at all.
> > >>
> > >> Shouldn't qemu_system_wakeup() do a qemu_devices_reset()
> > >> if there's no MachineClass::wakeup, in a similar way to
> > >> how qemu_system_reset() does a qemu_devices_reset()
> > >> if there's no MachineClass::reset method ? Having the
> > >> wakeup event be "sometimes this will do a RESET_TYPE_WAKEUP
> > >> but sometimes it won't" doesn't seem right to me...
> >
> > One thing one could consider would probably be to send a WARM reset to
> > all devices. The main issue here is that other devices will default to a
> > COLD device then, and that's precisely what the other machines that
> > implement suspend+resume do not want. And ...
> >
> > >
> > >  From my understanding that I have gathered from the code (but please,
> > > someone correct me if I am wrong), this is machine specific. Some
> > > machine types might not support suspend+wake-up at all. The support
> > > has to be explicitly advertised through qemu_register_wakeup_support()
> > > (for example, aarch64 with a generic virt machine type does not
> > > advertise support). Even if the machine type advertises
> > > suspend+wake-up support, it might not need to do anything machine
> > > specific. This is the case of pSeries PowerPC machine (sPAPR) that
> > > advertises support, but does not implement MachineClass::wakeup()
> > > method as nothing needs to change in the machine state. [1]
> > >
> > > So, if a restart during wake-up happens, it can be differentiated with
> > > the wake-up reset type, and if the machine type does not need to reset
> > > its devices during wake-up, there is no reset that needs to be
> > > differentiated.
> >
> > ... if the machine does not do any resets during suspend+wakeup, this
> > implies that there is not even a w

Re: [PULL 17/20] target/arm: Do memory type alignment check when translation disabled

2024-08-28 Thread Richard Henderson


On 8/28/24 17:22, Michael Tokarev wrote:

05.03.2024 16:52, Peter Maydell wrote:

From: Richard Henderson 

If translation is disabled, the default memory type is Device, which
requires alignment checking.  This is more optimally done early via
the MemOp given to the TCG memory operation.

Reviewed-by: Philippe Mathieu-Daudé 
Reported-by: Idan Horowitz 
Signed-off-by: Richard Henderson 
Message-id: 20240301204110.656742-6-richard.hender...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204
Signed-off-by: Richard Henderson 
Signed-off-by: Peter Maydell 


Hi!

Apparently this change also breaks picolibc testsuite (between
8.2 and 9.0, bisect points to this commit).

For example:

./qemu-system-arm \
   -m 1G \
   -chardev stdio,mux=on,id=stdio0 \
   -semihosting-config enable=on,chardev=stdio0,arg=program-name \
   -monitor none \
   -serial none \
   -machine none,accel=tcg \
   -cpu cortex-a8 \
   -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/ 
printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \

   -nographic


Almost certainly a duplicate of #2326, fixed in master by 
4c2c0474693229c1f533239bb983495c5427784d.



r~

Re: [PATCH v2 2/2] vhost-user: add a request-reply lock

2024-08-28 Thread Michael S. Tsirkin

On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit 
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().
> 
>  Thread-1   Thread-2
> 
>  vhost_dev_startpostcopy_ram_incoming_cleanup
>  vhost_device_iotlb_misspostcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply  process_message_reply
>  vhost_user_readvhost_user_read
>  vhost_user_read_header vhost_user_read_header
>  "Fail to update device iotlb"  "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu 
> Signed-off-by: Prasad Pandit 
> ---
>  hw/virtio/vhost-user.c | 74 ++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>the lock for longer fails some tests during rpmbuild(8).

what do you mean fails rpmbuild? that qemu with this
patch can not be compiled?

>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - 
> https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppan...@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-...@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev 
> *dev, uint64_t base,
>  .hdr.size = sizeof(msg.payload.log),
>  };
> 
> +struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  /* Send only once with first queue pair */
>  if (dev->vq_index != 0) {
>  return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
> bool reply_supported)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  struct vhost_memory_region *shadow_reg;
>  int i, fd, shadow_reg_idx, ret;
>  ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>  vhost_user_fill_msg_region(®ion_buffer, shadow_reg, 0);
>  msg->payload.mem_reg.region = region_buffer;
> 
> +QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>  ret = vhost_user_write(dev, msg, NULL, 0);
>  if (ret < 0) {
>  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>  bool reply_supported, bool track_ramblocks)
>  {
>  struct vhost_user *u = dev->opaque;
> +struct VhostUserState *us = u->user;
>  int i, fd, ret, reg_idx, reg_fd_idx;
>  struct vhost_memory_region *reg;
>  MemoryRegion *mr;
> @@ -746,6 +755,8

[PATCH v2 1/3] hw/boards: Add hvf_get_physical_address_range to MachineClass

2024-08-28 Thread Danny Canter

This addition will be necessary for some HVF related work to follow.
For HVF on ARM there exists a set of APIs in macOS 13 to be able to
adjust the IPA size for a given VM. This is useful as by default HVF
uses 36 bits as the IPA size, so to support guests with > 64GB of RAM
we'll need to reach for this.

To have all the info necessary to carry this out however, we need some
plumbing to be able to grab the memory map and compute the highest GPA
prior to creating the VM. This is almost exactly like what kvm_type is
used for on ARM today, and is also what this will be used for. We will
compute the highest GPA and find what IPA size we'd need to satisfy this,
and if it's valid (macOS today caps at 40b) we'll set this to be the IPA
size in coming patches. This new method is only needed (today at least)
on ARM, and obviously only for HVF/macOS, so admittedly it is much less
generic than kvm_type today, but it seemed a somewhat sane way to get
the information we need from the memmap at VM creation time.

Signed-off-by: Danny Canter 
---
 hw/arm/virt.c   | 9 -
 hw/i386/x86.c   | 2 ++
 include/hw/boards.h | 5 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 687fe0bb8b..62ee5f849b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2107,7 +2107,8 @@ static void machvirt_init(MachineState *machine)
 
 /*
  * In accelerated mode, the memory map is computed earlier in kvm_type()
- * to create a VM with the right number of IPA bits.
+ * for Linux, or hvf_get_physical_address_range() for macOS to create a
+ * VM with the right number of IPA bits.
  */
 if (!vms->memmap) {
 Object *cpuobj;
@@ -3027,6 +3028,11 @@ static int virt_kvm_type(MachineState *ms, const char 
*type_str)
 return fixed_ipa ? 0 : requested_pa_size;
 }
 
+static int virt_hvf_get_physical_address_range(MachineState *ms)
+{
+return 0;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -3086,6 +3092,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->valid_cpu_types = valid_cpu_types;
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
 mc->kvm_type = virt_kvm_type;
+mc->hvf_get_physical_address_range = virt_hvf_get_physical_address_range;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
 hc->pre_plug = virt_machine_device_pre_plug_cb;
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 01fc5e6562..fa7a0f6b98 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -382,6 +382,8 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
 mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
 mc->kvm_type = x86_kvm_type;
+/* Not needed for x86 */
+mc->hvf_get_physical_address_range = NULL;
 x86mc->save_tsc_khz = true;
 x86mc->fwcfg_dma_enabled = true;
 nc->nmi_monitor_handler = x86_nmi;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 48ff6d8b93..bfc7cc7f90 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -215,6 +215,10 @@ typedef struct {
  *Return the type of KVM corresponding to the kvm-type string option or
  *computed based on other criteria such as the host kernel capabilities.
  *kvm-type may be NULL if it is not needed.
+ * @hvf_get_physical_address_range:
+ *Returns the physical address range in bits to use for the HVF virtual
+ *machine based on the current boards memory map. This may be NULL if it
+ *is not needed.
  * @numa_mem_supported:
  *true if '--numa node.mem' option is supported and false otherwise
  * @hotplug_allowed:
@@ -256,6 +260,7 @@ struct MachineClass {
 void (*reset)(MachineState *state, ShutdownCause reason);
 void (*wakeup)(MachineState *state);
 int (*kvm_type)(MachineState *machine, const char *arg);
+int (*hvf_get_physical_address_range)(MachineState *machine);
 
 BlockInterfaceType block_default_type;
 int units_per_default_bus;
-- 
2.39.5 (Apple Git-154)

[PATCH v2 3/3] hvf: arm: Implement and use hvf_get_physical_address_range

2024-08-28 Thread Danny Canter

This patch's main focus is to use the previously added
hvf_get_physical_address_range to inform VM creation
about the IPA size we need for the VM, so we can extend
the default 36b IPA size and support VMs with 64+GB of
RAM. This is done by freezing the memory map, computing
the highest GPA and then (depending on if the platform
supports an IPA size that large) telling the kernel to
use a size >= for the VM. In pursuit of this a couple of
things related to how we handle the physical address range
we expose to guests were altered, but for an explanation of
what we were doing:

Today, to get the IPA size we were reading id_aa64mmfr0_el1's
PARange field from a newly made vcpu. Unfortunately, HVF just
returns the hosts PARange directly for the initial value and
not the IPA size that will actually back the VM, so we believe
we have much more address space than we actually do today it seems.

Starting in macOS 13.0 some APIs were introduced to be able to
query the maximum IPA size the kernel supports, and to set the IPA
size for a given VM. However, this still has a couple of issues
on < macOS 15. Up until macOS 15 (and if the hardware supported
it) the max IPA size was 39 bits which is not a valid PARange
value, so we can't clamp down what we advertise in the vcpu's
id_aa64mmfr0_el1 to our IPA size. Starting in macOS 15 however,
the maximum IPA size is 40 bits (if it's supported in the hardware
as well) which is also a valid PARange value so we can set our IPA
size to the maximum as well as clamp down the PARange we advertise
to the guest. This allows VMs with 64+ GB of RAM and should fix the
oddness of the PARange situation as well.

Signed-off-by: Danny Canter 
---
 accel/hvf/hvf-accel-ops.c | 12 -
 hw/arm/virt.c | 31 +-
 target/arm/hvf/hvf.c  | 56 ++-
 target/arm/hvf_arm.h  | 19 +
 target/arm/internals.h| 19 +
 target/arm/ptw.c  | 15 +++
 6 files changed, 149 insertions(+), 3 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index dbebf209f4..d60874d3e6 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -53,6 +53,7 @@
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
 #include "gdbstub/enums.h"
+#include "hw/boards.h"
 #include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
 #include "sysemu/hvf_int.h"
@@ -319,8 +320,17 @@ static int hvf_accel_init(MachineState *ms)
 int x;
 hv_return_t ret;
 HVFState *s;
+int pa_range = 36;
+MachineClass *mc = MACHINE_GET_CLASS(ms);
 
-ret = hvf_arch_vm_create(ms, 0);
+if (mc->hvf_get_physical_address_range) {
+pa_range = mc->hvf_get_physical_address_range(ms);
+if (pa_range < 0) {
+return -EINVAL;
+}
+}
+
+ret = hvf_arch_vm_create(ms, (uint32_t)pa_range);
 assert_hvf_ok(ret);
 
 s = g_new0(HVFState, 1);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 62ee5f849b..b39c7924a0 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -66,6 +66,7 @@
 #include "hw/intc/arm_gicv3_its_common.h"
 #include "hw/irq.h"
 #include "kvm_arm.h"
+#include "hvf_arm.h"
 #include "hw/firmware/smbios.h"
 #include "qapi/visitor.h"
 #include "qapi/qapi-visit-common.h"
@@ -3030,7 +3031,35 @@ static int virt_kvm_type(MachineState *ms, const char 
*type_str)
 
 static int virt_hvf_get_physical_address_range(MachineState *ms)
 {
-return 0;
+VirtMachineState *vms = VIRT_MACHINE(ms);
+
+int default_ipa_size = hvf_arm_get_default_ipa_bit_size();
+int max_ipa_size = hvf_arm_get_max_ipa_bit_size();
+
+/* We freeze the memory map to compute the highest gpa */
+virt_set_memmap(vms, max_ipa_size);
+
+int requested_ipa_size = 64 - clz64(vms->highest_gpa);
+
+/*
+ * If we're <= the default IPA size just use the default.
+ * If we're above the default but below the maximum, round up to
+ * the maximum. hvf_arm_get_max_ipa_bit_size() conveniently only
+ * returns values that are valid ARM PARange values.
+ */
+if (requested_ipa_size <= default_ipa_size) {
+requested_ipa_size = default_ipa_size;
+} else if (requested_ipa_size <= max_ipa_size) {
+requested_ipa_size = max_ipa_size;
+} else {
+error_report("-m and ,maxmem option values "
+ "require an IPA range (%d bits) larger than "
+ "the one supported by the host (%d bits)",
+ requested_ipa_size, max_ipa_size);
+return -1;
+}
+
+return requested_ipa_size;
 }
 
 static void virt_machine_class_init(ObjectClass *oc, void *data)
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 19964d241e..6cea483d42 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "exec/address-spaces.h"
+#include "hw/boards.h"
 #include "hw/irq.h"
 #include "qemu/main-loop.h"
 #include "sysemu/cpus.h"
@@ -297,6 +298

[PATCH v2 2/3] hvf: Split up hv_vm_create logic per arch

2024-08-28 Thread Danny Canter

This is preliminary work to split up hv_vm_create
logic per platform so we can support creating VMs
with > 64GB of RAM on Apple Silicon machines. This
is done via ARM HVF's hv_vm_config_create() (and
other APIs that modify this config that will be
coming in future patches). This should have no
behavioral difference at all as hv_vm_config_create()
just assigns the same default values as if you just
passed NULL to the function.

Signed-off-by: Danny Canter 
---
 accel/hvf/hvf-accel-ops.c | 6 +-
 include/sysemu/hvf_int.h  | 1 +
 target/arm/hvf/hvf.c  | 9 +
 target/i386/hvf/hvf.c | 5 +
 4 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index ac08cfb9f3..dbebf209f4 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -61,10 +61,6 @@
 
 HVFState *hvf_state;
 
-#ifdef __aarch64__
-#define HV_VM_DEFAULT NULL
-#endif
-
 /* Memory slots */
 
 hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
@@ -324,7 +320,7 @@ static int hvf_accel_init(MachineState *ms)
 hv_return_t ret;
 HVFState *s;
 
-ret = hv_vm_create(HV_VM_DEFAULT);
+ret = hvf_arch_vm_create(ms, 0);
 assert_hvf_ok(ret);
 
 s = g_new0(HVFState, 1);
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index 5b28d17ba1..42ae18433f 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -65,6 +65,7 @@ void assert_hvf_ok_impl(hv_return_t ret, const char *file, 
unsigned int line,
 #define assert_hvf_ok(EX) assert_hvf_ok_impl((EX), __FILE__, __LINE__, #EX)
 const char *hvf_return_string(hv_return_t ret);
 int hvf_arch_init(void);
+hv_return_t hvf_arch_vm_create(MachineState *ms, uint32_t pa_range);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
 int hvf_vcpu_exec(CPUState *);
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index ace83671b5..19964d241e 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -929,6 +929,15 @@ void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
 }
 
+hv_return_t hvf_arch_vm_create(MachineState *ms, uint32_t pa_range)
+{
+hv_vm_config_t config = hv_vm_config_create();
+hv_return_t ret = hv_vm_create(config);
+os_release(config);
+
+return ret;
+}
+
 int hvf_arch_init_vcpu(CPUState *cpu)
 {
 ARMCPU *arm_cpu = ARM_CPU(cpu);
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index c9c64e2978..68dc5d9cf7 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -223,6 +223,11 @@ int hvf_arch_init(void)
 return 0;
 }
 
+hv_return_t hvf_arch_vm_create(MachineState *ms, uint32_t pa_range)
+{
+return hv_vm_create(HV_VM_DEFAULT);
+}
+
 int hvf_arch_init_vcpu(CPUState *cpu)
 {
 X86CPU *x86cpu = X86_CPU(cpu);
-- 
2.39.5 (Apple Git-154)

[PATCH v2 0/3] hvf: arm: Support creating VMs with 64+GB of RAM on macOS 15+

2024-08-28 Thread Danny Canter

This patchsets focus is on lighting up the ability to create VMs with 64+GB
of RAM through using some new APIs introduced in macOS 13. Due to the IPA sizes
supported in macOS, the first version we can properly support this requirement
is macOS 15 as (if the hardware supports it also) the kernel adds support for a
40b IPA size, which is the first supported ARM PARange value after 36b, so we
can advertise this to the guest properly as well in id_aa64mmfr0_el1.

Today if you asked for a > 64GB VM you'd be met with a pretty unwieldy
HV_BAD_ARGUMENT. On machines without 40b IPA support this patchset also
improves this, and the message mirrors the kvm_type error you'd get on ARM:

"qemu-system-aarch64: -accel hvf: Addressing limited to 36 bits, but memory
exceeds it by 18253611008 bytes"

Changes from V1 to V2 (Thanks Peter for review!):

- Added a new function pointer to MachineClass to be able to freeze the memory
map and compute the highest guest physical address. We use this to inform VM
creation on what IPA size we should ask the kernel for. This is very similar to
what ARM's kvm_type() does.

- Fixed redundant loop in `round_down_to_parange_bit_size`

- Move the splitting up of hv_vm_create logic per platform to a separate patch.
This is mostly for readability.

Danny Canter (3):
  hw/boards: Add hvf_get_physical_address_range to MachineClass
  hvf: Split up hv_vm_create logic per arch
  hvf: arm: Allow creating VMs with 64+GB of RAM on macOS 15+

 accel/hvf/hvf-accel-ops.c | 16 +++---
 hw/arm/virt.c | 42 -
 hw/i386/x86.c |  2 ++
 include/hw/boards.h   |  5 +++
 include/sysemu/hvf_int.h  |  1 +
 target/arm/hvf/hvf.c  | 66 +++
 target/arm/hvf_arm.h  |  3 ++
 target/arm/internals.h| 19 +++
 target/arm/ptw.c  | 15 +
 target/i386/hvf/hvf.c |  5 +++
 10 files changed, 168 insertions(+), 6 deletions(-)

-- 
2.39.5 (Apple Git-154)

Re: [PATCH v5 1/2] kvm: replace fprintf with error_report()/printf() in kvm_init()

2024-08-28 Thread Markus Armbruster

Ani Sinha  writes:

> error_report() is more appropriate for error situations. Replace fprintf with
> error_report() and error_printf() as appropriate. Cosmetic. No functional
> change.

Uh, I missed this last time around: the change is more than just
cosmetics!  The error messages change, e.g. from

$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: 
Permission denied
qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission denied

to

$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
Could not access KVM kernel module: Permission denied
qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission denied

Note: the second message is from kvm_init()'s caller.  Reporting the
same error twice is wrong, but not this patch's problem.

Moreover, the patch tweaks an error message at [*].

Suggest something like

  Replace fprintf() with error_report() and error_printf() where
  appropriate.  Error messages improve, e.g. from

  Could not access KVM kernel module: Permission denied

  to

  qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: 
Permission denied

> CC: qemu-triv...@nongnu.org
> CC: zhao1@intel.com
> CC: arm...@redhat.com
> Reviewed-by: Zhao Liu 
> Signed-off-by: Ani Sinha 
> ---
>  accel/kvm/kvm-all.c | 40 ++--
>  1 file changed, 18 insertions(+), 22 deletions(-)
>
> changelog:
> v2: fix a bug.
> v3: replace one instance of error_report() with error_printf(). added tags.
> v4: changes suggested by Markus.
> v5: more changes from Markus's comments on v4.
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 75d11a07b2..fcc157f0e6 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2427,7 +2427,7 @@ static int kvm_init(MachineState *ms)
>  QLIST_INIT(&s->kvm_parked_vcpus);
>  s->fd = qemu_open_old(s->device ?: "/dev/kvm", O_RDWR);
>  if (s->fd == -1) {
> -fprintf(stderr, "Could not access KVM kernel module: %m\n");
> +error_report("Could not access KVM kernel module: %m");
>  ret = -errno;
>  goto err;
>  }
> @@ -2437,13 +2437,13 @@ static int kvm_init(MachineState *ms)
>  if (ret >= 0) {
>  ret = -EINVAL;
>  }
> -fprintf(stderr, "kvm version too old\n");
> +error_report("kvm version too old");
>  goto err;
>  }
>  
>  if (ret > KVM_API_VERSION) {
>  ret = -EINVAL;
> -fprintf(stderr, "kvm version not supported\n");
> +error_report("kvm version not supported");
>  goto err;
>  }
>  
> @@ -2488,26 +2488,22 @@ static int kvm_init(MachineState *ms)
>  } while (ret == -EINTR);
>  
>  if (ret < 0) {
> -fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
> -strerror(-ret));
> +error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));

[*] This is where you change an error message.

>  
>  #ifdef TARGET_S390X
>  if (ret == -EINVAL) {
> -fprintf(stderr,
> -"Host kernel setup problem detected. Please verify:\n");
> -fprintf(stderr, "- for kernels supporting the switch_amode or"
> -" user_mode parameters, whether\n");
> -fprintf(stderr,
> -"  user space is running in primary address space\n");
> -fprintf(stderr,
> -"- for kernels supporting the vm.allocate_pgste sysctl, "
> -"whether it is enabled\n");
> +error_printf("Host kernel setup problem detected."
> + " Please verify:\n");
> +error_printf("- for kernels supporting the"
> +" switch_amode or user_mode parameters, whether");
> +error_printf(" user space is running in primary address 
> space\n");
> +error_printf("- for kernels supporting the vm.allocate_pgste"
> + " sysctl, whether it is enabled\n");
>  }
>  #elif defined(TARGET_PPC)
>  if (ret == -EINVAL) {
> -fprintf(stderr,
> -"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
> -(type == 2) ? "pr" : "hv");
> +error_printf("PPC KVM module is not loaded. Try modprobe 
> kvm_%s.\n",
> + (type == 2) ? "pr" : "hv");
>  }
>  #endif
>  goto err;
> @@ -2526,9 +2522,9 @@ static int kvm_init(MachineState *ms)
>  nc->name, nc->num, soft_vcpus_limit);
>  
>  if (nc->num > hard_vcpus_limit) {
> -fprintf(stderr, "Number of %s cpus requested (%d) exceeds "
> -"the maximum cpus supported by KVM (%d)\n",
> -nc->name, nc->num, hard_vcpus_limit);
> +error_report("Number of %s cpus request

Re: [PATCH RESEND v9 2/9] build-sys: Add rust feature option

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> Add rust feature in meson.build, configure, to prepare for adding Rust
> code in the followup commits.
>
> Signed-off-by: Manos Pitsidianakis 
> ---
>  MAINTAINERS   |  5 +
>  meson.build   | 25 -
>  Kconfig   |  1 +
>  Kconfig.host  |  3 +++
>  meson_options.txt |  3 +++
>  rust/Kconfig  |  0
>  scripts/meson-buildoptions.sh |  3 +++
>  7 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 
> 3584d6a6c6da9a3210150534d640d29ddf329dce..0bc8e515daf7e63320620b52b42a799b99dbe035
>  100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -4243,6 +4243,11 @@ F: docs/sphinx/
>  F: docs/_templates/
>  F: docs/devel/docs.rst
>  
> +Rust build system integration
> +M: Manos Pitsidianakis 
> +S: Maintained
> +F: rust/Kconfig
> +
>  Miscellaneous
>  -
>  Performance Tools and Tests
> diff --git a/meson.build b/meson.build
> index 
> 7eb4b8a41c0a667cacf693cfa2764f326ba72b1f..67eb4eda649d5f0566de2b75466b5a9d9ca87ab4
>  100644
> --- a/meson.build
> +++ b/meson.build
> @@ -70,6 +70,22 @@ if host_os == 'darwin' and \
>all_languages += ['objc']
>objc = meson.get_compiler('objc')
>  endif
> +if get_option('rust').enabled() and meson.version().version_compare('<1.0.0')
> +  error('Rust support requires Meson version >=1.0.0')
> +endif

Isn't this test obsolete as patch 1 ensures we have at least 1.50 anyway? 

> +have_rust = false
> +if not get_option('rust').disabled() and add_languages('rust', required: 
> get_option('rust'), native: false)
> +  rustc = meson.get_compiler('rust')
> +  have_rust = true
> +  if rustc.version().version_compare('<1.80.0')
> +if get_option('rust').enabled()
> +  error('rustc version ' + rustc.version() + ' is unsupported: Please 
> upgrade to at least 1.80.0')
> +else
> +  warning('rustc version ' + rustc.version() + ' is unsupported: 
> Disabling Rust compilation. Please upgrade to at least 1.80.0 to use Rust.')
> +  have_rust = false
> +endif
> +  endif
> +endif
>  
>  dtrace = not_found
>  stap = not_found
> @@ -2131,6 +2147,7 @@ endif
>  
>  config_host_data = configuration_data()
>  
> +config_host_data.set('CONFIG_HAVE_RUST', have_rust)
>  audio_drivers_selected = []
>  if have_system
>audio_drivers_available = {
> @@ -3076,7 +3093,8 @@ host_kconfig = \
>(host_os == 'linux' ? ['CONFIG_LINUX=y'] : []) + \
>(multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
>(vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : []) + \
> -  (hv_balloon ? ['CONFIG_HV_BALLOON_POSSIBLE=y'] : [])
> +  (hv_balloon ? ['CONFIG_HV_BALLOON_POSSIBLE=y'] : []) + \
> +  (have_rust ? ['CONFIG_HAVE_RUST=y'] : [])
>  
>  ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
>  
> @@ -4287,6 +4305,11 @@ if 'objc' in all_languages
>  else
>summary_info += {'Objective-C compiler': false}
>  endif
> +summary_info += {'Rust support':  have_rust}
> +if have_rust
> +  summary_info += {'rustc version':  rustc.version()}
> +  summary_info += {'rustc':  ' '.join(rustc.cmd_array())}
> +endif
>  option_cflags = (get_option('debug') ? ['-g'] : [])
>  if get_option('optimization') != 'plain'
>option_cflags += ['-O' + get_option('optimization')]
> diff --git a/Kconfig b/Kconfig
> index 
> fb6a24a2de8c3ff11d4ee432d65ad000ba9d6c4d..63ca7f46df788144864b26ef5a64b29ad6547435
>  100644
> --- a/Kconfig
> +++ b/Kconfig
> @@ -4,3 +4,4 @@ source accel/Kconfig
>  source target/Kconfig
>  source hw/Kconfig
>  source semihosting/Kconfig
> +source rust/Kconfig

I was wondering if we should call out the directory structure in
docs/devel but we don't currently. docs/devel/build-system.rst is
probably the wrong place for it so maybe we need a new section for code
layout. Anyway not a problem for this patch.

> diff --git a/Kconfig.host b/Kconfig.host
> index 
> 17f405004b3bc765890688304322a1937ca8c01c..4ade7899d67a5ed91928f8ee1e287f5ba3331949
>  100644
> --- a/Kconfig.host
> +++ b/Kconfig.host
> @@ -52,3 +52,6 @@ config VFIO_USER_SERVER_ALLOWED
>  
>  config HV_BALLOON_POSSIBLE
>  bool
> +
> +config HAVE_RUST
> +bool

Not this patches fault but the top of this Kconfig states:

  # These are "proxy" symbols used to pass config-host.mak values
  # down to Kconfig.  See also kconfig_external_symbols in
  # meson.build: these two need to be kept in sync.

but that was updated in 0a18911074 (meson: cleanup Kconfig.host
handling) and I can see host_kconfig has been updated. It would be nice
to put a patch in before just cleaning up that comment.


> diff --git a/meson_options.txt b/meson_options.txt
> index 
> 0269fa0f16ed6b6f734fcefa2cfa94aa029fa837..fa94a5ce97bb14ab108e21cccb651923ac6a58f8
>  100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -371,3 +371,6 @@ option('hexagon_idef_parser', type : 'boolean', value : 
> true,

Re: [PULL 17/20] target/arm: Do memory type alignment check when translation disabled

2024-08-28 Thread Michael Tokarev


28.08.2024 14:07, Richard Henderson wrote:

On 8/28/24 17:22, Michael Tokarev wrote:

05.03.2024 16:52, Peter Maydell wrote:

From: Richard Henderson 

If translation is disabled, the default memory type is Device, which
requires alignment checking.  This is more optimally done early via
the MemOp given to the TCG memory operation.

Reviewed-by: Philippe Mathieu-Daudé 
Reported-by: Idan Horowitz 
Signed-off-by: Richard Henderson 
Message-id: 20240301204110.656742-6-richard.hender...@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204
Signed-off-by: Richard Henderson 
Signed-off-by: Peter Maydell 


Hi!

Apparently this change also breaks picolibc testsuite (between
8.2 and 9.0, bisect points to this commit).

For example:

./qemu-system-arm \
   -m 1G \
   -chardev stdio,mux=on,id=stdio0 \
   -semihosting-config enable=on,chardev=stdio0,arg=program-name \
   -monitor none \
   -serial none \
   -machine none,accel=tcg \
   -cpu cortex-a8 \
   -device loader,file=/tmp/picolibc-1.8.6/arm-none-eabi/test/ 
printf_scanf_thumb_v7_fp_softfp,cpu-num=0 \
   -nographic


Almost certainly a duplicate of #2326, fixed in master by 
4c2c0474693229c1f533239bb983495c5427784d.


Hi Richard!

You can read my email to the end, where I mentioned that this problem
is NOT fixed in current master and by this commit in particular.

Thanks,

/mjt

Re: [PATCH v10 02/21] linux-user/riscv: set priv for qemu-user and defaults for *envcfg

2024-08-28 Thread Richard Henderson


On 8/28/24 10:16, Deepak Gupta wrote:

This should be handled by a CPU reset, which is still called for linux
user mode.


It is the right place for setting priv to PRV_U?
or you want me to place it elsewhere ?


Sure
for reset values of *envcfg, I can rely on `riscv_cpu_reset_hold`


Doing this in reset_hold seems correct to me.

Compare target/arm/cpu.c, arm_cpu_reset_hold:


if (arm_feature(env, ARM_FEATURE_AARCH64)) {
/* 64 bit CPUs always start in 64 bit mode */
env->aarch64 = true;
#if defined(CONFIG_USER_ONLY)
env->pstate = PSTATE_MODE_EL0t;
/* Userspace expects access to DC ZVA, CTL_EL0 and the cache ops */
env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
/* Enable all PAC keys.  */
env->cp15.sctlr_el[1] |= (SCTLR_EnIA | SCTLR_EnIB |
  SCTLR_EnDA | SCTLR_EnDB);

...

That assignment to pstate is equivalent to "priv = PRV_U", and sctlr_el[] fills roughly 
the same role as [ms]envcfg.



r~

Re: [PATCH 2/2] hw/cxl/cxl_event: Fix interrupt triggering for dynamic capacity events grouped via More flag

2024-08-28 Thread Jonathan Cameron via

On Tue, 27 Aug 2024 09:40:05 -0700
nifan@gmail.com wrote:

> From: Fan Ni 
> 
> When inserting multiple dynamic capacity event records grouped via More flag,
> we should only trigger interrupt after the last record is inserted into the
> event log. Achieving the goal by letting cxl_event_insert return true only
> for the insertion of the last dynamic capacity event record in the sequence.

I'm not sure this one is accurate.  We might well have a slow
system provisioning capacity one extent at time (and interrupting).

The event buffer might also not be large enough to hold all records so
the device might 'wait' before figuring out the next extent for there
to be somewhere to put the record.

Overall I think we can interrupt on each one and it should 'work'
as should interrupt only once there are lots of them or
every (n).

Interrupt only fires on a 0 to >= 1 transition anyway, not
on repeats after that unless the log has been cleared.
It's up to OS to keep clearing records until it at least
momentarily hits 0 if it wants to get any more interrupts.

Jonathan

> 
> Signed-off-by: Fan Ni 
> ---
>  hw/cxl/cxl-events.c | 8 
>  include/hw/cxl/cxl_events.h | 1 +
>  2 files changed, 9 insertions(+)
> 
> diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c
> index 12dee2e467..90536c0e68 100644
> --- a/hw/cxl/cxl-events.c
> +++ b/hw/cxl/cxl-events.c
> @@ -135,6 +135,14 @@ bool cxl_event_insert(CXLDeviceState *cxlds, 
> CXLEventLogType log_type,
>  QSIMPLEQ_INSERT_TAIL(&log->events, entry, node);
>  cxl_event_set_status(cxlds, log_type, true);
>  
> +/*
> + * For dynamic capacity event records grouped via More flag,
> + * Only raise interrupt after inserting the last record in the log.
> + */
> +if (log_type == CXL_EVENT_TYPE_DYNAMIC_CAP) {
> +CXLEventDynamicCapacity *dCap = (CXLEventDynamicCapacity *)event;
> +return (dCap->flags & MORE_FLAG) ? false : true;
> +}
>  /* Count went from 0 to 1 */
>  return cxl_event_count(log) == 1;

If there are multiple this will fail I think as cxl_event_count(log) will go 
from 0
to X not 1.

>  }
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 38cadaa0f3..b0e5cc89c0 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -170,6 +170,7 @@ typedef struct CXLEventMemoryModule {
>   * CXL r3.1 section Table 8-50: Dynamic Capacity Event Record
>   * All fields little endian.
>   */
> +#define MORE_FLAG BIT_MASK(0)
>  typedef struct CXLEventDynamicCapacity {
>  CXLEventRecordHdr hdr;
>  uint8_t type;

Re: [PATCH 1/2] hw/mem/cxl_type3: Fix More flag setting for dynamic capacity event records

2024-08-28 Thread Jonathan Cameron via

On Tue, 27 Aug 2024 09:40:04 -0700
nifan@gmail.com wrote:

> From: Fan Ni 
> 
> Per cxl spec r3.1, for multiple dynamic capacity event records grouped via
> the More flag, the last record in the sequence should clear the More flag.
> 
> Before the change, the More flag of the event record is cleared before
> the loop of inserting records into the event log, which will leave the flag
> always set once it is set in the loop.
> 
> Signed-off-by: Fan Ni 
Oops.  I'll queue this up, though not sure I'll get a fixes series
out this week so it might only hit after the QEMU release.

Jonathan

> ---
>  hw/mem/cxl_type3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index d648192ab9..e616801c81 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -2060,11 +2060,11 @@ static void 
> qmp_cxl_process_dynamic_capacity_prescriptive(const char *path,
>  stw_le_p(&dCap.host_id, hid);
>  /* only valid for DC_REGION_CONFIG_UPDATED event */
>  dCap.updated_region_id = 0;
> -dCap.flags = 0;
>  for (i = 0; i < num_extents; i++) {
>  memcpy(&dCap.dynamic_capacity_extent, &extents[i],
> sizeof(CXLDCExtentRaw));
>  
> +dCap.flags = 0;
>  if (i < num_extents - 1) {
>  /* Set "More" flag */
>  dCap.flags |= BIT(0);

Re: [PATCH 13/13] pnv/xive2: TIMA CI ops using alternative offsets or byte lengths

2024-08-28 Thread Cédric Le Goater


On 8/1/24 22:30, Michael Kowal wrote:

Some of the TIMA Special CI operations perform the same operation at
alternative byte offsets and lengths.  The following
xive2_tm_opertions[] table entries are missing when they exist for
other offsets/sizes and have been added:
- lwz@0x810 Pull/Invalidate O/S Context to registeradded
   lwz@0x818exists
   ld @0x818exists
- lwz@0x820 Pull Pool Context to register  added
   lwz@0x828exists
   ld @0x828exists
- lwz@0x830 Pull Thread Context to registeradded
   lbz@0x838exists

Signed-off-by: Michael Kowal 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  include/hw/ppc/xive_regs.h | 7 ++-
  hw/intc/xive.c | 6 ++
  2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/xive_regs.h b/include/hw/ppc/xive_regs.h
index 5b11463777..326327fc79 100644
--- a/include/hw/ppc/xive_regs.h
+++ b/include/hw/ppc/xive_regs.h
@@ -124,12 +124,17 @@
  #define TM_SPC_PULL_USR_CTX 0x808   /* Load32 Pull/Invalidate user
*/
  /* context
*/
  #define TM_SPC_SET_OS_PENDING   0x812   /* Store8 Set OS irq pending bit  
*/
+#define TM_SPC_PULL_OS_CTX_G2   0x810   /* Load32/Load64 Pull/Invalidate OS   
*/
+/* context to reg 
*/
  #define TM_SPC_PULL_OS_CTX  0x818   /* Load32/Load64 Pull/Invalidate OS   
*/
  /* context to reg 
*/
+#define TM_SPC_PULL_POOL_CTX_G2 0x820   /* Load32/Load64 Pull/Invalidate Pool 
*/
+/* context to reg 
*/
  #define TM_SPC_PULL_POOL_CTX0x828   /* Load32/Load64 Pull/Invalidate Pool 
*/
  /* context to reg 
*/
  #define TM_SPC_ACK_HV_REG   0x830   /* Load16 ack HV irq to reg   
*/
-#define TM_SPC_PULL_PHYS_CTX0x838   /* Pull phys ctx to reg   
*/
+#define TM_SPC_PULL_PHYS_CTX_G2 0x830   /* Load32 Pull phys ctx to reg
*/
+#define TM_SPC_PULL_PHYS_CTX0x838   /* Load8  Pull phys ctx to reg
*/
  #define TM_SPC_PULL_USR_CTX_OL  0xc08   /* Store8 Pull/Inval usr ctx to odd   
*/
  /* line   
*/
  #define TM_SPC_ACK_OS_EL0xc10   /* Store8 ack OS irq to even line 
*/
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 99c8bea598..ce1504fbed 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -614,18 +614,24 @@ static const XiveTmOp xive2_tm_operations[] = {
   xive_tm_ack_os_reg },
  { XIVE_TM_OS_PAGE, TM_SPC_SET_OS_PENDING, 1, xive_tm_set_os_pending,
   NULL },
+{ XIVE_TM_HV_PAGE, TM_SPC_PULL_OS_CTX_G2, 4, NULL,
+ xive2_tm_pull_os_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_OS_CTX,4, NULL,
   xive2_tm_pull_os_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_OS_CTX,8, NULL,
   xive2_tm_pull_os_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_ACK_HV_REG, 2, NULL,
   xive_tm_ack_hv_reg },
+{ XIVE_TM_HV_PAGE, TM_SPC_PULL_POOL_CTX_G2,   4, NULL,
+ xive_tm_pull_pool_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_POOL_CTX,  4, NULL,
   xive_tm_pull_pool_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_POOL_CTX,  8, NULL,
   xive_tm_pull_pool_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_OS_CTX_OL, 1, xive2_tm_pull_os_ctx_ol,
   NULL },
+{ XIVE_TM_HV_PAGE, TM_SPC_PULL_PHYS_CTX_G2,   4, NULL,
+ xive_tm_pull_phys_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_PHYS_CTX,  1, NULL,
   xive_tm_pull_phys_ctx },
  { XIVE_TM_HV_PAGE, TM_SPC_PULL_PHYS_CTX_OL,   1, 
xive2_tm_pull_phys_ctx_ol,

Re: [PATCH 12/13] pnv/xive2: TIMA support for 8-byte OS context push for PHYP

2024-08-28 Thread Cédric Le Goater


On 8/1/24 22:30, Michael Kowal wrote:

From: Glenn Miles 

PHYP uses 8-byte writes to the 2nd doubleword of the OS context
line when dispatching an OS level virtual processor.  This
support was not used by OPAL/Linux and so was never added.

Without this support, the XIVE code doesn't notice that a new
context is being pushed and fails to check for unpresented
pending interrupts for that context.

Signed-off-by: Glenn Miles 
Signed-off-by: Michael Kowal 



Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  hw/intc/xive.c  |  2 ++
  hw/intc/xive2.c | 24 +++-
  2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index d951aac3a0..99c8bea598 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -596,6 +596,8 @@ static const XiveTmOp xive2_tm_operations[] = {
   NULL },
  { XIVE_TM_HV_PAGE, TM_QW1_OS + TM_WORD2,  4, xive2_tm_push_os_ctx,
   NULL },
+{ XIVE_TM_HV_PAGE, TM_QW1_OS + TM_WORD2,  8, xive2_tm_push_os_ctx,
+ NULL },
  { XIVE_TM_OS_PAGE, TM_QW1_OS + TM_LGS,1, xive_tm_set_os_lgs,
   NULL },
  { XIVE_TM_HV_PAGE, TM_QW3_HV_PHYS + TM_CPPR,  1, xive_tm_set_hv_cppr,
diff --git a/hw/intc/xive2.c b/hw/intc/xive2.c
index af9699ec88..76f2f36973 100644
--- a/hw/intc/xive2.c
+++ b/hw/intc/xive2.c
@@ -612,17 +612,31 @@ static void xive2_tctx_need_resend(Xive2Router *xrtr, 
XiveTCTX *tctx,
  void xive2_tm_push_os_ctx(XivePresenter *xptr, XiveTCTX *tctx,
hwaddr offset, uint64_t value, unsigned size)
  {
-uint32_t cam = value;
-uint32_t qw1w2 = cpu_to_be32(cam);
+uint32_t cam;
+uint32_t qw1w2;
+uint64_t qw1dw1;
  uint8_t nvp_blk;
  uint32_t nvp_idx;
  bool vo;
  bool do_restore;
  
-xive2_cam_decode(cam, &nvp_blk, &nvp_idx, &vo, &do_restore);

-
  /* First update the thead context */
-memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &qw1w2, 4);
+switch (size) {
+case 4:
+cam = value;
+qw1w2 = cpu_to_be32(cam);
+memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &qw1w2, 4);
+break;
+case 8:
+cam = value >> 32;
+qw1dw1 = cpu_to_be64(value);
+memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &qw1dw1, 8);
+break;
+default:
+g_assert_not_reached();
+}
+
+xive2_cam_decode(cam, &nvp_blk, &nvp_idx, &vo, &do_restore);
  
  /* Check the interrupt pending bits */

  if (vo) {

Re: [PATCH RESEND v9 3/9] configure, meson: detect Rust toolchain

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> From: Paolo Bonzini 
>
> Include the correct path and arguments to rustc in the native
> and cross files (native compilation is needed for procedural
> macros).
>
> Signed-off-by: Paolo Bonzini 
> ---
>  configure   | 50 --
>  meson.build |  8 +++-
>  2 files changed, 51 insertions(+), 7 deletions(-)
>
> diff --git a/configure b/configure
> index 
> 019fcbd0ef7b07e7b0280b358099cae72c73aa98..9ef6005c557fc627c7c6c732b4c92ed1b934f474
>  100755
> --- a/configure
> +++ b/configure
> @@ -207,6 +207,8 @@ for opt do
>;;
>--objcc=*) objcc="$optarg"
>;;
> +  --rustc=*) RUSTC="$optarg"
> +  ;;
>--cpu=*) cpu="$optarg"
>;;
>--extra-cflags=*)
> @@ -252,6 +254,9 @@ python=
>  download="enabled"
>  skip_meson=no
>  use_containers="yes"
> +# do not enable by default because cross compilation requires 
> --rust-target-triple
> +rust="disabled"
> +rust_target_triple=""
>  gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
>  gdb_arches=""
>  
> @@ -317,6 +322,8 @@ windmc="${WINDMC-${cross_prefix}windmc}"
>  pkg_config="${PKG_CONFIG-${cross_prefix}pkg-config}"
>  sdl2_config="${SDL2_CONFIG-${cross_prefix}sdl2-config}"
>  
> +rustc="${RUSTC-rustc}"
> +
>  check_define() {
>  cat > $TMPC <  #if !defined($1)
> @@ -636,6 +643,8 @@ for opt do
>;;
>--objcc=*)
>;;
> +  --rustc=*)
> +  ;;
>--make=*)
>;;
>--install=*)
> @@ -755,8 +764,14 @@ for opt do
>;;
>--container-engine=*) container_engine="$optarg"
>;;
> +  --rust-target-triple=*) rust_target_triple="$optarg"
> +  ;;

Could we not map from --cross-prefix to the target triple? Having to
specify both seems prone to failure.

>--gdb=*) gdb_bin="$optarg"
>;;
> +  --enable-rust) rust=enabled
> +  ;;
> +  --disable-rust) rust=disabled
> +  ;;
># everything else has the same name in configure and meson
>--*) meson_option_parse "$opt" "$optarg"
>;;
> @@ -859,6 +874,7 @@ Advanced options (experts only):
> at build time [$host_cc]
>--cxx=CXXuse C++ compiler CXX [$cxx]
>--objcc=OBJCCuse Objective-C compiler OBJCC [$objcc]
> +  --rustc=RUSTCuse Rust compiler RUSTC [$rustc]
>--extra-cflags=CFLAGSappend extra C compiler flags CFLAGS
>--extra-cxxflags=CXXFLAGS append extra C++ compiler flags CXXFLAGS
>--extra-objcflags=OBJCFLAGS append extra Objective C compiler flags 
> OBJCFLAGS
> @@ -869,8 +885,9 @@ Advanced options (experts only):
>--python=PYTHON  use specified python [$python]
>--ninja=NINJAuse specified ninja [$ninja]
>--static enable static build [$static]
> -  --without-default-features default all --enable-* options to "disabled"
> -  --without-default-devices  do not include any device that is not needed to
> +  --rust-target-triple=TRIPLE  target for Rust cross compilation
> +  --without-default-features   default all --enable-* options to "disabled"
> +  --without-default-devicesdo not include any device that is not needed 
> to
> start the emulator (only use if you are including
> desired devices in configs/devices/)
>--with-devices-ARCH=NAME override default configs/devices
> @@ -1139,6 +1156,21 @@ EOF
>  fi
>  
>  ##
> +# detect rust triple
> +
> +if test "$rust" != disabled && has "$rustc" && $rustc -vV > 
> "${TMPDIR1}/${TMPB}.out"; then
> +  rust_host_triple=$(sed -n 's/^host: //p' "${TMPDIR1}/${TMPB}.out")
> +else
> +  if test "$rust" = enabled; then
> +error_exit "could not execute rustc binary \"$rustc\""
> +  fi
> +  rust=disabled
> +fi
> +if test "$rust" != disabled && test -z "$rust_target_triple"; then
> +  rust_target_triple=$rust_host_triple
> +fi
> +
> +##
>  # functions to probe cross compilers
>  
>  container="no"
> @@ -1604,6 +1636,9 @@ if test "$container" != no; then
>  echo "RUNC=$runc" >> $config_host_mak
>  fi
>  echo "SUBDIRS=$subdirs" >> $config_host_mak
> +if test "$rust" != disabled; then
> +  echo "RUST_TARGET_TRIPLE=$rust_target_triple" >> $config_host_mak
> +fi
>  echo "PYTHON=$python" >> $config_host_mak
>  echo "MKVENV_ENSUREGROUP=$mkvenv ensuregroup $mkvenv_online_flag" >> 
> $config_host_mak
>  echo "GENISOIMAGE=$genisoimage" >> $config_host_mak
> @@ -1735,6 +1770,13 @@ if test "$skip_meson" = no; then
>echo "c = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
>test -n "$cxx" && echo "cpp = [$(meson_quote $cxx $CPU_CFLAGS)]" >> $cross
>test -n "$objcc" && echo "objc = [$(meson_quote $objcc $CPU_CFLAGS)]" >> 
> $cross
> +  if test "$rust" != disabled; then
> +if test "$rust_host_triple" != "$rust_target_triple"; then
> +  echo "rust = [$(meson_quote $rustc --target "$rust_target_triple")]" 
> >> $cross
> +else
> +  echo "rust = [$(meson_quote $rustc)]" >> $cross
> +

[PATCH 3/9] fifo8: add skip parameter to fifo8_peekpop_bufptr()

2024-08-28 Thread Mark Cave-Ayland

The skip parameter specifies the number of bytes to be skipped from the current
FIFO head before the peek or pop operation.

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index 5faa814a6e..62d6430b05 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -72,18 +72,20 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 }
 
 static const uint8_t *fifo8_peekpop_bufptr(Fifo8 *fifo, uint32_t max,
-   uint32_t *numptr, bool do_pop)
+   uint32_t skip, uint32_t *numptr,
+   bool do_pop)
 {
 uint8_t *ret;
 uint32_t num, head;
 
 assert(max > 0 && max <= fifo->num);
-head = fifo->head;
+assert(skip <= fifo->num);
+head = (fifo->head + skip) % fifo->capacity;
 num = MIN(fifo->capacity - head, max);
 ret = &fifo->data[head];
 
 if (do_pop) {
-fifo->head += num;
+fifo->head = head + num;
 fifo->head %= fifo->capacity;
 fifo->num -= num;
 }
@@ -95,12 +97,12 @@ static const uint8_t *fifo8_peekpop_bufptr(Fifo8 *fifo, 
uint32_t max,
 
 const uint8_t *fifo8_peek_bufptr(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
 {
-return fifo8_peekpop_bufptr(fifo, max, numptr, false);
+return fifo8_peekpop_bufptr(fifo, max, 0, numptr, false);
 }
 
 const uint8_t *fifo8_pop_bufptr(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
 {
-return fifo8_peekpop_bufptr(fifo, max, numptr, true);
+return fifo8_peekpop_bufptr(fifo, max, 0, numptr, true);
 }
 
 uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen)
-- 
2.39.2

[PATCH 5/9] fifo8: rename fifo8_pop_buf() to fifo8_peekpop_buf()

2024-08-28 Thread Mark Cave-Ayland

The fifo8_pop_buf() function will soon also be used for peek operations, so 
rename
the function accordingly. Create a new fifo8_pop_buf() wrapper function that can
be used by existing callers.

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index efe0117b1f..5453cbc1b0 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -105,7 +105,8 @@ const uint8_t *fifo8_pop_bufptr(Fifo8 *fifo, uint32_t max, 
uint32_t *numptr)
 return fifo8_peekpop_bufptr(fifo, max, 0, numptr, true);
 }
 
-uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen)
+static uint32_t fifo8_peekpop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen,
+  bool do_pop)
 {
 const uint8_t *buf;
 uint32_t n1, n2 = 0;
@@ -134,6 +135,11 @@ uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, 
uint32_t destlen)
 return n1 + n2;
 }
 
+uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen)
+{
+return fifo8_peekpop_buf(fifo, dest, destlen, true);
+}
+
 void fifo8_drop(Fifo8 *fifo, uint32_t len)
 {
 len -= fifo8_pop_buf(fifo, NULL, len);
-- 
2.39.2

[PATCH 1/9] fifo8: rename fifo8_peekpop_buf() to fifo8_peekpop_bufptr()

2024-08-28 Thread Mark Cave-Ayland

This is to emphasise that the function returns a pointer to the internal FIFO
buffer.

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index 1ffa19d900..61bce9d9a0 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -71,8 +71,8 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 return ret;
 }
 
-static const uint8_t *fifo8_peekpop_buf(Fifo8 *fifo, uint32_t max,
-uint32_t *numptr, bool do_pop)
+static const uint8_t *fifo8_peekpop_bufptr(Fifo8 *fifo, uint32_t max,
+   uint32_t *numptr, bool do_pop)
 {
 uint8_t *ret;
 uint32_t num;
@@ -94,12 +94,12 @@ static const uint8_t *fifo8_peekpop_buf(Fifo8 *fifo, 
uint32_t max,
 
 const uint8_t *fifo8_peek_bufptr(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
 {
-return fifo8_peekpop_buf(fifo, max, numptr, false);
+return fifo8_peekpop_bufptr(fifo, max, numptr, false);
 }
 
 const uint8_t *fifo8_pop_bufptr(Fifo8 *fifo, uint32_t max, uint32_t *numptr)
 {
-return fifo8_peekpop_buf(fifo, max, numptr, true);
+return fifo8_peekpop_bufptr(fifo, max, numptr, true);
 }
 
 uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen)
-- 
2.39.2

[PATCH 6/9] fifo8: honour do_pop argument in fifo8_peekpop_buf()

2024-08-28 Thread Mark Cave-Ayland

Pass the do_pop value from fifo8_peekpop_buf() to fifo8_peekpop_bufptr() to
allow peeks to the FIFO buffer, including adjusting the skip parameter to
handle the case where the internal FIFO buffer wraps around.

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index 5453cbc1b0..1031ffbe7e 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -117,7 +117,7 @@ static uint32_t fifo8_peekpop_buf(Fifo8 *fifo, uint8_t 
*dest, uint32_t destlen,
 }
 
 len = destlen;
-buf = fifo8_peekpop_bufptr(fifo, len, 0, &n1, true);
+buf = fifo8_peekpop_bufptr(fifo, len, 0, &n1, do_pop);
 if (dest) {
 memcpy(dest, buf, n1);
 }
@@ -126,7 +126,7 @@ static uint32_t fifo8_peekpop_buf(Fifo8 *fifo, uint8_t 
*dest, uint32_t destlen,
 len -= n1;
 len = MIN(len, fifo8_num_used(fifo));
 if (len) {
-buf = fifo8_peekpop_bufptr(fifo, len, 0, &n2, true);
+buf = fifo8_peekpop_bufptr(fifo, len, do_pop ? 0 : n1, &n2, do_pop);
 if (dest) {
 memcpy(&dest[n1], buf, n2);
 }
-- 
2.39.2

[PATCH 9/9] tests/unit: add test-fifo unit test

2024-08-28 Thread Mark Cave-Ayland

This tests the Fifo8 implementation for basic operations as well as testing for
the correct *_bufptr() including handling wraparound of the internal FIFO 
buffer.

Signed-off-by: Mark Cave-Ayland 
---
 tests/unit/meson.build |   1 +
 tests/unit/test-fifo.c | 256 +
 2 files changed, 257 insertions(+)
 create mode 100644 tests/unit/test-fifo.c

diff --git a/tests/unit/meson.build b/tests/unit/meson.build
index 490ab8182d..89f9633cd6 100644
--- a/tests/unit/meson.build
+++ b/tests/unit/meson.build
@@ -47,6 +47,7 @@ tests = {
   'test-logging': [],
   'test-qapi-util': [],
   'test-interval-tree': [],
+  'test-fifo': [],
 }
 
 if have_system or have_tools
diff --git a/tests/unit/test-fifo.c b/tests/unit/test-fifo.c
new file mode 100644
index 00..1e54cde871
--- /dev/null
+++ b/tests/unit/test-fifo.c
@@ -0,0 +1,256 @@
+/*
+ * Fifo8 tests
+ *
+ * Copyright 2024 Mark Cave-Ayland
+ *
+ * Authors:
+ *  Mark Cave-Ayland
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "migration/vmstate.h"
+#include "qemu/fifo8.h"
+
+const VMStateInfo vmstate_info_uint32;
+const VMStateInfo vmstate_info_buffer;
+
+
+static void test_fifo8_pop_bufptr_wrap(void)
+{
+Fifo8 fifo;
+uint8_t data_in1[] = { 0x1, 0x2, 0x3, 0x4 };
+uint8_t data_in2[] = { 0x5, 0x6, 0x7, 0x8, 0x1, 0x2 };
+const uint8_t *buf;
+uint32_t count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in1, sizeof(data_in1));
+buf = fifo8_pop_bufptr(&fifo, 2, &count);
+g_assert(count == 2);
+g_assert(buf[0] == 0x1 && buf[1] == 0x2);
+
+fifo8_push_all(&fifo, data_in2, sizeof(data_in2));
+buf = fifo8_pop_bufptr(&fifo, 8, &count);
+g_assert(count == 6);
+g_assert(buf[0] == 0x3 && buf[1] == 0x4 && buf[2] == 0x5 &&
+ buf[3] == 0x6 && buf[4] == 0x7 && buf[5] == 0x8);
+
+g_assert(fifo8_num_used(&fifo) == 2);
+fifo8_destroy(&fifo);
+}
+
+static void test_fifo8_pop_bufptr(void)
+{
+Fifo8 fifo;
+uint8_t data_in[] = { 0x1, 0x2, 0x3, 0x4 };
+const uint8_t *buf;
+uint32_t count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in, sizeof(data_in));
+buf = fifo8_pop_bufptr(&fifo, 2, &count);
+g_assert(count == 2);
+g_assert(buf[0] == 0x1 && buf[1] == 0x2);
+
+g_assert(fifo8_num_used(&fifo) == 2);
+fifo8_destroy(&fifo);
+}
+
+static void test_fifo8_peek_bufptr_wrap(void)
+{
+Fifo8 fifo;
+uint8_t data_in1[] = { 0x1, 0x2, 0x3, 0x4 };
+uint8_t data_in2[] = { 0x5, 0x6, 0x7, 0x8, 0x1, 0x2 };
+const uint8_t *buf;
+uint32_t count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in1, sizeof(data_in1));
+buf = fifo8_peek_bufptr(&fifo, 2, &count);
+g_assert(count == 2);
+g_assert(buf[0] == 0x1 && buf[1] == 0x2);
+
+buf = fifo8_pop_bufptr(&fifo, 2, &count);
+g_assert(count == 2);
+g_assert(buf[0] == 0x1 && buf[1] == 0x2);
+fifo8_push_all(&fifo, data_in2, sizeof(data_in2));
+
+buf = fifo8_peek_bufptr(&fifo, 8, &count);
+g_assert(count == 6);
+g_assert(buf[0] == 0x3 && buf[1] == 0x4 && buf[2] == 0x5 &&
+ buf[3] == 0x6 && buf[4] == 0x7 && buf[5] == 0x8);
+
+g_assert(fifo8_num_used(&fifo) == 8);
+fifo8_destroy(&fifo);
+}
+
+static void test_fifo8_peek_bufptr(void)
+{
+Fifo8 fifo;
+uint8_t data_in[] = { 0x1, 0x2, 0x3, 0x4 };
+const uint8_t *buf;
+uint32_t count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in, sizeof(data_in));
+buf = fifo8_peek_bufptr(&fifo, 2, &count);
+g_assert(count == 2);
+g_assert(buf[0] == 0x1 && buf[1] == 0x2);
+
+g_assert(fifo8_num_used(&fifo) == 4);
+fifo8_destroy(&fifo);
+}
+
+static void test_fifo8_pop_buf_wrap(void)
+{
+Fifo8 fifo;
+uint8_t data_in1[] = { 0x1, 0x2, 0x3, 0x4 };
+uint8_t data_in2[] = { 0x5, 0x6, 0x7, 0x8, 0x1, 0x2, 0x3, 0x4 };
+uint8_t data_out[4];
+int count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in1, sizeof(data_in1));
+fifo8_pop_buf(&fifo, NULL, 4);
+
+fifo8_push_all(&fifo, data_in2, sizeof(data_in2));
+count = fifo8_pop_buf(&fifo, NULL, 4);
+g_assert(count == 4);
+count = fifo8_pop_buf(&fifo, data_out, 4);
+g_assert(count == 4);
+g_assert(data_out[0] == 0x1 && data_out[1] == 0x2 &&
+ data_out[2] == 0x3 && data_out[3] == 0x4);
+
+g_assert(fifo8_num_used(&fifo) == 0);
+fifo8_destroy(&fifo);
+}
+
+static void test_fifo8_pop_buf(void)
+{
+Fifo8 fifo;
+uint8_t data_in[] = { 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8 };
+uint8_t data_out[] = { 0xff, 0xff, 0xff, 0xff };
+int count;
+
+fifo8_create(&fifo, 8);
+
+fifo8_push_all(&fifo, data_in, sizeof(data_in));
+count = fifo8_pop_buf(&fifo, NULL, 4);
+g_assert(count == 4);
+count = fifo8_pop_buf(&fifo, data_out, 4);
+g_ass

[PATCH 7/9] fifo8: add fifo8_peek_buf() function

2024-08-28 Thread Mark Cave-Ayland

This is a wrapper function around fifo8_peekpop_buf() that allows the caller to
peek into FIFO, including handling the case where there is a wraparound of the
internal FIFO buffer.

Signed-off-by: Mark Cave-Ayland 
---
 include/qemu/fifo8.h | 14 ++
 util/fifo8.c |  5 +
 2 files changed, 19 insertions(+)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index d1d06754d8..d09984b146 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -76,6 +76,20 @@ uint8_t fifo8_pop(Fifo8 *fifo);
  */
 uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen);
 
+/**
+ * fifo8_peek_buf:
+ * @fifo: FIFO to read from
+ * @dest: the buffer to write the data into (can be NULL)
+ * @destlen: size of @dest and maximum number of bytes to peek
+ *
+ * Peek a number of elements from the FIFO up to a maximum of @destlen.
+ * The peeked data is copied into the @dest buffer.
+ * Care is taken when the data wraps around in the ring buffer.
+ *
+ * Returns: number of bytes peeked.
+ */
+uint32_t fifo8_peek_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen);
+
 /**
  * fifo8_pop_bufptr:
  * @fifo: FIFO to pop from
diff --git a/util/fifo8.c b/util/fifo8.c
index 1031ffbe7e..a8f5cea158 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -140,6 +140,11 @@ uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, 
uint32_t destlen)
 return fifo8_peekpop_buf(fifo, dest, destlen, true);
 }
 
+uint32_t fifo8_peek_buf(Fifo8 *fifo, uint8_t *dest, uint32_t destlen)
+{
+return fifo8_peekpop_buf(fifo, dest, destlen, false);
+}
+
 void fifo8_drop(Fifo8 *fifo, uint32_t len)
 {
 len -= fifo8_pop_buf(fifo, NULL, len);
-- 
2.39.2

[PATCH 4/9] fifo8: replace fifo8_pop_bufptr() with fifo8_peekpop_bufptr() in fifo8_pop_buf()

2024-08-28 Thread Mark Cave-Ayland

The upcoming peek functionality will require passing a non-zero value to
fifo8_peekpop_bufptr().

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index 62d6430b05..efe0117b1f 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -116,7 +116,7 @@ uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t 
destlen)
 }
 
 len = destlen;
-buf = fifo8_pop_bufptr(fifo, len, &n1);
+buf = fifo8_peekpop_bufptr(fifo, len, 0, &n1, true);
 if (dest) {
 memcpy(dest, buf, n1);
 }
@@ -125,7 +125,7 @@ uint32_t fifo8_pop_buf(Fifo8 *fifo, uint8_t *dest, uint32_t 
destlen)
 len -= n1;
 len = MIN(len, fifo8_num_used(fifo));
 if (len) {
-buf = fifo8_pop_bufptr(fifo, len, &n2);
+buf = fifo8_peekpop_bufptr(fifo, len, 0, &n2, true);
 if (dest) {
 memcpy(&dest[n1], buf, n2);
 }
-- 
2.39.2

[PATCH 0/9] fifo8: add fifo8_peek(), fifo8_peek_buf() and tests

2024-08-28 Thread Mark Cave-Ayland

This is something I've had lying around for a little while as a follow on from
Phil's recent work on Fifo8 with a few updates, but also adding the missing
fifo8_peek() and fifo8_peek_buf() functions along with some relevant tests.

The reason for sending this now is that there are couple of recent series
(https://patchew.org/QEMU/20240819113148.3007047-1-alistair.fran...@wdc.com/ and
https://patchew.org/QEMU/20240817102606.3996242-1-ta...@google.com/) which can
benefit from these changes: in particular the fifo8_peek_buf() function, unlike 
the
existing fifo8_peek_bufptr() function, will correctly handle FIFO wraparound. 
This
occurs when the FIFO head drifts due to not popping the entire FIFO content in 
one
go, which often happens when trying to send FIFO data to a chardev.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (9):
  fifo8: rename fifo8_peekpop_buf() to fifo8_peekpop_bufptr()
  fifo8: introduce head variable for fifo8_peekpop_bufptr()
  fifo8: add skip parameter to fifo8_peekpop_bufptr()
  fifo8: replace fifo8_pop_bufptr() with fifo8_peekpop_bufptr() in
fifo8_pop_buf()
  fifo8: rename fifo8_pop_buf() to fifo8_peekpop_buf()
  fifo8: honour do_pop argument in fifo8_peekpop_buf()
  fifo8: add fifo8_peek_buf() function
  fifo8: introduce fifo8_peek() function
  tests/unit: add test-fifo unit test

 include/qemu/fifo8.h   |  25 
 tests/unit/meson.build |   1 +
 tests/unit/test-fifo.c | 256 +
 util/fifo8.c   |  42 +--
 4 files changed, 313 insertions(+), 11 deletions(-)
 create mode 100644 tests/unit/test-fifo.c

-- 
2.39.2

[PATCH 2/9] fifo8: introduce head variable for fifo8_peekpop_bufptr()

2024-08-28 Thread Mark Cave-Ayland

Rather than operate on fifo->head directly, introduce a new head variable which 
is
set to the value of fifo->head and use it instead. This is to allow future
adjustment of the head position within the internal FIFO buffer.

Signed-off-by: Mark Cave-Ayland 
---
 util/fifo8.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/util/fifo8.c b/util/fifo8.c
index 61bce9d9a0..5faa814a6e 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -75,11 +75,12 @@ static const uint8_t *fifo8_peekpop_bufptr(Fifo8 *fifo, 
uint32_t max,
uint32_t *numptr, bool do_pop)
 {
 uint8_t *ret;
-uint32_t num;
+uint32_t num, head;
 
 assert(max > 0 && max <= fifo->num);
-num = MIN(fifo->capacity - fifo->head, max);
-ret = &fifo->data[fifo->head];
+head = fifo->head;
+num = MIN(fifo->capacity - head, max);
+ret = &fifo->data[head];
 
 if (do_pop) {
 fifo->head += num;
-- 
2.39.2

[PATCH 8/9] fifo8: introduce fifo8_peek() function

2024-08-28 Thread Mark Cave-Ayland

This allows uses to peek the byte at the current head of the FIFO.

Signed-off-by: Mark Cave-Ayland 
---
 include/qemu/fifo8.h | 11 +++
 util/fifo8.c |  6 ++
 2 files changed, 17 insertions(+)

diff --git a/include/qemu/fifo8.h b/include/qemu/fifo8.h
index d09984b146..4f768d4ee3 100644
--- a/include/qemu/fifo8.h
+++ b/include/qemu/fifo8.h
@@ -62,6 +62,17 @@ void fifo8_push_all(Fifo8 *fifo, const uint8_t *data, 
uint32_t num);
  */
 uint8_t fifo8_pop(Fifo8 *fifo);
 
+/**
+ * fifo8_peek:
+ * @fifo: fifo to peek from
+ *
+ * Peek the data byte at the current head of the FIFO. Clients are responsible
+ * for checking for emptyness using fifo8_is_empty().
+ *
+ * Returns: The peeked data byte.
+ */
+uint8_t fifo8_peek(Fifo8 *fifo);
+
 /**
  * fifo8_pop_buf:
  * @fifo: FIFO to pop from
diff --git a/util/fifo8.c b/util/fifo8.c
index a8f5cea158..a26da66ad2 100644
--- a/util/fifo8.c
+++ b/util/fifo8.c
@@ -71,6 +71,12 @@ uint8_t fifo8_pop(Fifo8 *fifo)
 return ret;
 }
 
+uint8_t fifo8_peek(Fifo8 *fifo)
+{
+assert(fifo->num > 0);
+return fifo->data[fifo->head];
+}
+
 static const uint8_t *fifo8_peekpop_bufptr(Fifo8 *fifo, uint32_t max,
uint32_t skip, uint32_t *numptr,
bool do_pop)
-- 
2.39.2

Re: [PATCH RESEND v9 3/9] configure, meson: detect Rust toolchain

2024-08-28 Thread Daniel P . Berrangé

On Wed, Aug 28, 2024 at 07:11:44AM +0300, Manos Pitsidianakis wrote:
> From: Paolo Bonzini 
> 
> Include the correct path and arguments to rustc in the native
> and cross files (native compilation is needed for procedural
> macros).
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  configure   | 50 --
>  meson.build |  8 +++-
>  2 files changed, 51 insertions(+), 7 deletions(-)
> 
> diff --git a/configure b/configure
> index 
> 019fcbd0ef7b07e7b0280b358099cae72c73aa98..9ef6005c557fc627c7c6c732b4c92ed1b934f474
>  100755
> --- a/configure
> +++ b/configure
> @@ -207,6 +207,8 @@ for opt do
>;;
>--objcc=*) objcc="$optarg"
>;;
> +  --rustc=*) RUSTC="$optarg"
> +  ;;
>--cpu=*) cpu="$optarg"
>;;
>--extra-cflags=*)
> @@ -252,6 +254,9 @@ python=
>  download="enabled"
>  skip_meson=no
>  use_containers="yes"
> +# do not enable by default because cross compilation requires 
> --rust-target-triple
> +rust="disabled"
> +rust_target_triple=""
>  gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
>  gdb_arches=""
>  
> @@ -317,6 +322,8 @@ windmc="${WINDMC-${cross_prefix}windmc}"
>  pkg_config="${PKG_CONFIG-${cross_prefix}pkg-config}"
>  sdl2_config="${SDL2_CONFIG-${cross_prefix}sdl2-config}"
>  
> +rustc="${RUSTC-rustc}"
> +
>  check_define() {
>  cat > $TMPC <  #if !defined($1)
> @@ -636,6 +643,8 @@ for opt do
>;;
>--objcc=*)
>;;
> +  --rustc=*)
> +  ;;
>--make=*)
>;;
>--install=*)
> @@ -755,8 +764,14 @@ for opt do
>;;
>--container-engine=*) container_engine="$optarg"
>;;
> +  --rust-target-triple=*) rust_target_triple="$optarg"
> +  ;;
>--gdb=*) gdb_bin="$optarg"
>;;
> +  --enable-rust) rust=enabled
> +  ;;
> +  --disable-rust) rust=disabled
> +  ;;
># everything else has the same name in configure and meson
>--*) meson_option_parse "$opt" "$optarg"
>;;
> @@ -859,6 +874,7 @@ Advanced options (experts only):
> at build time [$host_cc]
>--cxx=CXXuse C++ compiler CXX [$cxx]
>--objcc=OBJCCuse Objective-C compiler OBJCC [$objcc]
> +  --rustc=RUSTCuse Rust compiler RUSTC [$rustc]
>--extra-cflags=CFLAGSappend extra C compiler flags CFLAGS
>--extra-cxxflags=CXXFLAGS append extra C++ compiler flags CXXFLAGS
>--extra-objcflags=OBJCFLAGS append extra Objective C compiler flags 
> OBJCFLAGS
> @@ -869,8 +885,9 @@ Advanced options (experts only):
>--python=PYTHON  use specified python [$python]
>--ninja=NINJAuse specified ninja [$ninja]
>--static enable static build [$static]
> -  --without-default-features default all --enable-* options to "disabled"
> -  --without-default-devices  do not include any device that is not needed to
> +  --rust-target-triple=TRIPLE  target for Rust cross compilation
> +  --without-default-features   default all --enable-* options to "disabled"
> +  --without-default-devicesdo not include any device that is not needed 
> to
> start the emulator (only use if you are including
> desired devices in configs/devices/)
>--with-devices-ARCH=NAME override default configs/devices
> @@ -1139,6 +1156,21 @@ EOF
>  fi
>  
>  ##
> +# detect rust triple
> +
> +if test "$rust" != disabled && has "$rustc" && $rustc -vV > 
> "${TMPDIR1}/${TMPB}.out"; then
> +  rust_host_triple=$(sed -n 's/^host: //p' "${TMPDIR1}/${TMPB}.out")
> +else
> +  if test "$rust" = enabled; then
> +error_exit "could not execute rustc binary \"$rustc\""
> +  fi
> +  rust=disabled
> +fi
> +if test "$rust" != disabled && test -z "$rust_target_triple"; then
> +  rust_target_triple=$rust_host_triple
> +fi

Defaulting to the $rust_host_triple is incorrect when QEMU has been
told to build for a non-host target.

Either we need todo the right thing and auto-set rust target based
on QEMU target (preferred), or we need to make it a fatal error
when rust target is omitted & QEMU is building a non-host target.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2] aspeed: Deprecate the tacoma-bmc machine

2024-08-28 Thread Cédric Le Goater


On 8/26/24 20:50, Guenter Roeck wrote:

Hi,

On 8/26/24 02:58, Cédric Le Goater wrote:

Hello Guenter,

On 8/9/24 00:05, Guenter Roeck wrote:

Hi,

On Tue, Jun 25, 2024 at 09:08:30AM +0200, Cédric Le Goater wrote:

The tacoma-bmc machine was a board including an AST2600 SoC based BMC
and a witherspoon like OpenPOWER system. It was used for bring up of
the AST2600 SoC in labs. It can be easily replaced by the rainier-bmc
machine which is part of a real product offering.

Signed-off-by: Cédric Le Goater 


I have been using tacoma-bmc to test tcg,tpm-tis-i2c functionality
on arm. rainier-bmc doesn't support that, and other IBM BMCs which
do support it (bonnell, everest, system1) are not supported in qemu.

Do you have a suggested alternative ?


Could you use the ast2600-evb machine instead ? as done in
machine_aspeed.py, see routine test_arm_ast2600_evb_buildroot_tpm.



Unfortunately, that does not work for me because that requires instantiating
the tpm chip from the CLI by writing into the new_device sysfs attribute,
and I can not do that in my test environment.


Ah. too bad.


We can't add a "tpm-tis-i2c" device to the tacoma-bmc machine init
routine because a TPM backend is required.



Not sure I understand; tacoma-bmc instantiates the TPM chip through its
devicetree file which is what I was looking for.


I meant at the "HW" board level in QEMU.

We can not instantiate the TPM I2C chip device model in the tacoma-bmc
machine init routine and attach it to the I2C bus because of the required
TPM backend. This means that the device is necessarily defined on the QEMU
command line and this makes the ast2600-evb and tacoma-bmc machine very
similar in terms of HW definitions.
 

I solved the problem by adding support for IBM Bonnell (which instantiates
the TPM chip through its devicetree file, similar to tacoma-bmc) to my local
copy of qemu. 


Hmm, did you copy the rainier-bmc machine definition ?


It isn't perfect since I don't know the correct HW pin strapping
and reused the strapping from Rainier, but it works for me.


Keeping the tacoma-bmc machine is fine if there is a use for it. Testing
the TPM I2C device driver is certainly a good use but we should reflect
that in QEMU also (so that we don't forget). Could we change the test in
machine_aspeed.py to use the tacoma-bmc machine instead ? and revert the
deprecation patch of course.

Thanks,

C.

[PATCH v2 1/7] vfio/igd: return an invalid generation for unknown devices

2024-08-28 Thread Corvin Köhne

Intel changes it's specification quite often e.g. the location and size
of the BDSM register has change for gen 11 devices and later. This
causes our emulation to fail on those devices. So, it's impossible for
us to use a suitable default value for unknown devices. Instead of
returning a random generation value and hoping that everthing works
fine, we should verify that different devices are working and add them
to our list of known devices.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index d320d032a7..650a323dda 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -90,7 +90,11 @@ static int igd_gen(VFIOPCIDevice *vdev)
 return 8;
 }
 
-return 8; /* Assume newer is compatible */
+/*
+ * Unfortunately, Intel changes it's specification quite often. This makes
+ * it impossible to use a suitable default value for unknown devices.
+ */
+return -1;
 }
 
 typedef struct VFIOIGDQuirk {
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v2 4/7] vfio/igd: add new bar0 quirk to emulate BDSM mirror

2024-08-28 Thread Corvin Köhne

The BDSM register is mirrored into MMIO space at least for gen 11 and
later devices. Unfortunately, the Windows driver reads the register
value from MMIO space instead of PCI config space for those devices [1].
Therefore, we either have to keep a 1:1 mapping for the host and guest
address or we have to emulate the MMIO register too. Using the igd in
legacy mode is already hard due to it's many constraints. Keeping a 1:1
mapping may not work in all cases and makes it even harder to use. An
MMIO emulation has to trap the whole MMIO page. This makes accesses to
this page slower compared to using second level address translation.
Nevertheless, it doesn't have any constraints and I haven't noticed any
performance degradation yet making it a better solution.

[1] 
https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653

Signed-off-by: Corvin Köhne 
---
v2:
* omit unnecessary leXX_to_cpu calls
* make use of IGD_BDSM_MMIO_OFFSET define

 hw/vfio/igd.c| 98 
 hw/vfio/pci-quirks.c |  1 +
 hw/vfio/pci.h|  1 +
 3 files changed, 100 insertions(+)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0b6533bbf7..0d68c6a451 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -374,6 +374,104 @@ static const MemoryRegionOps vfio_igd_index_quirk = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+#define IGD_BDSM_MMIO_OFFSET 0x1080C0
+
+static uint64_t vfio_igd_quirk_bdsm_read(void *opaque,
+  hwaddr addr, unsigned size)
+{
+VFIOPCIDevice *vdev = opaque;
+uint64_t offset;
+
+offset = IGD_BDSM_GEN11 + addr;
+
+switch (size) {
+case 1:
+return pci_get_byte(vdev->pdev.config + offset);
+case 2:
+return pci_get_word(vdev->pdev.config + offset);
+case 4:
+return pci_get_long(vdev->pdev.config + offset);
+case 8:
+return pci_get_quad(vdev->pdev.config + offset);
+default:
+hw_error("igd: unsupported read size, %u bytes", size);
+break;
+}
+
+return 0;
+}
+
+static void vfio_igd_quirk_bdsm_write(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size)
+{
+VFIOPCIDevice *vdev = opaque;
+uint64_t offset;
+
+offset = IGD_BDSM_GEN11 + addr;
+
+switch (size) {
+case 1:
+pci_set_byte(vdev->pdev.config + offset, data);
+break;
+case 2:
+pci_set_word(vdev->pdev.config + offset, data);
+break;
+case 4:
+pci_set_long(vdev->pdev.config + offset, data);
+break;
+case 8:
+pci_set_quad(vdev->pdev.config + offset, data);
+break;
+default:
+hw_error("igd: unsupported read size, %u bytes", size);
+break;
+}
+}
+
+static const MemoryRegionOps vfio_igd_bdsm_quirk = {
+.read = vfio_igd_quirk_bdsm_read,
+.write = vfio_igd_quirk_bdsm_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
+{
+VFIOQuirk *quirk;
+int gen;
+
+/*
+ * This must be an Intel VGA device at address 00:02.0 for us to even
+ * consider enabling legacy mode. Some driver have dependencies on the PCI
+ * bus address.
+ */
+if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
+!vfio_is_vga(vdev) || nr != 0 ||
+&vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
+   0, PCI_DEVFN(0x2, 0))) {
+return;
+}
+
+/*
+ * Only on IGD devices of gen 11 and above, the BDSM register is mirrored
+ * into MMIO space and read from MMIO space by the Windows driver.
+ */
+gen = igd_gen(vdev);
+if (gen < 11) {
+return;
+}
+
+quirk = vfio_quirk_alloc(1);
+quirk->data = vdev;
+
+memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_bdsm_quirk,
+  vdev, "vfio-igd-bdsm-quirk", 8);
+memory_region_add_subregion_overlap(vdev->bars[0].region.mem,
+IGD_BDSM_MMIO_OFFSET, &quirk->mem[0],
+1);
+
+QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
+}
+
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 {
 g_autofree struct vfio_region_info *rom = NULL;
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 39dae72497..d37f722cce 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1259,6 +1259,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
 vfio_probe_nvidia_bar0_quirk(vdev, nr);
 vfio_probe_rtl8168_bar2_quirk(vdev, nr);
 #ifdef CONFIG_VFIO_IGD
+vfio_probe_igd_bar0_quirk(vdev, nr);
 vfio_probe_igd_bar4_quirk(vdev, nr);
 #endif
 }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index bf67df2fbc..5ad090a229 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -215,6 +215,7 @@ void

[PATCH v2 2/7] vfio/igd: support legacy mode for all known generations

2024-08-28 Thread Corvin Köhne

We're soon going to add support for legacy mode to ElkhartLake and
TigerLake devices. Those are gen 11 and 12 devices. At the moment, all
devices identified by our igd_gen function do support legacy mode. This
won't change when adding our new devices of gen 11 and 12. Therefore, it
makes more sense to accept legacy mode for all known devices instead of
maintaining a long list of known good generations. If we add a new
generation to igd_gen which doesn't support legacy mode for some reason,
it'll be easy to advance the check to reject legacy mode for this
specific generation.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 650a323dda..d5e57656a8 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -416,7 +416,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
  * devices maintain compatibility with generation 8.
  */
 gen = igd_gen(vdev);
-if (gen != 6 && gen != 8) {
+if (gen == -1) {
 error_report("IGD device %s is unsupported in legacy mode, "
  "try SandyBridge or newer", vdev->vbasedev.name);
 return;
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v2 0/7] vfio/igd: add passthrough support for IGDs of gen 11 and later

2024-08-28 Thread Corvin Köhne

Hi,

Qemu has experimental support for GPU passthrough of Intels integrated graphic
devices. Unfortunately, Intel has changed some bits for their gen 11 devices
and later. To support these devices, we have to account for those changes. This
patch series adds the missing bits on the Qemu side.

I've tested the patch series on an ElkhartLake and TigerLake device. On the
guest side, I've tested an EFI environment (GOP driver), a Linux guest and a
Windows VM. The driver of all guests are able to use the GPU and produce an
output on the connected display.

Corvin Köhne (7):
  vfio/igd: return an invalid generation for unknown devices
  vfio/igd: support legacy mode for all known generations
  vfio/igd: use new BDSM register location and size for gen 11 and later
  vfio/igd: add new bar0 quirk to emulate BDSM mirror
  vfio/igd: add ID's for ElkhartLake and TigerLake
  vfio/igd: don't set stolen memory size to zero
  vfio/igd: correctly calculate stolen memory size for gen 9 and later

 hw/vfio/igd.c| 185 +--
 hw/vfio/pci-quirks.c |   1 +
 hw/vfio/pci.h|   1 +
 3 files changed, 161 insertions(+), 26 deletions(-)

-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v2 3/7] vfio/igd: use new BDSM register location and size for gen 11 and later

2024-08-28 Thread Corvin Köhne

Intel changed the location and size of the BDSM register for gen 11
devices and later. We have to adjust our emulation for these devices to
properly support them.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index d5e57656a8..0b6533bbf7 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -100,11 +100,12 @@ static int igd_gen(VFIOPCIDevice *vdev)
 typedef struct VFIOIGDQuirk {
 struct VFIOPCIDevice *vdev;
 uint32_t index;
-uint32_t bdsm;
+uint64_t bdsm;
 } VFIOIGDQuirk;
 
 #define IGD_GMCH 0x50 /* Graphics Control Register */
 #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
+#define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later 
*/
 
 
 /*
@@ -313,9 +314,13 @@ static void vfio_igd_quirk_data_write(void *opaque, hwaddr 
addr,
  */
 if ((igd->index % 4 == 1) && igd->index < vfio_igd_gtt_max(vdev)) {
 if (gen < 8 || (igd->index % 8 == 1)) {
-uint32_t base;
+uint64_t base;
 
-base = pci_get_long(vdev->pdev.config + IGD_BDSM);
+if (gen < 11) {
+base = pci_get_long(vdev->pdev.config + IGD_BDSM);
+} else {
+base = pci_get_quad(vdev->pdev.config + IGD_BDSM_GEN11);
+}
 if (!base) {
 hw_error("vfio-igd: Guest attempted to program IGD GTT before "
  "BIOS reserved stolen memory.  Unsupported BIOS?");
@@ -519,7 +524,13 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 igd = quirk->data = g_malloc0(sizeof(*igd));
 igd->vdev = vdev;
 igd->index = ~0;
-igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM, 4);
+if (gen < 11) {
+igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM, 4);
+} else {
+igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11, 4);
+igd->bdsm |=
+(uint64_t)vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11 + 4, 4) 
<< 32;
+}
 igd->bdsm &= ~((1 * MiB) - 1); /* 1MB aligned */
 
 memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_index_quirk,
@@ -577,9 +588,15 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 pci_set_long(vdev->emulated_config_bits + IGD_GMCH, ~0);
 
 /* BDSM is read-write, emulated.  The BIOS needs to be able to write it */
-pci_set_long(vdev->pdev.config + IGD_BDSM, 0);
-pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0);
-pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0);
+if (gen < 11) {
+pci_set_long(vdev->pdev.config + IGD_BDSM, 0);
+pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0);
+pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0);
+} else {
+pci_set_quad(vdev->pdev.config + IGD_BDSM_GEN11, 0);
+pci_set_quad(vdev->pdev.wmask + IGD_BDSM_GEN11, ~0);
+pci_set_quad(vdev->emulated_config_bits + IGD_BDSM_GEN11, ~0);
+}
 
 /*
  * This IOBAR gives us access to GTTADR, which allows us to write to
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v2 6/7] vfio/igd: don't set stolen memory size to zero

2024-08-28 Thread Corvin Köhne

The stolen memory is required for the GOP (EFI) driver and the Windows
driver. While the GOP driver seems to work with any stolen memory size,
the Windows driver will crash if the size doesn't match the size
allocated by the host BIOS. For that reason, it doesn't make sense to
overwrite the stolen memory size. It's true that this wastes some VM
memory. In the worst case, the stolen memory can take up more than a GB.
However, that's uncommon. Additionally, it's likely that a bunch of RAM
is assigned to VMs making use of GPU passthrough.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 8a41b16421..0751c43eae 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -478,6 +478,23 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 }
 
+static int igd_get_stolen_mb(int gen, uint32_t gmch)
+{
+int gms;
+
+if (gen < 8) {
+gms = (gmch >> 3) & 0x1f;
+} else {
+gms = (gmch >> 8) & 0xff;
+}
+
+if (gms > 0x10) {
+error_report("Unsupported IGD GMS value 0x%x", gms);
+return 0;
+}
+return gms * 32;
+}
+
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 {
 g_autofree struct vfio_region_info *rom = NULL;
@@ -655,23 +672,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 ggms_mb = 1 << ggms_mb;
 }
 
-/*
- * Assume we have no GMS memory, but allow it to be overridden by device
- * option (experimental).  The spec doesn't actually allow zero GMS when
- * when IVD (IGD VGA Disable) is clear, but the claim is that it's unused,
- * so let's not waste VM memory for it.
- */
-gmch &= ~((gen < 8 ? 0x1f : 0xff) << (gen < 8 ? 3 : 8));
-
-if (vdev->igd_gms) {
-if (vdev->igd_gms <= 0x10) {
-gms_mb = vdev->igd_gms * 32;
-gmch |= vdev->igd_gms << (gen < 8 ? 3 : 8);
-} else {
-error_report("Unsupported IGD GMS value 0x%x", vdev->igd_gms);
-vdev->igd_gms = 0;
-}
-}
+gms_mb = igd_get_stolen_mb(gen, gmch);
 
 /*
  * Request reserved memory for stolen memory via fw_cfg.  VM firmware
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

Re: [PATCH v5 1/2] kvm: replace fprintf with error_report()/printf() in kvm_init()

2024-08-28 Thread Ani Sinha




> On 28 Aug 2024, at 4:53 PM, Markus Armbruster  wrote:
> 
> Ani Sinha  writes:
> 
>> error_report() is more appropriate for error situations. Replace fprintf with
>> error_report() and error_printf() as appropriate. Cosmetic. No functional
>> change.
> 
> Uh, I missed this last time around: the change is more than just
> cosmetics!  The error messages change, e.g. from
> 
>$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
>qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: 
> Permission denied
>qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission 
> denied
> 
> to
> 
>$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
>Could not access KVM kernel module: Permission denied
>qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission 
> denied

You got this backwards. This is what I have:

Before:
$ ./qemu-system-x86_64 --accel kvm
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: --accel kvm: failed to initialize kvm: No such file or 
directory

Now:
$ ./qemu-system-x86_64 --accel kvm
qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: No such 
file or directory
qemu-system-x86_64: --accel kvm: failed to initialize kvm: No such file or 
directory


> 
> Note: the second message is from kvm_init()'s caller.  Reporting the
> same error twice is wrong, but not this patch's problem.
> 
> Moreover, the patch tweaks an error message at [*].
> 
> Suggest something like
> 
>  Replace fprintf() with error_report() and error_printf() where
>  appropriate.  Error messages improve, e.g. from
> 
>  Could not access KVM kernel module: Permission denied
> 
>  to
> 
>  qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: 
> Permission denied

Yes this seems correct.

> 
>> CC: qemu-triv...@nongnu.org
>> CC: zhao1@intel.com
>> CC: arm...@redhat.com
>> Reviewed-by: Zhao Liu 
>> Signed-off-by: Ani Sinha 
>> ---
>> accel/kvm/kvm-all.c | 40 ++--
>> 1 file changed, 18 insertions(+), 22 deletions(-)
>> 
>> changelog:
>> v2: fix a bug.
>> v3: replace one instance of error_report() with error_printf(). added tags.
>> v4: changes suggested by Markus.
>> v5: more changes from Markus's comments on v4.
>> 
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index 75d11a07b2..fcc157f0e6 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -2427,7 +2427,7 @@ static int kvm_init(MachineState *ms)
>> QLIST_INIT(&s->kvm_parked_vcpus);
>> s->fd = qemu_open_old(s->device ?: "/dev/kvm", O_RDWR);
>> if (s->fd == -1) {
>> -fprintf(stderr, "Could not access KVM kernel module: %m\n");
>> +error_report("Could not access KVM kernel module: %m");
>> ret = -errno;
>> goto err;
>> }
>> @@ -2437,13 +2437,13 @@ static int kvm_init(MachineState *ms)
>> if (ret >= 0) {
>> ret = -EINVAL;
>> }
>> -fprintf(stderr, "kvm version too old\n");
>> +error_report("kvm version too old");
>> goto err;
>> }
>> 
>> if (ret > KVM_API_VERSION) {
>> ret = -EINVAL;
>> -fprintf(stderr, "kvm version not supported\n");
>> +error_report("kvm version not supported");
>> goto err;
>> }
>> 
>> @@ -2488,26 +2488,22 @@ static int kvm_init(MachineState *ms)
>> } while (ret == -EINTR);
>> 
>> if (ret < 0) {
>> -fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
>> -strerror(-ret));
>> +error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));
> 
> [*] This is where you change an error message.
> 
>> 
>> #ifdef TARGET_S390X
>> if (ret == -EINVAL) {
>> -fprintf(stderr,
>> -"Host kernel setup problem detected. Please verify:\n");
>> -fprintf(stderr, "- for kernels supporting the switch_amode or"
>> -" user_mode parameters, whether\n");
>> -fprintf(stderr,
>> -"  user space is running in primary address space\n");
>> -fprintf(stderr,
>> -"- for kernels supporting the vm.allocate_pgste sysctl, 
>> "
>> -"whether it is enabled\n");
>> +error_printf("Host kernel setup problem detected."
>> + " Please verify:\n");
>> +error_printf("- for kernels supporting the"
>> +" switch_amode or user_mode parameters, whether");
>> +error_printf(" user space is running in primary address 
>> space\n");
>> +error_printf("- for kernels supporting the vm.allocate_pgste"
>> + " sysctl, whether it is enabled\n");
>> }
>> #elif defined(TARGET_PPC)
>> if (ret == -EINVAL) {
>> -fprintf(stderr,
>> -"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
>> -

[PATCH v2 7/7] vfio/igd: correctly calculate stolen memory size for gen 9 and later

2024-08-28 Thread Corvin Köhne

We have to update the calculation of the stolen memory size because
we've seen devices using values of 0xf0 and above for the graphics mode
select field. The new calculation was taken from the linux kernel [1].

[1] 
https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0751c43eae..a95d441f68 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -488,11 +488,18 @@ static int igd_get_stolen_mb(int gen, uint32_t gmch)
 gms = (gmch >> 8) & 0xff;
 }
 
-if (gms > 0x10) {
-error_report("Unsupported IGD GMS value 0x%x", gms);
-return 0;
+if (gen < 9) {
+if (gms > 0x10) {
+error_report("Unsupported IGD GMS value 0x%x", gms);
+return 0;
+}
+return gms * 32;
+} else {
+if (gms < 0xf0)
+return gms * 32;
+else
+return gms * 4 + 4;
 }
-return gms * 32;
 }
 
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v2 5/7] vfio/igd: add ID's for ElkhartLake and TigerLake

2024-08-28 Thread Corvin Köhne

ElkhartLake and TigerLake devices were tested in legacy mode with Linux
and Windows VMs. Both are working properly. It's likely that other Intel
GPUs of gen 11 and 12 like IceLake device are working too. However,
we're only adding known good devices for now.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0d68c6a451..8a41b16421 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -88,6 +88,12 @@ static int igd_gen(VFIOPCIDevice *vdev)
 case 0x2200:
 case 0x5900:
 return 8;
+/* ElkhartLake */
+case 0x4500:
+return 11;
+/* TigerLake */
+case 0x9A00:
+return 12;
 }
 
 /*
-- 
2.46.0

This email contains confidential information. If you have received it in error, 
you must not read, use, copy or pass on this e-mail or its attachments. If you 
have received the e-mail in error, please inform me immediately by reply e-mail 
and then delete this e-mail from your system. Thank you
 
Diese E-Mail enthält vertrauliche Informationen. Sollten Sie sie irrtümlich 
erhalten haben, dürfen Sie diese E-Mail oder ihre Anhänge nicht lesen, 
verwenden, kopieren oder weitergeben. Sollten Sie die Mail versehentlich 
erhalten haben, teilen Sie mir dies bitte umgehend per Antwort-E-Mail mit und 
löschen Sie diese E-Mail dann aus Ihrem System. Vielen Dank

Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

[PATCH v6] kvm: replace fprintf with error_report()/printf() in kvm_init()

2024-08-28 Thread Ani Sinha

error_report() is more appropriate for error situations. Replace fprintf with
error_report() and error_printf() as appropriate. Some improvement in error
reporting also happens as a part of this change. For example:

From:
$ ./qemu-system-x86_64 --accel kvm
Could not access KVM kernel module: No such file or directory

To:
$ ./qemu-system-x86_64 --accel kvm
qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: No such 
file or directory

CC: qemu-triv...@nongnu.org
CC: zhao1@intel.com
CC: arm...@redhat.com
Reviewed-by: Zhao Liu 
Reviewed-by: Markus Armbruster 
Signed-off-by: Ani Sinha 
---
 accel/kvm/kvm-all.c | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

changelog:
v2: fix a bug.
v3: replace one instance of error_report() with error_printf(). added tags.
v4: changes suggested by Markus.
v5: more changes from Markus's comments on v4.
v6: commit message update as per suggestion from Markus. Tag added.

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 75d11a07b2..fcc157f0e6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2427,7 +2427,7 @@ static int kvm_init(MachineState *ms)
 QLIST_INIT(&s->kvm_parked_vcpus);
 s->fd = qemu_open_old(s->device ?: "/dev/kvm", O_RDWR);
 if (s->fd == -1) {
-fprintf(stderr, "Could not access KVM kernel module: %m\n");
+error_report("Could not access KVM kernel module: %m");
 ret = -errno;
 goto err;
 }
@@ -2437,13 +2437,13 @@ static int kvm_init(MachineState *ms)
 if (ret >= 0) {
 ret = -EINVAL;
 }
-fprintf(stderr, "kvm version too old\n");
+error_report("kvm version too old");
 goto err;
 }
 
 if (ret > KVM_API_VERSION) {
 ret = -EINVAL;
-fprintf(stderr, "kvm version not supported\n");
+error_report("kvm version not supported");
 goto err;
 }
 
@@ -2488,26 +2488,22 @@ static int kvm_init(MachineState *ms)
 } while (ret == -EINTR);
 
 if (ret < 0) {
-fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
-strerror(-ret));
+error_report("ioctl(KVM_CREATE_VM) failed: %s", strerror(-ret));
 
 #ifdef TARGET_S390X
 if (ret == -EINVAL) {
-fprintf(stderr,
-"Host kernel setup problem detected. Please verify:\n");
-fprintf(stderr, "- for kernels supporting the switch_amode or"
-" user_mode parameters, whether\n");
-fprintf(stderr,
-"  user space is running in primary address space\n");
-fprintf(stderr,
-"- for kernels supporting the vm.allocate_pgste sysctl, "
-"whether it is enabled\n");
+error_printf("Host kernel setup problem detected."
+ " Please verify:\n");
+error_printf("- for kernels supporting the"
+" switch_amode or user_mode parameters, whether");
+error_printf(" user space is running in primary address space\n");
+error_printf("- for kernels supporting the vm.allocate_pgste"
+ " sysctl, whether it is enabled\n");
 }
 #elif defined(TARGET_PPC)
 if (ret == -EINVAL) {
-fprintf(stderr,
-"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
-(type == 2) ? "pr" : "hv");
+error_printf("PPC KVM module is not loaded. Try modprobe 
kvm_%s.\n",
+ (type == 2) ? "pr" : "hv");
 }
 #endif
 goto err;
@@ -2526,9 +2522,9 @@ static int kvm_init(MachineState *ms)
 nc->name, nc->num, soft_vcpus_limit);
 
 if (nc->num > hard_vcpus_limit) {
-fprintf(stderr, "Number of %s cpus requested (%d) exceeds "
-"the maximum cpus supported by KVM (%d)\n",
-nc->name, nc->num, hard_vcpus_limit);
+error_report("Number of %s cpus requested (%d) exceeds "
+ "the maximum cpus supported by KVM (%d)",
+ nc->name, nc->num, hard_vcpus_limit);
 exit(1);
 }
 }
@@ -2542,8 +2538,8 @@ static int kvm_init(MachineState *ms)
 }
 if (missing_cap) {
 ret = -EINVAL;
-fprintf(stderr, "kvm does not support %s\n%s",
-missing_cap->name, upgrade_note);
+error_report("kvm does not support %s", missing_cap->name);
+error_printf("%s", upgrade_note);
 goto err;
 }
 
-- 
2.42.0

Re: [PULL 3/6] qemu/osdep: Split qemu_close_all_open_fd() and add fallback

2024-08-28 Thread Daniel P . Berrangé

This is already merged, but I have two comments - one improvement
and one bug which we should probably fix before release.

On Mon, Aug 05, 2024 at 10:31:26AM +1000, Richard Henderson wrote:
> From: Clément Léger 
> 
> In order to make it cleaner, split qemu_close_all_open_fd() logic into
> multiple subfunctions (close with close_range(), with /proc/self/fd and
> fallback).
> 
> Signed-off-by: Clément Léger 
> Reviewed-by: Richard Henderson 
> Message-ID: <20240802145423.3232974-3-cle...@rivosinc.com>
> Signed-off-by: Richard Henderson 
> ---
>  util/oslib-posix.c | 50 ++
>  1 file changed, 37 insertions(+), 13 deletions(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 1e867efa47..9b79fc7cff 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -808,27 +808,16 @@ int qemu_msync(void *addr, size_t length, int fd)
>  return msync(addr, length, MS_SYNC);
>  }
>  
> -/*
> - * Close all open file descriptors.
> - */
> -void qemu_close_all_open_fd(void)
> +static bool qemu_close_all_open_fd_proc(void)
>  {
>  struct dirent *de;
>  int fd, dfd;
>  DIR *dir;
>  
> -#ifdef CONFIG_CLOSE_RANGE
> -int r = close_range(0, ~0U, 0);
> -if (!r) {
> -/* Success, no need to try other ways. */
> -return;
> -}
> -#endif
> -
>  dir = opendir("/proc/self/fd");

IIUC from previous threads this is valid on Linux and on Solaris.

On FreeBSD & macOS, you need /dev/fd though.

>  if (!dir) {
>  /* If /proc is not mounted, there is nothing that can be done. */
> -return;
> +return false;
>  }
>  /* Avoid closing the directory. */
>  dfd = dirfd(dir);
> @@ -840,4 +829,39 @@ void qemu_close_all_open_fd(void)
>  }
>  }
>  closedir(dir);
> +
> +return true;
> +}
> +
> +static bool qemu_close_all_open_fd_close_range(void)
> +{
> +#ifdef CONFIG_CLOSE_RANGE
> +int r = close_range(0, ~0U, 0);
> +if (!r) {
> +/* Success, no need to try other ways. */
> +return true;
> +}
> +#endif
> +return false;
> +}
> +
> +static void qemu_close_all_open_fd_fallback(void)
> +{
> +int open_max = sysconf(_SC_OPEN_MAX), i;
> +
> +/* Fallback */
> +for (i = 0; i < open_max; i++) {
> +close(i);
> +}

I'm told that sysconf(_SC_OPEN_MAX) returns -1 on some versions of
macOS. "Luckily" since we assigned to 'int' rather than 'unsigned int'
this will result in us not closing any FDs in this fallback path,
rather than trying to close several billion FDs (an effective hang).

If _SC_OPEN_MAX returns -1, we should fallback to the OPEN_MAX
constant on macOS (see commit de448e0f26e710e9d2b7fc91393c40ac24b75847
which tackled a similar issue wrt getrlimit), and fallback to perhaps
a hardcoded 1024 on non-macOS.


> +}
> +
> +/*
> + * Close all open file descriptors.
> + */
> +void qemu_close_all_open_fd(void)
> +{
> +if (!qemu_close_all_open_fd_close_range() &&
> +!qemu_close_all_open_fd_proc()) {
> +qemu_close_all_open_fd_fallback();
> +}
>  }
> -- 
> 2.43.0
> 
> 

With regards,
Daniel

[1] https://github.com/open-mpi/ompi/issues/10358
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v6 19/19] migration/multifd: Add documentation for multifd methods

2024-08-28 Thread Fabiano Rosas

Peter Xu  writes:

> On Tue, Aug 27, 2024 at 05:22:32PM -0300, Fabiano Rosas wrote:
>> Peter Xu  writes:
>> 
>> > On Tue, Aug 27, 2024 at 04:17:59PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu  writes:
>> >> 
>> >> > On Tue, Aug 27, 2024 at 03:54:51PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu  writes:
>> >> >> 
>> >> >> > On Tue, Aug 27, 2024 at 02:46:06PM -0300, Fabiano Rosas wrote:
>> >> >> >> Add documentation clarifying the usage of the multifd methods. The
>> >> >> >> general idea is that the client code calls into multifd to trigger
>> >> >> >> send/recv of data and multifd then calls these hooks back from the
>> >> >> >> worker threads at opportune moments so the client can process a
>> >> >> >> portion of the data.
>> >> >> >> 
>> >> >> >> Suggested-by: Peter Xu 
>> >> >> >> Signed-off-by: Fabiano Rosas 
>> >> >> >> ---
>> >> >> >> Note that the doc is not symmetrical among send/recv because the 
>> >> >> >> recv
>> >> >> >> side is still wonky. It doesn't give the packet to the hooks, which
>> >> >> >> forces the p->normal, p->zero, etc. to be processed at the top level
>> >> >> >> of the threads, where no client-specific information should be.
>> >> >> >> ---
>> >> >> >>  migration/multifd.h | 76 
>> >> >> >> +
>> >> >> >>  1 file changed, 70 insertions(+), 6 deletions(-)
>> >> >> >> 
>> >> >> >> diff --git a/migration/multifd.h b/migration/multifd.h
>> >> >> >> index 13e7a88c01..ebb17bdbcf 100644
>> >> >> >> --- a/migration/multifd.h
>> >> >> >> +++ b/migration/multifd.h
>> >> >> >> @@ -229,17 +229,81 @@ typedef struct {
>> >> >> >>  } MultiFDRecvParams;
>> >> >> >>  
>> >> >> >>  typedef struct {
>> >> >> >> -/* Setup for sending side */
>> >> >> >> +/*
>> >> >> >> + * The send_setup, send_cleanup, send_prepare are only called 
>> >> >> >> on
>> >> >> >> + * the QEMU instance at the migration source.
>> >> >> >> + */
>> >> >> >> +
>> >> >> >> +/*
>> >> >> >> + * Setup for sending side. Called once per channel during 
>> >> >> >> channel
>> >> >> >> + * setup phase.
>> >> >> >> + *
>> >> >> >> + * Must allocate p->iov. If packets are in use (default), one
>> >> >> >
>> >> >> > Pure thoughts: wonder whether we can assert(p->iov) that after the 
>> >> >> > hook
>> >> >> > returns in code to match this line.
>> >> >> 
>> >> >> Not worth the extra instructions in my opinion. It would crash
>> >> >> immediately once the thread touches p->iov anyway.
>> >> >
>> >> > It might still be good IMHO to have that assert(), not only to abort
>> >> > earlier, but also as a code-styled comment.  Your call when resend.
>> >> >
>> >> > PS: feel free to queue existing patches into your own tree without
>> >> > resending the whole series!
>> >> >
>> >> >> 
>> >> >> >
>> >> >> >> + * extra iovec must be allocated for the packet header. Any 
>> >> >> >> memory
>> >> >> >> + * allocated in this hook must be released at send_cleanup.
>> >> >> >> + *
>> >> >> >> + * p->write_flags may be used for passing flags to the 
>> >> >> >> QIOChannel.
>> >> >> >> + *
>> >> >> >> + * p->compression_data may be used by compression methods to 
>> >> >> >> store
>> >> >> >> + * compression data.
>> >> >> >> + */
>> >> >> >>  int (*send_setup)(MultiFDSendParams *p, Error **errp);
>> >> >> >> -/* Cleanup for sending side */
>> >> >> >> +
>> >> >> >> +/*
>> >> >> >> + * Cleanup for sending side. Called once per channel during
>> >> >> >> + * channel cleanup phase. May be empty.
>> >> >> >
>> >> >> > Hmm, if we require p->iov allocation per-ops, then they must free it 
>> >> >> > here?
>> >> >> > I wonder whether we leaked it in most compressors.
>> >> >> 
>> >> >> Sorry, this one shouldn't have that text.
>> >> >
>> >> > I still want to double check with you: we leaked iov[] in most 
>> >> > compressors
>> >> > here, or did I overlook something?
>> >> 
>> >> They have their own send_cleanup function where p->iov is freed.
>> >
>> > Oh, so I guess I just accidentally stumbled upon
>> > multifd_uadk_send_cleanup() when looking..
>> 
>> Yeah, this is a bit worrying. The reason this has not shown on valgrind
>> or the asan that Peter ran recently is that uadk, qpl and soon qat are
>> never enabled in a regular build. I have myself introduced compilation
>> errors in those files that I only caught by accident at a later point
>> (before sending to the ml).
>
> I tried to manually install qpl and uadk just now but neither of them is
> trivial to compile and install..  I hit random errors here and there in my
> first shot.
>
> OTOH, qatzip packages are around at least in Fedora repositories, so that
> might be the easiest to reach.  Not sure how's that when with OpenSUSE.
>
> Shall we perhaps draft an email and check with them? E.g., would that be
> better if there's plan they would at some point provide RPMs for libraries
> at some point so that we could somehow integrate that into CI routines?

We merged

Re: [PATCH 4/7] vfio/igd: add new bar0 quirk to emulate BDSM mirror

2024-08-28 Thread Cédric Le Goater


On 8/28/24 14:50, Corvin Köhne wrote:

On Wed, 2024-08-28 at 12:40 +0200, Corvin Köhne wrote:

On Mon, 2024-08-26 at 10:35 -0600, Alex Williamson wrote:


PS - please drop the confidential email warning signature when
posting
to public lists.



Sry for the noise. I can't drop it, so I'm going to use another mail
address to post my patches.





Argh, forgot updating my send-email config when resending the patch
series. Should I resend it again?


Please do because the result is not compatible with the tools we use
to extract patches (b4).

Thanks,

C.

Re: [PATCH RESEND v9 7/9] rust: add crate to expose bindings and interfaces

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> Add rust/qemu-api, which exposes rust-bindgen generated FFI bindings and
> provides some declaration macros for symbols visible to the rest of
> QEMU.

As mentioned on IRC I'm hitting a compilation error that bisects to this
commit:

  [148/1010] Generating bindings for Rust rustmod-bindgen-rust_wrapper.h
  FAILED: bindings.rs
  /home/alex/.cargo/bin/bindgen ../../rust/wrapper.h --output 
/home/alex/lsrc/qemu.git/builds/rust/bindings.rs --disable-header-comment 
--raw-line '// @generated' --ctypes-prefix core::ffi --formatter rustfmt 
--generate-block --generate-cstr --impl-debug --merge-extern-blocks 
--no-doc-comments --use-core --with-derive-default --allowlist-file 
'/home/alex/lsrc/qemu.git/include/.*' --allowlist-file 
'/home/alex/lsrc/qemu.git/.*' --allowlist-file 
'/home/alex/lsrc/qemu.git/builds/rust/.*' -- -I/home/alex/lsrc/qemu.git/. 
-I/home/alex/lsrc/qemu.git/builds/rust/. -I/home/alex/lsrc/qemu.git/include 
-I/home/alex/lsrc/qemu.git/builds/rust/include -I/usr/include/capstone 
-I/usr/include/p11-kit-1 -I/usr/include/pixman-1 -I/usr/include/libpng16 
-I/usr/include/spice-server -I/usr/include/spice-1 -I/usr/include/spice-1 
-DSTRUCT_IOVEC_DEFINED -I/usr/include/libusb-1.0 -I/usr/include/SDL2 
-D_REENTRANT -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -pthread -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/gio-unix-2.0 -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -pthread -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/slirp -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -DNCURSES_WIDECHAR=1 
-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/gtk-3.0 
-I/usr/include/pango-1.0 -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/harfbuzz 
-I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/cairo 
-I/usr/include/pixman-1 -I/usr/include/gdk-pixbuf-2.0 
-I/usr/include/x86_64-linux-gnu -I/usr/include/gio-unix-2.0 
-I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 
-I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -pthread 
-I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/harfbuzz 
-I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/fribidi -I/usr/include/cairo 
-I/usr/include/pixman-1 -I/usr/include/gdk-pixbuf-2.0 
-I/usr/include/x86_64-linux-gnu -I/usr/include/gio-unix-2.0 
-I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 
-I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -pthread 
-I/usr/include/vte-2.91 -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/pango-1.0 -I/usr/include/harfbuzz 
-I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/fribidi 
-I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/gtk-3.0 
-I/usr/include/gdk-pixbuf-2.0 -I/usr/include/x86_64-linux-gnu 
-I/usr/include/gio-unix-2.0 -I/usr/include/atk-1.0 
-I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 
-I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -pthread 
-I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include 
-I/usr/include/spice-server -I/usr/include/spice-1 -I/usr/include/cacard 
-I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include 
-I/usr/include/nss -I/usr/include/nspr -I/usr/include/PCSC -pthread 
-D_REENTRANT -I/usr/include/pipewire-0.3 -I/usr/include/spa-0.2 -D_REENTRANT 
-I/usr/include/p11-kit-1 -I/usr/include/fuse3 -I/usr/include/x86_64-linux-gnu 
-D_FILE_OFFSET_BITS=64 -D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64 
-DUSE_POSIX_ACLS=1 -I/usr/include/uuid -I/usr/include/glib-2.0 
-I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/p11-kit-1 
-I/usr/include/p11-kit-1 -I/usr/include/p11-kit-1 -I/usr/include/p11-kit-1 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -std=gnu11 
-MD -MQ ../../rust/wrapper.h -MF wrapper.h.d
  /usr/include/liburing.h:296:3: error: use of undeclared identifier 
'memory_order_release'
  /usr/include/liburing.h:1080:11: error: use of undeclared identifier 
'memory_order_acquire'
  /usr/include/liburing.h:1116:9: error: use of undeclared identifier 
'memory_order_acquire'
  /usr/include/liburing.h:1125:9: error: use of undeclared identifier 
'memory_order_relaxed'
  /usr/include/liburing.h:1161:2: error: use of undeclared identifier 
'memory_order_relaxed'
  /usr/include/liburing.h:1197:19: error: use of undeclared identifier 
'memory_order_acquire'
  /usr/include/liburing.h:1267:10: error: use of undeclared

Re: [PATCH v4 6/7] memory: Do not create circular reference with subregion

2024-08-28 Thread Peter Xu

On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
> On 2024/08/28 1:11, Peter Xu wrote:
> > On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/27 4:42, Peter Xu wrote:
> > > > On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> > > > > On Mon, 26 Aug 2024 at 16:22, Peter Xu  wrote:
> > > > > > 
> > > > > > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > > > > > memory_region_update_container_subregions() used to call
> > > > > > > memory_region_ref(), which creates a reference to the owner of the
> > > > > > > subregion, on behalf of the owner of the container. This results 
> > > > > > > in a
> > > > > > > circular reference if the subregion and container have the same 
> > > > > > > owner.
> > > > > > > 
> > > > > > > memory_region_ref() creates a reference to the owner instead of 
> > > > > > > the
> > > > > > > memory region to match the lifetime of the owner and memory 
> > > > > > > region. We
> > > > > > > do not need such a hack if the subregion and container have the 
> > > > > > > same
> > > > > > > owner because the owner will be alive as long as the container is.
> > > > > > > Therefore, create a reference to the subregion itself instead ot 
> > > > > > > its
> > > > > > > owner in such a case; the reference to the subregion is still 
> > > > > > > necessary
> > > > > > > to ensure that the subregion gets finalized after the container.
> > > > > > > 
> > > > > > > Signed-off-by: Akihiko Odaki 
> > > > > > > ---
> > > > > > >system/memory.c | 8 ++--
> > > > > > >1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/system/memory.c b/system/memory.c
> > > > > > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > > > > > --- a/system/memory.c
> > > > > > > +++ b/system/memory.c
> > > > > > > @@ -2612,7 +2612,9 @@ static void 
> > > > > > > memory_region_update_container_subregions(MemoryRegion *subregion)
> > > > > > > 
> > > > > > >memory_region_transaction_begin();
> > > > > > > 
> > > > > > > -memory_region_ref(subregion);
> > > > > > > +object_ref(mr->owner == subregion->owner ?
> > > > > > > +   OBJECT(subregion) : subregion->owner);
> > > > > > 
> > > > > > The only place that mr->refcount is used so far is the owner with 
> > > > > > the
> > > > > > object property attached to the mr, am I right (ignoring name-less 
> > > > > > MRs)?
> > > > > > 
> > > > > > I worry this will further complicate refcounting, now we're 
> > > > > > actively using
> > > > > > two refcounts for MRs..
> > > 
> > > The actor of object_ref() is the owner of the memory region also in this
> > > case. We are calling object_ref() on behalf of mr->owner so we use
> > > mr->refcount iff mr->owner == subregion->owner. In this sense there is 
> > > only
> > > one user of mr->refcount even after this change.
> > 
> > Yes it's still one user, but it's not that straightforward to see, also
> > it's still an extension to how we use mr->refcount right now.  Currently
> > it's about "true / false" just to describe, now it's a real counter.
> > 
> > I wished that counter doesn't even exist if we'd like to stick with device
> > / owner's counter.  Adding this can definitely also make further effort
> > harder if we want to remove mr->refcount.
> 
> I don't think it will make removing mr->refcount harder. With this change,
> mr->refcount will count the parent and container. If we remove mr->refcount,
> we need to trigger object_finalize() in a way other than checking
> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
> && mr->container.
> 
> > 
> > > 
> > > > > > 
> > > > > > Continue discussion there:
> > > > > > 
> > > > > > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c1...@daynix.com
> > > > > > 
> > > > > > What I don't see is how mr->subregions differs from mr->container, 
> > > > > > so we
> > > > > > allow subregions to be attached but not the container when 
> > > > > > finalize()
> > > > > > (which is, afaict, the other way round).
> > > > > > 
> > > > > > It seems easier to me that we allow both container and subregions 
> > > > > > to exist
> > > > > > as long as within the owner itself, rather than start heavier use of
> > > > > > mr->refcount.
> > > > > 
> > > > > I don't think just "same owner" necessarily will be workable --
> > > > > you can have a setup like:
> > > > > * device A has a container C_A
> > > > > * device A has a child-device B
> > > > > * device B has a memory region R_B
> > > > > * device A's realize method puts R_B into C_A
> > > > > 
> > > > > R_B's owner is B, and the container's owner is A,
> > > > > but we still want to be able to get rid of A (in the process
> > > > > getting rid of B because it gets unparented and unreffed,
> > > > > and R_B and C_A also).
> > > > 
> > > > For cross-device references, should we rely on an explicit call to
> > > > memory_region_del_subregion(), so as to detac

Re: [PULL 3/6] qemu/osdep: Split qemu_close_all_open_fd() and add fallback

2024-08-28 Thread Clément Léger




On 28/08/2024 14:48, Daniel P. Berrangé wrote:
> This is already merged, but I have two comments - one improvement
> and one bug which we should probably fix before release.
> 
> On Mon, Aug 05, 2024 at 10:31:26AM +1000, Richard Henderson wrote:
>> From: Clément Léger 
>>
>> In order to make it cleaner, split qemu_close_all_open_fd() logic into
>> multiple subfunctions (close with close_range(), with /proc/self/fd and
>> fallback).
>>
>> Signed-off-by: Clément Léger 
>> Reviewed-by: Richard Henderson 
>> Message-ID: <20240802145423.3232974-3-cle...@rivosinc.com>
>> Signed-off-by: Richard Henderson 
>> ---
>>  util/oslib-posix.c | 50 ++
>>  1 file changed, 37 insertions(+), 13 deletions(-)
>>
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index 1e867efa47..9b79fc7cff 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -808,27 +808,16 @@ int qemu_msync(void *addr, size_t length, int fd)
>>  return msync(addr, length, MS_SYNC);
>>  }
>>  
>> -/*
>> - * Close all open file descriptors.
>> - */
>> -void qemu_close_all_open_fd(void)
>> +static bool qemu_close_all_open_fd_proc(void)
>>  {
>>  struct dirent *de;
>>  int fd, dfd;
>>  DIR *dir;
>>  
>> -#ifdef CONFIG_CLOSE_RANGE
>> -int r = close_range(0, ~0U, 0);
>> -if (!r) {
>> -/* Success, no need to try other ways. */
>> -return;
>> -}
>> -#endif
>> -
>>  dir = opendir("/proc/self/fd");
> 
> IIUC from previous threads this is valid on Linux and on Solaris.
> 
> On FreeBSD & macOS, you need /dev/fd though.

Acked.

> 
>>  if (!dir) {
>>  /* If /proc is not mounted, there is nothing that can be done. */
>> -return;
>> +return false;
>>  }
>>  /* Avoid closing the directory. */
>>  dfd = dirfd(dir);
>> @@ -840,4 +829,39 @@ void qemu_close_all_open_fd(void)
>>  }
>>  }
>>  closedir(dir);
>> +
>> +return true;
>> +}
>> +
>> +static bool qemu_close_all_open_fd_close_range(void)
>> +{
>> +#ifdef CONFIG_CLOSE_RANGE
>> +int r = close_range(0, ~0U, 0);
>> +if (!r) {
>> +/* Success, no need to try other ways. */
>> +return true;
>> +}
>> +#endif
>> +return false;
>> +}
>> +
>> +static void qemu_close_all_open_fd_fallback(void)
>> +{
>> +int open_max = sysconf(_SC_OPEN_MAX), i;
>> +
>> +/* Fallback */
>> +for (i = 0; i < open_max; i++) {
>> +close(i);
>> +}
> 
> I'm told that sysconf(_SC_OPEN_MAX) returns -1 on some versions of
> macOS. "Luckily" since we assigned to 'int' rather than 'unsigned int'
> this will result in us not closing any FDs in this fallback path,
> rather than trying to close several billion FDs (an effective hang).
> 
> If _SC_OPEN_MAX returns -1, we should fallback to the OPEN_MAX
> constant on macOS (see commit de448e0f26e710e9d2b7fc91393c40ac24b75847
> which tackled a similar issue wrt getrlimit), and fallback to perhaps
> a hardcoded 1024 on non-macOS.

Thanks for catching this, I can submit these fixes except if you already
prepared something though.

Clément

> 
> 
>> +}
>> +
>> +/*
>> + * Close all open file descriptors.
>> + */
>> +void qemu_close_all_open_fd(void)
>> +{
>> +if (!qemu_close_all_open_fd_close_range() &&
>> +!qemu_close_all_open_fd_proc()) {
>> +qemu_close_all_open_fd_fallback();
>> +}
>>  }
>> -- 
>> 2.43.0
>>
>>
> 
> With regards,
> Daniel
> 
> [1] https://github.com/open-mpi/ompi/issues/10358

Re: [PATCH v6 19/19] migration/multifd: Add documentation for multifd methods

2024-08-28 Thread Peter Xu

On Wed, Aug 28, 2024 at 10:04:47AM -0300, Fabiano Rosas wrote:
> We merged most of these things already. Now even if rpms show up at some
> point we still have to deal with not being able to build that code until
> then. Perhaps we could have a container that has all of these
> pre-installed just to exercize the code a bit. But it still wouldn't
> catch some issues becase we cannot run the code due to the lack of
> hardware.

Yes, ultimately we may need help from the relevant people..

One last fallback plan is we can consult them for help at least to make
sure it's working at the end of each release, so it might be helpful they
help verify the code at soft-freeze for each release.  Then we can keep the
development as usual ignoring them during dev cycles.

If we find some feature broken (e.g. fail to compile..) for more than
multiple releases, it may mean that upstream has nobody using it, then we
suggest obsoletions.

-- 
Peter Xu

Re: [PATCH RESEND v9 6/9] meson.build: add HAVE_GLIB_WITH_ALIGNED_ALLOC flag

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> Rust crates, introduced from the next commit onwards, use the glib
> allocator API and need to know whether g_aligned_alloc etc are
> available.
>
> This commit adds a define in config_host_data that depends on glib
> version >= 2.72.
>
> Signed-off-by: Manos Pitsidianakis 
> ---
>  meson.build | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/meson.build b/meson.build
> index 
> 05446acbc684a7521ecc9b3f80b98e2cec1a60cf..7f05466d128776ad8dbf403179734e6831b023c0
>  100644
> --- a/meson.build
> +++ b/meson.build
> @@ -979,6 +979,9 @@ glib = declare_dependency(dependencies: [glib_pc, 
> gmodule],
>  # TODO: remove this check and the corresponding workaround (qtree) when
>  # the minimum supported glib is >= 2.75.3
>  glib_has_gslice = glib.version().version_compare('<2.75.3')
> +# Check whether glib has the aligned_alloc family of functions.
> +# 
> +glib_has_aligned_alloc = glib.version().version_compare('>=2.72.0')

Minor suggestion: you could update the comment for the main glib probe:

  # When bumping glib minimum version, please check also whether to increase
  # the _WIN32_WINNT setting in osdep.h according to the value from
  # glib. You should also check if any of the glib.version() checks
  # bellow can also be removed.

Anyway:

Reviewed-by: Alex Bennée 


>  
>  # override glib dep to include the above refinements
>  meson.override_dependency('glib-2.0', glib)
> @@ -2508,6 +2511,7 @@ config_host_data.set('CONFIG_TIMERFD', 
> cc.has_function('timerfd_create'))
>  config_host_data.set('HAVE_COPY_FILE_RANGE', 
> cc.has_function('copy_file_range'))
>  config_host_data.set('HAVE_GETIFADDRS', cc.has_function('getifaddrs'))
>  config_host_data.set('HAVE_GLIB_WITH_SLICE_ALLOCATOR', glib_has_gslice)
> +config_host_data.set('HAVE_GLIB_WITH_ALIGNED_ALLOC', glib_has_aligned_alloc)
>  config_host_data.set('HAVE_OPENPTY', cc.has_function('openpty', 
> dependencies: util))
>  config_host_data.set('HAVE_STRCHRNUL', cc.has_function('strchrnul'))
>  config_host_data.set('HAVE_SYSTEM_FUNCTION', cc.has_function('system', 
> prefix: '#include '))

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v5 1/2] kvm: replace fprintf with error_report()/printf() in kvm_init()

2024-08-28 Thread Markus Armbruster

Ani Sinha  writes:

>> On 28 Aug 2024, at 4:53 PM, Markus Armbruster  wrote:
>> 
>> Ani Sinha  writes:
>> 
>>> error_report() is more appropriate for error situations. Replace fprintf 
>>> with
>>> error_report() and error_printf() as appropriate. Cosmetic. No functional
>>> change.
>> 
>> Uh, I missed this last time around: the change is more than just
>> cosmetics!  The error messages change, e.g. from
>> 
>>$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
>>qemu-system-x86_64: --accel kvm: Could not access KVM kernel module: 
>> Permission denied
>>qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission 
>> denied
>> 
>> to
>> 
>>$ qemu-system-x86_64 -nodefaults -S -display none --accel kvm
>>Could not access KVM kernel module: Permission denied
>>qemu-system-x86_64: --accel kvm: failed to initialize kvm: Permission 
>> denied
>
> You got this backwards. This is what I have:

I do!  Sorry %-}

[...]

Re: [PATCH RESEND v9 0/9] Add Rust build support, ARM PL011 device impl

2024-08-28 Thread Alex Bennée

Manos Pitsidianakis  writes:

> Hello everyone,
>
> This series adds:
>
> - build system support for the Rust compiler
> - a small Rust library, qemu-api, which includes bindings to QEMU's C
>   interface generated with bindgen, and qemu-api-macros, a procedural
>   macro library.
> - a proof of concept ARM PL011 device implementation in Rust, chosen for
>   its low complexity. The device is used in the arm virt machine if qemu
>   is compiled with rust enabled (./configure --enable-rust [...])

OK I've finished my pass through after running aground with the bindgen
problem. I shall have another go on a Trixie machine once
I've caught up with my other reviews.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 4/7] vfio/igd: add new bar0 quirk to emulate BDSM mirror

2024-08-28 Thread Corvin Köhne

On Wed, 2024-08-28 at 12:40 +0200, Corvin Köhne wrote:
> On Mon, 2024-08-26 at 10:35 -0600, Alex Williamson wrote:
> > 
> > PS - please drop the confidential email warning signature when
> > posting
> > to public lists.
> > 
> 
> Sry for the noise. I can't drop it, so I'm going to use another mail
> address to post my patches.
> 
> > 

Argh, forgot updating my send-email config when resending the patch
series. Should I resend it again?


-- 
Kind regards,
Corvin

Re: [PATCH 4/7] vfio/igd: add new bar0 quirk to emulate BDSM mirror

2024-08-28 Thread Corvin Köhne

On Mon, 2024-08-26 at 10:35 -0600, Alex Williamson wrote:
> CAUTION: External Email!!
> On Thu, 22 Aug 2024 13:08:29 +0200
> Corvin Köhne  wrote:
> 
> > The BDSM register is mirrored into MMIO space at least for gen 11
> > and
> > later devices. Unfortunately, the Windows driver reads the register
> > value from MMIO space instead of PCI config space for those devices
> > [1].
> > Therefore, we either have to keep a 1:1 mapping for the host and
> > guest
> > address or we have to emulate the MMIO register too. Using the igd
> > in
> > legacy mode is already hard due to it's many constraints. Keeping a
> > 1:1
> > mapping may not work in all cases and makes it even harder to use.
> > An
> > MMIO emulation has to trap the whole MMIO page. This makes accesses
> > to
> > this page slower compared to using second level address
> > translation.
> > Nevertheless, it doesn't have any constraints and I haven't noticed
> > any
> > performance degradation yet making it a better solution.
> > 
> > [1]
> > https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653
> > 
> > Signed-off-by: Corvin Köhne 
> > ---
> >  hw/vfio/igd.c    | 97
> > 
> >  hw/vfio/pci-quirks.c |  1 +
> >  hw/vfio/pci.h    |  1 +
> >  3 files changed, 99 insertions(+)
> > 
> > diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> > index 0b6533bbf7..863b58565e 100644
> > --- a/hw/vfio/igd.c
> > +++ b/hw/vfio/igd.c
> > @@ -374,6 +374,103 @@ static const MemoryRegionOps
> > vfio_igd_index_quirk = {
> >  .endianness = DEVICE_LITTLE_ENDIAN,
> >  };
> >  
> > +#define IGD_BDSM_MMIO_OFFSET 0x1080C0
> > +
> > +static uint64_t vfio_igd_quirk_bdsm_read(void *opaque,
> > +  hwaddr addr, unsigned
> > size)
> > +{
> > +    VFIOPCIDevice *vdev = opaque;
> > +    uint64_t offset;
> > +
> > +    offset = IGD_BDSM_GEN11 + addr;
> > +
> > +    switch (size) {
> > +    case 1:
> > +    return pci_get_byte(vdev->pdev.config + offset);
> > +    case 2:
> > +    return le16_to_cpu(pci_get_word(vdev->pdev.config +
> > offset));
> > +    case 4:
> > +    return le32_to_cpu(pci_get_long(vdev->pdev.config +
> > offset));
> > +    case 8:
> > +    return le64_to_cpu(pci_get_quad(vdev->pdev.config +
> > offset));
> > +    default:
> > +    hw_error("igd: unsupported read size, %u bytes", size);
> > +    break;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static void vfio_igd_quirk_bdsm_write(void *opaque, hwaddr addr,
> > +   uint64_t data, unsigned
> > size)
> > +{
> > +    VFIOPCIDevice *vdev = opaque;
> > +    uint64_t offset;
> > +
> > +    offset = IGD_BDSM_GEN11 + addr;
> > +
> > +    switch (size) {
> > +    case 1:
> > +    pci_set_byte(vdev->pdev.config + offset, data);
> > +    break;
> > +    case 2:
> > +    pci_set_word(vdev->pdev.config + offset, data);
> > +    break;
> > +    case 4:
> > +    pci_set_long(vdev->pdev.config + offset, data);
> > +    break;
> > +    case 8:
> > +    pci_set_quad(vdev->pdev.config + offset, data);
> > +    break;
> > +    default:
> > +    hw_error("igd: unsupported read size, %u bytes", size);
> > +    break;
> > +    }
> > +}
> 
> If we have the leXX_to_cpu() in the read path, don't we need
> cpu_to_leXX() in the write path?  Maybe we should in fact just get
> rid
> of all of them since we're quirking a device that's specific to a
> little endian architecture and we're defining the memory region as
> little endian, but minimally we should be consistent.
> 

Will drop leXX_to_cpu in the read path.

> > +
> > +static const MemoryRegionOps vfio_igd_bdsm_quirk = {
> > +    .read = vfio_igd_quirk_bdsm_read,
> > +    .write = vfio_igd_quirk_bdsm_write,
> > +    .endianness = DEVICE_LITTLE_ENDIAN,
> > +};
> > +
> > +void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
> > +{
> > +    VFIOQuirk *quirk;
> > +    int gen;
> > +
> > +    /*
> > + * This must be an Intel VGA device at address 00:02.0 for us
> > to even
> > + * consider enabling legacy mode. Some driver have
> > dependencies on the PCI
> > + * bus address.
> > + */
> > +    if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> > +    !vfio_is_vga(vdev) || nr != 0 ||
> > +    &vdev->pdev != pci_find_device(pci_device_root_bus(&vdev-
> > >pdev),
> > +   0, PCI_DEVFN(0x2, 0))) {
> > +    return;
> > +    }
> > +
> > +    /*
> > + * Only on IGD devices of gen 11 and above, the BDSM register
> > is mirrored
> > + * into MMIO space and read from MMIO space by the Windows
> > driver.
> > + */
> > +    gen = igd_gen(vdev);
> > +    if (gen < 11) {
> > +    return;
> > +    }
> > +
> > +    quirk = vfio_quirk_alloc(1);
> > +    quirk->data = vdev;
> > +
> > +    memory_region_init_io(&quirk->mem[0], OBJECT(vdev),
> > &v

Re: [PATCH v2] block/reqlist: allow adding overlapping requests

2024-08-28 Thread Vladimir Sementsov-Ogievskiy


On 11.08.24 20:55, Michael Tokarev wrote:

12.07.2024 17:07, Fiona Ebner wrote:

Allow overlapping request by removing the assert that made it
impossible. There are only two callers:

1. block_copy_task_create()

It already asserts the very same condition before calling
reqlist_init_req().

2. cbw_snapshot_read_lock()

There is no need to have read requests be non-overlapping in
copy-before-write when used for snapshot-access. In fact, there was no
protection against two callers of cbw_snapshot_read_lock() calling
reqlist_init_req() with overlapping ranges and this could lead to an
assertion failure [1].

In particular, with the reproducer script below [0], two
cbw_co_snapshot_block_status() callers could race, with the second
calling reqlist_init_req() before the first one finishes and removes
its conflicting request.

[0]:


#!/bin/bash -e
dd if=/dev/urandom of=/tmp/disk.raw bs=1M count=1024
./qemu-img create /tmp/fleecing.raw -f raw 1G
(
./qemu-system-x86_64 --qmp stdio \
--blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw \
--blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.raw 
\
<

[1]:


#5  0x71e5f0088eb2 in __GI___assert_fail (...) at ./assert/assert.c:101
#6  0x615285438017 in reqlist_init_req (...) at ../block/reqlist.c:23
#7  0x6152853e2d98 in cbw_snapshot_read_lock (...) at 
../block/copy-before-write.c:237
#8  0x6152853e3068 in cbw_co_snapshot_block_status (...) at 
../block/copy-before-write.c:304
#9  0x6152853f4d22 in bdrv_co_snapshot_block_status (...) at 
../block/io.c:3726
#10 0x61528543a63e in snapshot_access_co_block_status (...) at 
../block/snapshot-access.c:48
#11 0x6152853f1a0a in bdrv_co_do_block_status (...) at ../block/io.c:2474
#12 0x6152853f2016 in bdrv_co_common_block_status_above (...) at 
../block/io.c:2652
#13 0x6152853f22cf in bdrv_co_block_status_above (...) at ../block/io.c:2732
#14 0x6152853d9a86 in blk_co_block_status_above (...) at 
../block/block-backend.c:1473
#15 0x61528538da6c in blockstatus_to_extents (...) at ../nbd/server.c:2374
#16 0x61528538deb1 in nbd_co_send_block_status (...) at ../nbd/server.c:2481
#17 0x61528538f424 in nbd_handle_request (...) at ../nbd/server.c:2978
#18 0x61528538f906 in nbd_trip (...) at ../nbd/server.c:3121
#19 0x6152855a7caf in coroutine_trampoline (...) at 
../util/coroutine-ucontext.c:175


Cc: qemu-sta...@nongnu.org
Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 


Hi!

Has this been forgotten or is it not needed for 9.1?



My apologies, this is forgotten. I think rc4 is too late, I'll send Pull 
request as soon as 9.2 window open.

--
Best regards,
Vladimir

[PATCH v3 5/7] vfio/igd: add ID's for ElkhartLake and TigerLake

2024-08-28 Thread Corvin Köhne

ElkhartLake and TigerLake devices were tested in legacy mode with Linux
and Windows VMs. Both are working properly. It's likely that other Intel
GPUs of gen 11 and 12 like IceLake device are working too. However,
we're only adding known good devices for now.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0d68c6a451..8a41b16421 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -88,6 +88,12 @@ static int igd_gen(VFIOPCIDevice *vdev)
 case 0x2200:
 case 0x5900:
 return 8;
+/* ElkhartLake */
+case 0x4500:
+return 11;
+/* TigerLake */
+case 0x9A00:
+return 12;
 }
 
 /*
-- 
2.46.0

[PATCH v3 2/7] vfio/igd: support legacy mode for all known generations

2024-08-28 Thread Corvin Köhne

We're soon going to add support for legacy mode to ElkhartLake and
TigerLake devices. Those are gen 11 and 12 devices. At the moment, all
devices identified by our igd_gen function do support legacy mode. This
won't change when adding our new devices of gen 11 and 12. Therefore, it
makes more sense to accept legacy mode for all known devices instead of
maintaining a long list of known good generations. If we add a new
generation to igd_gen which doesn't support legacy mode for some reason,
it'll be easy to advance the check to reject legacy mode for this
specific generation.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 650a323dda..d5e57656a8 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -416,7 +416,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
  * devices maintain compatibility with generation 8.
  */
 gen = igd_gen(vdev);
-if (gen != 6 && gen != 8) {
+if (gen == -1) {
 error_report("IGD device %s is unsupported in legacy mode, "
  "try SandyBridge or newer", vdev->vbasedev.name);
 return;
-- 
2.46.0

[PATCH v3 7/7] vfio/igd: correctly calculate stolen memory size for gen 9 and later

2024-08-28 Thread Corvin Köhne

We have to update the calculation of the stolen memory size because
we've seen devices using values of 0xf0 and above for the graphics mode
select field. The new calculation was taken from the linux kernel [1].

[1] 
https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0751c43eae..a95d441f68 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -488,11 +488,18 @@ static int igd_get_stolen_mb(int gen, uint32_t gmch)
 gms = (gmch >> 8) & 0xff;
 }
 
-if (gms > 0x10) {
-error_report("Unsupported IGD GMS value 0x%x", gms);
-return 0;
+if (gen < 9) {
+if (gms > 0x10) {
+error_report("Unsupported IGD GMS value 0x%x", gms);
+return 0;
+}
+return gms * 32;
+} else {
+if (gms < 0xf0)
+return gms * 32;
+else
+return gms * 4 + 4;
 }
-return gms * 32;
 }
 
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
-- 
2.46.0

[PATCH v3 0/7] vfio/igd: add passthrough support for IGDs of gen 11 and later

2024-08-28 Thread Corvin Köhne

Hi,

Qemu has experimental support for GPU passthrough of Intels integrated graphic
devices. Unfortunately, Intel has changed some bits for their gen 11 devices
and later. To support these devices, we have to account for those changes. This
patch series adds the missing bits on the Qemu side.

I've tested the patch series on an ElkhartLake and TigerLake device. On the
guest side, I've tested an EFI environment (GOP driver), a Linux guest and a
Windows VM. The driver of all guests are able to use the GPU and produce an
output on the connected display.

Corvin Köhne (7):
  vfio/igd: return an invalid generation for unknown devices
  vfio/igd: support legacy mode for all known generations
  vfio/igd: use new BDSM register location and size for gen 11 and later
  vfio/igd: add new bar0 quirk to emulate BDSM mirror
  vfio/igd: add ID's for ElkhartLake and TigerLake
  vfio/igd: don't set stolen memory size to zero
  vfio/igd: correctly calculate stolen memory size for gen 9 and later

 hw/vfio/igd.c| 185 +--
 hw/vfio/pci-quirks.c |   1 +
 hw/vfio/pci.h|   1 +
 3 files changed, 161 insertions(+), 26 deletions(-)

-- 
2.46.0

[PATCH v3 3/7] vfio/igd: use new BDSM register location and size for gen 11 and later

2024-08-28 Thread Corvin Köhne

Intel changed the location and size of the BDSM register for gen 11
devices and later. We have to adjust our emulation for these devices to
properly support them.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index d5e57656a8..0b6533bbf7 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -100,11 +100,12 @@ static int igd_gen(VFIOPCIDevice *vdev)
 typedef struct VFIOIGDQuirk {
 struct VFIOPCIDevice *vdev;
 uint32_t index;
-uint32_t bdsm;
+uint64_t bdsm;
 } VFIOIGDQuirk;
 
 #define IGD_GMCH 0x50 /* Graphics Control Register */
 #define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
+#define IGD_BDSM_GEN11 0xc0 /* Base Data of Stolen Memory of gen 11 and later 
*/
 
 
 /*
@@ -313,9 +314,13 @@ static void vfio_igd_quirk_data_write(void *opaque, hwaddr 
addr,
  */
 if ((igd->index % 4 == 1) && igd->index < vfio_igd_gtt_max(vdev)) {
 if (gen < 8 || (igd->index % 8 == 1)) {
-uint32_t base;
+uint64_t base;
 
-base = pci_get_long(vdev->pdev.config + IGD_BDSM);
+if (gen < 11) {
+base = pci_get_long(vdev->pdev.config + IGD_BDSM);
+} else {
+base = pci_get_quad(vdev->pdev.config + IGD_BDSM_GEN11);
+}
 if (!base) {
 hw_error("vfio-igd: Guest attempted to program IGD GTT before "
  "BIOS reserved stolen memory.  Unsupported BIOS?");
@@ -519,7 +524,13 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 igd = quirk->data = g_malloc0(sizeof(*igd));
 igd->vdev = vdev;
 igd->index = ~0;
-igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM, 4);
+if (gen < 11) {
+igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM, 4);
+} else {
+igd->bdsm = vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11, 4);
+igd->bdsm |=
+(uint64_t)vfio_pci_read_config(&vdev->pdev, IGD_BDSM_GEN11 + 4, 4) 
<< 32;
+}
 igd->bdsm &= ~((1 * MiB) - 1); /* 1MB aligned */
 
 memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_index_quirk,
@@ -577,9 +588,15 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 pci_set_long(vdev->emulated_config_bits + IGD_GMCH, ~0);
 
 /* BDSM is read-write, emulated.  The BIOS needs to be able to write it */
-pci_set_long(vdev->pdev.config + IGD_BDSM, 0);
-pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0);
-pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0);
+if (gen < 11) {
+pci_set_long(vdev->pdev.config + IGD_BDSM, 0);
+pci_set_long(vdev->pdev.wmask + IGD_BDSM, ~0);
+pci_set_long(vdev->emulated_config_bits + IGD_BDSM, ~0);
+} else {
+pci_set_quad(vdev->pdev.config + IGD_BDSM_GEN11, 0);
+pci_set_quad(vdev->pdev.wmask + IGD_BDSM_GEN11, ~0);
+pci_set_quad(vdev->emulated_config_bits + IGD_BDSM_GEN11, ~0);
+}
 
 /*
  * This IOBAR gives us access to GTTADR, which allows us to write to
-- 
2.46.0

[PATCH v3 4/7] vfio/igd: add new bar0 quirk to emulate BDSM mirror

2024-08-28 Thread Corvin Köhne

The BDSM register is mirrored into MMIO space at least for gen 11 and
later devices. Unfortunately, the Windows driver reads the register
value from MMIO space instead of PCI config space for those devices [1].
Therefore, we either have to keep a 1:1 mapping for the host and guest
address or we have to emulate the MMIO register too. Using the igd in
legacy mode is already hard due to it's many constraints. Keeping a 1:1
mapping may not work in all cases and makes it even harder to use. An
MMIO emulation has to trap the whole MMIO page. This makes accesses to
this page slower compared to using second level address translation.
Nevertheless, it doesn't have any constraints and I haven't noticed any
performance degradation yet making it a better solution.

[1] 
https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653

Signed-off-by: Corvin Köhne 
---
v2:
* omit unnecessary leXX_to_cpu calls
* make use of IGD_BDSM_MMIO_OFFSET define

 hw/vfio/igd.c| 98 
 hw/vfio/pci-quirks.c |  1 +
 hw/vfio/pci.h|  1 +
 3 files changed, 100 insertions(+)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 0b6533bbf7..0d68c6a451 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -374,6 +374,104 @@ static const MemoryRegionOps vfio_igd_index_quirk = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+#define IGD_BDSM_MMIO_OFFSET 0x1080C0
+
+static uint64_t vfio_igd_quirk_bdsm_read(void *opaque,
+  hwaddr addr, unsigned size)
+{
+VFIOPCIDevice *vdev = opaque;
+uint64_t offset;
+
+offset = IGD_BDSM_GEN11 + addr;
+
+switch (size) {
+case 1:
+return pci_get_byte(vdev->pdev.config + offset);
+case 2:
+return pci_get_word(vdev->pdev.config + offset);
+case 4:
+return pci_get_long(vdev->pdev.config + offset);
+case 8:
+return pci_get_quad(vdev->pdev.config + offset);
+default:
+hw_error("igd: unsupported read size, %u bytes", size);
+break;
+}
+
+return 0;
+}
+
+static void vfio_igd_quirk_bdsm_write(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size)
+{
+VFIOPCIDevice *vdev = opaque;
+uint64_t offset;
+
+offset = IGD_BDSM_GEN11 + addr;
+
+switch (size) {
+case 1:
+pci_set_byte(vdev->pdev.config + offset, data);
+break;
+case 2:
+pci_set_word(vdev->pdev.config + offset, data);
+break;
+case 4:
+pci_set_long(vdev->pdev.config + offset, data);
+break;
+case 8:
+pci_set_quad(vdev->pdev.config + offset, data);
+break;
+default:
+hw_error("igd: unsupported read size, %u bytes", size);
+break;
+}
+}
+
+static const MemoryRegionOps vfio_igd_bdsm_quirk = {
+.read = vfio_igd_quirk_bdsm_read,
+.write = vfio_igd_quirk_bdsm_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
+{
+VFIOQuirk *quirk;
+int gen;
+
+/*
+ * This must be an Intel VGA device at address 00:02.0 for us to even
+ * consider enabling legacy mode. Some driver have dependencies on the PCI
+ * bus address.
+ */
+if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
+!vfio_is_vga(vdev) || nr != 0 ||
+&vdev->pdev != pci_find_device(pci_device_root_bus(&vdev->pdev),
+   0, PCI_DEVFN(0x2, 0))) {
+return;
+}
+
+/*
+ * Only on IGD devices of gen 11 and above, the BDSM register is mirrored
+ * into MMIO space and read from MMIO space by the Windows driver.
+ */
+gen = igd_gen(vdev);
+if (gen < 11) {
+return;
+}
+
+quirk = vfio_quirk_alloc(1);
+quirk->data = vdev;
+
+memory_region_init_io(&quirk->mem[0], OBJECT(vdev), &vfio_igd_bdsm_quirk,
+  vdev, "vfio-igd-bdsm-quirk", 8);
+memory_region_add_subregion_overlap(vdev->bars[0].region.mem,
+IGD_BDSM_MMIO_OFFSET, &quirk->mem[0],
+1);
+
+QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
+}
+
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 {
 g_autofree struct vfio_region_info *rom = NULL;
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 39dae72497..d37f722cce 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1259,6 +1259,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)
 vfio_probe_nvidia_bar0_quirk(vdev, nr);
 vfio_probe_rtl8168_bar2_quirk(vdev, nr);
 #ifdef CONFIG_VFIO_IGD
+vfio_probe_igd_bar0_quirk(vdev, nr);
 vfio_probe_igd_bar4_quirk(vdev, nr);
 #endif
 }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index bf67df2fbc..5ad090a229 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -215,6 +215,7 @@ void v

[PATCH v3 1/7] vfio/igd: return an invalid generation for unknown devices

2024-08-28 Thread Corvin Köhne

Intel changes it's specification quite often e.g. the location and size
of the BDSM register has change for gen 11 devices and later. This
causes our emulation to fail on those devices. So, it's impossible for
us to use a suitable default value for unknown devices. Instead of
returning a random generation value and hoping that everthing works
fine, we should verify that different devices are working and add them
to our list of known devices.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index d320d032a7..650a323dda 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -90,7 +90,11 @@ static int igd_gen(VFIOPCIDevice *vdev)
 return 8;
 }
 
-return 8; /* Assume newer is compatible */
+/*
+ * Unfortunately, Intel changes it's specification quite often. This makes
+ * it impossible to use a suitable default value for unknown devices.
+ */
+return -1;
 }
 
 typedef struct VFIOIGDQuirk {
-- 
2.46.0

[PATCH v3 6/7] vfio/igd: don't set stolen memory size to zero

2024-08-28 Thread Corvin Köhne

The stolen memory is required for the GOP (EFI) driver and the Windows
driver. While the GOP driver seems to work with any stolen memory size,
the Windows driver will crash if the size doesn't match the size
allocated by the host BIOS. For that reason, it doesn't make sense to
overwrite the stolen memory size. It's true that this wastes some VM
memory. In the worst case, the stolen memory can take up more than a GB.
However, that's uncommon. Additionally, it's likely that a bunch of RAM
is assigned to VMs making use of GPU passthrough.

Signed-off-by: Corvin Köhne 
---
 hw/vfio/igd.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 8a41b16421..0751c43eae 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -478,6 +478,23 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
 QLIST_INSERT_HEAD(&vdev->bars[nr].quirks, quirk, next);
 }
 
+static int igd_get_stolen_mb(int gen, uint32_t gmch)
+{
+int gms;
+
+if (gen < 8) {
+gms = (gmch >> 3) & 0x1f;
+} else {
+gms = (gmch >> 8) & 0xff;
+}
+
+if (gms > 0x10) {
+error_report("Unsupported IGD GMS value 0x%x", gms);
+return 0;
+}
+return gms * 32;
+}
+
 void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 {
 g_autofree struct vfio_region_info *rom = NULL;
@@ -655,23 +672,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr)
 ggms_mb = 1 << ggms_mb;
 }
 
-/*
- * Assume we have no GMS memory, but allow it to be overridden by device
- * option (experimental).  The spec doesn't actually allow zero GMS when
- * when IVD (IGD VGA Disable) is clear, but the claim is that it's unused,
- * so let's not waste VM memory for it.
- */
-gmch &= ~((gen < 8 ? 0x1f : 0xff) << (gen < 8 ? 3 : 8));
-
-if (vdev->igd_gms) {
-if (vdev->igd_gms <= 0x10) {
-gms_mb = vdev->igd_gms * 32;
-gmch |= vdev->igd_gms << (gen < 8 ? 3 : 8);
-} else {
-error_report("Unsupported IGD GMS value 0x%x", vdev->igd_gms);
-vdev->igd_gms = 0;
-}
-}
+gms_mb = igd_get_stolen_mb(gen, gmch);
 
 /*
  * Request reserved memory for stolen memory via fw_cfg.  VM firmware
-- 
2.46.0

Re: [PATCH v4 6/7] memory: Do not create circular reference with subregion

2024-08-28 Thread Akihiko Odaki


On 2024/08/28 22:09, Peter Xu wrote:

On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:

On 2024/08/28 1:11, Peter Xu wrote:

On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:

On 2024/08/27 4:42, Peter Xu wrote:

On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:

On Mon, 26 Aug 2024 at 16:22, Peter Xu  wrote:


On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:

memory_region_update_container_subregions() used to call
memory_region_ref(), which creates a reference to the owner of the
subregion, on behalf of the owner of the container. This results in a
circular reference if the subregion and container have the same owner.

memory_region_ref() creates a reference to the owner instead of the
memory region to match the lifetime of the owner and memory region. We
do not need such a hack if the subregion and container have the same
owner because the owner will be alive as long as the container is.
Therefore, create a reference to the subregion itself instead ot its
owner in such a case; the reference to the subregion is still necessary
to ensure that the subregion gets finalized after the container.

Signed-off-by: Akihiko Odaki 
---
system/memory.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/system/memory.c b/system/memory.c
index 5e6eb459d5de..e4d3e9d1f427 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2612,7 +2612,9 @@ static void 
memory_region_update_container_subregions(MemoryRegion *subregion)

memory_region_transaction_begin();

-memory_region_ref(subregion);
+object_ref(mr->owner == subregion->owner ?
+   OBJECT(subregion) : subregion->owner);


The only place that mr->refcount is used so far is the owner with the
object property attached to the mr, am I right (ignoring name-less MRs)?

I worry this will further complicate refcounting, now we're actively using
two refcounts for MRs..


The actor of object_ref() is the owner of the memory region also in this
case. We are calling object_ref() on behalf of mr->owner so we use
mr->refcount iff mr->owner == subregion->owner. In this sense there is only
one user of mr->refcount even after this change.


Yes it's still one user, but it's not that straightforward to see, also
it's still an extension to how we use mr->refcount right now.  Currently
it's about "true / false" just to describe, now it's a real counter.

I wished that counter doesn't even exist if we'd like to stick with device
/ owner's counter.  Adding this can definitely also make further effort
harder if we want to remove mr->refcount.


I don't think it will make removing mr->refcount harder. With this change,
mr->refcount will count the parent and container. If we remove mr->refcount,
we need to trigger object_finalize() in a way other than checking
mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
&& mr->container.







Continue discussion there:

https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c1...@daynix.com

What I don't see is how mr->subregions differs from mr->container, so we
allow subregions to be attached but not the container when finalize()
(which is, afaict, the other way round).

It seems easier to me that we allow both container and subregions to exist
as long as within the owner itself, rather than start heavier use of
mr->refcount.


I don't think just "same owner" necessarily will be workable --
you can have a setup like:
 * device A has a container C_A
 * device A has a child-device B
 * device B has a memory region R_B
 * device A's realize method puts R_B into C_A

R_B's owner is B, and the container's owner is A,
but we still want to be able to get rid of A (in the process
getting rid of B because it gets unparented and unreffed,
and R_B and C_A also).


For cross-device references, should we rely on an explicit call to
memory_region_del_subregion(), so as to detach the link between C_A and
R_B?


Yes, I agree.



My understanding so far: logically when MR finalize() it should guarantee
both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
beginning we don't assert on ->container==NULL yet).  It requires all
device emulations to do proper unrealize() to unlink all the MRs.

However what I'm guessing is QEMU probably used to have lots of devices
that are not following the rules and leaking these links.  Hence we have
had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.

What I was thinking is this comment seems to apply too to mr->container, so
that it should be safe too to unlink ->container the same way as its own
subregions. >
IIUC that means for device-internal MR links we should be fine leaving
whatever link between MRs owned by such device; the device->refcount
guarantees none of them will be visible in any AS.  B

1 2 3 >

1 - 100 of 200 matches

Mail list logo