date:20240701

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread Jiaxun Yang




在2024年7月1日七月 上午7:44，maobibo写道：
> Also this patch is problematic on LoongArch.
>
> The original patch is to search physical cpuid rather than logic cpuid.
>
> We want to make ipi module better and better, however now it comes back 
> to initial state at the beginning :(

Isn't arch_id the "physical id" you want? "cs->cpu_index" is the logical ID
for QEMU.

arch_id is setup by arch code, like APIC ID for x86.

I had come across the old ipi_getcpu  implementation, and I'm sure we were
looking at arch_id as well.

Thanks
- Jiaxun
>
> commit 03ca348b6b9038ce284916b36c19f700ac0ce7a6
> Author: Jiaxun Yang 
> Date:   Wed Jun 5 10:04:27 2024
>
>  hw/intc/loongson_ipi: Replace ipi_getcpu with cpu_by_arch_id
>
>  cpu_by_arch_id is doing the same thing as our ipi_getcpu logic.
>
>  Signed-off-by: Jiaxun Yang 
>  Reviewed-by: Song Gao 
>  Message-ID: <20240605-loongson3-ipi-v3-4-ddd2c0e03...@flygoat.com>
>  Signed-off-by: Philippe Mathieu-Daudé 
>
>
> Regards
> Bibo Mao
>
-- 
- Jiaxun

Re: [PATCH] block/curl: rewrite http header parsing function

2024-07-01 Thread Vladimir Sementsov-Ogievskiy


On 01.07.24 09:55, Michael Tokarev wrote:

01.07.2024 09:54, Vladimir Sementsov-Ogievskiy wrote:


+    const char *t = "accept-ranges : bytes "; /* A lowercase template */


Note: you make parser less strict: you allow "bytes" to be uppercase (was allowed 
only for accept-ranges", and you allow whitespaces before colon.


Yes, exactly.

I should add this to the description (wanted to do that but forgot).
I'll update the patch (without re-sending) - hopefully its' okay to
keep your S-o-b :)



Of course!

--
Best regards,
Vladimir

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread Jiaxun Yang




在2024年7月1日七月 上午2:35，maobibo写道：
[...]
>
> How about split loongson_ipi.c into 
> loongson_ipi_base.c/loongson_ipi_loongson.c/loongson_ipi_loongarch.c,
>
> File loongson_ipi_base.c contains the common code, loongson_ipi_xxx.c 
> contains arch specific. Soon we will submit irqchip in kernel function,
> it will be much different for architectures, since ioctl command is 
> different for two architectures to save/restore ipi registers.

MIPS's in kernel IPI IOCTL interface is non-existent so far, so if you are going
to design something, I think it will be adopted by MIPS if necessary. There is 
still
no need to create divergence in between.

That being said, You are more than welcomed to draft a patch so we can discuss 
based
on that.

Thanks
>
> Regards
> Bibo Mao

-- 
- Jiaxun

Re: [PATCH] spapr: Migrate ail-mode-3 spapr cap

2024-07-01 Thread Michael Tokarev


06.06.2024 14:26, Michael Tokarev wrote:

06.05.2024 14:56, Nicholas Piggin wrote:

This cap did not add the migration code when it was introduced. This
results in migration failure when changing the default using the
command line.

Cc: qemu-sta...@nongnu.org
Fixes: ccc5a4c5e10 ("spapr: Add SPAPR_CAP_AIL_MODE_3 for AIL mode 3 support for 
H_SET_MODE hcall")
Signed-off-by: Nicholas Piggin 
---
  include/hw/ppc/spapr.h | 1 +
  hw/ppc/spapr.c | 1 +
  hw/ppc/spapr_caps.c    | 1 +
  3 files changed, 3 insertions(+)


Hi!

Has this change been forgotten?  It's aimed at -stable, so must be
fixing a real issue.


Ping #2 ? :)

/mjt

--
GPG Key transition (from rsa2048 to rsa4096) since 2024-04-24.
New key: rsa4096/61AD3D98ECDF2C8E  9D8B E14E 3F2A 9DD7 9199  28F1 61AD 3D98 
ECDF 2C8E
Old key: rsa2048/457CE0A0804465C5  6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 
8044 65C5
Transition statement: http://www.corpit.ru/mjt/gpg-transition-2024.txt

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread maobibo





On 2024/7/1 下午3:01, Jiaxun Yang wrote:



在2024年7月1日七月 上午7:44，maobibo写道：

Also this patch is problematic on LoongArch.

The original patch is to search physical cpuid rather than logic cpuid.

We want to make ipi module better and better, however now it comes back
to initial state at the beginning :(


Isn't arch_id the "physical id" you want? "cs->cpu_index" is the logical ID
for QEMU.

arch_id is setup by arch code, like APIC ID for x86.

I had come across the old ipi_getcpu  implementation, and I'm sure we were
looking at arch_id as well.
So, where is implementation code for function get_arch_id() looking for 
vcpu with physical index?


Regards
Bibo Mao



Thanks
- Jiaxun


commit 03ca348b6b9038ce284916b36c19f700ac0ce7a6
Author: Jiaxun Yang 
Date:   Wed Jun 5 10:04:27 2024

  hw/intc/loongson_ipi: Replace ipi_getcpu with cpu_by_arch_id

  cpu_by_arch_id is doing the same thing as our ipi_getcpu logic.

  Signed-off-by: Jiaxun Yang 
  Reviewed-by: Song Gao 
  Message-ID: <20240605-loongson3-ipi-v3-4-ddd2c0e03...@flygoat.com>
  Signed-off-by: Philippe Mathieu-Daudé 


Regards
Bibo Mao

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread maobibo





On 2024/7/1 下午3:08, Jiaxun Yang wrote:



在2024年7月1日七月 上午2:35，maobibo写道：
[...]


How about split loongson_ipi.c into
loongson_ipi_base.c/loongson_ipi_loongson.c/loongson_ipi_loongarch.c,

File loongson_ipi_base.c contains the common code, loongson_ipi_xxx.c
contains arch specific. Soon we will submit irqchip in kernel function,
it will be much different for architectures, since ioctl command is
different for two architectures to save/restore ipi registers.


MIPS's in kernel IPI IOCTL interface is non-existent so far, so if you are going
to design something, I think it will be adopted by MIPS if necessary. There is 
still
no need to create divergence in between.

That being said, You are more than welcomed to draft a patch so we can discuss 
based
on that.

Sure, will do.

Regards
Bibo Mao


Thanks


Regards
Bibo Mao

Re: [RFC v3 1/2] target/loongarch: Add loongson binary translation feature

2024-07-01 Thread maobibo





On 2024/7/1 下午2:57, Jiaxun Yang wrote:



在2024年5月30日五月 上午7:49，Bibo Mao写道：

Loongson Binary Translation (LBT) is used to accelerate binary
translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM
eflags (eflags) and x87 fpu stack pointer (ftop).

Now LBT feature is added in kvm mode, not supported in TCG mode since
it is not emulated. Feature variable lbt is added with OnOffAuto type,
If lbt feature is not supported with KVM host, it reports error if there
is lbt=on command line.

If there is no any command line about lbt parameter, it checks whether
KVM host supports lbt feature and set the corresponding value in cpucfg.

Signed-off-by: Bibo Mao 

Hi Bibo,

I was going across recent LoongArch changes and this comes into my attention:


---
  target/loongarch/cpu.c| 53 +++
  target/loongarch/cpu.h|  6 +++
  target/loongarch/kvm/kvm.c| 26 +
  target/loongarch/kvm/kvm_loongarch.h  | 16 
  target/loongarch/loongarch-qmp-cmds.c |  2 +-
  5 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index b5c1ec94af..14265b6667 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -571,6 +571,30 @@ static void loongarch_cpu_disas_set_info(CPUState
*s, disassemble_info *info)
  info->print_insn = print_insn_loongarch;
  }

+static void loongarch_cpu_check_lbt(CPUState *cs, Error **errp)
+{
+CPULoongArchState *env = cpu_env(cs);
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+bool kvm_supported;
+
+kvm_supported = kvm_feature_supported(cs, LOONGARCH_FEATURE_LBT);


IMHO if there is no global states that should be saved/restored VM wise,
this should be handled at per CPU level, preferably with CPUCFG flags hint.

We should minimize non-privilege KVM feature bits to prevent hindering
asymmetry ISA system.
For "asymmetry ISA system", do you meaning some vcpus have LBT feature, 
however some vcpus does have LBT features? That does not exists at all.


It will be big disaster for such products, how does application use this?

Regards
Bibo


Thanks
- Jiaxun

Re: [PATCH] vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests

2024-07-01 Thread BillXiang



> From: "项文成"
> Date:  Thu, Jun 13, 2024, 14:51
> Subject:  [PATCH] vhost-user: Skip unnecessary duplicated 
> VHOST_USER_SET_LOG_BASE requests
> To: 
> Cc: , "BillXiang"
> From: BillXiang 
> 
> The VHOST_USER_SET_LOG_BASE requests should be categorized into
> non-vring specific messages, and should be sent only once.
> If send more than once, dpdk will munmap old log_addr which may has been used 
> and cause segmentation fault.
> 
> Signed-off-by: BillXiang 
> ---
>  hw/virtio/vhost-user.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..41e34edd49 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -371,6 +371,7 @@ static bool 
> vhost_user_per_device_request(VhostUserRequest request)
>      case VHOST_USER_RESET_DEVICE:
>      case VHOST_USER_ADD_MEM_REG:
>      case VHOST_USER_REM_MEM_REG:
> +    case VHOST_USER_SET_LOG_BASE:
>          return true;
>      default:
>          return false;
> -- 
> 2.30.0
ping

[PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Stefano Garzarella

Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
switched using of address_space_read_cached() to virito_lduw_phys_cached()
to access packed descriptor event.

When we used address_space_read_cached(), we needed to call
virtio_tswap16s() to handle the endianess of the field, but
virito_lduw_phys_cached() already handles it internally, so we no longer
need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
but forgot for `flags`).

Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
Cc: jasow...@redhat.com
Cc: qemu-sta...@nongnu.org
Reported-by: Xoykie 
Link: 
https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
Signed-off-by: Stefano Garzarella 
---
 hw/virtio/virtio.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..2e5e67bdb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
 /* Make sure flags is seen before off_wrap */
 smp_rmb();
 e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-virtio_tswap16s(vdev, &e->flags);
 }
 
 static void vring_packed_off_wrap_write(VirtIODevice *vdev,
-- 
2.45.2

Re: [Bug Report] Possible Missing Endianness Conversion

2024-07-01 Thread Stefano Garzarella


On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:

On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella  wrote:


On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella  wrote:
>>
>> CCing Jason.
>>
>> On Mon, Jun 24, 2024 at 4:30 PM Xoykie  wrote:
>> >
>> > The virtio packed virtqueue support patch[1] suggests converting
>> > endianness by lines:
>> >
>> > virtio_tswap16s(vdev, &e->off_wrap);
>> > virtio_tswap16s(vdev, &e->flags);
>> >
>> > Though both of these conversion statements aren't present in the
>> > latest qemu code here[2]
>> >
>> > Is this intentional?
>>
>> Good catch!
>>
>> It looks like it was removed (maybe by mistake) by commit
>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
>
>That commit changes from:
>
>-address_space_read_cached(cache, off_off, &e->off_wrap,
>-  sizeof(e->off_wrap));
>-virtio_tswap16s(vdev, &e->off_wrap);
>
>which does a byte read of 2 bytes and then swaps the bytes
>depending on the host endianness and the value of
>virtio_access_is_big_endian()
>
>to this:
>
>+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
>
>virtio_lduw_phys_cached() is a small function which calls
>either lduw_be_phys_cached() or lduw_le_phys_cached()
>depending on the value of virtio_access_is_big_endian().
>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
>the right thing for the host-endianness to do a "load
>a specifically big or little endian 16-bit value".)
>
>Which is to say that because we use a load/store function that's
>explicit about the size of the data type it is accessing, the
>function itself can handle doing the load as big or little
>endian, rather than the calling code having to do a manual swap after
>it has done a load-as-bag-of-bytes. This is generally preferable
>as it's less error-prone.

Thanks for the details!

So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?

I mean:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..2e5e67bdb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
  /* Make sure flags is seen before off_wrap */
  smp_rmb();
  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-virtio_tswap16s(vdev, &e->flags);
  }


That definitely looks like it's probably not correct...


Yeah, I just sent that patch: 
https://lore.kernel.org/qemu-devel/20240701075208.19634-1-sgarz...@redhat.com


We can continue the discussion there.

Thanks,
Stefano

Re: [PATCH 0/7] pc-bios/s390-ccw: Merge the netboot loader into s390-ccw.img

2024-07-01 Thread Thomas Huth


On 28/06/2024 20.01, Jared Rossi wrote:



On 6/24/24 1:55 AM, Thomas Huth wrote:

[...]

I think it should be fine, both functions are basically just a wrapper 
around the write() function in sclp.c, with sclp_print() being rather dumb 
while printf() is doing the usual string formatting before writing it out. 
I think in the long run, it would be nice to get rid of sclp_print() and 
replace it by puts() or printf() in the whole code, but doing that right 
now would likely cause quite some conflicts for Jared with his patch 
series, so I'd rather postpone that to a later point in time.


Hi Thomas,

Converting the panics to returns will require me to modify/move some of the 
sclp_print() calls.  Shall I go ahead and change them to printf() and puts() 
while I'm at it, or would you rather preserve the sclp_print() for now and 
then have a dedicated patch for the all replacements later?  I'm not sure if 
we want to try to maintain some amount of consistency until we do a total 
conversion, or if you are OK with a mix of sclp_print() and printf() 
throughout in the meantime.


 Hi,

I'm ok if we mix them for a while, so I'd say go ahead and use printf or 
puts for the new code.


 Thomas

Re: [PATCH v42 06/98] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)

2024-07-01 Thread Luc Michel

On 09:00 Fri 28 Jun , Philippe Mathieu-Daudé wrote:
> Caution: This message originated from an External Source. Use proper caution 
> when opening attachments, clicking links, or responding.
> 
> 
> "General command" (GEN_CMD, CMD56) is described as:
> 
>   GEN_CMD is the same as the single block read or write
>   commands (CMD24 or CMD17). The difference is that [...]
>   the data block is not a memory payload data but has a
>   vendor specific format and meaning.
> 
> Thus this block must not be stored overwriting data block
> on underlying storage drive. Keep it in a dedicated
> 'vendor_data[]' array.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> Tested-by: Cédric Le Goater 
> ---
> RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
> to be the same size)?
> 
> Cc: Peter Xu 
> Cc: Fabiano Rosas 

I'm not sure about this migration question.

But IMHO you can simplify your implementation to avoid having to store
and migrate this vendor_data array. After some research on this command,
I came to the conclusion that it's used by manufacturers to return
device health related vendor-specific data. (E.g.,
https://images-na.ssl-images-amazon.com/images/I/91tTtUMDM3L.pdf Section
1.6.1). So I guess you can simply discard writes and return 0s on reads
(or "QEMU" in ASCII or... :)).

> ---
>  hw/sd/sd.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 464576751a..1f3eea6e84 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -142,6 +142,8 @@ struct SDState {
>  uint64_t data_start;
>  uint32_t data_offset;
>  uint8_t data[512];
> +uint8_t vendor_data[512];
> +
>  qemu_irq readonly_cb;
>  qemu_irq inserted_cb;
>  QEMUTimer *ocr_power_timer;
> @@ -656,6 +658,7 @@ static void sd_reset(DeviceState *dev)
>  sd->wp_switch = sd->blk ? !blk_is_writable(sd->blk) : false;
>  sd->wp_group_bits = sect;
>  sd->wp_group_bmap = bitmap_new(sd->wp_group_bits);
> +memset(sd->vendor_data, 0xec, sizeof(sd->vendor_data));
>  memset(sd->function_group, 0, sizeof(sd->function_group));
>  sd->erase_start = INVALID_ADDRESS;
>  sd->erase_end = INVALID_ADDRESS;
> @@ -771,7 +774,7 @@ static const VMStateDescription sd_vmstate = {
>  VMSTATE_UINT64(data_start, SDState),
>  VMSTATE_UINT32(data_offset, SDState),
>  VMSTATE_UINT8_ARRAY(data, SDState, 512),
> -VMSTATE_UNUSED_V(1, 512),
> +VMSTATE_UINT8_ARRAY(vendor_data, SDState, 512),
>  VMSTATE_BOOL(enable, SDState),
>  VMSTATE_END_OF_LIST()
>  },
> @@ -2029,9 +2032,8 @@ void sd_write_byte(SDState *sd, uint8_t value)
>  break;
> 
>  case 56:  /* CMD56:  GEN_CMD */
> -sd->data[sd->data_offset ++] = value;
> -if (sd->data_offset >= sd->blk_len) {
> -APP_WRITE_BLOCK(sd->data_start, sd->data_offset);
> +sd->vendor_data[sd->data_offset ++] = value;
> +if (sd->data_offset >= sizeof(sd->vendor_data)) {
>  sd->state = sd_transfer_state;
>  }
>  break;
> @@ -2165,12 +2167,11 @@ uint8_t sd_read_byte(SDState *sd)
>  break;
> 
>  case 56:  /* CMD56:  GEN_CMD */
> -if (sd->data_offset == 0)
> -APP_READ_BLOCK(sd->data_start, sd->blk_len);
> -ret = sd->data[sd->data_offset ++];
> +ret = sd->vendor_data[sd->data_offset ++];
> 
> -if (sd->data_offset >= sd->blk_len)
> +if (sd->data_offset >= sizeof(sd->vendor_data)) {
>  sd->state = sd_transfer_state;
> +}
>  break;
> 
>  default:
> --
> 2.41.0
> 
> 

--

Re: [PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Jason Wang

On Mon, Jul 1, 2024 at 3:52 PM Stefano Garzarella  wrote:
>
> Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> switched using of address_space_read_cached() to virito_lduw_phys_cached()
> to access packed descriptor event.
>
> When we used address_space_read_cached(), we needed to call
> virtio_tswap16s() to handle the endianess of the field, but
> virito_lduw_phys_cached() already handles it internally, so we no longer
> need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
> but forgot for `flags`).
>
> Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> Cc: jasow...@redhat.com
> Cc: qemu-sta...@nongnu.org
> Reported-by: Xoykie 
> Link: 
> https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
> Signed-off-by: Stefano Garzarella 

Acked-by: Jason Wang 

Thanks

> ---
>  hw/virtio/virtio.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..2e5e67bdb9 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
>  /* Make sure flags is seen before off_wrap */
>  smp_rmb();
>  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
> -virtio_tswap16s(vdev, &e->flags);
>  }
>
>  static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> --
> 2.45.2
>

Re: [PATCH v42 06/98] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)

2024-07-01 Thread Daniel P . Berrangé

On Fri, Jun 28, 2024 at 09:00:42AM +0200, Philippe Mathieu-Daudé wrote:
> "General command" (GEN_CMD, CMD56) is described as:
> 
>   GEN_CMD is the same as the single block read or write
>   commands (CMD24 or CMD17). The difference is that [...]
>   the data block is not a memory payload data but has a
>   vendor specific format and meaning.
> 
> Thus this block must not be stored overwriting data block
> on underlying storage drive. Keep it in a dedicated
> 'vendor_data[]' array.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> Tested-by: Cédric Le Goater 
> ---
> RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
> to be the same size)?

This field became unused with:

commit 12c125cba9c548929ccf4da2515e5b795c94afd9
Author: Eric Blake 
Date:   Fri May 6 10:26:39 2016 -0600

sd: Switch to byte-based block access

which was in 2.6.1 / 2.7.0


Thus if someone is using a machine type that is 2.6 or
older, I don't think it is safe to unconditionally
reuse that field.

My pending series deprecates everything upto 2.12, but
we won't remove those machine types until 2 further
release are past.

You could gamble that SD card usage is niche enough
that its highly unlikely someone will be using SD
card at the same time as these ancient machine types.

The safe thing would be a new field.
 
> Cc: Peter Xu 
> Cc: Fabiano Rosas 
> ---
>  hw/sd/sd.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread Jiaxun Yang




在2024年7月1日七月 上午8:22，maobibo写道：
> On 2024/7/1 下午3:01, Jiaxun Yang wrote:
>> 
>> 
>> 在2024年7月1日七月 上午7:44，maobibo写道：
>>> Also this patch is problematic on LoongArch.
>>>
>>> The original patch is to search physical cpuid rather than logic cpuid.
>>>
>>> We want to make ipi module better and better, however now it comes back
>>> to initial state at the beginning :(
>> 
>> Isn't arch_id the "physical id" you want? "cs->cpu_index" is the logical ID
>> for QEMU.
>> 
>> arch_id is setup by arch code, like APIC ID for x86.
>> 
>> I had come across the old ipi_getcpu  implementation, and I'm sure we were
>> looking at arch_id as well.
> So, where is implementation code for function get_arch_id() looking for 
> vcpu with physical index?

Hi Bibo,

cpu_by_arch_id will be redirected to:

```
CPUState *cpu_by_arch_id(int64_t id)
{
CPUState *cpu;

CPU_FOREACH(cpu) {
CPUClass *cc = CPU_GET_CLASS(cpu);

if (cc->get_arch_id(cpu) == id) {
return cpu;
}
}
return NULL;
}
```

It iterates over all vcpus and return CPUStates with corresponding arch_id.

Whereas, for LoongArch's get_arch_id implementation:
```
static int64_t loongarch_cpu_get_arch_id(CPUState *cs)
{
LoongArchCPU *cpu = LOONGARCH_CPU(cs);

return cpu->phy_id;
}
```
I believe it matches our intension here.

Thanks

>
> Regards
> Bibo Mao
>
>> 
>> Thanks
>> - Jiaxun
>>>
>>> commit 03ca348b6b9038ce284916b36c19f700ac0ce7a6
>>> Author: Jiaxun Yang 
>>> Date:   Wed Jun 5 10:04:27 2024
>>>
>>>   hw/intc/loongson_ipi: Replace ipi_getcpu with cpu_by_arch_id
>>>
>>>   cpu_by_arch_id is doing the same thing as our ipi_getcpu logic.
>>>
>>>   Signed-off-by: Jiaxun Yang 
>>>   Reviewed-by: Song Gao 
>>>   Message-ID: <20240605-loongson3-ipi-v3-4-ddd2c0e03...@flygoat.com>
>>>   Signed-off-by: Philippe Mathieu-Daudé 
>>>
>>>
>>> Regards
>>> Bibo Mao
>>>

-- 
- Jiaxun

Re: [v2 1/1] hw/i386/acpi-build: add OSHP method support for SHPC driver load

2024-07-01 Thread Igor Mammedov

On Fri, 28 Jun 2024 03:04:28 +
"Gao,Shiyuan"  wrote:

> > > that OS cannot get control of SHPC hotplug and hotplug device to
> > > the PCI bridge will fail when we use SHPC Native type:
> > >
> > >   [3.336059] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > >OSHP (\_SB_.PCI0.S28_)
> > >   [3.337408] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > >OSHP (\_SB_.PCI0)
> > >   [3.338710] shpchp :00:03.0: Cannot get control of SHPC hotplug
> > >
> > > Add OSHP method support for transfer control to the operating system,
> > > after this SHPC driver will be loaded success and the hotplug device to
> > > the PCI bridge will success when we use SHPC Native type.
> > >
> > >   [1.703975] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > >OSHP (\_SB_.PCI0.S18_)
> > >   [1.704934] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > >OSHP (\_SB_.PCI0)
> > >   [1.705855] shpchp :00:03.0: Gained control of SHPC hotplug 
> > >(\_SB_.PCI0)
> > >   [1.707054] shpchp :00:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 
> > >ss_did 0  
> >
> > please describe in commit message reproducer
> > (aka QEMU CLI and guest OS and if necessary other details)  
> 
> qemu-system-x86_64 -machine pc-q35-9.0
> ...
> -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off

please use full QEMU CLI and follow up steps to trigger the issue.

From above it's not obvious what and where you are trying to hotplug
 
> guest OS: centos7/ubuntu22.04
> 
> I will add it in the next version.
> 
> > > +/*
> > > + * PCI Firmware Specification 3.0
> > > + * 4.8. The OSHP Control Method
> > > + */
> > > +static Aml *build_oshp_method(void)
> > > +{
> > > +    Aml *method;
> > > +
> > > +    /*
> > > + * We don't use ACPI to control the SHPC, so just return
> > > + * success is enough.
> > > + */
> > > +    method = aml_method("OSHP", 0, AML_NOTSERIALIZED);
> > > +    aml_append(method, aml_return(aml_int(0x0)));
> > > +    return method;
> > > +}
> > > +
> > >  static void
> > >  build_dsdt(GArray *table_data, BIOSLinker *linker,
> > > AcpiPmInfo *pm, AcpiMiscInfo *misc,
> > > @@ -1452,6 +1469,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > >  aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
> > >  aml_append(dev, aml_name_decl("_UID", 
> > >aml_int(pcmc->pci_root_uid)));
> > >  aml_append(dev, aml_pci_edsm());
> > > +    aml_append(dev, build_oshp_method());  
> >
> > it's global and what will happen if we have ACPI PCI hotplug enabled
> > and guest calls this NOP method?  
> 
> ths OS get the control of SHPC hotplug and SHPC driver load fail later.
> 
> [6.170345] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> OSHP (\_SB_.PCI0.S18_)
> [6.171962] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> OSHP (\_SB_.PCI0)
> [6.173556] shpchp :00:03.0: Gained control of SHPC hotplug 
> (\_SB_.PCI0)
> [6.175144] shpchp :00:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 
> ss_did 0
> [6.196153] shpchp :00:03.0: irq 24 for MSI/MSI-X
> [6.197211] shpchp :00:03.0: pci_hp_register failed with error -16
> [6.198272] shpchp :00:03.0: Slot initialization failed
> 
> this looks more suitable.
> 
> +if (!pm->pcihp_bridge_en) {
> +aml_append(dev, build_i440fx_oshp_method());
> +}

we also have
 PIIX4_PM.acpi-root-pci-hotplug (default true)
though it seems that ACPI hotplug takes precedence of SHPC if both are enabled.
So I'd take it and OSHP approach seems simpler than adding _OSC to do the same.

> 
> >  
> > >  aml_append(sb_scope, dev);
> > >  aml_append(dsdt, sb_scope);
> > >
> > > @@ -1586,6 +1604,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > >  aml_append(dev, build_q35_osc_method(true));
> > >  } else {
> > >  aml_append(dev, aml_name_decl("_HID", 
> > >aml_eisaid("PNP0A03")));
> > > +    aml_append(dev, build_oshp_method());
> > >  }
> > >
> > >  if (numa_node != NUMA_NODE_UNASSIGNED) {  
> 
> Hot plug/unplug a device using SHPC will take more time than ACPI PCI 
> hotplug, because
> after pressing the button, it can be cancelled within 5 seconds in SHPC 
> driver. 

for SHPC on PXB see,
commit d10dda2d60 hw/pci-bridge: disable SHPC in PXB

it seems that enabling SHPC on PXB in QEMU is not enough,
UEFI needs to support that as well
(CCing Gerd to check whether it is possible at all) 

> If I want to use ACPI PCI hotplug in the pxb bridge, what else need to be 
> done?

does it have to be hotplug directly into pxb or
would be it be sufficient to have hotplug support
on pci-bridge attached to a pxb?

I particularly do not like spreading ACPI hotplug 
to any host bridges, as it's quite complicated
code.

Michael,
Are there any reasons why we don't have hotplug directly
on PXBs enabled from PCI spec point of vie

Re: [RFC v3 1/2] target/loongarch: Add loongson binary translation feature

2024-07-01 Thread Jiaxun Yang




在2024年7月1日七月 上午8:32，maobibo写道：
[...]
>>>
>>> +static void loongarch_cpu_check_lbt(CPUState *cs, Error **errp)
>>> +{
>>> +CPULoongArchState *env = cpu_env(cs);
>>> +LoongArchCPU *cpu = LOONGARCH_CPU(cs);
>>> +bool kvm_supported;
>>> +
>>> +kvm_supported = kvm_feature_supported(cs, LOONGARCH_FEATURE_LBT);
>> 
>> IMHO if there is no global states that should be saved/restored VM wise,
>> this should be handled at per CPU level, preferably with CPUCFG flags hint.
>> 
>> We should minimize non-privilege KVM feature bits to prevent hindering
>> asymmetry ISA system.
> For "asymmetry ISA system", do you meaning some vcpus have LBT feature, 
> however some vcpus does have LBT features? That does not exists at all.

Yes, we should always prepare for the future :-)

>From Weiwu's presentations, I believe LASX asymmetry product is already on the
roadmap. So for LBT it's also a possibility.

Even if such product won't land in near future, we should still try our best
to bound to vCPU creation, not to the system.

>
> It will be big disaster for such products, how does application use this?

Affinity placement etc, there are many possibilities.

On Arm side, there are already systems with Big.Little asymmetry CPU that
some of the processor equipped 32 bit EL0 mode while others doesn't. They
managed that well with affinity.

See: arm64: Allow mismatched 32-bit EL0 support

Thanks
>
> Regards
> Bibo
>> 
>> Thanks
>> - Jiaxun
>>

-- 
- Jiaxun

Re: [PATCH] vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests

2024-07-01 Thread Alex Bennée

项文成  writes:

> From: BillXiang 
>
> The VHOST_USER_SET_LOG_BASE requests should be categorized into
> non-vring specific messages, and should be sent only once.
> If send more than once, dpdk will munmap old log_addr which may has
> been used and cause segmentation fault.

This looks fine to me but looking at the vhost-user.rst we don't seem to
make any explicit statements about how many times given messages should
be sent.

>
> Signed-off-by: BillXiang 
> ---
>  hw/virtio/vhost-user.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index cdf9af4a4b..41e34edd49 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -371,6 +371,7 @@ static bool 
> vhost_user_per_device_request(VhostUserRequest request)
>  case VHOST_USER_RESET_DEVICE:
>  case VHOST_USER_ADD_MEM_REG:
>  case VHOST_USER_REM_MEM_REG:
> +case VHOST_USER_SET_LOG_BASE:
>  return true;
>  default:
>  return false;

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PATCH v2 0/7] VIRTIO-IOMMU/HostIOMMUDevice: Fixes and page size mask rework

2024-07-01 Thread Eric Auger

The 2 first patches are fixes of
cf2647a76e ("virtio-iommu: Compute host reserved regions")
They can be taken separately of the rest.

Then the series uses the HostIOMMUDevice interface to fetch
information about the page size mask supported along the assigned
device and propagate it to the virtio-iommu. This is a similar
work as what was done in

VIRTIO-IOMMU/VFIO: Fix host iommu geometry handling series

but this time for the page size mask. Using this new
infrastructure allows to handle page size mask conflicts
earlier on device hotplug as opposed to current QEMU
abort:

qemu-system-aarch64: virtio-iommu virtio-iommu-memory-region-8-0
does not support frozen granule 0x1
qemu: hardware error: vfio: DMA mapping failed, unable to continue

With this series the hotplug nicely fails.

Also this allows to remove hacks consisting in transiently enabling
IOMMU MRs to collect coldplugged device page size mask before machine
init. Those hacks were introduced by

94df5b2180d6 ("virtio-iommu: Fix 64kB host page size VFIO device
assignment")

The series can be found at:
https://github.com/eauger/qemu/tree/virtio-iommu-psmask-rework-v2

---
v1 -> v2:
- Don't update mask if the granule is frozen (Zhenzhong)
- Collected Zhenzhong's and Cédric's R-bs

Eric Auger (7):
  virtio-iommu: Fix error handling in
virtio_iommu_set_host_iova_ranges()
  vfio-container-base: Introduce vfio_container_get_iova_ranges() helper
  HostIOMMUDevice : remove Error handle from get_iova_ranges callback
  HostIOMMUDevice: Introduce get_page_size_mask() callback
  virtio-iommu : Retrieve page size mask on
virtio_iommu_set_iommu_device()
  memory: remove IOMMU MR iommu_set_page_size_mask() callback
  virtio-iommu: Revert transient enablement of IOMMU MR in bypass mode

 include/exec/memory.h |  38 
 include/hw/vfio/vfio-container-base.h |   9 ++
 include/sysemu/host_iommu_device.h|  11 ++-
 hw/vfio/common.c  |   8 --
 hw/vfio/container-base.c  |  15 
 hw/vfio/container.c   |  16 ++--
 hw/vfio/iommufd.c |  21 +++--
 hw/virtio/virtio-iommu.c  | 123 +-
 system/memory.c   |  13 ---
 hw/virtio/trace-events|   2 +-
 10 files changed, 119 insertions(+), 137 deletions(-)

-- 
2.41.0

Re: [PATCH V12 0/8] Add architecture agnostic code to support vCPU Hotplug

2024-07-01 Thread Igor Mammedov

On Wed, 26 Jun 2024 17:53:52 +
Salil Mehta  wrote:

> Hi Gavin,
> 
> >  From: Gavin Shan 
> >  Sent: Wednesday, June 26, 2024 5:13 AM
> >  To: Salil Mehta ; Igor Mammedov
> >  
> >  
> >  Hi Salil and Igor,
> >  
> >  On 6/26/24 9:51 AM, Salil Mehta wrote:  
> >  > On Wed, Jun 5, 2024 at 3:03 PM Igor Mammedov  
> >  mailto:imamm...@redhat.com>> wrote:  
> >  > On Sun, 2 Jun 2024 18:03:05 -0400
> >  > "Michael S. Tsirkin" mailto:m...@redhat.com>>  
> >  wrote:  
> >  >  
> >  >  > On Thu, May 30, 2024 at 12:42:33AM +0100, Salil Mehta wrote:  
> >  >  > > Virtual CPU hotplug support is being added across various 
> > architectures[1][3].
> >  >  > > This series adds various code bits common across all 
> > architectures:
> >  >  > >
> >  >  > > 1. vCPU creation and Parking code refactor [Patch 1]
> >  >  > > 2. Update ACPI GED framework to support vCPU Hotplug [Patch 2,3]
> >  >  > > 3. ACPI CPUs AML code change [Patch 4,5]
> >  >  > > 4. Helper functions to support unrealization of CPU objects 
> > [Patch  6,7]
> >  >  > > 5. Docs [Patch 8]
> >  >  > >
> >  >  > >
> >  >  > > Repository:
> >  >  > >
> >  >  > > [*] https://github.com/salil-mehta/qemu.git   
> >  virt-cpuhp-armv8/rfc- 
> > v3.arch.agnostic.v12
> >  >  > >
> >  >  > > NOTE: This series is meant to work in conjunction with 
> > Architecture specific patch-set.
> >  >  > > For ARM, this will work in combination of the architecture 
> > specific part based on
> >  >  > > RFC V2 [1]. This architecture specific patch-set RFC V3 shall 
> > be floated soon and is
> >  >  > > present at below location
> >  >  > >
> >  >  > > [*] 
> > https://github.com/salil-mehta/qemu/tree/virt-cpuhp-armv8/rfc-v3-rc1 
> > 
> >  >  > >  
> >  >  >
> >  >  >
> >  >  > Igor plan to take a look?  
> >  >
> >  > Yep, I plan to review it
> >  >
> >  >
> >  > A gentle reminder on this.
> >  >  
> >  
> >  Since the latest revision for this series is v13, so I guess Igor needs to 
> > do the
> >  final screening on v13 instead?
> >  
> >  v13: https://lists.nongnu.org/archive/html/qemu-arm/2024-06/msg00129.html  
> 
> 
> Yes, thanks for pointing this. 😊

I have v13 tagged,
Sorry for delay, I'll try to review it this week

> 
> 
> >  
> >  Thanks,
> >  Gavin
> >
>

[PATCH v2 6/7] memory: remove IOMMU MR iommu_set_page_size_mask() callback

2024-07-01 Thread Eric Auger

Everything is now in place to use the Host IOMMU Device callbacks
to retrieve the page size mask usable with a given assigned device.
This new method brings the advantage to pass the info much earlier
to the virtual IOMMU and before the IOMMU MR gets enabled. So let's
remove the call to memory_region_iommu_set_page_size_mask in
vfio common.c and remove the single implementation of the IOMMU MR
callback in the virtio-iommu.c

Signed-off-by: Eric Auger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
---
 include/exec/memory.h| 38 -
 hw/vfio/common.c |  8 ---
 hw/virtio/virtio-iommu.c | 45 
 system/memory.c  | 13 
 hw/virtio/trace-events   |  1 -
 5 files changed, 105 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0903513d13..6f9c78cc14 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -504,32 +504,6 @@ struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
-
-/**
- * @iommu_set_page_size_mask:
- *
- * Restrict the page size mask that can be supported with a given IOMMU
- * memory region. Used for example to propagate host physical IOMMU page
- * size mask limitations to the virtual IOMMU.
- *
- * Optional method: if this method is not provided, then the default global
- * page mask is used.
- *
- * @iommu: the IOMMUMemoryRegion
- *
- * @page_size_mask: a bitmask of supported page sizes. At least one bit,
- * representing the smallest page size, must be set. Additional set bits
- * represent supported block sizes. For example a host physical IOMMU that
- * uses page tables with a page size of 4kB, and supports 2MB and 4GB
- * blocks, will set mask 0x40201000. A granule of 4kB with indiscriminate
- * block sizes is specified with mask 0xf000.
- *
- * Returns 0 on success, or a negative error. In case of failure, the error
- * object must be created.
- */
- int (*iommu_set_page_size_mask)(IOMMUMemoryRegion *iommu,
- uint64_t page_size_mask,
- Error **errp);
 };
 
 typedef struct RamDiscardListener RamDiscardListener;
@@ -1919,18 +1893,6 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
  */
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
-/**
- * memory_region_iommu_set_page_size_mask: set the supported page
- * sizes for a given IOMMU memory region
- *
- * @iommu_mr: IOMMU memory region
- * @page_size_mask: supported page size mask
- * @errp: pointer to Error*, to store an error if it happens.
- */
-int memory_region_iommu_set_page_size_mask(IOMMUMemoryRegion *iommu_mr,
-   uint64_t page_size_mask,
-   Error **errp);
-
 /**
  * memory_region_name: get a memory region's name
  *
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7cdb969fd3..6d15b36e0b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -622,14 +622,6 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 int128_get64(llend),
 iommu_idx);
 
-ret = memory_region_iommu_set_page_size_mask(giommu->iommu_mr,
- bcontainer->pgsizes,
- &err);
-if (ret) {
-g_free(giommu);
-goto fail;
-}
-
 ret = memory_region_register_iommu_notifier(section->mr, &giommu->n,
 &err);
 if (ret) {
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 7d5db554af..dc337a6805 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1365,50 +1365,6 @@ static int 
virtio_iommu_notify_flag_changed(IOMMUMemoryRegion *iommu_mr,
 return 0;
 }
 
-/*
- * The default mask depends on the "granule" property. For example, with
- * 4k granule, it is -(4 * KiB). When an assigned device has page size
- * restrictions due to the hardware IOMMU configuration, apply this restriction
- * to the mask.
- */
-static int virtio_iommu_set_page_size_mask(IOMMUMemoryRegion *mr,
-   uint64_t new_mask,
-   Error **errp)
-{
-IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
-VirtIOIOMMU *s = sdev->viommu;
-uint64_t cur_mask = s->config.page_size_mask;
-
-trace_virtio_iommu_set_page_size_mask(mr->parent_obj.name, cur_mask,
-  new_mask);
-
-if ((cur_mask & new_mask) == 0) {
-error_setg(errp, "virtio-iommu %s reports a page size mask 0x%"PRIx64
-   " incompatible with currently

[PATCH v2 3/7] HostIOMMUDevice : remove Error handle from get_iova_ranges callback

2024-07-01 Thread Eric Auger

The error handle argument is not used anywhere. let's remove it.

Signed-off-by: Eric Auger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
---
 include/sysemu/host_iommu_device.h | 3 +--
 hw/vfio/container.c| 2 +-
 hw/vfio/iommufd.c  | 2 +-
 hw/virtio/virtio-iommu.c   | 2 +-
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/host_iommu_device.h 
b/include/sysemu/host_iommu_device.h
index ee6c813c8b..05c7324a0d 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -87,9 +87,8 @@ struct HostIOMMUDeviceClass {
  * @hiod Host IOMMU device
  *
  * @hiod: handle to the host IOMMU device
- * @errp: error handle
  */
-GList* (*get_iova_ranges)(HostIOMMUDevice *hiod, Error **errp);
+GList* (*get_iova_ranges)(HostIOMMUDevice *hiod);
 };
 
 /*
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 2ad57cd845..adeab1ac89 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1166,7 +1166,7 @@ static int hiod_legacy_vfio_get_cap(HostIOMMUDevice 
*hiod, int cap,
 }
 
 static GList *
-hiod_legacy_vfio_get_iova_ranges(HostIOMMUDevice *hiod, Error **errp)
+hiod_legacy_vfio_get_iova_ranges(HostIOMMUDevice *hiod)
 {
 VFIODevice *vdev = hiod->agent;
 
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 890d8d6a38..211e7223f1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -644,7 +644,7 @@ static bool hiod_iommufd_vfio_realize(HostIOMMUDevice 
*hiod, void *opaque,
 }
 
 static GList *
-hiod_iommufd_vfio_get_iova_ranges(HostIOMMUDevice *hiod, Error **errp)
+hiod_iommufd_vfio_get_iova_ranges(HostIOMMUDevice *hiod)
 {
 VFIODevice *vdev = hiod->agent;
 
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index b708fb96fd..b8f75d2b1a 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -615,7 +615,7 @@ static bool virtio_iommu_set_iommu_device(PCIBus *bus, void 
*opaque, int devfn,
 
 if (hiodc->get_iova_ranges) {
 int ret;
-host_iova_ranges = hiodc->get_iova_ranges(hiod, errp);
+host_iova_ranges = hiodc->get_iova_ranges(hiod);
 if (!host_iova_ranges) {
 return true; /* some old kernels may not support that capability */
 }
-- 
2.41.0

[PATCH v2 4/7] HostIOMMUDevice: Introduce get_page_size_mask() callback

2024-07-01 Thread Eric Auger

This callback will be used to retrieve the page size mask supported
along a given Host IOMMU device.

Signed-off-by: Eric Auger 
Reviewed-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-container-base.h |  7 +++
 include/sysemu/host_iommu_device.h|  8 
 hw/vfio/container.c   | 10 ++
 hw/vfio/iommufd.c | 11 +++
 4 files changed, 36 insertions(+)

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 45d7c40fce..62a8b60d87 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -88,6 +88,13 @@ int vfio_container_query_dirty_bitmap(const 
VFIOContainerBase *bcontainer,
 
 GList *vfio_container_get_iova_ranges(const VFIOContainerBase *bcontainer);
 
+static inline uint64_t
+vfio_container_get_page_size_mask(const VFIOContainerBase *bcontainer)
+{
+assert(bcontainer);
+return bcontainer->pgsizes;
+}
+
 #define TYPE_VFIO_IOMMU "vfio-iommu"
 #define TYPE_VFIO_IOMMU_LEGACY TYPE_VFIO_IOMMU "-legacy"
 #define TYPE_VFIO_IOMMU_SPAPR TYPE_VFIO_IOMMU "-spapr"
diff --git a/include/sysemu/host_iommu_device.h 
b/include/sysemu/host_iommu_device.h
index 05c7324a0d..c1bf74ae2c 100644
--- a/include/sysemu/host_iommu_device.h
+++ b/include/sysemu/host_iommu_device.h
@@ -89,6 +89,14 @@ struct HostIOMMUDeviceClass {
  * @hiod: handle to the host IOMMU device
  */
 GList* (*get_iova_ranges)(HostIOMMUDevice *hiod);
+/**
+ *
+ * @get_page_size_mask: Return the page size mask supported along this
+ * @hiod Host IOMMU device
+ *
+ * @hiod: handle to the host IOMMU device
+ */
+uint64_t (*get_page_size_mask)(HostIOMMUDevice *hiod);
 };
 
 /*
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index adeab1ac89..b5ce559a0d 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1174,6 +1174,15 @@ hiod_legacy_vfio_get_iova_ranges(HostIOMMUDevice *hiod)
 return vfio_container_get_iova_ranges(vdev->bcontainer);
 }
 
+static uint64_t
+hiod_legacy_vfio_get_page_size_mask(HostIOMMUDevice *hiod)
+{
+VFIODevice *vdev = hiod->agent;
+
+g_assert(vdev);
+return vfio_container_get_page_size_mask(vdev->bcontainer);
+}
+
 static void vfio_iommu_legacy_instance_init(Object *obj)
 {
 VFIOContainer *container = VFIO_IOMMU_LEGACY(obj);
@@ -1188,6 +1197,7 @@ static void hiod_legacy_vfio_class_init(ObjectClass *oc, 
void *data)
 hioc->realize = hiod_legacy_vfio_realize;
 hioc->get_cap = hiod_legacy_vfio_get_cap;
 hioc->get_iova_ranges = hiod_legacy_vfio_get_iova_ranges;
+hioc->get_page_size_mask = hiod_legacy_vfio_get_page_size_mask;
 };
 
 static const TypeInfo types[] = {
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 211e7223f1..7b5f87a148 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -652,12 +652,23 @@ hiod_iommufd_vfio_get_iova_ranges(HostIOMMUDevice *hiod)
 return vfio_container_get_iova_ranges(vdev->bcontainer);
 }
 
+static uint64_t
+hiod_iommufd_vfio_get_page_size_mask(HostIOMMUDevice *hiod)
+{
+VFIODevice *vdev = hiod->agent;
+
+g_assert(vdev);
+return vfio_container_get_page_size_mask(vdev->bcontainer);
+}
+
+
 static void hiod_iommufd_vfio_class_init(ObjectClass *oc, void *data)
 {
 HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_CLASS(oc);
 
 hiodc->realize = hiod_iommufd_vfio_realize;
 hiodc->get_iova_ranges = hiod_iommufd_vfio_get_iova_ranges;
+hiodc->get_page_size_mask = hiod_iommufd_vfio_get_page_size_mask;
 };
 
 static const TypeInfo types[] = {
-- 
2.41.0

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-07-01 Thread maobibo





On 2024/7/1 下午4:29, Jiaxun Yang wrote:



在2024年7月1日七月 上午8:22，maobibo写道：

On 2024/7/1 下午3:01, Jiaxun Yang wrote:



在2024年7月1日七月 上午7:44，maobibo写道：

Also this patch is problematic on LoongArch.

The original patch is to search physical cpuid rather than logic cpuid.

We want to make ipi module better and better, however now it comes back
to initial state at the beginning :(


Isn't arch_id the "physical id" you want? "cs->cpu_index" is the logical ID
for QEMU.

arch_id is setup by arch code, like APIC ID for x86.

I had come across the old ipi_getcpu  implementation, and I'm sure we were
looking at arch_id as well.

So, where is implementation code for function get_arch_id() looking for
vcpu with physical index?


Hi Bibo,

cpu_by_arch_id will be redirected to:

```
CPUState *cpu_by_arch_id(int64_t id)
{
 CPUState *cpu;

 CPU_FOREACH(cpu) {
 CPUClass *cc = CPU_GET_CLASS(cpu);

 if (cc->get_arch_id(cpu) == id) {
 return cpu;
 }
 }
 return NULL;
}
```

It iterates over all vcpus and return CPUStates with corresponding arch_id.

Whereas, for LoongArch's get_arch_id implementation:
```
static int64_t loongarch_cpu_get_arch_id(CPUState *cs)
{
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);

 return cpu->phy_id;
}
```
I believe it matches our intension here.

yes, you are right.

Got it, I miss the architecture specific implementation, and thanks for 
pointing it out with patience.


Regards


Thanks



Regards
Bibo Mao



Thanks
- Jiaxun


commit 03ca348b6b9038ce284916b36c19f700ac0ce7a6
Author: Jiaxun Yang 
Date:   Wed Jun 5 10:04:27 2024

   hw/intc/loongson_ipi: Replace ipi_getcpu with cpu_by_arch_id

   cpu_by_arch_id is doing the same thing as our ipi_getcpu logic.

   Signed-off-by: Jiaxun Yang 
   Reviewed-by: Song Gao 
   Message-ID: <20240605-loongson3-ipi-v3-4-ddd2c0e03...@flygoat.com>
   Signed-off-by: Philippe Mathieu-Daudé 


Regards
Bibo Mao

[PATCH v2 1/7] virtio-iommu: Fix error handling in virtio_iommu_set_host_iova_ranges()

2024-07-01 Thread Eric Auger

In case no IOMMUPciBus/IOMMUDevice are found we need to properly
set the error handle and return.

Fixes : Coverity CID 1549006
Signed-off-by: Eric Auger 
Fixes: cf2647a76e ("virtio-iommu: Compute host reserved regions")
Reviewed-by: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
---
 hw/virtio/virtio-iommu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index b9a7ddcd14..b708fb96fd 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -543,10 +543,15 @@ static int virtio_iommu_set_host_iova_ranges(VirtIOIOMMU 
*s, PCIBus *bus,
 int ret = -EINVAL;
 
 if (!sbus) {
-error_report("%s no sbus", __func__);
+error_setg(errp, "%s: no IOMMUPciBus found!", __func__);
+return ret;
 }
 
 sdev = sbus->pbdev[devfn];
+if (!sdev) {
+error_setg(errp, "%s: no IOMMUDevice found!", __func__);
+return ret;
+}
 
 current_ranges = sdev->host_resv_ranges;
 
-- 
2.41.0

[PATCH v2 2/7] vfio-container-base: Introduce vfio_container_get_iova_ranges() helper

2024-07-01 Thread Eric Auger

Introduce vfio_container_get_iova_ranges() to retrieve the usable
IOVA regions of the base container and use it in the Host IOMMU
device implementations of get_iova_ranges() callback.

We also fix a UAF bug as the list was shallow copied while
g_list_free_full() was used both on the single call site, in
virtio_iommu_set_iommu_device() but also in
vfio_container_instance_finalize(). Instead use g_list_copy_deep.

Fixes: cf2647a76e ("virtio-iommu: Compute host reserved regions")
Signed-off-by: Eric Auger 
Suggested-by: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-container-base.h |  2 ++
 hw/vfio/container-base.c  | 15 +++
 hw/vfio/container.c   |  8 +---
 hw/vfio/iommufd.c |  8 +---
 4 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 419e45ee7a..45d7c40fce 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -86,6 +86,8 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase 
*bcontainer,
 int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap, hwaddr iova, hwaddr size, Error **errp);
 
+GList *vfio_container_get_iova_ranges(const VFIOContainerBase *bcontainer);
+
 #define TYPE_VFIO_IOMMU "vfio-iommu"
 #define TYPE_VFIO_IOMMU_LEGACY TYPE_VFIO_IOMMU "-legacy"
 #define TYPE_VFIO_IOMMU_SPAPR TYPE_VFIO_IOMMU "-spapr"
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 50b1664f89..809b157674 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -83,6 +83,21 @@ int vfio_container_query_dirty_bitmap(const 
VFIOContainerBase *bcontainer,
errp);
 }
 
+static gpointer copy_iova_range(gconstpointer src, gpointer data)
+{
+ Range *source = (Range *)src;
+ Range *dest = g_new(Range, 1);
+
+ range_set_bounds(dest, range_lob(source), range_upb(source));
+ return dest;
+}
+
+GList *vfio_container_get_iova_ranges(const VFIOContainerBase *bcontainer)
+{
+assert(bcontainer);
+return g_list_copy_deep(bcontainer->iova_ranges, copy_iova_range, NULL);
+}
+
 static void vfio_container_instance_finalize(Object *obj)
 {
 VFIOContainerBase *bcontainer = VFIO_IOMMU(obj);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 2e7ecdf10e..2ad57cd845 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1169,15 +1169,9 @@ static GList *
 hiod_legacy_vfio_get_iova_ranges(HostIOMMUDevice *hiod, Error **errp)
 {
 VFIODevice *vdev = hiod->agent;
-GList *l = NULL;
 
 g_assert(vdev);
-
-if (vdev->bcontainer) {
-l = g_list_copy(vdev->bcontainer->iova_ranges);
-}
-
-return l;
+return vfio_container_get_iova_ranges(vdev->bcontainer);
 }
 
 static void vfio_iommu_legacy_instance_init(Object *obj)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index c2f158e603..890d8d6a38 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -647,15 +647,9 @@ static GList *
 hiod_iommufd_vfio_get_iova_ranges(HostIOMMUDevice *hiod, Error **errp)
 {
 VFIODevice *vdev = hiod->agent;
-GList *l = NULL;
 
 g_assert(vdev);
-
-if (vdev->bcontainer) {
-l = g_list_copy(vdev->bcontainer->iova_ranges);
-}
-
-return l;
+return vfio_container_get_iova_ranges(vdev->bcontainer);
 }
 
 static void hiod_iommufd_vfio_class_init(ObjectClass *oc, void *data)
-- 
2.41.0

[PATCH v2 7/7] virtio-iommu: Revert transient enablement of IOMMU MR in bypass mode

2024-07-01 Thread Eric Auger

In 94df5b2180d6 ("virtio-iommu: Fix 64kB host page size VFIO device
assignment"), in case of bypass mode, we transiently enabled the
IOMMU MR to allow the set_page_size_mask() to be called and pass
information about the page size mask constraint of cold plugged
VFIO devices. Now we do not use the IOMMU MR callback anymore, we
can just get rid of this hack.

Signed-off-by: Eric Auger 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Zhenzhong Duan 
---
 hw/virtio/virtio-iommu.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index dc337a6805..72011d2d11 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -1387,18 +1387,6 @@ static void virtio_iommu_freeze_granule(Notifier 
*notifier, void *data)
 VirtIOIOMMU *s = container_of(notifier, VirtIOIOMMU, machine_done);
 int granule;
 
-if (likely(s->config.bypass)) {
-/*
- * Transient IOMMU MR enable to collect page_size_mask requirements
- * through memory_region_iommu_set_page_size_mask() called by
- * VFIO region_add() callback
- */
- s->config.bypass = false;
- virtio_iommu_switch_address_space_all(s);
- /* restore default */
- s->config.bypass = true;
- virtio_iommu_switch_address_space_all(s);
-}
 s->granule_frozen = true;
 granule = ctz64(s->config.page_size_mask);
 trace_virtio_iommu_freeze_granule(BIT_ULL(granule));
-- 
2.41.0

[PATCH v2 5/7] virtio-iommu : Retrieve page size mask on virtio_iommu_set_iommu_device()

2024-07-01 Thread Eric Auger

Retrieve the Host IOMMU Device page size mask when this latter is set.
This allows to get the information much sooner than when relying on
IOMMU MR set_page_size_mask() call, whcih happens when the IOMMU MR
gets enabled. We introduce check_page_size_mask() helper whose code
is inherited from current virtio_iommu_set_page_size_mask()
implementation. This callback will be removed in a subsequent patch.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- do not update the mask if the granule is frozen (Zhenzhong)
---
 hw/virtio/virtio-iommu.c | 57 ++--
 hw/virtio/trace-events   |  1 +
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index b8f75d2b1a..7d5db554af 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -598,9 +598,39 @@ out:
 return ret;
 }
 
+static bool check_page_size_mask(VirtIOIOMMU *viommu, uint64_t new_mask,
+ Error **errp)
+{
+uint64_t cur_mask = viommu->config.page_size_mask;
+
+if ((cur_mask & new_mask) == 0) {
+error_setg(errp, "virtio-iommu reports a page size mask 0x%"PRIx64
+   " incompatible with currently supported mask 0x%"PRIx64,
+   new_mask, cur_mask);
+return false;
+}
+/*
+ * Once the granule is frozen we can't change the mask anymore. If by
+ * chance the hotplugged device supports the same granule, we can still
+ * accept it.
+ */
+if (viommu->granule_frozen) {
+int cur_granule = ctz64(cur_mask);
+
+if (!(BIT_ULL(cur_granule) & new_mask)) {
+error_setg(errp,
+   "virtio-iommu does not support frozen granule 0x%llx",
+   BIT_ULL(cur_granule));
+return false;
+}
+}
+return true;
+}
+
 static bool virtio_iommu_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
   HostIOMMUDevice *hiod, Error **errp)
 {
+ERRP_GUARD();
 VirtIOIOMMU *viommu = opaque;
 HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
 struct hiod_key *new_key;
@@ -623,8 +653,28 @@ static bool virtio_iommu_set_iommu_device(PCIBus *bus, 
void *opaque, int devfn,
 hiod->aliased_devfn,
 host_iova_ranges, errp);
 if (ret) {
-g_list_free_full(host_iova_ranges, g_free);
-return false;
+goto error;
+}
+}
+if (hiodc->get_page_size_mask) {
+uint64_t new_mask = hiodc->get_page_size_mask(hiod);
+
+if (check_page_size_mask(viommu, new_mask, errp)) {
+/*
+ * The default mask depends on the "granule" property. For example,
+ * with 4k granule, it is -(4 * KiB). When an assigned device has
+ * page size restrictions due to the hardware IOMMU configuration,
+ * apply this restriction to the mask.
+ */
+trace_virtio_iommu_update_page_size_mask(hiod->name,
+ 
viommu->config.page_size_mask,
+ new_mask);
+if (!viommu->granule_frozen) {
+viommu->config.page_size_mask &= new_mask;
+}
+} else {
+error_prepend(errp, "%s: ", hiod->name);
+goto error;
 }
 }
 
@@ -637,6 +687,9 @@ static bool virtio_iommu_set_iommu_device(PCIBus *bus, void 
*opaque, int devfn,
 g_list_free_full(host_iova_ranges, g_free);
 
 return true;
+error:
+g_list_free_full(host_iova_ranges, g_free);
+return false;
 }
 
 static void
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 3cf84e04a7..599d855ff6 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -132,6 +132,7 @@ virtio_iommu_notify_map(const char *name, uint64_t 
virt_start, uint64_t virt_end
 virtio_iommu_notify_unmap(const char *name, uint64_t virt_start, uint64_t 
virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t virt_end, 
uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64" 
phys_start=0x%"PRIx64
 virtio_iommu_set_page_size_mask(const char *name, uint64_t old, uint64_t new) 
"mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
+virtio_iommu_update_page_size_mask(const char *name, uint64_t old, uint64_t 
new) "host iommu device=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
 virtio_iommu_notify_flag_add(const char *name) "add notifier to mr %s"
 virtio_iommu_notify_flag_del(const char *name) "del notifier from mr %s"
 virtio_iommu_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool 
on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)"
-- 
2.41.0

Re: [PATCH v3 08/11] hw/acpi: Generic Port Affinity Structure support

2024-07-01 Thread Igor Mammedov

On Thu, 20 Jun 2024 17:03:16 +0100
Jonathan Cameron  wrote:

> These are very similar to the recently added Generic Initiators
> but instead of representing an initiator of memory traffic they
> represent an edge point beyond which may lie either targets or
> initiators.  Here we add these ports such that they may
> be targets of hmat_lb records to describe the latency and
> bandwidth from host side initiators to the port.  A discoverable
> mechanism such as UEFI CDAT read from CXL devices and switches
> is used to discover the remainder of the path, and the OS can build
> up full latency and bandwidth numbers as need for work and data
> placement decisions.
> 
> Acked-by: Markus Armbruster 
> Signed-off-by: Jonathan Cameron 
> ---
> v3: Move to hw/acpi/pci.c
> Rename the funciton to actually registers both types
> of generic nodes to reflect it isn't GI only.
> Note that the qom part is unchanged and other changes are mostly
> code movement so I've kept Markus' Ack.
> ---
>  qapi/qom.json|  34 
>  include/hw/acpi/acpi_generic_initiator.h |  35 
>  include/hw/acpi/aml-build.h  |   4 +
>  include/hw/acpi/pci.h|   3 +-
>  include/hw/pci/pci_bridge.h  |   1 +
>  hw/acpi/acpi_generic_initiator.c | 216 +++
>  hw/acpi/aml-build.c  |  40 +
>  hw/acpi/pci.c| 110 +++-
>  hw/arm/virt-acpi-build.c |   2 +-
>  hw/i386/acpi-build.c |   2 +-
>  hw/pci-bridge/pci_expander_bridge.c  |   1 -
>  11 files changed, 443 insertions(+), 5 deletions(-)

this is quite large patch, is it possible to split into
a set of smaller patches?

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 8bd299265e..8fa6bbd9a7 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -826,6 +826,38 @@
>'data': { 'pci-dev': 'str',
>  'node': 'uint32' } }
>  
> +##
> +# @AcpiGenericPortProperties:
> +#
> +# Properties for acpi-generic-port objects.
> +#
> +# @pci-bus: QOM path of the PCI bus of the hostbridge associated with
> +# this SRAT Generic Port Affinity Structure.  This is the same as
> +# the bus parameter for the root ports attached to this host
> +# bridge.  The resulting SRAT Generic Port Affinity Structure will
> +# refer to the ACPI object in DSDT that represents the host bridge
> +# (e.g.  ACPI0016 for CXL host bridges).  See ACPI 6.5 Section
> +# 5.2.16.7 for more information.
> +#
> +# @node: Similar to a NUMA node ID, but instead of providing a
> +# reference point used for defining NUMA distances and access
> +# characteristics to memory or from an initiator (e.g. CPU), this
> +# node defines the boundary point between non-discoverable system
> +# buses which must be described by firmware, and a discoverable
> +# bus.  NUMA distances and access characteristics are defined to
> +# and from that point.  For system software to establish full
> +# initiator to target characteristics this information must be
> +# combined with information retrieved from the discoverable part
> +# of the path.  An example would use CDAT (see UEFI.org)
> +# information read from devices and switches in conjunction with
> +# link characteristics read from PCIe Configuration space.
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'AcpiGenericPortProperties',
> +  'data': { 'pci-bus': 'str',
> +'node': 'uint32' } }
> +
>  ##
>  # @RngProperties:
>  #
> @@ -1019,6 +1051,7 @@
>  { 'enum': 'ObjectType',
>'data': [
>  'acpi-generic-initiator',
> +'acpi-generic-port',
>  'authz-list',
>  'authz-listfile',
>  'authz-pam',
> @@ -1092,6 +1125,7 @@
>'discriminator': 'qom-type',
>'data': {
>'acpi-generic-initiator': 'AcpiGenericInitiatorProperties',
> +  'acpi-generic-port':  'AcpiGenericPortProperties',
>'authz-list': 'AuthZListProperties',
>'authz-listfile': 'AuthZListFileProperties',
>'authz-pam':  'AuthZPAMProperties',
> diff --git a/include/hw/acpi/acpi_generic_initiator.h 
> b/include/hw/acpi/acpi_generic_initiator.h
> new file mode 100644
> index 00..92a39ad303
> --- /dev/null
> +++ b/include/hw/acpi/acpi_generic_initiator.h
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#ifndef ACPI_GENERIC_INITIATOR_H
> +#define ACPI_GENERIC_INITIATOR_H
> +
> +#include "qom/object_interfaces.h"
> +
> +#define TYPE_ACPI_GENERIC_INITIATOR "acpi-generic-initiator"
> +
> +typedef struct AcpiGenericInitiator {
> +/* private */
> +Object parent;
> +
> +/* public */
> +char *pci_dev;
> +uint16_t node;
> +} AcpiGenericInitiator;
> +
> +#define TYPE_ACPI_GENERIC_PORT "acpi-generic-port"
> +
> +typedef struct AcpiGene

Re: [PATCH] tcg/optimize: Fix TCG_COND_TST* simplification of setcond2

2024-07-01 Thread Alex Bennée

Richard Henderson  writes:

> Fix a typo in the argument movement.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c   |  2 +-
>  tests/tcg/x86_64/test-2413.c | 30 ++
>  2 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 tests/tcg/x86_64/test-2413.c
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 8886f7037a..ba16ec27e2 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -2384,7 +2384,7 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
>  
>  case TCG_COND_TSTEQ:
>  case TCG_COND_TSTNE:
> -if (arg_is_const_val(op->args[2], 0)) {
> +if (arg_is_const_val(op->args[3], 0)) {
>  goto do_setcond_high;
>  }
>  if (arg_is_const_val(op->args[4], 0)) {
> diff --git a/tests/tcg/x86_64/test-2413.c b/tests/tcg/x86_64/test-2413.c
> new file mode 100644
> index 00..a0e4d25093
> --- /dev/null
> +++ b/tests/tcg/x86_64/test-2413.c
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* Copyright 2024 Linaro, Ltd. */
> +/* See https://gitlab.com/qemu-project/qemu/-/issues/2413 */
> +
> +#include 

This also needs:

X86_64_TESTS += test-2413

in the Makefile.

> +
> +void test(unsigned long *a, unsigned long *d, unsigned long c)
> +{
> +asm("xorl %%eax, %%eax\n\t"
> +"xorl %%edx, %%edx\n\t"
> +"testb $0x20, %%cl\n\t"
> +"sete %%al\n\t"
> +"setne %%dl\n\t"
> +"shll %%cl, %%eax\n\t"
> +"shll %%cl, %%edx\n\t"
> +: "=a"(*a), "=d"(*d)
> +: "c"(c));
> +}
> +
> +int main(void)
> +{
> +long a, c, d;
> +
> +for (c = 0; c < 64; c++) {
> +test(&a, &d, c);
> +assert(a == (c & 0x20 ? 0 : 1u << (c & 0x1f)));
> +assert(d == (c & 0x20 ? 1u << (c & 0x1f) : 0));
> +}
> +return 0;
> +}

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout

2024-07-01 Thread Alberto Garcia

On Wed 12 Jun 2024 02:00:19 PM +03, Manos Pitsidianakis wrote:

Hi, thanks for the review and sorry for taking so long to reply, I was
on vacation.

>> scripts/qcow2-to-stdout.py | 330 +
>> 1 file changed, 330 insertions(+)
>> create mode 100755 scripts/qcow2-to-stdout.py
>
> I recommend running the `black` formatter on this script, it makes the
> code more diff-friendly and uniform. Also it has become the de-facto
> python style.

Hmmm, I don't like how it reformats some of the lines. However other
changes do make sense, so I'll apply those.

> Also, it's more pythonic to name constants in uppercase, like
> allocated_l2_tables. You can check such lints with pylint
> scripts/qcow2-to-stdout.py

allocated_l2_tables is not a constant :-?

>>+struct.pack_into('>I', header, 0x70, 0x6803f857)
>>+struct.pack_into('>I', header, 0x74, len(qcow2_features) * 48)
>>+cur_offset = 0x78
>
> Minor comment: extract magic values/offsets into constant globals with
> descriptive names, it'd help the code be more readable and easier to
> maintain if ported in the future to other formats.

Good idea, will do.

>>+for (feature_type, feature_bit, feature_name) in qcow2_features:
>>+struct.pack_into('>BB46s', header, cur_offset,
>>+ feature_type, feature_bit, 
>>feature_name.encode('ascii'))
>>+cur_offset += 48
>>+
>
>>From here onwards put everything under a main block like so:

Ok.

>>+# Command-line arguments
>>+parser = argparse.ArgumentParser(description='This program converts a QEMU 
>>disk image to qcow2 '
>>+ 'and writes it to the standard output')
>>+parser.add_argument('input_file', help='name of the input file')
>
> Suggestion:
>
> -parser.add_argument('input_file', help='name of the input file')
> +parser.add_argument('input_file', help='name of the input file', 
> type=pathlib.Path, required=True)

'required' is not valid in positional arguments, and I'm not sure what
benefits using pathlib brings in this case.

>>+parser.add_argument('-v', dest='qcow2_version', metavar='qcow2_version',
>
> Maybe -q instead of -v? No strong feelings on this one, it's just that 
> -v is usually version. -q is also usually --quiet so not sure...

Yeah, I thought the same but I didn't want to complicate this too much,
this is just a helper script.

>>+# If the input file is not in raw format we can use
>>+# qemu-storage-daemon to make it appear as such
>>+if input_format != 'raw':
>>+temp_dir = tempfile.mkdtemp()
>
> Consider using the tempfile.TemporaryDirectory as with... context 
> manager so that the temp dir cleanup is performed automatically

I don't think I can do that directly here because the temp dir has to
live until the very end (qemu-storage-daemon needs it).

>>+pid_file = temp_dir + "/pid"
>>+raw_file = temp_dir + "/raw"
>>+open(raw_file, 'wb').close()
>
> Consider using a with open(...) open manager for opening the file

How would that be? Like this?

with open(raw_file, 'wb'):
pass

If so I don't see the benefit, I just need to create an empty file and
close it immediately.

>>+atexit.register(kill_storage_daemon, pid_file, raw_file, temp_dir)
>
> Hm, this too could be a context manager. Seems very C-like to use
> atexit here.

Yeah it is, but I think that using the context manager would require me
to split the main function in two, and I'm not sure that it's worth it
for this case. Other Python scripts in the QEMU repo use atexit already.

>>+ret = subprocess.run(["qemu-storage-daemon", "--daemonize", "--pidfile", 
>>pid_file,
>>+  "--blockdev", 
>>f"driver=file,node-name=file0,driver=file,filename={input_file},read-only=on",
>>+  "--blockdev", 
>>f"driver={input_format},node-name=disk0,file=file0,read-only=on",
>>+  "--export", 
>>f"type=fuse,id=export0,node-name=disk0,mountpoint={raw_file},writable=off"])
>
> You can add shell=True, check=False arguments to subprocess.run() so 
> that it captures the outputs. (check=False is the default behavior, but 
> better make it explicit)

I'm not sure that I understand, why would I need to use a shell here?

>>+sys.stdout.buffer.write(cluster)
>
> Would it be a good idea to check if stdout is a tty and not a
> pipe/redirection? You can check it with isatty() and error out to
> prevent printing binary to the terminal.

Yeah this is a good idea, thanks.

Berto

Re: [PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Peter Maydell

On Mon, 1 Jul 2024 at 08:52, Stefano Garzarella  wrote:
>
> Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> switched using of address_space_read_cached() to virito_lduw_phys_cached()
> to access packed descriptor event.
>
> When we used address_space_read_cached(), we needed to call
> virtio_tswap16s() to handle the endianess of the field, but
> virito_lduw_phys_cached() already handles it internally, so we no longer
> need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
> but forgot for `flags`).
>
> Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> Cc: jasow...@redhat.com
> Cc: qemu-sta...@nongnu.org
> Reported-by: Xoykie 
> Link: 
> https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
> Signed-off-by: Stefano Garzarella 
> ---
>  hw/virtio/virtio.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..2e5e67bdb9 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
>  /* Make sure flags is seen before off_wrap */
>  smp_rmb();
>  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
> -virtio_tswap16s(vdev, &e->flags);
>  }

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 1/2] tests/avocado: update firmware for sbsa-ref

2024-07-01 Thread Peter Maydell

On Mon, 1 Jul 2024 at 07:49, Marcin Juszkiewicz
 wrote:
>
> W dniu 30.06.2024 o 16:37, Ard Biesheuvel pisze:
> > On Thu, 20 Jun 2024 at 12:20, Marcin Juszkiewicz
> >  wrote:
> >>
> >> Update firmware to have graphics card memory fix from EDK2 commit
> >> c1d1910be6e04a8b1a73090cf2881fb698947a6e:
> >>
> >>  OvmfPkg/QemuVideoDxe: add feature PCD to remap framebuffer W/C
> >>
> >>  Some platforms (such as SBSA-QEMU on recent builds of the emulator) 
> >> only
> >>  tolerate misaligned accesses to normal memory, and raise alignment
> >>  faults on such accesses to device memory, which is the default for 
> >> PCIe
> >>  MMIO BARs.
> >>
> >>  When emulating a PCIe graphics controller, the framebuffer is 
> >> typically
> >>  exposed via a MMIO BAR, while the disposition of the region is closer 
> >> to
> >>  memory (no side effects on reads or writes, except for the changing
> >>  picture on the screen; direct random access to any pixel in the 
> >> image).
> >>
> >>  In order to permit the use of such controllers on platforms that only
> >>  tolerate these types of accesses for normal memory, it is necessary to
> >>  remap the memory. Use the DXE services to set the desired capabilities
> >>  and attributes.
> >>
> >>  Hide this behavior under a feature PCD so only platforms that really
> >>  need it can enable it. (OVMF on x86 has no need for this)
> >>
> >> With this fix enabled we can boot sbsa-ref with more than one cpu core.
> >>
> >
> > This requires an explanation: what does the number of CPU cores have
> > to do with the memory attributes used for the framebuffer?
>
> I have no idea. Older firmware was hanging on several systems but was
> passing in QEMU tests. After closer looking I noticed that Avocado tests
> run with "-smp 1" and pass.
>
> Checked failing system with "-smp 1" and it worked. In meantime you have
> fixed problem in EDK2.
>
> So yes, updating firmware may look like hiding a bug. Which I do not
> know how to track (I can build and test QEMU, but going into its
> internals is something I never done).

My assumption was that random chance meant that TF-A when
only dealing with one CPU meant that its memory layout etc
was such that it didn't do the unaligned access. I don't think
this is likely to be a QEMU side question.

-- PMM

Re: [PATCH 02/23] target/i386: fix gen_prepare_size_nz condition

2024-07-01 Thread Igor Mammedov

On Fri, 28 Jun 2024 15:34:58 +0100
Alex Bennée  wrote:

> Alex Bennée  writes:
> 
> > Incorrect brace positions causes an unintended overflow on 32 bit
> > builds and shenanigans result.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
> > Suggested-by: Mark Cave-Ayland 
> > Signed-off-by: Alex Bennée   
> 
> This seems to trigger regressions in:
> 
>   qtest-x86_64/bios-tables-test
>   qtest-x86_64/pxe-test
>   qtest-x86_64/vmgenid-test
> 
> Could that be down to generated test data?

Without context, I'd guess, that
guest doesn't boot/get to randevu point that tests are waiting for
and then it just timeouts => fails.

> 
> > ---
> >  target/i386/tcg/translate.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> > index ad1819815a..94f13541c3 100644
> > --- a/target/i386/tcg/translate.c
> > +++ b/target/i386/tcg/translate.c
> > @@ -877,7 +877,7 @@ static CCPrepare gen_prepare_sign_nz(TCGv src, MemOp 
> > size)
> >  return (CCPrepare) { .cond = TCG_COND_LT, .reg = src };
> >  } else {
> >  return (CCPrepare) { .cond = TCG_COND_TSTNE, .reg = src,
> > - .imm = 1ull << ((8 << size) - 1) };
> > + .imm = (1ull << (8 << size)) - 1 };
> >  }
> >  }  
>

Re: [PATCH v2 07/14] hw/i386: convert 'q35' machine definitions to use new macros

2024-07-01 Thread Thomas Huth


On 20/06/2024 18.57, Daniel P. Berrangé wrote:

This changes the DEFINE_Q35_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

The added benefit is that it avoids the need to repeat the
version number thrice in three different formats in the calls
to DEFINE_Q35_MACHINE.

Due to the odd-ball '4.0.1' machine type version, this
commit introduces a DEFINE_Q35_BUGFIX helper, to allow
defining of "bugfix" machine types which have a three
digit version.

Signed-off-by: Daniel P. Berrangé 
---
  hw/i386/pc_q35.c | 215 ---
  1 file changed, 90 insertions(+), 125 deletions(-)


Reviewed-by: Thomas Huth

Re: [PATCH v2 06/14] hw/i386: convert 'i440fx' machine definitions to use new macros

2024-07-01 Thread Thomas Huth


On 20/06/2024 18.57, Daniel P. Berrangé wrote:

This changes the DEFINE_I440FX_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

The added benefit is that it avoids the need to repeat the
version number thrice in three different formats in the calls
to DEFINE_I440FX_MACHINE.

Signed-off-by: Daniel P. Berrangé 
---
  hw/i386/pc_piix.c| 219 +++
  include/hw/i386/pc.h |  26 +
  2 files changed, 122 insertions(+), 123 deletions(-)



Reviewed-by: Thomas Huth

Re: [PATCH v2 07/21] docs/qapidoc: fix nested parsing under untagged sections

2024-07-01 Thread Markus Armbruster

John Snow  writes:

> On Fri, Jun 28, 2024, 3:55 AM Markus Armbruster  wrote:
>
>> John Snow  writes:
>>
>> > Sphinx does not like sections without titles, because it wants to
>> > convert every section into a reference. When there is no title, it
>> > struggles to do this and transforms the tree inproperly.
>> >
>> > Depending on the rST used, this may result in an assertion error deep in
>> > the docutils HTMLWriter.
>> >
>> > (Observed when using ".. admonition:: Notes" under such a section - When
>> > this is transformed with its own  element, Sphinx is fooled into
>> > believing this title belongs to the section and incorrect mutates the
>> > docutils tree, leading to errors during rendering time.)
>> >
>> > When parsing an untagged section (free paragraphs), skip making a hollow
>> > section and instead append the parse results to the prior section.
>> >
>> > Many Bothans died to bring us this information.
>> >
>> > Signed-off-by: John Snow 
>> > Acked-by: Markus Armbruster 
>>
>> Generated HTML changes, but the diff is hard to review due to id
>> attribute changes all over the place.
>>
>> Generated qemu-ga-ref.7 also changes:
>>
>> diff -rup old/qemu-ga-ref.7 new/qemu-ga-ref.7
>> --- old/qemu-ga-ref.7   2024-06-27 10:42:21.466096276 +0200
>> +++ new/qemu-ga-ref.7   2024-06-27 10:45:36.502414099 +0200
>> @@ -397,6 +397,7 @@ shutdown request, with no guarantee of s
>>  .B \fBmode\fP: \fBstring\fP (optional)
>>  \(dqhalt\(dq, \(dqpowerdown\(dq (default), or \(dqreboot\(dq
>>  .UNINDENT
>> +.sp
>>  This command does NOT return a response on success.  Success
>>  condition is indicated by the VM exiting with a zero exit status or,
>>  when running with \-\-no\-shutdown, by issuing the query\-status QMP
>> @@ -1348,6 +1349,7 @@ the new password entry string, base64 en
>>  .B \fBcrypted\fP: \fBboolean\fP
>>  true if password is already crypt()d, false if raw
>>  .UNINDENT
>> +.sp
>>  If the \fBcrypted\fP flag is true, it is the caller\(aqs
>> responsibility to
>>  ensure the correct crypt() encryption scheme is used.  This command
>>  does not attempt to interpret or report on the encryption scheme.
>>
>> We add vertical space.  Visible when viewed with man.  Looks like an
>> improvement to me.
>>
>> Here's the first of these two spots in HTML:

[...]

>> The id changes muddy the waters.  With them manually removed:

[...]

>> Makes no visual difference in my browser.
>>
>> Do these differences match your expectations?
>>
>
> Yep!
>
> It does change the output just a little, but Sphinx really doesn't like
> title-less sections.
>
> I thought the change looked fine, and I'm still planning on removing this
> old generator anyway, so...

I'll add to the commit message

The resulting output changes are basically invisible.

Thanks!

Re: [RFC v3 1/2] target/loongarch: Add loongson binary translation feature

2024-07-01 Thread maobibo





On 2024/7/1 下午4:42, Jiaxun Yang wrote:



在2024年7月1日七月 上午8:32，maobibo写道：
[...]


+static void loongarch_cpu_check_lbt(CPUState *cs, Error **errp)
+{
+CPULoongArchState *env = cpu_env(cs);
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+bool kvm_supported;
+
+kvm_supported = kvm_feature_supported(cs, LOONGARCH_FEATURE_LBT);


IMHO if there is no global states that should be saved/restored VM wise,
this should be handled at per CPU level, preferably with CPUCFG flags hint.

We should minimize non-privilege KVM feature bits to prevent hindering
asymmetry ISA system.

For "asymmetry ISA system", do you meaning some vcpus have LBT feature,
however some vcpus does have LBT features? That does not exists at all.


Yes, we should always prepare for the future :-)

 From Weiwu's presentations, I believe LASX asymmetry product is already on the
roadmap. So for LBT it's also a possibility.
I never hear such product plan with different LASX/LSX capability even 
for the future big-little product.




Even if such product won't land in near future, we should still try our best
to bound to vCPU creation, not to the system.



It will be big disaster for such products, how does application use this?


Affinity placement etc, there are many possibilities.

On Arm side, there are already systems with Big.Little asymmetry CPU that
some of the processor equipped 32 bit EL0 mode while others doesn't. They
managed that well with affinity.

See: arm64: Allow mismatched 32-bit EL0 support

Is it only 32 bit EL0 mode or 32bit compatible mode and 64 bit also?

What is the detailed product name and how does Linux/KVM support this 
product?


Regards
Bibo Mao


Thanks


Regards
Bibo


Thanks
- Jiaxun

Re: [PATCH] vhost-user: Skip unnecessary duplicated VHOST_USER_SET_LOG_BASE requests

2024-07-01 Thread BillXiang



> From: "Alex Bennée"
> Date:  Mon, Jul 1, 2024, 16:49
> Subject:  Re: [PATCH] vhost-user: Skip unnecessary duplicated 
> VHOST_USER_SET_LOG_BASE requests
> To: "项文成"
> Cc: , 
> 项文成  writes:
> 
> > From: BillXiang 
> >
> > The VHOST_USER_SET_LOG_BASE requests should be categorized into
> > non-vring specific messages, and should be sent only once.
> > If send more than once, dpdk will munmap old log_addr which may has
> > been used and cause segmentation fault.
> 
> This looks fine to me but looking at the vhost-user.rst we don't seem to
> make any explicit statements about how many times given messages should
> be sent.
> 
There is indeed no explicit statements about how many times given messages
 should be sent in vhost-user.rst but already have some discussions such as 
https://lore.kernel.org/qemu-devel/20230127083027-mutt-send-email-...@kernel.org/.
> >
> > Signed-off-by: BillXiang 
> > ---
> >  hw/virtio/vhost-user.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index cdf9af4a4b..41e34edd49 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -371,6 +371,7 @@ static bool 
> > vhost_user_per_device_request(VhostUserRequest request)
> >      case VHOST_USER_RESET_DEVICE:
> >      case VHOST_USER_ADD_MEM_REG:
> >      case VHOST_USER_REM_MEM_REG:
> > +    case VHOST_USER_SET_LOG_BASE:
> >          return true;
> >      default:
> >          return false;
> 
> -- 
> Alex Bennée
> Virtualisation Tech Lead @ Linaro

Re: [v2 1/1] hw/i386/acpi-build: add OSHP method support for SHPC driver load

2024-07-01 Thread Gao,Shiyuan

> > > > the PCI bridge will fail when we use SHPC Native type:
> > > >
> > > >   [3.336059] shpchp :00:03.0: Requesting control of SHPC hotplug 
> > > >via OSHP (\_SB_.PCI0.S28_)
> > > >   [3.337408] shpchp :00:03.0: Requesting control of SHPC hotplug 
> > > >via OSHP (\_SB_.PCI0)
> > > >   [3.338710] shpchp :00:03.0: Cannot get control of SHPC hotplug
> > > >
> > > > Add OSHP method support for transfer control to the operating system,
> > > > after this SHPC driver will be loaded success and the hotplug device to
> > > > the PCI bridge will success when we use SHPC Native type.
> > > >
> > > >   [1.703975] shpchp :00:03.0: Requesting control of SHPC hotplug 
> > > >via OSHP (\_SB_.PCI0.S18_)
> > > >   [1.704934] shpchp :00:03.0: Requesting control of SHPC hotplug 
> > > >via OSHP (\_SB_.PCI0)
> > > >   [1.705855] shpchp :00:03.0: Gained control of SHPC hotplug 
> > > >(\_SB_.PCI0)
> > > >   [1.707054] shpchp :00:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 
> > > >0 ss_did 0
> > >
> > > please describe in commit message reproducer
> > > (aka QEMU CLI and guest OS and if necessary other details)
> >
> > qemu-system-x86_64 -machine pc-q35-9.0
> > ...
> > -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off
>
> please use full QEMU CLI and follow up steps to trigger the issue.
>
> From above it's not obvious what and where you are trying to hotplug

Nothing needs to be done when you start a i440fx VM, this issue will be 
triggered.
PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off is used to verify shpc driver 
load sucess.

>
> > guest OS: centos7/ubuntu22.04
> >
> > I will add it in the next version.
> >
> > > > +/*
> > > > + * PCI Firmware Specification 3.0
> > > > + * 4.8. The OSHP Control Method
> > > > + */
> > > > +static Aml *build_oshp_method(void)
> > > > +{
> > > > +    Aml *method;
> > > > +
> > > > +    /*
> > > > + * We don't use ACPI to control the SHPC, so just return
> > > > + * success is enough.
> > > > + */
> > > > +    method = aml_method("OSHP", 0, AML_NOTSERIALIZED);
> > > > +    aml_append(method, aml_return(aml_int(0x0)));
> > > > +    return method;
> > > > +}
> > > > +
> > > >  static void
> > > >  build_dsdt(GArray *table_data, BIOSLinker *linker,
> > > > AcpiPmInfo *pm, AcpiMiscInfo *misc,
> > > > @@ -1452,6 +1469,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > > >  aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
> > > >  aml_append(dev, aml_name_decl("_UID", 
> > > >aml_int(pcmc->pci_root_uid)));
> > > >  aml_append(dev, aml_pci_edsm());
> > > > +    aml_append(dev, build_oshp_method());
> > >
> > > it's global and what will happen if we have ACPI PCI hotplug enabled
> > > and guest calls this NOP method?
> >
> > ths OS get the control of SHPC hotplug and SHPC driver load fail later.
> >
> > [    6.170345] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > OSHP (\_SB_.PCI0.S18_)
> > [    6.171962] shpchp :00:03.0: Requesting control of SHPC hotplug via 
> > OSHP (\_SB_.PCI0)
> > [    6.173556] shpchp :00:03.0: Gained control of SHPC hotplug 
> > (\_SB_.PCI0)
> > [    6.175144] shpchp :00:03.0: HPC vendor_id 1b36 device_id 1 ss_vid 0 
> > ss_did 0
> > [    6.196153] shpchp :00:03.0: irq 24 for MSI/MSI-X
> > [    6.197211] shpchp :00:03.0: pci_hp_register failed with error -16
> > [    6.198272] shpchp :00:03.0: Slot initialization failed
> >
> > this looks more suitable.
> >
> > +    if (!pm->pcihp_bridge_en) {
> > +    aml_append(dev, build_i440fx_oshp_method());
> > +    }
>
> we also have
>  PIIX4_PM.acpi-root-pci-hotplug (default true)
> though it seems that ACPI hotplug takes precedence of SHPC if both are 
> enabled.
> So I'd take it and OSHP approach seems simpler than adding _OSC to do the 
> same.

yes, I tried to add an OSC method, but the OS and the firmware failed to 
negotiate in the guest, dmesg as follow.
Through analyzing the code, I found that this process relies on 
pci_ext_cfg_avail in negotiate_os_control. On i440fx,
pci_ext_cfg_avail is false.

[0.631156] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI 
EDR HPX-Type3]
[0.632184] acpi PNP0A03:00: _OSC: not requesting OS control; OS requires 
[ExtendedConfig ASPM ClockPM MSI]
[0.633160] acpi PNP0A03:00: fail to add MMCONFIG information, can't access 
extended PCI configuration space under this bridge.

Therefore, I chose the OSHP method.
>
> >
> > >
> > > >  aml_append(sb_scope, dev);
> > > >  aml_append(dsdt, sb_scope);
> > > >
> > > > @@ -1586,6 +1604,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> > > >  aml_append(dev, build_q35_osc_method(true));
> > > >  } else {
> > > >  aml_append(dev, aml_name_decl("_HID", 
> > > >aml_eisaid("PNP0A03")));
> > > > +    aml_append(dev, build_oshp_method());
> > > >  }
> > > >
> > > >

RE: [PATCH v2 5/7] virtio-iommu : Retrieve page size mask on virtio_iommu_set_iommu_device()

2024-07-01 Thread Duan, Zhenzhong




>-Original Message-
>From: Eric Auger 
>Subject: [PATCH v2 5/7] virtio-iommu : Retrieve page size mask on
>virtio_iommu_set_iommu_device()
>
>Retrieve the Host IOMMU Device page size mask when this latter is set.
>This allows to get the information much sooner than when relying on
>IOMMU MR set_page_size_mask() call, whcih happens when the IOMMU
s/whcih/which
Sorry, I missed it in last review.
>MR
>gets enabled. We introduce check_page_size_mask() helper whose code
>is inherited from current virtio_iommu_set_page_size_mask()
>implementation. This callback will be removed in a subsequent patch.
>
>Signed-off-by: Eric Auger 

Otherwise,
Reviewed-by: Zhenzhong Duan 

Thanks
Zhenzhong

>
>---
>
>v1 -> v2:
>- do not update the mask if the granule is frozen (Zhenzhong)
>---
> hw/virtio/virtio-iommu.c | 57
>++--
> hw/virtio/trace-events   |  1 +
> 2 files changed, 56 insertions(+), 2 deletions(-)
>
>diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>index b8f75d2b1a..7d5db554af 100644
>--- a/hw/virtio/virtio-iommu.c
>+++ b/hw/virtio/virtio-iommu.c
>@@ -598,9 +598,39 @@ out:
> return ret;
> }
>
>+static bool check_page_size_mask(VirtIOIOMMU *viommu, uint64_t
>new_mask,
>+ Error **errp)
>+{
>+uint64_t cur_mask = viommu->config.page_size_mask;
>+
>+if ((cur_mask & new_mask) == 0) {
>+error_setg(errp, "virtio-iommu reports a page size mask 0x%"PRIx64
>+   " incompatible with currently supported mask 0x%"PRIx64,
>+   new_mask, cur_mask);
>+return false;
>+}
>+/*
>+ * Once the granule is frozen we can't change the mask anymore. If by
>+ * chance the hotplugged device supports the same granule, we can still
>+ * accept it.
>+ */
>+if (viommu->granule_frozen) {
>+int cur_granule = ctz64(cur_mask);
>+
>+if (!(BIT_ULL(cur_granule) & new_mask)) {
>+error_setg(errp,
>+   "virtio-iommu does not support frozen granule 0x%llx",
>+   BIT_ULL(cur_granule));
>+return false;
>+}
>+}
>+return true;
>+}
>+
> static bool virtio_iommu_set_iommu_device(PCIBus *bus, void *opaque,
>int devfn,
>   HostIOMMUDevice *hiod, Error **errp)
> {
>+ERRP_GUARD();
> VirtIOIOMMU *viommu = opaque;
> HostIOMMUDeviceClass *hiodc =
>HOST_IOMMU_DEVICE_GET_CLASS(hiod);
> struct hiod_key *new_key;
>@@ -623,8 +653,28 @@ static bool
>virtio_iommu_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> hiod->aliased_devfn,
> host_iova_ranges, errp);
> if (ret) {
>-g_list_free_full(host_iova_ranges, g_free);
>-return false;
>+goto error;
>+}
>+}
>+if (hiodc->get_page_size_mask) {
>+uint64_t new_mask = hiodc->get_page_size_mask(hiod);
>+
>+if (check_page_size_mask(viommu, new_mask, errp)) {
>+/*
>+ * The default mask depends on the "granule" property. For 
>example,
>+ * with 4k granule, it is -(4 * KiB). When an assigned device has
>+ * page size restrictions due to the hardware IOMMU configuration,
>+ * apply this restriction to the mask.
>+ */
>+trace_virtio_iommu_update_page_size_mask(hiod->name,
>+ 
>viommu->config.page_size_mask,
>+ new_mask);
>+if (!viommu->granule_frozen) {
>+viommu->config.page_size_mask &= new_mask;
>+}
>+} else {
>+error_prepend(errp, "%s: ", hiod->name);
>+goto error;
> }
> }
>
>@@ -637,6 +687,9 @@ static bool
>virtio_iommu_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> g_list_free_full(host_iova_ranges, g_free);
>
> return true;
>+error:
>+g_list_free_full(host_iova_ranges, g_free);
>+return false;
> }
>
> static void
>diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>index 3cf84e04a7..599d855ff6 100644
>--- a/hw/virtio/trace-events
>+++ b/hw/virtio/trace-events
>@@ -132,6 +132,7 @@ virtio_iommu_notify_map(const char *name,
>uint64_t virt_start, uint64_t virt_end
> virtio_iommu_notify_unmap(const char *name, uint64_t virt_start,
>uint64_t virt_end) "mr=%s virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
> virtio_iommu_remap(const char *name, uint64_t virt_start, uint64_t
>virt_end, uint64_t phys_start) "mr=%s virt_start=0x%"PRIx64"
>virt_end=0x%"PRIx64" phys_start=0x%"PRIx64
> virtio_iommu_set_page_size_mask(const char *name, uint64_t old,
>uint64_t new) "mr=%s old_mask=0x%"PRIx64" new_mask=0x%"PRIx64
>+virtio_iommu_update_page_size_mask(const char *name, uint64_t old,
>uint64_t new) "host iommu device=%s old_mask=0x%"PRIx64"
>new_mask=0x

Re: [PATCH] tcg/optimize: Fix TCG_COND_TST* simplification of setcond2

2024-07-01 Thread Alex Bennée

Richard Henderson  writes:

> Fix a typo in the argument movement.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c   |  2 +-
>  tests/tcg/x86_64/test-2413.c | 30 ++
>  2 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 tests/tcg/x86_64/test-2413.c
>
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 8886f7037a..ba16ec27e2 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -2384,7 +2384,7 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
>  
>  case TCG_COND_TSTEQ:
>  case TCG_COND_TSTNE:
> -if (arg_is_const_val(op->args[2], 0)) {
> +if (arg_is_const_val(op->args[3], 0)) {
>  goto do_setcond_high;
>  }
>  if (arg_is_const_val(op->args[4], 0)) {
> diff --git a/tests/tcg/x86_64/test-2413.c b/tests/tcg/x86_64/test-2413.c
> new file mode 100644
> index 00..a0e4d25093
> --- /dev/null
> +++ b/tests/tcg/x86_64/test-2413.c
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* Copyright 2024 Linaro, Ltd. */
> +/* See https://gitlab.com/qemu-project/qemu/-/issues/2413 */
> +
> +#include 
> +
> +void test(unsigned long *a, unsigned long *d, unsigned long c)
> +{
> +asm("xorl %%eax, %%eax\n\t"
> +"xorl %%edx, %%edx\n\t"
> +"testb $0x20, %%cl\n\t"
> +"sete %%al\n\t"
> +"setne %%dl\n\t"
> +"shll %%cl, %%eax\n\t"
> +"shll %%cl, %%edx\n\t"
> +: "=a"(*a), "=d"(*d)
> +: "c"(c));
> +}
> +
> +int main(void)
> +{
> +long a, c, d;

The compiler complains about the mismatch between long and the unsigned
long of the test function.

Anyway dropping the previous fix and just using this:

Tested-by: Alex Bennée 


> +
> +for (c = 0; c < 64; c++) {
> +test(&a, &d, c);
> +assert(a == (c & 0x20 ? 0 : 1u << (c & 0x1f)));
> +assert(d == (c & 0x20 ? 1u << (c & 0x1f) : 0));
> +}
> +return 0;
> +}

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Eugenio Perez Martin

On Mon, Jul 1, 2024 at 9:52 AM Stefano Garzarella  wrote:
>
> Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> switched using of address_space_read_cached() to virito_lduw_phys_cached()
> to access packed descriptor event.
>
> When we used address_space_read_cached(), we needed to call
> virtio_tswap16s() to handle the endianess of the field, but
> virito_lduw_phys_cached() already handles it internally, so we no longer
> need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
> but forgot for `flags`).
>
> Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> Cc: jasow...@redhat.com
> Cc: qemu-sta...@nongnu.org
> Reported-by: Xoykie 
> Link: 
> https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
> Signed-off-by: Stefano Garzarella 

Reviewed-by: Eugenio Pérez 

I think it would be great to test the patches using a big endian host
just in case.

Thanks!

> ---
>  hw/virtio/virtio.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..2e5e67bdb9 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
>  /* Make sure flags is seen before off_wrap */
>  smp_rmb();
>  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
> -virtio_tswap16s(vdev, &e->flags);
>  }
>
>  static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> --
> 2.45.2
>

[PATCH] virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged

2024-07-01 Thread Cédric Le Goater

When a VFIO device is hoplugged in a VM using virtio-iommu, IOMMUPciBus
and IOMMUDevice cache entries are created in the .get_address_space()
handler of the machine IOMMU device. However, these entries are never
destroyed, not even when the VFIO device is detached from the machine.
This can lead to an assert if the device is reattached again.

When reattached, the .get_address_space() handler reuses an
IOMMUDevice entry allocated when the VFIO device was first attached.
virtio_iommu_set_host_iova_ranges() is called later on from the
.set_iommu_device() handler an fails with an assert on 'probe_done'
because the device appears to have been already probed when this is
not the case.

The IOMMUDevice entry is allocated in pci_device_iommu_address_space()
called from under vfio_realize(), the VFIO PCI realize handler. Since
pci_device_unset_iommu_device() is called from vfio_exitfn(), a sub
function of the PCIDevice unrealize() handler, it seems that the
.unset_iommu_device() handler is the best place to release resources
allocated at realize time. Clear the IOMMUDevice cache entry there to
fix hotplug.

Fixes: 817ef10da23c ("virtio-iommu: Implement set|unset]_iommu_device() 
callbacks")
Signed-off-by: Cédric Le Goater 
---
 hw/virtio/virtio-iommu.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 
72011d2d11ebf1da343b5924f5514ccfe6b2580d..57f53f0fa79cb34bfb75f80bcb9301b523b2a6ab
 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -467,6 +467,26 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, 
void *opaque,
 return &sdev->as;
 }
 
+static void virtio_iommu_device_clear(VirtIOIOMMU *s, PCIBus *bus, int devfn)
+{
+IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
+IOMMUDevice *sdev;
+
+if (!sbus) {
+return;
+}
+
+sdev = sbus->pbdev[devfn];
+if (!sdev) {
+return;
+}
+
+g_list_free_full(sdev->resv_regions, g_free);
+sdev->resv_regions = NULL;
+g_free(sdev);
+sbus->pbdev[devfn] = NULL;
+}
+
 static gboolean hiod_equal(gconstpointer v1, gconstpointer v2)
 {
 const struct hiod_key *key1 = v1;
@@ -708,6 +728,7 @@ virtio_iommu_unset_iommu_device(PCIBus *bus, void *opaque, 
int devfn)
 }
 
 g_hash_table_remove(viommu->host_iommu_devices, &key);
+virtio_iommu_device_clear(viommu, bus, devfn);
 }
 
 static const PCIIOMMUOps virtio_iommu_ops = {
-- 
2.45.2

Re: [PATCH v2 04/15] hw/isa/vt82c686: Define a GPIO line between vt82c686 and i8259

2024-07-01 Thread Akihiko Odaki


On 2024/06/29 22:08, BALATON Zoltan wrote:

On Thu, 27 Jun 2024, Akihiko Odaki wrote:

This fixes qemu_irq array leak.

Signed-off-by: Akihiko Odaki 
---
hw/isa/vt82c686.c | 7 ---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
index 8582ac0322eb..629d2d568137 100644
--- a/hw/isa/vt82c686.c
+++ b/hw/isa/vt82c686.c
@@ -715,13 +715,14 @@ static void via_isa_realize(PCIDevice *d, Error 
**errp)

    ViaISAState *s = VIA_ISA(d);
    DeviceState *dev = DEVICE(d);
    PCIBus *pci_bus = pci_get_bus(d);
-    qemu_irq *isa_irq;
+    qemu_irq isa_irq;
    ISABus *isa_bus;
    int i;

    qdev_init_gpio_out(dev, &s->cpu_intr, 1);
    qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
-    isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
+    qdev_init_gpio_in_named(dev, via_isa_request_i8259_irq, "i8259", 1);


The chip has no such pin so this is a QEMU internal connection that I 
think should not be modelled with a named gpio. I think we really need a 
function to init a qemu_irq (for now we only have one that also 
allocares it) so then we can put this in ViaISAState and init in place. 
I'll make a patch.


According to qdev_pass_gpios(), it is valid to have a GPIO line even if 
a QOM device to be connected with the GPIO line is actually included in 
one chip package. I didn't use qdev_pass_gpios() because it cannot 
expose a subset of the GPIO array.


Regards,
Akihiko Odaki

Re: [PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Michael S. Tsirkin

On Mon, Jul 01, 2024 at 11:53:34AM +0200, Eugenio Perez Martin wrote:
> On Mon, Jul 1, 2024 at 9:52 AM Stefano Garzarella  wrote:
> >
> > Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> > switched using of address_space_read_cached() to virito_lduw_phys_cached()
> > to access packed descriptor event.
> >
> > When we used address_space_read_cached(), we needed to call
> > virtio_tswap16s() to handle the endianess of the field, but
> > virito_lduw_phys_cached() already handles it internally, so we no longer
> > need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
> > but forgot for `flags`).
> >
> > Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
> > Cc: jasow...@redhat.com
> > Cc: qemu-sta...@nongnu.org
> > Reported-by: Xoykie 
> > Link: 
> > https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
> > Signed-off-by: Stefano Garzarella 
> 
> Reviewed-by: Eugenio Pérez 
> 
> I think it would be great to test the patches using a big endian host
> just in case.
> 
> Thanks!

I think we really should move to using sparse tags for endian-ness,
like Linux does, and away from swap calls.


> > ---
> >  hw/virtio/virtio.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index 893a072c9d..2e5e67bdb9 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
> >  /* Make sure flags is seen before off_wrap */
> >  smp_rmb();
> >  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
> > -virtio_tswap16s(vdev, &e->flags);
> >  }
> >
> >  static void vring_packed_off_wrap_write(VirtIODevice *vdev,
> > --
> > 2.45.2
> >

Re: [PATCH v4 00/14] test/tcg: Clang build fixes for arm/aarch64

2024-07-01 Thread Akihiko Odaki


On 2024/07/01 4:00, Richard Henderson wrote:

Supercedes: 20240629-tcg-v3-0-fa57918bd...@daynix.com
("[PATCH v3 0/7] tests/tcg/aarch64: Fix inline assemblies for clang")

On top of Akihiko's patches for aarch64, additional changes are
required for arm, both as a host and as a guest.


For the whole series:
Reviewed-by: Akihiko Odaki

Re: [PATCH] virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged

2024-07-01 Thread Eric Auger

Hi Cédric,

On 7/1/24 12:14, Cédric Le Goater wrote:
> When a VFIO device is hoplugged in a VM using virtio-iommu, IOMMUPciBus
> and IOMMUDevice cache entries are created in the .get_address_space()
> handler of the machine IOMMU device. However, these entries are never
> destroyed, not even when the VFIO device is detached from the machine.
> This can lead to an assert if the device is reattached again.
>
> When reattached, the .get_address_space() handler reuses an
> IOMMUDevice entry allocated when the VFIO device was first attached.
> virtio_iommu_set_host_iova_ranges() is called later on from the
> .set_iommu_device() handler an fails with an assert on 'probe_done'
> because the device appears to have been already probed when this is
> not the case.
This patch fixes the assert on hotplug/hotunplug/hotplug which is
definitively needed.

However the deallocation issue also exists for other devices (for
instance a virtio-net device) where
pbdev[devfn] is also leaked. I see Intel IOMMU uses a hash table
indexeed by both the bus and devfn. This would allow to avoid leaking
devfns on unrealize. I think we need that change both on virtio-iommu
and smmu.

Besides you could imagine to unplug a virtio-net device and plug a vfio
device on same BDF. In such as case unplugging the virtio-net device
won't deallocate the IOMMUDevice and the first device will be leaked. In
mid term we really would need something symetrical to the
get_address_space but it is unclear to me where we should call it.
Michael, do you have any advice?

Thanks

Eric
>
> The IOMMUDevice entry is allocated in pci_device_iommu_address_space()
> called from under vfio_realize(), the VFIO PCI realize handler. Since
> pci_device_unset_iommu_device() is called from vfio_exitfn(), a sub
> function of the PCIDevice unrealize() handler, it seems that the
> .unset_iommu_device() handler is the best place to release resources
> allocated at realize time. Clear the IOMMUDevice cache entry there to
> fix hotplug.
>
> Fixes: 817ef10da23c ("virtio-iommu: Implement set|unset]_iommu_device() 
> callbacks")
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/virtio/virtio-iommu.c | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 
> 72011d2d11ebf1da343b5924f5514ccfe6b2580d..57f53f0fa79cb34bfb75f80bcb9301b523b2a6ab
>  100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -467,6 +467,26 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus 
> *bus, void *opaque,
>  return &sdev->as;
>  }
>  
> +static void virtio_iommu_device_clear(VirtIOIOMMU *s, PCIBus *bus, int devfn)
> +{
> +IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
> +IOMMUDevice *sdev;
> +
> +if (!sbus) {
> +return;
> +}
> +
> +sdev = sbus->pbdev[devfn];
> +if (!sdev) {
> +return;
> +}
> +
> +g_list_free_full(sdev->resv_regions, g_free);
> +sdev->resv_regions = NULL;
I am not sure the above is requested
> +g_free(sdev);
> +sbus->pbdev[devfn] = NULL;
> +}
> +
>  static gboolean hiod_equal(gconstpointer v1, gconstpointer v2)
>  {
>  const struct hiod_key *key1 = v1;
> @@ -708,6 +728,7 @@ virtio_iommu_unset_iommu_device(PCIBus *bus, void 
> *opaque, int devfn)
>  }
>  
>  g_hash_table_remove(viommu->host_iommu_devices, &key);
> +virtio_iommu_device_clear(viommu, bus, devfn);
>  }
>  
>  static const PCIIOMMUOps virtio_iommu_ops = {

Re: [PATCH v16 00/13] Support blob memory and venus on qemu

2024-07-01 Thread Marc-André Lureau

Hi,

All R-b now, it looks good to merge. Thanks for all the effort!

Michael, are you taking it through your tree?

On Sun, Jun 23, 2024 at 7:23 PM Dmitry Osipenko <
dmitry.osipe...@collabora.com> wrote:

> Hello,
>
> This series enables Vulkan Venus context support on virtio-gpu.
>
> All virglrender and almost all Linux kernel prerequisite changes
> needed by Venus are already in upstream. For kernel there is a pending
> KVM patchset that fixes mapping of compound pages needed for DRM drivers
> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
> from Qemu.
>
> [1]
> https://lore.kernel.org/kvm/20240229025759.1187910-1-steve...@google.com/
>
> You'll need to use recent Mesa version containing patch that removes
> dependency on cross-device feature from Venus that isn't supported by
> Qemu [2].
>
> [2]
> https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>
> Example Qemu cmdline that enables Venus:
>
>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true
> \
>   -machine q35,accel=kvm,memory-backend=mem1 \
>   -object memory-backend-memfd,id=mem1,size=8G -m 8G
>
>
> Changes from V15 to V16
>
> - Fixed resource_get_info() change made in v15 that was squshed into a
>   wrong patch. Squashed set_scanout_blob patch into the blob-commands
> patch,
>   otherwise we'll need to revert back ordering of blob patches to older v7
>   variant.
>
> - Changed hostmem mapping state to boolean, suggested by Akihiko Odaki.
>
> - Documented Venus in virtio-gpu doc. Amended "Support Venus context" patch
>   with the doc addition. Suggested by Akihiko Odaki.
>
> Changes from V14 to V15
>
> - Dropped hostmem mapping state that got unused in v14, suggested by
>   Akihiko Odaki.
>
> - Moved resource_get_info() from set_scanout_blob() to create_blob(),
>   suggested by Akihiko Odaki.
>
> - Fixed unitilized variable in create_blob(), spotted by Alex Bennée.
>
> Changes from V13 to V14
>
> - Fixed erronous fall-through in renderer_state's switch-case that was
>   spotted by Marc-André Lureau.
>
> - Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by
>   Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED.
>
> - Made use of g_autofree in virgl_cmd_resource_create_blob() as was
>   suggested by Akihiko Odaki.
>
> - Removed virtio_gpu_virgl_deinit() and moved all deinit code to
>   virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau.
>
> - Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR
>   check as was suggested by Marc-André Lureau.
>
> - Added trace event for cmd-suspension as was suggested by Marc-André
> Lureau.
>
> - Added patch to replace in-flight printf's with trace events as was
>   suggested by Marc-André Lureau
>
> Changes from V12 to V13
>
> - Replaced `res->async_unmap_in_progress` flag with a mapping state,
>   moved it to the virtio_gpu_virgl_hostmem_region like was suggested
>   by Akihiko Odaki.
>
> - Renamed blob_unmap function and added back cmd_suspended argument
>   to it. Suggested by Akihiko Odaki.
>
> - Reordered VirtIOGPUGL refactoring patches to minimize code changes
>   like was suggested by Akihiko Odaki.
>
> - Replaced gl->renderer_inited with gl->renderer_state, like was suggested
>   by Alex Bennée.
>
> - Added gl->renderer state resetting to gl_device_unrealize(), for
>   consistency. Suggested by Alex Bennée.
>
> - Added rb's from Alex and Manos.
>
> - Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB.
>
> Changes from V11 to V12
>
> - Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't
>   corrupt resource list and releases resource properly on error. Thanks
>   to Akihiko Odaki for spotting the bug.
>
> - Added new patch that handles virtio_gpu_virgl_init() failure gracefully,
>   fixing Qemu crash. Besides fixing the crash, it allows to implement
>   a cleaner virtio_gpu_virgl_deinit().
>
> - virtio_gpu_virgl_deinit() now assumes that previously virgl was
>   initialized successfully when it was inited at all. Suggested by
>   Akihiko Odaki.
>
> - Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit()
>
> - Added back blob unmapping or RESOURCE_UNREF that was requested
>   by Akihiko Odaki. Added comment to the code explaining how
>   async unmapping works. Added back `res->async_unmap_in_progress`
>   flag and added comment telling why it's needed.
>
> - Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes
>   suggested by Akihiko Odaki.
>
> - Added patches that move fence_poll and print_stats timers to VirtIOGPUGL
>   for consistency with cmdq_resume_bh.
>
> Changes from V10 to V11
>
> - Replaced cmd_resume bool in struct ctrl_command with
>   "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested
>   by Akihiko Odaki.
>
> - Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding
>   the 'async_unmap_in_progress' flag that was dropped in v9:
>
> 1. virgl_cmd_res

[PATCH v4 00/19] SMMUv3 nested translation support

2024-07-01 Thread Mostafa Saleh

Currently, QEMU supports emulating either stage-1 or stage-2 SMMUs
but not nested instances.
This patch series adds support for nested translation in SMMUv3,
this is controlled by property “arm-smmuv3.stage=nested”, and
advertised to guests as (IDR0.S1P == 1 && IDR0.S2P == 2)

Main changes(architecture):

1) CDs are considered IPA and translated with stage-2.
2) TTBx and tables for stage-1 are considered IPA and translated
   with stage-2.
3) Translate the IPA address with stage-2.

TLBs:
==
TLBs are the most tricky part.

1) General design
   Unified(Combined) design is used, where entries with ASID=-1 are
   IPAs(cached from stage-2 config)

   TLBs are also modified to cache 2 permissions, a new permission added
   "parent_perm."

   For non-nested configuration, perm == parent_perm and nothing
   changes. This is used to know which stage to use in case there is
   a permission fault from a TLB entry.

2) Caching in TLB
   Stage-1 and stage-2 are inserted in the TLB as is.
   For nested translation, both entries are combined into one TLB
   entry. The size (level and granule) are chosen from the smallest entries.
   That means that a stage-1 translation can be cached with sage-2
   granule in key, this is taken into account for lookup.

3) TLB Lookup
   TLB lookup already uses ASID in key, so it can distinguish between
   stage-1 and stage-2.
   And as mentioned above, the granule for stage-1 can be different,
   If stage-1 lookup failed, we try again with the stage-2 granule.

4) TLB invalidation
   - Address invalidation is split, for IOVA(CMD_TLBI_NH_VA
 /CMD_TLBI_NH_VAA) and IPA(CMD_TLBI_S2_IPA) based on ASID value
   - CMD_TLBI_NH_ASID/CMD_TLBI_NH_ALL: Consider VMID if stage-2 is
 supported, and invalidate stage-1 only by VMIDs

As far as I understand, this is compliant with the ARM architecture:
- ARM ARM DDI 0487J.a: RLGSCG, RTVTYQ, RGNJPZ
- ARM IHI 0070F.b: 16.2 Caching

An alternative approach would be to instantiate 2 TLBs, one per each
stage. I haven’t investigated that.

Others
===
- Advertise SMMUv3.2-S2FWB, it is NOP for QEMU as it doesn’t support
  attributes.

- OAS: A typical setup with nesting is to share CPU stage-2 with the
  SMMU, and according to the user manual, SMMU OAS must match the
  system physical address.

  This was discussed before in
  https://lore.kernel.org/all/20230226220650.1480786-11-smost...@google.com/
  This series doesn’t implement that, but reworks OAS to make it easier
  to configure in the future.

- For nested configuration, IOVA notifier only notifies for stage-1
  invalidations (as far as I understand this is the intended
  behaviour as it notifies for IOVA).

- Stop ignoring VMID for stage-1 if stage-2 is also supported.


Future improvements:
=
1) One small improvement, that I don’t think it’s worth the extra
   complexity, is in case of Stage-1 TLB miss for nested translation,
   we can do stage-1 walk and lookup for stage-2 TLBs, instead of
   doing the full walk.

Testing

1) IOMMUFD + VFIO
   Kernel: https://lore.kernel.org/all/cover.1683688960.git.nicol...@nvidia.com/
   VMM: 
https://qemu-devel.nongnu.narkive.com/o815DqpI/rfc-v5-0-8-arm-smmuv3-emulation-support

   By assigning 
“virtio-net-pci,netdev=net0,disable-legacy=on,iommu_platform=on,ats=on”,
   to a guest VM (on top of QEMU guest) with VIFO and IOMMUFD.

2) Work in progress prototype I am hacking on for nesting on KVM
   (this is nowhere near complete, and misses many stuff but it
   doesn't require VMs/VFIO) also with virtio-net-pci and git
   cloning a bunch of stuff and also observing traces.
   
https://android-kvm.googlesource.com/linux/+log/refs/heads/smostafa/android15-6.6-smmu-nesting-wip

Overall I tested the following configurations
(S1 = 4k, S2 =4k):
- S1 level = 1 and S2 level = 1
- S1 level = 1 and S2 level = 2
- S1 level = 1 and S2 level = 3
- S1 level = 2 and S2 level = 1
- S1 level = 2 and S2 level = 2
- S1 level = 2 and S2 level = 3
- S1 level = 3 and S2 level = 2
- S1 level = 3 and S2 level = 3
Also did some testing with
(S1 = 16k, S2= 4k)
(S1 = 4K, S2= 16k)


hw/arm/smmuv3: Split smmuv3_translate() better viewed with --color-moved

The first 3 patches are fixes.

Changes in v4:
v3: 
https://lore.kernel.org/qemu-devel/20240429032403.74910-1-smost...@google.com/
- Collected Eric and Alex Rbs
- Rebased on master
- Dropped RFC tag
- Dropped last 2 patches about oas changes to avoid blocking this series
  and I will post them after as RFC
- Split patch 7, and introduce CACHED_ENTRY_TO_ADDR in a separate patch
- Reorder patch 8 and 9 (combine tlb and tlb lookup)
- Split patch 12, and introduce smmu_iotlb_inv_asid_vmid in a separate patch
- Split patch 14, to have fault changes in a separate patch
- Update commit messages and include Fixes sha
- Minor updates, renames and a lot of comments based on review

Changes in v3
v2: 
https://lore.kernel.org/qemu-devel/20240408140818.3799590-1-smost...@google.com/
-

[PATCH v4 01/19] hw/arm/smmu-common: Add missing size check for stage-1

2024-07-01 Thread Mostafa Saleh

According to the SMMU architecture specification (ARM IHI 0070 F.b),
in “3.4 Address sizes”
The address output from the translation causes a stage 1 Address Size
fault if it exceeds the range of the effective IPA size for the given CD.

However, this check was missing.

There is already a similar check for stage-2 against effective PA.

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 1ce706bf94..eb2356bc35 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -381,6 +381,16 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 goto error;
 }
 
+/*
+ * The address output from the translation causes a stage 1 Address
+ * Size fault if it exceeds the range of the effective IPA size for
+ * the given CD.
+ */
+if (gpa >= (1ULL << cfg->oas)) {
+info->type = SMMU_PTW_ERR_ADDR_SIZE;
+goto error;
+}
+
 tlbe->entry.translated_addr = gpa;
 tlbe->entry.iova = iova & ~mask;
 tlbe->entry.addr_mask = mask;
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 04/19] hw/arm/smmu: Use enum for SMMU stage

2024-07-01 Thread Mostafa Saleh

Currently, translation stage is represented as an int, where 1 is stage-1 and
2 is stage-2, when nested is added, 3 would be confusing to represent nesting,
so we use an enum instead.

While keeping the same values, this is useful for:
 - Doing tricks with bit masks, where BIT(0) is stage-1 and BIT(1) is
   stage-2 and both is nested.
 - Tracing, as stage is printed as int.

Reviewed-by: Eric Auger 
Reviewed-by: Alex Bennée 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 14 +++---
 hw/arm/smmuv3.c  | 15 ---
 include/hw/arm/smmu-common.h | 11 +--
 3 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 8a8c718e6b..8a5858f69f 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -304,7 +304,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
   SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
 dma_addr_t baseaddr, indexmask;
-int stage = cfg->stage;
+SMMUStage stage = cfg->stage;
 SMMUTransTableInfo *tt = select_tt(cfg, iova);
 uint8_t level, granule_sz, inputsize, stride;
 
@@ -402,7 +402,7 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 info->type = SMMU_PTW_ERR_TRANSLATION;
 
 error:
-info->stage = 1;
+info->stage = SMMU_STAGE_1;
 tlbe->entry.perm = IOMMU_NONE;
 return -EINVAL;
 }
@@ -425,7 +425,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
   dma_addr_t ipa, IOMMUAccessFlags perm,
   SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
-const int stage = 2;
+const SMMUStage stage = SMMU_STAGE_2;
 int granule_sz = cfg->s2cfg.granule_sz;
 /* ARM DDI0487I.a: Table D8-7. */
 int inputsize = 64 - cfg->s2cfg.tsz;
@@ -525,7 +525,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
 error_ipa:
 info->addr = ipa;
 error:
-info->stage = 2;
+info->stage = SMMU_STAGE_2;
 tlbe->entry.perm = IOMMU_NONE;
 return -EINVAL;
 }
@@ -544,9 +544,9 @@ error:
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
  SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
 {
-if (cfg->stage == 1) {
+if (cfg->stage == SMMU_STAGE_1) {
 return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info);
-} else if (cfg->stage == 2) {
+} else if (cfg->stage == SMMU_STAGE_2) {
 /*
  * If bypassing stage 1(or unimplemented), the input address is passed
  * directly to stage 2 as IPA. If the input address of a transaction
@@ -555,7 +555,7 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, 
IOMMUAccessFlags perm,
  */
 if (iova >= (1ULL << cfg->oas)) {
 info->type = SMMU_PTW_ERR_ADDR_SIZE;
-info->stage = 1;
+info->stage = SMMU_STAGE_1;
 tlbe->entry.perm = IOMMU_NONE;
 return -EINVAL;
 }
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 1eb5b160d2..dab3ad2db9 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -34,7 +34,8 @@
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
 
-#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == 1) ? (cfg)->record_faults : \
+#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
+ (cfg)->record_faults : \
  (cfg)->s2cfg.record_faults)
 
 /**
@@ -402,7 +403,7 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t 
t0sz, uint8_t gran)
 
 static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
 {
-cfg->stage = 2;
+cfg->stage = SMMU_STAGE_2;
 
 if (STE_S2AA64(ste) == 0x0) {
 qemu_log_mask(LOG_UNIMP,
@@ -678,7 +679,7 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, 
SMMUEventInfo *event)
 
 /* we support only those at the moment */
 cfg->aa64 = true;
-cfg->stage = 1;
+cfg->stage = SMMU_STAGE_1;
 
 cfg->oas = oas2bits(CD_IPS(cd));
 cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
@@ -762,7 +763,7 @@ static int smmuv3_decode_config(IOMMUMemoryRegion *mr, 
SMMUTransCfg *cfg,
 return ret;
 }
 
-if (cfg->aborted || cfg->bypassed || (cfg->stage == 2)) {
+if (cfg->aborted || cfg->bypassed || (cfg->stage == SMMU_STAGE_2)) {
 return 0;
 }
 
@@ -882,7 +883,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 goto epilogue;
 }
 
-if (cfg->stage == 1) {
+if (cfg->stage == SMMU_STAGE_1) {
 /* Select stage1 translation table. */
 tt = select_tt(cfg, addr);
 if (!tt) {
@@ -919,7 +920,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
  * nesting is not supported. So it is sufficient to check the
  * translation stage to know the TLB stage for now.
  */
-event.u.f_walk_eabt.s2 = (cfg->stage == 2);
+event.u.f_walk_eabt.s2 = (cfg->stage == SMMU_STAGE_2);
 if (PTW_RECORD_FAULT(cfg)) {

[PATCH v4 03/19] hw/arm/smmuv3: Fix encoding of CLASS in events

2024-07-01 Thread Mostafa Saleh

The SMMUv3 spec (ARM IHI 0070 F.b - 7.3 Event records) defines the
class of events faults as:

CLASS: The class of the operation that caused the fault:
- 0b00: CD, CD fetch.
- 0b01: TTD, Stage 1 translation table fetch.
- 0b10: IN, Input address

However, this value was not set and left as 0 which means CD and not
IN (0b10).
While at it, add an enum for class as it would be used for nesting.
However, at the moment stage-1 and stage-2 use the same class values.

Fixes: 9bde7f0674 “hw/arm/smmuv3: Implement translate callback”
Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3-internal.h | 6 ++
 hw/arm/smmuv3.c  | 6 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index e4dd11e1e6..0f3ecec804 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -32,6 +32,12 @@ typedef enum SMMUTranslationStatus {
 SMMU_TRANS_SUCCESS,
 } SMMUTranslationStatus;
 
+typedef enum SMMUTranslationClass {
+SMMU_CLASS_CD,
+SMMU_CLASS_TT,
+SMMU_CLASS_IN,
+} SMMUTranslationClass;
+
 /* MMIO Registers */
 
 REG32(IDR0,0x0)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 9dd3ea48e4..1eb5b160d2 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -942,7 +942,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 event.type = SMMU_EVT_F_WALK_EABT;
 event.u.f_walk_eabt.addr = addr;
 event.u.f_walk_eabt.rnw = flag & 0x1;
-event.u.f_walk_eabt.class = 0x1;
+event.u.f_walk_eabt.class = SMMU_CLASS_TT;
 event.u.f_walk_eabt.addr2 = ptw_info.addr;
 break;
 case SMMU_PTW_ERR_TRANSLATION:
@@ -950,6 +950,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 event.type = SMMU_EVT_F_TRANSLATION;
 event.u.f_translation.addr = addr;
 event.u.f_translation.addr2 = ptw_info.addr;
+event.u.f_translation.class = SMMU_CLASS_IN;
 event.u.f_translation.rnw = flag & 0x1;
 }
 break;
@@ -958,6 +959,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 event.type = SMMU_EVT_F_ADDR_SIZE;
 event.u.f_addr_size.addr = addr;
 event.u.f_addr_size.addr2 = ptw_info.addr;
+event.u.f_translation.class = SMMU_CLASS_IN;
 event.u.f_addr_size.rnw = flag & 0x1;
 }
 break;
@@ -966,6 +968,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 event.type = SMMU_EVT_F_ACCESS;
 event.u.f_access.addr = addr;
 event.u.f_access.addr2 = ptw_info.addr;
+event.u.f_translation.class = SMMU_CLASS_IN;
 event.u.f_access.rnw = flag & 0x1;
 }
 break;
@@ -974,6 +977,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 event.type = SMMU_EVT_F_PERMISSION;
 event.u.f_permission.addr = addr;
 event.u.f_permission.addr2 = ptw_info.addr;
+event.u.f_translation.class = SMMU_CLASS_IN;
 event.u.f_permission.rnw = flag & 0x1;
 }
 break;
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 19/19] hw/arm/smmu: Refactor SMMU OAS

2024-07-01 Thread Mostafa Saleh

SMMUv3 OAS is currently hardcoded in the code to 44 bits, for nested
configurations that can be a problem, as stage-2 might be shared with
the CPU which might have different PARANGE, and according to SMMU manual
ARM IHI 0070F.b:
6.3.6 SMMU_IDR5, OAS must match the system physical address size.

This patch doesn't change the SMMU OAS, but refactors the code to
make it easier to do that:
- Rely everywhere on IDR5 for reading OAS instead of using the
  SMMU_IDR5_OAS macro, so, it is easier just to change IDR5 and
  it propagages correctly.
- Add additional checks when OAS is greater than 48bits.
- Remove unused functions/macros: pa_range/MAX_PA.

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c |  7 ---
 hw/arm/smmuv3-internal.h | 13 -
 hw/arm/smmuv3.c  | 35 ---
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index faba4adc49..2cff80e5dd 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -452,7 +452,8 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 inputsize = 64 - tt->tsz;
 level = 4 - (inputsize - 4) / stride;
 indexmask = VMSA_IDXMSK(inputsize, stride, level);
-baseaddr = extract64(tt->ttb, 0, 48);
+
+baseaddr = extract64(tt->ttb, 0, cfg->oas);
 baseaddr &= ~indexmask;
 
 while (level < VMSA_LEVELS) {
@@ -576,8 +577,8 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
  * Get the ttb from concatenated structure.
  * The offset is the idx * size of each ttb(number of ptes * (sizeof(pte))
  */
-uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, 48) + (1 << stride) *
-  idx * sizeof(uint64_t);
+uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, cfg->s2cfg.eff_ps) +
+  (1 << stride) * idx * sizeof(uint64_t);
 dma_addr_t indexmask = VMSA_IDXMSK(inputsize, stride, level);
 
 baseaddr &= ~indexmask;
diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
index 0f3ecec804..0ebf2eebcf 100644
--- a/hw/arm/smmuv3-internal.h
+++ b/hw/arm/smmuv3-internal.h
@@ -602,19 +602,6 @@ static inline int oas2bits(int oas_field)
 return -1;
 }
 
-static inline int pa_range(STE *ste)
-{
-int oas_field = MIN(STE_S2PS(ste), SMMU_IDR5_OAS);
-
-if (!STE_S2AA64(ste)) {
-return 40;
-}
-
-return oas2bits(oas_field);
-}
-
-#define MAX_PA(ste) ((1 << pa_range(ste)) - 1)
-
 /* CD fields */
 
 #define CD_VALID(x)   extract32((x)->word[0], 31, 1)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 88378e83dd..6954b385c7 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -410,10 +410,10 @@ static bool s2t0sz_valid(SMMUTransCfg *cfg)
 }
 
 if (cfg->s2cfg.granule_sz == 16) {
-return (cfg->s2cfg.tsz >= 64 - oas2bits(SMMU_IDR5_OAS));
+return (cfg->s2cfg.tsz >= 64 - cfg->s2cfg.eff_ps);
 }
 
-return (cfg->s2cfg.tsz >= MAX(64 - oas2bits(SMMU_IDR5_OAS), 16));
+return (cfg->s2cfg.tsz >= MAX(64 - cfg->s2cfg.eff_ps, 16));
 }
 
 /*
@@ -434,8 +434,11 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t 
t0sz, uint8_t gran)
 return nr_concat <= VMSA_MAX_S2_CONCAT;
 }
 
-static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
+static int decode_ste_s2_cfg(SMMUv3State *s, SMMUTransCfg *cfg,
+ STE *ste)
 {
+uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
+
 if (STE_S2AA64(ste) == 0x0) {
 qemu_log_mask(LOG_UNIMP,
   "SMMUv3 AArch32 tables not supported\n");
@@ -468,7 +471,15 @@ static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
 }
 
 /* For AA64, The effective S2PS size is capped to the OAS. */
-cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), SMMU_IDR5_OAS));
+cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), oas));
+/*
+ * For SMMUv3.1 and later, when OAS == IAS == 52, the stage 2 input
+ * range is further limited to 48 bits unless STE.S2TG indicates a
+ * 64KB granule.
+ */
+if (cfg->s2cfg.granule_sz != 16) {
+cfg->s2cfg.eff_ps = MIN(cfg->s2cfg.eff_ps, 48);
+}
 /*
  * It is ILLEGAL for the address in S2TTB to be outside the range
  * described by the effective S2PS value.
@@ -544,6 +555,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   STE *ste, SMMUEventInfo *event)
 {
 uint32_t config;
+uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
 int ret;
 
 if (!STE_VALID(ste)) {
@@ -587,8 +599,8 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
  * Stage-1 OAS defaults to OAS even if not enabled as it would be used
  * in input address check for stage-2.
  */
-cfg->oas = oas2bits(SMMU_IDR5_OAS);
-ret = decode_ste_s2_cfg(cfg, ste);
+cfg->oas = oas2bits(oas);
+ret = decode_ste_s2_cfg(s, cfg, ste);
 if (ret) {
 goto bad_ste;

[PATCH v4 07/19] hw/arm/smmu: Introduce CACHED_ENTRY_TO_ADDR

2024-07-01 Thread Mostafa Saleh

Soon, smmuv3_do_translate() will be used to translate the CD and the
TTBx, instead of re-writting the same logic to convert the returned
cached entry to an address, add a new macro CACHED_ENTRY_TO_ADDR.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c  | 3 +--
 include/hw/arm/smmu-common.h | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index cc61708160..229b3c388c 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -950,8 +950,7 @@ epilogue:
 switch (status) {
 case SMMU_TRANS_SUCCESS:
 entry.perm = cached_entry->entry.perm;
-entry.translated_addr = cached_entry->entry.translated_addr +
-(addr & cached_entry->entry.addr_mask);
+entry.translated_addr = CACHED_ENTRY_TO_ADDR(cached_entry, addr);
 entry.addr_mask = cached_entry->entry.addr_mask;
 trace_smmuv3_translate_success(mr->parent_obj.name, sid, addr,
entry.translated_addr, entry.perm,
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 96eb017e50..09d3b9e734 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -37,6 +37,9 @@
 #define VMSA_IDXMSK(isz, strd, lvl) ((1ULL << \
  VMSA_BIT_LVL(isz, strd, lvl)) - 1)
 
+#define CACHED_ENTRY_TO_ADDR(ent, addr)  (ent)->entry.translated_addr + \
+ ((addr) & (ent)->entry.addr_mask);
+
 /*
  * Page table walk error types
  */
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 06/19] hw/arm/smmu: Consolidate ASID and VMID types

2024-07-01 Thread Mostafa Saleh

ASID and VMID used to be uint16_t in the translation config, however,
in other contexts they can be int as -1 in case of TLB invalidation,
to represent all (don’t care).
When stage-2 was added asid was set to -1 in stage-2 and vmid to -1
in stage-1 configs. However, that meant they were set as (65536),
this was not an issue as nesting was not supported and no
commands/lookup uses both.

With nesting, it’s critical to get this right as translation must be
tagged correctly with ASID/VMID, and with ASID=-1 meaning stage-2.
Represent ASID/VMID everywhere as int.

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 10 +-
 hw/arm/smmuv3.c  |  4 ++--
 hw/arm/trace-events  | 18 +-
 include/hw/arm/smmu-common.h | 14 +++---
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index d94db6b34f..21982621c0 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -57,7 +57,7 @@ static gboolean smmu_iotlb_key_equal(gconstpointer v1, 
gconstpointer v2)
(k1->vmid == k2->vmid);
 }
 
-SMMUIOTLBKey smmu_get_iotlb_key(uint16_t asid, uint16_t vmid, uint64_t iova,
+SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova,
 uint8_t tg, uint8_t level)
 {
 SMMUIOTLBKey key = {.asid = asid, .vmid = vmid, .iova = iova,
@@ -130,7 +130,7 @@ void smmu_iotlb_inv_all(SMMUState *s)
 static gboolean smmu_hash_remove_by_asid(gpointer key, gpointer value,
  gpointer user_data)
 {
-uint16_t asid = *(uint16_t *)user_data;
+int asid = *(int *)user_data;
 SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
 
 return SMMU_IOTLB_ASID(*iotlb_key) == asid;
@@ -139,7 +139,7 @@ static gboolean smmu_hash_remove_by_asid(gpointer key, 
gpointer value,
 static gboolean smmu_hash_remove_by_vmid(gpointer key, gpointer value,
  gpointer user_data)
 {
-uint16_t vmid = *(uint16_t *)user_data;
+int vmid = *(int *)user_data;
 SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
 
 return SMMU_IOTLB_VMID(*iotlb_key) == vmid;
@@ -191,13 +191,13 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int 
vmid, dma_addr_t iova,
 &info);
 }
 
-void smmu_iotlb_inv_asid(SMMUState *s, uint16_t asid)
+void smmu_iotlb_inv_asid(SMMUState *s, int asid)
 {
 trace_smmu_iotlb_inv_asid(asid);
 g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
 }
 
-void smmu_iotlb_inv_vmid(SMMUState *s, uint16_t vmid)
+void smmu_iotlb_inv_vmid(SMMUState *s, int vmid)
 {
 trace_smmu_iotlb_inv_vmid(vmid);
 g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid, &vmid);
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index f98c157221..cc61708160 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1243,7 +1243,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 }
 case SMMU_CMD_TLBI_NH_ASID:
 {
-uint16_t asid = CMD_ASID(&cmd);
+int asid = CMD_ASID(&cmd);
 
 if (!STAGE1_SUPPORTED(s)) {
 cmd_error = SMMU_CERROR_ILL;
@@ -1276,7 +1276,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 break;
 case SMMU_CMD_TLBI_S12_VMALL:
 {
-uint16_t vmid = CMD_VMID(&cmd);
+int vmid = CMD_VMID(&cmd);
 
 if (!STAGE2_SUPPORTED(s)) {
 cmd_error = SMMU_CERROR_ILL;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index cc12924a84..09ccd39548 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -11,13 +11,13 @@ smmu_ptw_page_pte(int stage, int level,  uint64_t iova, 
uint64_t baseaddr, uint6
 smmu_ptw_block_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, 
uint64_t pte, uint64_t iova, uint64_t gpa, int bsize_mb) "stage=%d level=%d 
base@=0x%"PRIx64" pte@=0x%"PRIx64" pte=0x%"PRIx64" iova=0x%"PRIx64" block 
address = 0x%"PRIx64" block size = %d MiB"
 smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) 
"baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64
 smmu_iotlb_inv_all(void) "IOTLB invalidate all"
-smmu_iotlb_inv_asid(uint16_t asid) "IOTLB invalidate asid=%d"
-smmu_iotlb_inv_vmid(uint16_t vmid) "IOTLB invalidate vmid=%d"
-smmu_iotlb_inv_iova(uint16_t asid, uint64_t addr) "IOTLB invalidate asid=%d 
addr=0x%"PRIx64
+smmu_iotlb_inv_asid(int asid) "IOTLB invalidate asid=%d"
+smmu_iotlb_inv_vmid(int vmid) "IOTLB invalidate vmid=%d"
+smmu_iotlb_inv_iova(int asid, uint64_t addr) "IOTLB invalidate asid=%d 
addr=0x%"PRIx64
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
-smmu_iotlb_lookup_hit(uint16_t asid, uint16_t vmid, uint64_t addr, uint32_t 
hit, uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d 
addr=0x%"PRIx64" hit=%d miss=%d hit rate=%d"
-smmu_iotlb_lookup_miss(uint16_t asid, uint16_t vmid, uint64_t

[PATCH v4 12/19] hw/arm/smmu: Support nesting in smmuv3_range_inval()

2024-07-01 Thread Mostafa Saleh

With nesting, we would need to invalidate IPAs without
over-invalidating stage-1 IOVAs. This can be done by
distinguishing IPAs in the TLBs by having ASID=-1.
To achieve that, rework the invalidation for IPAs to have a
separate function, while for IOVA invalidation ASID=-1 means
invalidate for all ASIDs.

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 47 
 hw/arm/smmuv3.c  | 23 --
 hw/arm/trace-events  |  2 +-
 include/hw/arm/smmu-common.h |  3 ++-
 4 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 71afd486ba..5bf9eadeff 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -195,6 +195,25 @@ static gboolean 
smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer value,
((entry->iova & ~info->mask) == info->iova);
 }
 
+static gboolean smmu_hash_remove_by_vmid_ipa(gpointer key, gpointer value,
+ gpointer user_data)
+{
+SMMUTLBEntry *iter = (SMMUTLBEntry *)value;
+IOMMUTLBEntry *entry = &iter->entry;
+SMMUIOTLBPageInvInfo *info = (SMMUIOTLBPageInvInfo *)user_data;
+SMMUIOTLBKey iotlb_key = *(SMMUIOTLBKey *)key;
+
+if (info->asid >= 0) {
+/* This is a stage-1 address. */
+return false;
+}
+if (info->vmid != SMMU_IOTLB_VMID(iotlb_key)) {
+return false;
+}
+return ((info->iova & ~entry->addr_mask) == entry->iova) ||
+   ((entry->iova & ~info->mask) == info->iova);
+}
+
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
  uint8_t tg, uint64_t num_pages, uint8_t ttl)
 {
@@ -223,6 +242,34 @@ void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, 
dma_addr_t iova,
 &info);
 }
 
+/*
+ * Similar to smmu_iotlb_inv_iova(), but for Stage-2, ASID is always -1,
+ * in Stage-1 invalidation ASID = -1, means don't care.
+ */
+void smmu_iotlb_inv_ipa(SMMUState *s, int vmid, dma_addr_t ipa, uint8_t tg,
+uint64_t num_pages, uint8_t ttl)
+{
+uint8_t granule = tg ? tg * 2 + 10 : 12;
+int asid = -1;
+
+   if (ttl && (num_pages == 1)) {
+SMMUIOTLBKey key = smmu_get_iotlb_key(asid, vmid, ipa, tg, ttl);
+
+if (g_hash_table_remove(s->iotlb, &key)) {
+return;
+}
+}
+
+SMMUIOTLBPageInvInfo info = {
+.iova = ipa,
+.vmid = vmid,
+.mask = (num_pages * 1 << granule) - 1};
+
+g_hash_table_foreach_remove(s->iotlb,
+smmu_hash_remove_by_vmid_ipa,
+&info);
+}
+
 void smmu_iotlb_inv_asid(SMMUState *s, int asid)
 {
 trace_smmu_iotlb_inv_asid(asid);
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 86f95c1e40..e5ecd93258 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1136,7 +1136,7 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, int vmid,
 }
 }
 
-static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
+static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, SMMUStage stage)
 {
 dma_addr_t end, addr = CMD_ADDR(cmd);
 uint8_t type = CMD_TYPE(cmd);
@@ -1161,9 +1161,13 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
 }
 
 if (!tg) {
-trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf);
+trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf, stage);
 smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
-smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
+if (stage == SMMU_STAGE_1) {
+smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
+} else {
+smmu_iotlb_inv_ipa(s, vmid, addr, tg, 1, ttl);
+}
 return;
 }
 
@@ -1179,9 +1183,14 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd)
 uint64_t mask = dma_aligned_pow2_mask(addr, end, 64);
 
 num_pages = (mask + 1) >> granule;
-trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages, ttl, leaf);
+trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages,
+ ttl, leaf, stage);
 smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
-smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
+if (stage == SMMU_STAGE_1) {
+smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
+} else {
+smmu_iotlb_inv_ipa(s, vmid, addr, tg, num_pages, ttl);
+}
 addr += mask + 1;
 }
 }
@@ -1340,7 +1349,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
-smmuv3_range_inval(bs, &cmd);
+smmuv3_range_inval(bs, &cmd, SMMU_STAGE_1);
 break;
 case SMMU_CMD_TLBI_S12_VMALL:
 {
@@ -1365,7 +1374,7 @@ static int s

[PATCH v4 09/19] hw/arm/smmu-common: Rework TLB lookup for nesting

2024-07-01 Thread Mostafa Saleh

In the next patch, combine_tlb() will be added which combines 2 TLB
entries into one for nested translations, which chooses the granule
and level from the smallest entry.

This means that with nested translation, an entry can be cached with
the granule of stage-2 and not stage-1.

However, currently, the lookup for an IOVA is done with input stage
granule, which is stage-1 for nested configuration, which will not
work with the above logic.
This patch reworks lookup in that case, so it falls back to stage-2
granule if no entry is found using stage-1 granule.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 21982621c0..0840b5cffd 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t 
iova,
 return key;
 }
 
-SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
-SMMUTransTableInfo *tt, hwaddr iova)
+static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
+  SMMUTransCfg *cfg,
+  SMMUTransTableInfo *tt,
+  hwaddr iova)
 {
 uint8_t tg = (tt->granule_sz - 10) / 2;
 uint8_t inputsize = 64 - tt->tsz;
@@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg 
*cfg,
 }
 level++;
 }
+return entry;
+}
+
+/**
+ * smmu_iotlb_lookup - Look up for a TLB entry.
+ * @bs: SMMU state which includes the TLB instance
+ * @cfg: Configuration of the translation
+ * @tt: Translation table info (granule and tsz)
+ * @iova: IOVA address to lookup
+ *
+ * returns a valid entry on success, otherwise NULL.
+ * In case of nested translation, tt can be updated to include
+ * the granule of the found entry as it might different from
+ * the IOVA granule.
+ */
+SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
+SMMUTransTableInfo *tt, hwaddr iova)
+{
+SMMUTLBEntry *entry = NULL;
+
+entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova);
+/*
+ * For nested translation also try the s2 granule, as the TLB will insert
+ * it if the size of s2 tlb entry was smaller.
+ */
+if (!entry && (cfg->stage == SMMU_NESTED) &&
+(cfg->s2cfg.granule_sz != tt->granule_sz)) {
+tt->granule_sz = cfg->s2cfg.granule_sz;
+entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova);
+}
 
 if (entry) {
 cfg->iotlb_hits++;
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 02/19] hw/arm/smmu: Fix IPA for stage-2 events

2024-07-01 Thread Mostafa Saleh

For the following events (ARM IHI 0070 F.b - 7.3 Event records):
- F_TRANSLATION
- F_ACCESS
- F_PERMISSION
- F_ADDR_SIZE

If fault occurs at stage 2, S2 == 1 and:
  - If translating an IPA for a transaction (whether by input to
stage 2-only configuration, or after successful stage 1 translation),
CLASS == IN, and IPA is provided.

At the moment only CLASS == IN is used which indicates input
translation.

However, this was not implemented correctly, as for stage 2, the code
only sets the  S2 bit but not the IPA.

This field has the same bits as FetchAddr in F_WALK_EABT which is
populated correctly, so we don’t change that.
The setting of this field should be done from the walker as the IPA address
wouldn't be known in case of nesting.

For stage 1, the spec says:
  If fault occurs at stage 1, S2 == 0 and:
  CLASS == IN, IPA is UNKNOWN.

So, no need to set it to for stage 1, as ptw_info is initialised by zero in
smmuv3_translate().

Fixes: e703f7076a “hw/arm/smmuv3: Add page table walk for stage-2”
Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 10 ++
 hw/arm/smmuv3.c  |  4 
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index eb2356bc35..8a8c718e6b 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -448,7 +448,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
  */
 if (ipa >= (1ULL << inputsize)) {
 info->type = SMMU_PTW_ERR_TRANSLATION;
-goto error;
+goto error_ipa;
 }
 
 while (level < VMSA_LEVELS) {
@@ -494,13 +494,13 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
  */
 if (!PTE_AF(pte) && !cfg->s2cfg.affd) {
 info->type = SMMU_PTW_ERR_ACCESS;
-goto error;
+goto error_ipa;
 }
 
 s2ap = PTE_AP(pte);
 if (is_permission_fault_s2(s2ap, perm)) {
 info->type = SMMU_PTW_ERR_PERMISSION;
-goto error;
+goto error_ipa;
 }
 
 /*
@@ -509,7 +509,7 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
  */
 if (gpa >= (1ULL << cfg->s2cfg.eff_ps)) {
 info->type = SMMU_PTW_ERR_ADDR_SIZE;
-goto error;
+goto error_ipa;
 }
 
 tlbe->entry.translated_addr = gpa;
@@ -522,6 +522,8 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
 }
 info->type = SMMU_PTW_ERR_TRANSLATION;
 
+error_ipa:
+info->addr = ipa;
 error:
 info->stage = 2;
 tlbe->entry.perm = IOMMU_NONE;
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 2d1e0d55ec..9dd3ea48e4 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -949,6 +949,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 if (PTW_RECORD_FAULT(cfg)) {
 event.type = SMMU_EVT_F_TRANSLATION;
 event.u.f_translation.addr = addr;
+event.u.f_translation.addr2 = ptw_info.addr;
 event.u.f_translation.rnw = flag & 0x1;
 }
 break;
@@ -956,6 +957,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 if (PTW_RECORD_FAULT(cfg)) {
 event.type = SMMU_EVT_F_ADDR_SIZE;
 event.u.f_addr_size.addr = addr;
+event.u.f_addr_size.addr2 = ptw_info.addr;
 event.u.f_addr_size.rnw = flag & 0x1;
 }
 break;
@@ -963,6 +965,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 if (PTW_RECORD_FAULT(cfg)) {
 event.type = SMMU_EVT_F_ACCESS;
 event.u.f_access.addr = addr;
+event.u.f_access.addr2 = ptw_info.addr;
 event.u.f_access.rnw = flag & 0x1;
 }
 break;
@@ -970,6 +973,7 @@ static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion 
*mr, hwaddr addr,
 if (PTW_RECORD_FAULT(cfg)) {
 event.type = SMMU_EVT_F_PERMISSION;
 event.u.f_permission.addr = addr;
+event.u.f_permission.addr2 = ptw_info.addr;
 event.u.f_permission.rnw = flag & 0x1;
 }
 break;
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 11/19] hw/arm/smmu-common: Support nested translation

2024-07-01 Thread Mostafa Saleh

When nested translation is requested, do the following:

- Translate stage-1 table address IPA into PA through stage-2.
- Translate stage-1 table walk output (IPA) through stage-2.
- Create a single TLB entry from stage-1 and stage-2 translations
  using logic introduced before.

For stage-1 table translation, the spec (ARM IHI 0070 F.b) says in:
7.3.12 F_WALK_EABT:
Translation of an IPA for Stage 1 descriptor fetch:
S2 == 1 (stage 2), CLASS == T
So, F_WALK_EABT is used which propagtes to CLASS == TT.

smmu_ptw() has a new argument SMMUState which include the TLB as
stage-1 table address can be cached in there.

Also in smmu_ptw() a separate path used for nesting to simplify the
code, although some logic can be combined.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 72 +++-
 include/hw/arm/smmu-common.h |  2 +-
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 24b7d09e2b..71afd486ba 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -318,6 +318,38 @@ SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, 
dma_addr_t iova)
 return NULL;
 }
 
+/* Translate stage-1 table address using stage-2 page table. */
+static inline int translate_table_addr_ipa(dma_addr_t *table_addr,
+   SMMUTransCfg *cfg,
+   SMMUPTWEventInfo *info,
+   SMMUState *bs)
+{
+dma_addr_t addr = *table_addr;
+SMMUTLBEntry *cached_entry;
+int asid;
+
+/*
+ * The translation table walks performed from TTB0 or TTB1 are always
+ * performed in IPA space if stage 2 translations are enabled.
+ */
+asid = cfg->asid;
+cfg->stage = SMMU_STAGE_2;
+cfg->asid = -1;
+cached_entry = smmu_translate(bs, cfg, addr, IOMMU_RO, info);
+cfg->asid = asid;
+cfg->stage = SMMU_NESTED;
+
+if (cached_entry) {
+*table_addr = CACHED_ENTRY_TO_ADDR(cached_entry, addr);
+return 0;
+}
+
+info->stage = SMMU_STAGE_2;
+info->type = SMMU_PTW_ERR_WALK_EABT;
+info->addr = addr;
+return -EINVAL;
+}
+
 /**
  * smmu_ptw_64_s1 - VMSAv8-64 Walk of the page tables for a given IOVA
  * @cfg: translation config
@@ -333,7 +365,8 @@ SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t 
iova)
  */
 static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
   dma_addr_t iova, IOMMUAccessFlags perm,
-  SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
+  SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info,
+  SMMUState *bs)
 {
 dma_addr_t baseaddr, indexmask;
 SMMUStage stage = cfg->stage;
@@ -381,6 +414,11 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 goto error;
 }
 baseaddr = get_table_pte_address(pte, granule_sz);
+if (cfg->stage == SMMU_NESTED) {
+if (translate_table_addr_ipa(&baseaddr, cfg, info, bs)) {
+goto error;
+}
+}
 level++;
 continue;
 } else if (is_page_pte(pte, level)) {
@@ -568,10 +606,8 @@ error:
  * combine S1 and S2 TLB entries into a single entry.
  * As a result the S1 entry is overriden with combined data.
  */
-static void __attribute__((unused)) combine_tlb(SMMUTLBEntry *tlbe,
-SMMUTLBEntry *tlbe_s2,
-dma_addr_t iova,
-SMMUTransCfg *cfg)
+static void combine_tlb(SMMUTLBEntry *tlbe, SMMUTLBEntry *tlbe_s2,
+dma_addr_t iova, SMMUTransCfg *cfg)
 {
 if (tlbe_s2->entry.addr_mask < tlbe->entry.addr_mask) {
 tlbe->entry.addr_mask = tlbe_s2->entry.addr_mask;
@@ -596,14 +632,19 @@ static void __attribute__((unused)) 
combine_tlb(SMMUTLBEntry *tlbe,
  * @perm: tentative access type
  * @tlbe: returned entry
  * @info: ptw event handle
+ * @bs: smmu state which includes TLB instance
  *
  * return 0 on success
  */
 int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
- SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info)
+ SMMUTLBEntry *tlbe, SMMUPTWEventInfo *info, SMMUState *bs)
 {
+int ret;
+SMMUTLBEntry tlbe_s2;
+dma_addr_t ipa;
+
 if (cfg->stage == SMMU_STAGE_1) {
-return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info);
+return smmu_ptw_64_s1(cfg, iova, perm, tlbe, info, bs);
 } else if (cfg->stage == SMMU_STAGE_2) {
 /*
  * If bypassing stage 1(or unimplemented), the input address is passed
@@ -621,7 +662,20 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, 
IOMMUAccessFlags perm,
 return smmu_ptw_64_s2(cfg, iova, perm, tlbe, info);
 }
 
-g_assert_not_reached();
+/* SMMU_NESTED. */
+ret = smmu_ptw_64_s1(c

[PATCH v4 05/19] hw/arm/smmu: Split smmuv3_translate()

2024-07-01 Thread Mostafa Saleh

smmuv3_translate() does everything from STE/CD parsing to TLB lookup
and PTW.

Soon, when nesting is supported, stage-1 data (tt, CD) needs to be
translated using stage-2.

Split smmuv3_translate() to 3 functions:

- smmu_translate(): in smmu-common.c, which does the TLB lookup, PTW,
  TLB insertion, all the functions are already there, this just puts
  them together.
  This also simplifies the code as it consolidates event generation
  in case of TLB lookup permission failure or in TT selection.

- smmuv3_do_translate(): in smmuv3.c, Calls smmu_translate() and does
  the event population in case of errors.

 - smmuv3_translate(), now calls smmuv3_do_translate() for
   translation while the rest is the same.

Also, add stage in trace_smmuv3_translate_success()

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c |  59 +++
 hw/arm/smmuv3.c  | 191 +--
 hw/arm/trace-events  |   2 +-
 include/hw/arm/smmu-common.h |   8 ++
 4 files changed, 141 insertions(+), 119 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 8a5858f69f..d94db6b34f 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -566,6 +566,65 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, 
IOMMUAccessFlags perm,
 g_assert_not_reached();
 }
 
+SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg *cfg, dma_addr_t addr,
+ IOMMUAccessFlags flag, SMMUPTWEventInfo *info)
+{
+uint64_t page_mask, aligned_addr;
+SMMUTLBEntry *cached_entry = NULL;
+SMMUTransTableInfo *tt;
+int status;
+
+/*
+ * Combined attributes used for TLB lookup, as only one stage is supported,
+ * it will hold attributes based on the enabled stage.
+ */
+SMMUTransTableInfo tt_combined;
+
+if (cfg->stage == SMMU_STAGE_1) {
+/* Select stage1 translation table. */
+tt = select_tt(cfg, addr);
+if (!tt) {
+info->type = SMMU_PTW_ERR_TRANSLATION;
+info->stage = SMMU_STAGE_1;
+return NULL;
+}
+tt_combined.granule_sz = tt->granule_sz;
+tt_combined.tsz = tt->tsz;
+
+} else {
+/* Stage2. */
+tt_combined.granule_sz = cfg->s2cfg.granule_sz;
+tt_combined.tsz = cfg->s2cfg.tsz;
+}
+
+/*
+ * TLB lookup looks for granule and input size for a translation stage,
+ * as only one stage is supported right now, choose the right values
+ * from the configuration.
+ */
+page_mask = (1ULL << tt_combined.granule_sz) - 1;
+aligned_addr = addr & ~page_mask;
+
+cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
+if (cached_entry) {
+if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
+info->type = SMMU_PTW_ERR_PERMISSION;
+info->stage = cfg->stage;
+return NULL;
+}
+return cached_entry;
+}
+
+cached_entry = g_new0(SMMUTLBEntry, 1);
+status = smmu_ptw(cfg, aligned_addr, flag, cached_entry, info);
+if (status) {
+g_free(cached_entry);
+return NULL;
+}
+smmu_iotlb_insert(bs, cfg, cached_entry);
+return cached_entry;
+}
+
 /**
  * The bus number is used for lookup when SID based invalidation occurs.
  * In that case we lazily populate the SMMUPciBus array from the bus hash
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index dab3ad2db9..f98c157221 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -827,6 +827,75 @@ static void smmuv3_flush_config(SMMUDevice *sdev)
 g_hash_table_remove(bc->configs, sdev);
 }
 
+/* Do translation with TLB lookup. */
+static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
+ SMMUTransCfg *cfg,
+ SMMUEventInfo *event,
+ IOMMUAccessFlags flag,
+ SMMUTLBEntry **out_entry)
+{
+SMMUPTWEventInfo ptw_info = {};
+SMMUState *bs = ARM_SMMU(s);
+SMMUTLBEntry *cached_entry = NULL;
+
+cached_entry = smmu_translate(bs, cfg, addr, flag, &ptw_info);
+if (!cached_entry) {
+/* All faults from PTW has S2 field. */
+event->u.f_walk_eabt.s2 = (ptw_info.stage == SMMU_STAGE_2);
+switch (ptw_info.type) {
+case SMMU_PTW_ERR_WALK_EABT:
+event->type = SMMU_EVT_F_WALK_EABT;
+event->u.f_walk_eabt.addr = addr;
+event->u.f_walk_eabt.rnw = flag & 0x1;
+event->u.f_walk_eabt.class = SMMU_CLASS_TT;
+event->u.f_walk_eabt.addr2 = ptw_info.addr;
+break;
+case SMMU_PTW_ERR_TRANSLATION:
+if (PTW_RECORD_FAULT(cfg)) {
+event->type = SMMU_EVT_F_TRANSLATION;
+event->u.f_translation.addr = addr;
+event->u.f_translation.addr2 =

[PATCH v4 15/19] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova()

2024-07-01 Thread Mostafa Saleh

IOMMUTLBEvent only understands IOVA, for stage-1 or stage-2
SMMU instances we consider the input address as the IOVA, but when
nesting is used, we can't mix stage-1 and stage-2 addresses, so for
nesting only stage-1 is considered the IOVA and would be notified.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c | 28 +++-
 hw/arm/trace-events |  2 +-
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e9007af3cd..36eb6f514a 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1064,17 +1064,17 @@ epilogue:
  * @iova: iova
  * @tg: translation granule (if communicated through range invalidation)
  * @num_pages: number of @granule sized pages (if tg != 0), otherwise 1
+ * @stage: Which stage(1 or 2) is used
  */
 static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
IOMMUNotifier *n,
int asid, int vmid,
dma_addr_t iova, uint8_t tg,
-   uint64_t num_pages)
+   uint64_t num_pages, int stage)
 {
 SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
 IOMMUTLBEvent event;
 uint8_t granule;
-SMMUv3State *s = sdev->smmu;
 
 if (!tg) {
 SMMUEventInfo eventinfo = {.inval_ste_allowed = true};
@@ -1093,14 +1093,24 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 return;
 }
 
-if (STAGE1_SUPPORTED(s)) {
+/*
+ * stage is passed from TLB invalidation commands which can be either
+ * stage-1 or stage-2.
+ * However, IOMMUTLBEvent only understands IOVA, for stage-1 or stage-2
+ * SMMU instances we consider the input address as the IOVA, but when
+ * nesting is used, we can't mix stage-1 and stage-2 addresses, so for
+ * nesting only stage-1 is considered the IOVA and would be notified.
+ */
+if (stage == SMMU_STAGE_1) {
 tt = select_tt(cfg, iova);
 if (!tt) {
 return;
 }
 granule = tt->granule_sz;
-} else {
+} else if ((stage == SMMU_STAGE_2) && (cfg->stage != SMMU_NESTED)) {
 granule = cfg->s2cfg.granule_sz;
+} else {
+return;
 }
 
 } else {
@@ -1119,7 +1129,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
 /* invalidate an asid/vmid/iova range tuple in all mr's */
 static void smmuv3_inv_notifiers_iova(SMMUState *s, int asid, int vmid,
   dma_addr_t iova, uint8_t tg,
-  uint64_t num_pages)
+  uint64_t num_pages, int stage)
 {
 SMMUDevice *sdev;
 
@@ -1128,10 +1138,10 @@ static void smmuv3_inv_notifiers_iova(SMMUState *s, int 
asid, int vmid,
 IOMMUNotifier *n;
 
 trace_smmuv3_inv_notifiers_iova(mr->parent_obj.name, asid, vmid,
-iova, tg, num_pages);
+iova, tg, num_pages, stage);
 
 IOMMU_NOTIFIER_FOREACH(n, mr) {
-smmuv3_notify_iova(mr, n, asid, vmid, iova, tg, num_pages);
+smmuv3_notify_iova(mr, n, asid, vmid, iova, tg, num_pages, stage);
 }
 }
 }
@@ -1162,7 +1172,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, 
SMMUStage stage)
 
 if (!tg) {
 trace_smmuv3_range_inval(vmid, asid, addr, tg, 1, ttl, leaf, stage);
-smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1);
+smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, 1, stage);
 if (stage == SMMU_STAGE_1) {
 smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, 1, ttl);
 } else {
@@ -1185,7 +1195,7 @@ static void smmuv3_range_inval(SMMUState *s, Cmd *cmd, 
SMMUStage stage)
 num_pages = (mask + 1) >> granule;
 trace_smmuv3_range_inval(vmid, asid, addr, tg, num_pages,
  ttl, leaf, stage);
-smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages);
+smmuv3_inv_notifiers_iova(s, asid, vmid, addr, tg, num_pages, stage);
 if (stage == SMMU_STAGE_1) {
 smmu_iotlb_inv_iova(s, asid, vmid, addr, tg, num_pages, ttl);
 } else {
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 593cc571da..be6c8f720b 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -55,7 +55,7 @@ smmuv3_cmdq_tlbi_s12_vmid(int vmid) "vmid=%d"
 smmuv3_config_cache_inv(uint32_t sid) "Config cache INV for sid=0x%x"
 smmuv3_notify_flag_add(const char *iommu) "ADD SMMUNotifier node for iommu 
mr=%s"
 smmuv3_notify_flag_del(const char *iommu) "DEL SMMUNotifier node for iommu 
mr=%s"
-smmuv3_inv_notifiers_iova(const char *name, int asid, int vmid, uint64_t iova, 
uint8_t tg, uint64_t num_pages) "iommu mr=%s asid=%d vmid=%d iova=0x%"PRIx64" 
tg=%d num_pages=0x%"PRIx64
+smmuv3_inv_notifiers_iova(const cha

[PATCH v4 13/19] hw/arm/smmu: Introduce smmu_iotlb_inv_asid_vmid

2024-07-01 Thread Mostafa Saleh

Soon, Instead of doing TLB invalidation by ASID only, VMID will be
also required.
Add smmu_iotlb_inv_asid_vmid() which invalidates by both ASID and VMID.

However, at the moment this function is only used in SMMU_CMD_TLBI_NH_ASID
which is a stage-1 command, so passing VMID = -1 keeps the original
behaviour.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 20 +---
 hw/arm/smmuv3.c  |  2 +-
 include/hw/arm/smmu-common.h |  2 +-
 3 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 5bf9eadeff..d0309a95b2 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -159,13 +159,14 @@ void smmu_iotlb_inv_all(SMMUState *s)
 g_hash_table_remove_all(s->iotlb);
 }
 
-static gboolean smmu_hash_remove_by_asid(gpointer key, gpointer value,
- gpointer user_data)
+static gboolean smmu_hash_remove_by_asid_vmid(gpointer key, gpointer value,
+  gpointer user_data)
 {
-int asid = *(int *)user_data;
+SMMUIOTLBPageInvInfo *info = (SMMUIOTLBPageInvInfo *)user_data;
 SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
 
-return SMMU_IOTLB_ASID(*iotlb_key) == asid;
+return (SMMU_IOTLB_ASID(*iotlb_key) == info->asid) &&
+   (SMMU_IOTLB_VMID(*iotlb_key) == info->vmid);
 }
 
 static gboolean smmu_hash_remove_by_vmid(gpointer key, gpointer value,
@@ -270,10 +271,15 @@ void smmu_iotlb_inv_ipa(SMMUState *s, int vmid, 
dma_addr_t ipa, uint8_t tg,
 &info);
 }
 
-void smmu_iotlb_inv_asid(SMMUState *s, int asid)
+void smmu_iotlb_inv_asid_vmid(SMMUState *s, int asid, int vmid)
 {
-trace_smmu_iotlb_inv_asid(asid);
-g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid, &asid);
+SMMUIOTLBPageInvInfo info = {
+.asid = asid,
+.vmid = vmid,
+};
+
+trace_smmu_iotlb_inv_asid_vmid(asid, vmid);
+g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_asid_vmid, 
&info);
 }
 
 void smmu_iotlb_inv_vmid(SMMUState *s, int vmid)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index e5ecd93258..928f125523 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1329,7 +1329,7 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 
 trace_smmuv3_cmdq_tlbi_nh_asid(asid);
 smmu_inv_notifiers_all(&s->smmu_state);
-smmu_iotlb_inv_asid(bs, asid);
+smmu_iotlb_inv_asid_vmid(bs, asid, -1);
 break;
 }
 case SMMU_CMD_TLBI_NH_ALL:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index de032fdfd1..2bc9a03d47 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -212,7 +212,7 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, 
SMMUTLBEntry *entry);
 SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova,
 uint8_t tg, uint8_t level);
 void smmu_iotlb_inv_all(SMMUState *s);
-void smmu_iotlb_inv_asid(SMMUState *s, int asid);
+void smmu_iotlb_inv_asid_vmid(SMMUState *s, int asid, int vmid);
 void smmu_iotlb_inv_vmid(SMMUState *s, int vmid);
 void smmu_iotlb_inv_iova(SMMUState *s, int asid, int vmid, dma_addr_t iova,
  uint8_t tg, uint64_t num_pages, uint8_t ttl);
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 10/19] hw/arm/smmu-common: Add support for nested TLB

2024-07-01 Thread Mostafa Saleh

This patch adds support for nested (combined) TLB entries.
The main function combine_tlb() is not used here but in the next
patches, but to simplify the patches it is introduced first.

Main changes:
1) New field added in the SMMUTLBEntry struct: parent_perm, for
   nested TLB, holds the stage-2 permission, this can be used to know
   the origin of a permission fault from a cached entry as caching
   the “and” of the permissions loses this information.

   SMMUPTWEventInfo is used to hold information about PTW faults so
   the event can be populated, the value of stage used to be set
   based on the current stage for TLB permission faults, however
   with the parent_perm, it is now set based on which perm has
   the missing permission

   When nesting is not enabled it has the same value as perm which
   doesn't change the logic.

2) As combined TLB implementation is used, the combination logic
   chooses:
   - tg and level from the entry which has the smallest addr_mask.
   - Based on that the iova that would be cached is recalculated.
   - Translated_addr is chosen from stage-2.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 37 
 include/hw/arm/smmu-common.h |  1 +
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 0840b5cffd..24b7d09e2b 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -426,7 +426,8 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
 tlbe->entry.translated_addr = gpa;
 tlbe->entry.iova = iova & ~mask;
 tlbe->entry.addr_mask = mask;
-tlbe->entry.perm = PTE_AP_TO_PERM(ap);
+tlbe->parent_perm = PTE_AP_TO_PERM(ap);
+tlbe->entry.perm = tlbe->parent_perm;
 tlbe->level = level;
 tlbe->granule = granule_sz;
 return 0;
@@ -547,7 +548,8 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
 tlbe->entry.translated_addr = gpa;
 tlbe->entry.iova = ipa & ~mask;
 tlbe->entry.addr_mask = mask;
-tlbe->entry.perm = s2ap;
+tlbe->parent_perm = s2ap;
+tlbe->entry.perm = tlbe->parent_perm;
 tlbe->level = level;
 tlbe->granule = granule_sz;
 return 0;
@@ -562,6 +564,30 @@ error:
 return -EINVAL;
 }
 
+/*
+ * combine S1 and S2 TLB entries into a single entry.
+ * As a result the S1 entry is overriden with combined data.
+ */
+static void __attribute__((unused)) combine_tlb(SMMUTLBEntry *tlbe,
+SMMUTLBEntry *tlbe_s2,
+dma_addr_t iova,
+SMMUTransCfg *cfg)
+{
+if (tlbe_s2->entry.addr_mask < tlbe->entry.addr_mask) {
+tlbe->entry.addr_mask = tlbe_s2->entry.addr_mask;
+tlbe->granule = tlbe_s2->granule;
+tlbe->level = tlbe_s2->level;
+}
+
+tlbe->entry.translated_addr = CACHED_ENTRY_TO_ADDR(tlbe_s2,
+tlbe->entry.translated_addr);
+
+tlbe->entry.iova = iova & ~tlbe->entry.addr_mask;
+/* parent_perm has s2 perm while perm keeps s1 perm. */
+tlbe->parent_perm = tlbe_s2->entry.perm;
+return;
+}
+
 /**
  * smmu_ptw - Walk the page tables for an IOVA, according to @cfg
  *
@@ -639,9 +665,12 @@ SMMUTLBEntry *smmu_translate(SMMUState *bs, SMMUTransCfg 
*cfg, dma_addr_t addr,
 
 cached_entry = smmu_iotlb_lookup(bs, cfg, &tt_combined, aligned_addr);
 if (cached_entry) {
-if ((flag & IOMMU_WO) && !(cached_entry->entry.perm & IOMMU_WO)) {
+if ((flag & IOMMU_WO) && !(cached_entry->entry.perm &
+cached_entry->parent_perm & IOMMU_WO)) {
 info->type = SMMU_PTW_ERR_PERMISSION;
-info->stage = cfg->stage;
+info->stage = !(cached_entry->entry.perm & IOMMU_WO) ?
+  SMMU_STAGE_1 :
+  SMMU_STAGE_2;
 return NULL;
 }
 return cached_entry;
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 09d3b9e734..1db566d451 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -77,6 +77,7 @@ typedef struct SMMUTLBEntry {
 IOMMUTLBEntry entry;
 uint8_t level;
 uint8_t granule;
+IOMMUAccessFlags parent_perm;
 } SMMUTLBEntry;
 
 /* Stage-2 configuration. */
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 16/19] hw/arm/smmuv3: Handle translation faults according to SMMUPTWEventInfo

2024-07-01 Thread Mostafa Saleh

Previously, to check if faults are enabled, it was sufficient to check
the current stage of translation and check the corresponding
record_faults flag.

However, with nesting, it is possible for stage-1 (nested) translation
to trigger a stage-2 fault, so we check SMMUPTWEventInfo as it would
have the correct stage set from the page table walk.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 36eb6f514a..6c18dc0acf 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -34,9 +34,10 @@
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
 
-#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
- (cfg)->record_faults : \
- (cfg)->s2cfg.record_faults)
+#define PTW_RECORD_FAULT(ptw_info, cfg) (((ptw_info).stage == SMMU_STAGE_1 && \
+(cfg)->record_faults) || \
+((ptw_info).stage == SMMU_STAGE_2 && \
+(cfg)->s2cfg.record_faults))
 
 /**
  * smmuv3_trigger_irq - pulse @irq if enabled and update
@@ -919,7 +920,7 @@ static SMMUTranslationStatus 
smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
 event->u.f_walk_eabt.addr2 = ptw_info.addr;
 break;
 case SMMU_PTW_ERR_TRANSLATION:
-if (PTW_RECORD_FAULT(cfg)) {
+if (PTW_RECORD_FAULT(ptw_info, cfg)) {
 event->type = SMMU_EVT_F_TRANSLATION;
 event->u.f_translation.addr = addr;
 event->u.f_translation.addr2 = ptw_info.addr;
@@ -928,7 +929,7 @@ static SMMUTranslationStatus 
smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
 }
 break;
 case SMMU_PTW_ERR_ADDR_SIZE:
-if (PTW_RECORD_FAULT(cfg)) {
+if (PTW_RECORD_FAULT(ptw_info, cfg)) {
 event->type = SMMU_EVT_F_ADDR_SIZE;
 event->u.f_addr_size.addr = addr;
 event->u.f_addr_size.addr2 = ptw_info.addr;
@@ -937,7 +938,7 @@ static SMMUTranslationStatus 
smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
 }
 break;
 case SMMU_PTW_ERR_ACCESS:
-if (PTW_RECORD_FAULT(cfg)) {
+if (PTW_RECORD_FAULT(ptw_info, cfg)) {
 event->type = SMMU_EVT_F_ACCESS;
 event->u.f_access.addr = addr;
 event->u.f_access.addr2 = ptw_info.addr;
@@ -946,7 +947,7 @@ static SMMUTranslationStatus 
smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
 }
 break;
 case SMMU_PTW_ERR_PERMISSION:
-if (PTW_RECORD_FAULT(cfg)) {
+if (PTW_RECORD_FAULT(ptw_info, cfg)) {
 event->type = SMMU_EVT_F_PERMISSION;
 event->u.f_permission.addr = addr;
 event->u.f_permission.addr2 = ptw_info.addr;
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 17/19] hw/arm/smmuv3: Support and advertise nesting

2024-07-01 Thread Mostafa Saleh

Everything is in place, consolidate parsing of STE cfg and setting
translation stage.

Advertise nesting if stage requested is "nested".

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 6c18dc0acf..807f26f2da 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -261,6 +261,9 @@ static void smmuv3_init_regs(SMMUv3State *s)
 /* Based on sys property, the stages supported in smmu will be 
advertised.*/
 if (s->stage && !strcmp("2", s->stage)) {
 s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
+} else if (s->stage && !strcmp("nested", s->stage)) {
+s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
+s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
 } else {
 s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
 }
@@ -425,8 +428,6 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t 
t0sz, uint8_t gran)
 
 static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
 {
-cfg->stage = SMMU_STAGE_2;
-
 if (STE_S2AA64(ste) == 0x0) {
 qemu_log_mask(LOG_UNIMP,
   "SMMUv3 AArch32 tables not supported\n");
@@ -509,6 +510,27 @@ bad_ste:
 return -EINVAL;
 }
 
+static void decode_ste_config(SMMUTransCfg *cfg, uint32_t config)
+{
+
+if (STE_CFG_ABORT(config)) {
+cfg->aborted = true;
+return;
+}
+if (STE_CFG_BYPASS(config)) {
+cfg->bypassed = true;
+return;
+}
+
+if (STE_CFG_S1_ENABLED(config)) {
+cfg->stage = SMMU_STAGE_1;
+}
+
+if (STE_CFG_S2_ENABLED(config)) {
+cfg->stage |= SMMU_STAGE_2;
+}
+}
+
 /* Returns < 0 in case of invalid STE, 0 otherwise */
 static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
   STE *ste, SMMUEventInfo *event)
@@ -525,13 +547,9 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
 
 config = STE_CONFIG(ste);
 
-if (STE_CFG_ABORT(config)) {
-cfg->aborted = true;
-return 0;
-}
+decode_ste_config(cfg, config);
 
-if (STE_CFG_BYPASS(config)) {
-cfg->bypassed = true;
+if (cfg->aborted || cfg->bypassed) {
 return 0;
 }
 
@@ -704,7 +722,6 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
 
 /* we support only those at the moment */
 cfg->aa64 = true;
-cfg->stage = SMMU_STAGE_1;
 
 cfg->oas = oas2bits(CD_IPS(cd));
 cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
-- 
2.45.2.803.g4e1b14247a-goog

[PATCH v4 18/19] hw/arm/smmuv3: Advertise S2FWB

2024-07-01 Thread Mostafa Saleh

QEMU doesn's support memory attributes, so FWB is NOP, this
might change in the future if memory attributre would be supported.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 807f26f2da..88378e83dd 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -287,6 +287,14 @@ static void smmuv3_init_regs(SMMUv3State *s)
 if (FIELD_EX32(s->idr[0], IDR0, S2P)) {
 /* XNX is a stage-2-specific feature */
 s->idr[3] = FIELD_DP32(s->idr[3], IDR3, XNX, 1);
+if (FIELD_EX32(s->idr[0], IDR0, S1P)) {
+/*
+ * QEMU doesn's support memory attributes, so FWB is NOP, this
+ * might change in the future if memory attributre would be
+ * supported.
+ */
+   s->idr[3] = FIELD_DP32(s->idr[3], IDR3, FWB, 1);
+}
 }
 s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
 s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, 2);
-- 
2.45.2.803.g4e1b14247a-goog

Re: [PATCH] scripts/qcow2-to-stdout.py: Add script to write qcow2 images to stdout

2024-07-01 Thread Manos Pitsidianakis

Hi Berto :)

On Mon, 1 Jul 2024 at 11:56, Alberto Garcia  wrote:
>
> On Wed 12 Jun 2024 02:00:19 PM +03, Manos Pitsidianakis wrote:
>
> Hi, thanks for the review and sorry for taking so long to reply, I was
> on vacation.
>
> >> scripts/qcow2-to-stdout.py | 330 +
> >> 1 file changed, 330 insertions(+)
> >> create mode 100755 scripts/qcow2-to-stdout.py
> >
> > I recommend running the `black` formatter on this script, it makes the
> > code more diff-friendly and uniform. Also it has become the de-facto
> > python style.
>
> Hmmm, I don't like how it reformats some of the lines. However other
> changes do make sense, so I'll apply those.

It's not a project coding style requirement (for now) so it's fine.

>
> > Also, it's more pythonic to name constants in uppercase, like
> > allocated_l2_tables. You can check such lints with pylint
> > scripts/qcow2-to-stdout.py
>
> allocated_l2_tables is not a constant :-?

Eeeh right, correct. `pylint`'s error message said it was a constant,
my bad. It says it is a constant because it is declared as a global
(module-level), `__all__` is not defined with any globals, and
according to PEP-8 non-public globals start with an underscore in the
name.

>
> >>+struct.pack_into('>I', header, 0x70, 0x6803f857)
> >>+struct.pack_into('>I', header, 0x74, len(qcow2_features) * 48)
> >>+cur_offset = 0x78
> >
> > Minor comment: extract magic values/offsets into constant globals with
> > descriptive names, it'd help the code be more readable and easier to
> > maintain if ported in the future to other formats.
>
> Good idea, will do.
>
> >>+for (feature_type, feature_bit, feature_name) in qcow2_features:
> >>+struct.pack_into('>BB46s', header, cur_offset,
> >>+ feature_type, feature_bit, 
> >>feature_name.encode('ascii'))
> >>+cur_offset += 48
> >>+
> >
> >>From here onwards put everything under a main block like so:
>
> Ok.
>
> >>+# Command-line arguments
> >>+parser = argparse.ArgumentParser(description='This program converts a QEMU 
> >>disk image to qcow2 '
> >>+ 'and writes it to the standard output')
> >>+parser.add_argument('input_file', help='name of the input file')
> >
> > Suggestion:
> >
> > -parser.add_argument('input_file', help='name of the input file')
> > +parser.add_argument('input_file', help='name of the input file', 
> > type=pathlib.Path, required=True)
>
> 'required' is not valid in positional arguments,

Sorry did not notice it's a positional!

> and I'm not sure what
> benefits using pathlib brings in this case.

implicit type requirement, argument value validations, path normalization etc.

>
> >>+parser.add_argument('-v', dest='qcow2_version', metavar='qcow2_version',
> >
> > Maybe -q instead of -v? No strong feelings on this one, it's just that
> > -v is usually version. -q is also usually --quiet so not sure...
>
> Yeah, I thought the same but I didn't want to complicate this too much,
> this is just a helper script.
>
> >>+# If the input file is not in raw format we can use
> >>+# qemu-storage-daemon to make it appear as such
> >>+if input_format != 'raw':
> >>+temp_dir = tempfile.mkdtemp()
> >
> > Consider using the tempfile.TemporaryDirectory as with... context
> > manager so that the temp dir cleanup is performed automatically
>
> I don't think I can do that directly here because the temp dir has to
> live until the very end (qemu-storage-daemon needs it).
>
> >>+pid_file = temp_dir + "/pid"
> >>+raw_file = temp_dir + "/raw"
> >>+open(raw_file, 'wb').close()
> >
> > Consider using a with open(...) open manager for opening the file
>
> How would that be? Like this?
>
> with open(raw_file, 'wb'):
> pass
>
> If so I don't see the benefit, I just need to create an empty file and
> close it immediately.

My only argument here is that it's "more pythonic" which I know is of
little value and consequence :) Feel free to ignore! They were mere
suggestions.

>
> >>+atexit.register(kill_storage_daemon, pid_file, raw_file, temp_dir)
> >
> > Hm, this too could be a context manager. Seems very C-like to use
> > atexit here.
>
> Yeah it is, but I think that using the context manager would require me
> to split the main function in two, and I'm not sure that it's worth it
> for this case. Other Python scripts in the QEMU repo use atexit already.
>
> >>+ret = subprocess.run(["qemu-storage-daemon", "--daemonize", 
> >>"--pidfile", pid_file,
> >>+  "--blockdev", 
> >>f"driver=file,node-name=file0,driver=file,filename={input_file},read-only=on",
> >>+  "--blockdev", 
> >>f"driver={input_format},node-name=disk0,file=file0,read-only=on",
> >>+  "--export", 
> >>f"type=fuse,id=export0,node-name=disk0,mountpoint={raw_file},writable=off"])
> >
> > You can add shell=True, check=False arguments to subprocess.run() so
> > that it captures the outputs. (check

[PATCH v4 14/19] hw/arm/smmu: Support nesting in the rest of commands

2024-07-01 Thread Mostafa Saleh

Some commands need rework for nesting, as they used to assume S1
and S2 are mutually exclusive:

- CMD_TLBI_NH_ASID: Consider VMID if stage-2 is supported
- CMD_TLBI_NH_ALL: Consider VMID if stage-2 is supported, otherwise
  invalidate everything, this required a new vmid invalidation
  function for stage-1 only (ASID >= 0)

Also, rework trace events to reflect the new implementation.

Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmu-common.c | 16 
 hw/arm/smmuv3.c  | 28 ++--
 hw/arm/trace-events  |  6 --
 include/hw/arm/smmu-common.h |  1 +
 4 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index d0309a95b2..faba4adc49 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -178,6 +178,16 @@ static gboolean smmu_hash_remove_by_vmid(gpointer key, 
gpointer value,
 return SMMU_IOTLB_VMID(*iotlb_key) == vmid;
 }
 
+static gboolean smmu_hash_remove_by_vmid_s1(gpointer key, gpointer value,
+gpointer user_data)
+{
+int vmid = *(int *)user_data;
+SMMUIOTLBKey *iotlb_key = (SMMUIOTLBKey *)key;
+
+return (SMMU_IOTLB_VMID(*iotlb_key) == vmid) &&
+   (SMMU_IOTLB_ASID(*iotlb_key) >= 0);
+}
+
 static gboolean smmu_hash_remove_by_asid_vmid_iova(gpointer key, gpointer 
value,
   gpointer user_data)
 {
@@ -288,6 +298,12 @@ void smmu_iotlb_inv_vmid(SMMUState *s, int vmid)
 g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid, &vmid);
 }
 
+inline void smmu_iotlb_inv_vmid_s1(SMMUState *s, int vmid)
+{
+trace_smmu_iotlb_inv_vmid_s1(vmid);
+g_hash_table_foreach_remove(s->iotlb, smmu_hash_remove_by_vmid_s1, &vmid);
+}
+
 /* VMSAv8-64 Translation */
 
 /**
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 928f125523..e9007af3cd 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1321,25 +1321,49 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_TLBI_NH_ASID:
 {
 int asid = CMD_ASID(&cmd);
+int vmid = -1;
 
 if (!STAGE1_SUPPORTED(s)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
+/*
+ * VMID is only matched when stage 2 is supported, otherwise set it
+ * to -1 as the value used for stage-1 only VMIDs.
+ */
+if (STAGE2_SUPPORTED(s)) {
+vmid = CMD_VMID(&cmd);
+}
+
 trace_smmuv3_cmdq_tlbi_nh_asid(asid);
 smmu_inv_notifiers_all(&s->smmu_state);
-smmu_iotlb_inv_asid_vmid(bs, asid, -1);
+smmu_iotlb_inv_asid_vmid(bs, asid, vmid);
 break;
 }
 case SMMU_CMD_TLBI_NH_ALL:
+{
+int vmid = -1;
+
 if (!STAGE1_SUPPORTED(s)) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
+
+/*
+ * If stage-2 is supported, invalidate for this VMID only, 
otherwise
+ * invalidate the whole thing.
+ */
+if (STAGE2_SUPPORTED(s)) {
+vmid = CMD_VMID(&cmd);
+trace_smmuv3_cmdq_tlbi_nh(vmid);
+smmu_iotlb_inv_vmid_s1(bs, vmid);
+break;
+}
 QEMU_FALLTHROUGH;
+}
 case SMMU_CMD_TLBI_NSNH_ALL:
-trace_smmuv3_cmdq_tlbi_nh();
+trace_smmuv3_cmdq_tlbi_nsnh();
 smmu_inv_notifiers_all(&s->smmu_state);
 smmu_iotlb_inv_all(bs);
 break;
diff --git a/hw/arm/trace-events b/hw/arm/trace-events
index 7d9c1703da..593cc571da 100644
--- a/hw/arm/trace-events
+++ b/hw/arm/trace-events
@@ -11,8 +11,9 @@ smmu_ptw_page_pte(int stage, int level,  uint64_t iova, 
uint64_t baseaddr, uint6
 smmu_ptw_block_pte(int stage, int level, uint64_t baseaddr, uint64_t pteaddr, 
uint64_t pte, uint64_t iova, uint64_t gpa, int bsize_mb) "stage=%d level=%d 
base@=0x%"PRIx64" pte@=0x%"PRIx64" pte=0x%"PRIx64" iova=0x%"PRIx64" block 
address = 0x%"PRIx64" block size = %d MiB"
 smmu_get_pte(uint64_t baseaddr, int index, uint64_t pteaddr, uint64_t pte) 
"baseaddr=0x%"PRIx64" index=0x%x, pteaddr=0x%"PRIx64", pte=0x%"PRIx64
 smmu_iotlb_inv_all(void) "IOTLB invalidate all"
-smmu_iotlb_inv_asid(int asid) "IOTLB invalidate asid=%d"
+smmu_iotlb_inv_asid_vmid(int asid, int vmid) "IOTLB invalidate asid=%d vmid=%d"
 smmu_iotlb_inv_vmid(int vmid) "IOTLB invalidate vmid=%d"
+smmu_iotlb_inv_vmid_s1(int vmid) "IOTLB invalidate vmid=%d"
 smmu_iotlb_inv_iova(int asid, uint64_t addr) "IOTLB invalidate asid=%d 
addr=0x%"PRIx64
 smmu_inv_notifiers_mr(const char *name) "iommu mr=%s"
 smmu_iotlb_lookup_hit(int asid, int vmid, uint64_t addr, uint32_t hit, 
uint32_t miss, uint32_t p) "IOTLB cache HIT asid=%d vmid=%d addr=0x%"PRIx64" 
hit=%d miss=%d hit rate=%d"
@@ -47,7 +48,8 @@ smmuv3_cmdq_cfgi_cd(uint32_t si

[PATCH v4 08/19] hw/arm/smmuv3: Translate CD and TT using stage-2 table

2024-07-01 Thread Mostafa Saleh

According to ARM SMMU architecture specification (ARM IHI 0070 F.b),
In "5.2 Stream Table Entry":
 [51:6] S1ContextPtr
 If Config[1] == 1 (stage 2 enabled), this pointer is an IPA translated by
 stage 2 and the programmed value must be within the range of the IAS.

In "5.4.1 CD notes":
 The translation table walks performed from TTB0 or TTB1 are always performed
 in IPA space if stage 2 translations are enabled.

This patch implements translation of the S1 context descriptor pointer and
TTBx base addresses through the S2 stage (IPA -> PA)

smmuv3_do_translate() is updated to have one arg which is translation
class, this is useful to:
 - Decide wether a translation is stage-2 only or use the STE config.
 - Populate the class in case of faults, WALK_EABT is left unchanged,
   as it is always triggered from TT access so no need to use the
   input class.

In case for stage-2 only translation, used in the context of nested
translation, the stage and asid are saved and restored before and
after calling smmu_translate().

Translating CD or TTBx can fail for the following reasons:
1) Large address size: This is described in
   (3.4.3 Address sizes of SMMU-originated accesses)
   - For CD ptr larger than IAS, for SMMUv3.1, it can trigger either
 C_BAD_STE or Translation fault, we implement the latter as it
 requires no extra code.
   - For TTBx, if larger than the effective stage 1 output address size, it
 triggers C_BAD_CD.

2) Faults from PTWs (7.3 Event records)
   - F_ADDR_SIZE: large address size after first level causes stage 2 Address
 Size fault (Also in 3.4.3 Address sizes of SMMU-originated accesses)
   - F_PERMISSION: Same as an address translation. However, when
 CLASS == CD, the access is implicitly Data and a read.
   - F_ACCESS: Same as an address translation.
   - F_TRANSLATION: Same as an address translation.
   - F_WALK_EABT: Same as an address translation.
  These are already implemented in the PTW logic, so no extra handling
  required.

Reviewed-by: Eric Auger 
Signed-off-by: Mostafa Saleh 
---
 hw/arm/smmuv3.c | 91 +++--
 1 file changed, 80 insertions(+), 11 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 229b3c388c..86f95c1e40 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -337,14 +337,35 @@ static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, 
STE *buf,
 
 }
 
+static SMMUTranslationStatus smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
+ SMMUTransCfg *cfg,
+ SMMUEventInfo *event,
+ IOMMUAccessFlags flag,
+ SMMUTLBEntry **out_entry,
+ SMMUTranslationClass class);
 /* @ssid > 0 not supported yet */
-static int smmu_get_cd(SMMUv3State *s, STE *ste, uint32_t ssid,
-   CD *buf, SMMUEventInfo *event)
+static int smmu_get_cd(SMMUv3State *s, STE *ste, SMMUTransCfg *cfg,
+   uint32_t ssid, CD *buf, SMMUEventInfo *event)
 {
 dma_addr_t addr = STE_CTXPTR(ste);
 int ret, i;
+SMMUTranslationStatus status;
+SMMUTLBEntry *entry;
 
 trace_smmuv3_get_cd(addr);
+
+if (cfg->stage == SMMU_NESTED) {
+status = smmuv3_do_translate(s, addr, cfg, event,
+ IOMMU_RO, &entry, SMMU_CLASS_CD);
+
+/* Same PTW faults are reported but with CLASS = CD. */
+if (status != SMMU_TRANS_SUCCESS) {
+return -EINVAL;
+}
+
+addr = CACHED_ENTRY_TO_ADDR(entry, addr);
+}
+
 /* TODO: guarantee 64-bit single-copy atomicity */
 ret = dma_memory_read(&address_space_memory, addr, buf, sizeof(*buf),
   MEMTXATTRS_UNSPECIFIED);
@@ -659,10 +680,13 @@ static int smmu_find_ste(SMMUv3State *s, uint32_t sid, 
STE *ste,
 return 0;
 }
 
-static int decode_cd(SMMUTransCfg *cfg, CD *cd, SMMUEventInfo *event)
+static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
+ CD *cd, SMMUEventInfo *event)
 {
 int ret = -EINVAL;
 int i;
+SMMUTranslationStatus status;
+SMMUTLBEntry *entry;
 
 if (!CD_VALID(cd) || !CD_AARCH64(cd)) {
 goto bad_cd;
@@ -713,9 +737,26 @@ static int decode_cd(SMMUTransCfg *cfg, CD *cd, 
SMMUEventInfo *event)
 
 tt->tsz = tsz;
 tt->ttb = CD_TTB(cd, i);
+
 if (tt->ttb & ~(MAKE_64BIT_MASK(0, cfg->oas))) {
 goto bad_cd;
 }
+
+/* Translate the TTBx, from IPA to PA if nesting is enabled. */
+if (cfg->stage == SMMU_NESTED) {
+status = smmuv3_do_translate(s, tt->ttb, cfg, event, IOMMU_RO,
+ &entry, SMMU_CLASS_TT);
+/*
+ * Same PTW faults are reported but with CLASS = TT.
+ * If TTBx is larger than the effective stage 1 output addres
+

Re: [PATCH] hw/misc/bcm2835_thermal: Handle invalid address accesses gracefully

2024-07-01 Thread Philippe Mathieu-Daudé


Hi Zheyu,

On 30/6/24 17:14, Zheyu Ma wrote:

This commit handles invalid address accesses gracefully in both read and write
functions. Instead of asserting and aborting, it logs an error message and 
returns
a neutral value for read operations and does nothing for write operations.

Error log:
ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code should not be 
reached
Bail out! ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code should 
not be reached
Aborted

Reproducer:
cat << EOF | qemu-system-aarch64 -display \
none -machine accel=qtest, -m 512M -machine raspi3b -m 1G -qtest stdio
readw 0x3f212003


Thanks for this very interesting bug report (and reproducer).


EOF

Signed-off-by: Zheyu Ma 
---
  hw/misc/bcm2835_thermal.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/misc/bcm2835_thermal.c b/hw/misc/bcm2835_thermal.c
index ee7816b8a5..5c2a429d58 100644
--- a/hw/misc/bcm2835_thermal.c
+++ b/hw/misc/bcm2835_thermal.c
@@ -51,8 +51,10 @@ static uint64_t bcm2835_thermal_read(void *opaque, hwaddr 
addr, unsigned size)
  val = FIELD_DP32(bcm2835_thermal_temp2adc(25), STAT, VALID, true);
  break;
  default:
-/* MemoryRegionOps are aligned, so this can not happen. */
-g_assert_not_reached();


Like Xingtao Yao mentioned, I believe the current code is correct
and shouldn't be reached.

Why is it reached? You might have uncovered a core memory bug.

Likely around access_with_adjusted_size() in system/memory.c.

I'll keep investigating, but so far it reminds me a previous
patch from Andrew, but it isn't the fix:
https://patchwork.ozlabs.org/project/qemu-devel/patch/20170630030058.28943-1-and...@aj.id.au/


+qemu_log_mask(LOG_GUEST_ERROR,
+  "bcm2835_thermal_read: invalid address 0x%"
+  HWADDR_PRIx "\n", addr);
+val = 0;
  }
  return val;
  }

Re: [PATCH v2 10/14] hw: set deprecation info for all versioned machine types

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This calls the MACHINE_VER_DEPRECATION() macro in the definition of
all machine type classes which support versioning. This ensures
that they will automatically get deprecation info set when they
reach the appropriate point in their lifecycle.

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/arm/virt.c  | 1 +
  hw/m68k/virt.c | 1 +
  hw/ppc/spapr.c | 1 +
  hw/s390x/s390-virtio-ccw.c | 1 +
  include/hw/i386/pc.h   | 1 +
  5 files changed, 5 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 12/14] hw/ppc: remove obsolete manual deprecation reason string of spapr machines

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

The automatic deprecation mechanism introduced in the preceeding patches
will mark every spapr machine upto and including 2.12 as deprecated. As
such we can revert the manually added deprecation which was a subset:

   commit 1392617d35765d5d912625fbb5cab1ffbed8e140
   Author: Cédric Le Goater 
   Date:   Tue Jan 23 16:37:02 2024 +1000

 spapr: Tag pseries-2.1 - 2.11 machines as deprecated

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/ppc/spapr.c | 1 -
  1 file changed, 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 13/14] hw/i386: remove obsolete manual deprecation reason string of i440fx machines

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

The automatic deprecation mechanism introduced in the preceeding patches
will mark every i440fx machine upto and including 2.12 as deprecated. As
such we can revert the manually added deprecation introduced in:

   commit 792b4fdd4eb8197bd6eb9e80a1dfaf0cb3b54aeb
   Author: Philippe Mathieu-Daudé 
   Date:   Wed Feb 28 10:34:35 2024 +0100

 hw/i386/pc: Deprecate 2.4 to 2.12 pc-i440fx machines

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/i386/pc_piix.c | 1 -
  1 file changed, 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch

2024-07-01 Thread Miguel Luis

Hi Salil,

> On 13 Jun 2024, at 23:36, Salil Mehta  wrote:
> 
> PROLOGUE
> 
> 
> To assist in review and set the right expectations from this RFC, please first
> read the sections *APPENDED AT THE END* of this cover letter:
> 
> 1. Important *DISCLAIMER* [Section (X)]
> 2. Work presented at KVMForum Conference (slides available) [Section (V)F]
> 3. Organization of patches [Section (XI)]
> 4. References [Section (XII)]
> 5. Detailed TODO list of leftover work or work-in-progress [Section (IX)]
> 
> There has been interest shown by other organizations in adapting this series
> for their architecture. Hence, RFC V2 [21] has been split into architecture
> *agnostic* [22] and *specific* patch sets.
> 
> This is an ARM architecture-specific patch set carved out of RFC V2. Please
> check section (XI)B for details of architecture agnostic patches.
> 
> SECTIONS [I - XIII] are as follows:
> 
> (I) Key Changes [details in last section (XIV)]
> ==
> 
> RFC V2 -> RFC V3
> 
> 1. Split into Architecture *agnostic* (V13) [22] and *specific* (RFC V3) 
> patch sets.
> 2. Addressed comments by Gavin Shan (RedHat), Shaoqin Huang (RedHat), 
> Philippe Mathieu-Daudé (Linaro),
>   Jonathan Cameron (Huawei), Zhao Liu (Intel).
> 

I’ve tested this series along with v10 kernel patches from [1] on the following 
items:

Boot.
Hotplug up to maxcpus.
Hot unplug down to the number of boot cpus.
Hotplug vcpus then migrate to a new VM.
Hot unplug down to the number of boot cpus then migrate to a new VM.
Up to 6 successive live migrations.

And in which LGTM.

Please feel free to add,
Tested-by: Miguel Luis 

Regards,
Miguel

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/vcpu-hotplug

> RFC V1 -> RFC V2
> 
> RFC V1: 
> https://lore.kernel.org/qemu-devel/20200613213629.21984-1-salil.me...@huawei.com/
> 
> 1. ACPI MADT Table GIC CPU Interface can now be presented [6] as ACPI
>   *online-capable* or *enabled* to the Guest OS at boot time. This means
>   associated CPUs can have ACPI _STA as *enabled* or *disabled* even after 
> boot.
>   See UEFI ACPI 6.5 Spec, Section 05, Table 5.37 GICC CPU Interface Flags[20].
> 2. SMCC/HVC Hypercall exit handling in userspace/Qemu for PSCI CPU_{ON,OFF}
>   request. This is required to {dis}allow online'ing a vCPU.
> 3. Always presenting unplugged vCPUs in CPUs ACPI AML code as ACPI 
> _STA.PRESENT 
>   to the Guest OS. Toggling ACPI _STA.Enabled to give an effect of the
>   hot{un}plug.
> 4. Live Migration works (some issues are still there).
> 5. TCG/HVF/qtest does not support Hotplug and falls back to default.
> 6. Code for TCG support exists in this release (it is a work-in-progress).
> 7. ACPI _OSC method can now be used by OSPM to negotiate Qemu VM platform
>   hotplug capability (_OSC Query support still pending).
> 8. Misc. Bug fixes.
> 
> (II) Summary
> 
> 
> This patch set introduces virtual CPU hotplug support for the ARMv8 
> architecture
> in QEMU. The idea is to be able to hotplug and hot-unplug vCPUs while the 
> guest VM
> is running, without requiring a reboot. This does *not* make any assumptions 
> about
> the physical CPU hotplug availability within the host system but rather tries 
> to
> solve the problem at the virtualizer/QEMU layer. It introduces ACPI CPU 
> hotplug hooks
> and event handling to interface with the guest kernel, and code to 
> initialize, plug,
> and unplug CPUs. No changes are required within the host kernel/KVM except the
> support of hypercall exit handling in the user-space/Qemu, which has recently
> been added to the kernel. Corresponding guest kernel changes have been
> posted on the mailing list [3] [4] by James Morse.
> 
> (III) Motivation
> 
> 
> This allows scaling the guest VM compute capacity on-demand, which would be
> useful for the following example scenarios:
> 
> 1. Vertical Pod Autoscaling [9][10] in the cloud: Part of the orchestration
>   framework that could adjust resource requests (CPU and Mem requests) for
>   the containers in a pod, based on usage.
> 2. Pay-as-you-grow Business Model: Infrastructure providers could allocate and
>   restrict the total number of compute resources available to the guest VM
>   according to the SLA (Service Level Agreement). VM owners could request more
>   compute to be hot-plugged for some cost.
> 
> For example, Kata Container VM starts with a minimum amount of resources 
> (i.e.,
> hotplug everything approach). Why?
> 
> 1. Allowing faster *boot time* and
> 2. Reduction in *memory footprint*
> 
> Kata Container VM can boot with just 1 vCPU, and then later more vCPUs can be
> hot-plugged as needed.
> 
> (IV) Terminology
> 
> 
> (*) Possible CPUs: Total vCPUs that could ever exist in the VM. This includes
>   any cold-booted CPUs plus any CPUs that could be later
>   hot-plugged.
>   - Qemu parameter (-smp maxcpus=N)
> (*) Pres

[PATCH 1/2] graph-lock: make sure reader_count access is atomic

2024-07-01 Thread Wolfgang Bumiller

There's one case where `reader_count` is accessed non-atomically. This
was likely seen as being "guarded by a mutex" held in that block, but
other access to this does not actually depend on the mutex and already
uses atomics.

Additionally this replaces the pattern of atomic_set(atomic_read() +
1) with qatomic_inc() (and -1 with _dec)

Signed-off-by: Wolfgang Bumiller 
---
 block/graph-lock.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/graph-lock.c b/block/graph-lock.c
index c81162b147..32fb29b841 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -176,8 +176,7 @@ void coroutine_fn bdrv_graph_co_rdlock(void)
 bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
 
 for (;;) {
-qatomic_set(&bdrv_graph->reader_count,
-bdrv_graph->reader_count + 1);
+qatomic_inc(&bdrv_graph->reader_count);
 /* make sure writer sees reader_count before we check has_writer */
 smp_mb();
 
@@ -226,7 +225,7 @@ void coroutine_fn bdrv_graph_co_rdlock(void)
 }
 
 /* slow path where reader sleeps */
-bdrv_graph->reader_count--;
+qatomic_dec(&bdrv_graph->reader_count);
 aio_wait_kick();
 qemu_co_queue_wait(&reader_queue, &aio_context_list_lock);
 }
@@ -238,8 +237,7 @@ void coroutine_fn bdrv_graph_co_rdunlock(void)
 BdrvGraphRWlock *bdrv_graph;
 bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
 
-qatomic_store_release(&bdrv_graph->reader_count,
-  bdrv_graph->reader_count - 1);
+qatomic_dec(&bdrv_graph->reader_count);
 /* make sure writer sees reader_count before we check has_writer */
 smp_mb();
 
-- 
2.39.2

[PATCH 0/2] change some odd-looking atomic uses

2024-07-01 Thread Wolfgang Bumiller

I spotted the weird-looking pattern of:
atomic_set(atomic_load()  N)
in a few palces and one variable in the graph-lock code which was used with
atomics except for a single case, which also seemed suspicious.

I'm not sure if there are any known compiler-optimizations or ordering
semantics already ensuring that these operations are indeed working correctly
atomically, so I thought I'd point them out and ask about it by sending
patches.

In patch 2 the ordering is changed (see the note in its mail)

Wolfgang Bumiller (2):
  graph-lock: make sure reader_count access is atomic
  atomics: replace fetch-use-store with direct atomic operations

 block/graph-lock.c | 8 +++-
 util/aio-posix.c   | 3 +--
 util/aio-win32.c   | 3 +--
 util/async.c   | 2 +-
 4 files changed, 6 insertions(+), 10 deletions(-)

-- 
2.39.2

[PATCH 2/2] atomics: replace fetch-use-store with direct atomic operations

2024-07-01 Thread Wolfgang Bumiller

Replaces the pattern `atomic_store(atomic_load()  something)`
pattern with its direct atomic function.

Signed-off-by: Wolfgang Bumiller 
---
Note: these previously used RELEASE ordering for the store and `relaxed`
ordering for the reads, while the replacement uses SEQ_CST, as there are no
other wrappers around yet.  Should we add `qatomic_fetch_{sub,and}_release`
variants?

 util/aio-posix.c | 3 +--
 util/aio-win32.c | 3 +--
 util/async.c | 2 +-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 266c9dd35f..9cf7fed8fc 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -672,8 +672,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 if (use_notify_me) {
 /* Finish the poll before clearing the flag.  */
-qatomic_store_release(&ctx->notify_me,
- qatomic_read(&ctx->notify_me) - 2);
+qatomic_fetch_sub(&ctx->notify_me, 2);
 }
 
 aio_notify_accept(ctx);
diff --git a/util/aio-win32.c b/util/aio-win32.c
index d144f9391f..ff6d1ebf97 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -387,8 +387,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 ret = WaitForMultipleObjects(count, events, FALSE, timeout);
 if (blocking) {
 assert(first);
-qatomic_store_release(&ctx->notify_me,
-  qatomic_read(&ctx->notify_me) - 2);
+qatomic_fetch_sub(&ctx->notify_me, 2);
 aio_notify_accept(ctx);
 }
 
diff --git a/util/async.c b/util/async.c
index 0467890052..d17deeceea 100644
--- a/util/async.c
+++ b/util/async.c
@@ -330,7 +330,7 @@ aio_ctx_check(GSource *source)
 BHListSlice *s;
 
 /* Finish computing the timeout before clearing the flag.  */
-qatomic_store_release(&ctx->notify_me, qatomic_read(&ctx->notify_me) & ~1);
+qatomic_fetch_and(&ctx->notify_me, ~1);
 aio_notify_accept(ctx);
 
 QSLIST_FOREACH_RCU(bh, &ctx->bh_list, next) {
-- 
2.39.2

Re: [PATCH 0/4] Drop ifdef for macOS versions older than 12.0

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 07:26, Akihiko Odaki  wrote:
>
> macOS versions older than 12.0 are no longer supported.
>
> docs/about/build-platforms.rst says:
> > Support for the previous major version will be dropped 2 years after
> > the new major version is released or when the vendor itself drops
> > support, whichever comes first.
>
> macOS 12.0 was released 2021:
> https://www.apple.com/newsroom/2021/10/macos-monterey-is-now-available/


Further, we have already dropped support for macos < 12
in the ui/cocoa.m code in commit 2d27c91e2b72ac.

For the whole series:
Reviewed-by: Peter Maydell 

> Signed-off-by: Akihiko Odaki 

PS: you don't need to put a signed-off-by line on the cover
letter, only in the individual patches.

thanks
-- PMM

Re: [PATCH] tests/qtest: Fix STM32L4x5 SYSCFG irq line 15 state assumption

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 11:46, Inès Varhol  wrote:
>
> The QTest `test_irq_pin_multiplexer` makes the assumption that the
> reset state of irq line 15 is low, which is false since STM32L4x5 GPIO
> was implemented (the reset state of pin GPIOA15 is high because there's
> pull-up and it results in the irq line 15 also being high at reset).
>
> It wasn't triggering an error because `test_interrupt` was mistakenly
> "resetting" the line low.
>
> This commit corrects these two mistakes by :
> - not setting the line low in `test_interrupt`
> - using an irq line in `test_irq_pin_multiplexer` which is low at reset
>
> Signed-off-by: Inès Varhol 
> ---



Applied to target-arm.next, thanks.

-- PMM

Re: [PATCH 0/2] hw/misc: Fix STM32L4x5 EXTI interrupts

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 12:08, Inès Varhol  wrote:
>
> STM32L4x5 EXTI was incorrectly expecting alternating interrupts.
> This patch adds a new field to track IRQ levels to actually
> *detect* edges.
> It also corrects existing QTests which were modifying the IRQ lines'
> levels.
>
> Signed-off-by: Inès Varhol 
>



Applied to target-arm.next, thanks.

-- PMM

Re: [PATCH 2/3] target/arm: Always add pmu property

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 13:51, Akihiko Odaki  wrote:
>
> kvm-steal-time and sve properties are added for KVM even if the
> corresponding features are not available. Always add pmu property too.
>
> Signed-off-by: Akihiko Odaki 
> ---
>  target/arm/cpu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 35fa281f1b98..0da72c12a5bd 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1770,9 +1770,10 @@ void arm_cpu_post_init(Object *obj)
>
>  if (arm_feature(&cpu->env, ARM_FEATURE_PMU)) {
>  cpu->has_pmu = true;
> -object_property_add_bool(obj, "pmu", arm_get_pmu, arm_set_pmu);
>  }
>
> +object_property_add_bool(obj, "pmu", arm_get_pmu, arm_set_pmu);

This will allow the user to set the ARM_FEATURE_PMU feature
bit on TCG CPUs where that doesn't make sense. If we want to
make the property visible on all CPUs, we need to make it
be an error to set it when it's not valid to set it (probably
by adding some TCG/hvf equivalent to the "raise an error
in arm_set_pmu()" code branch we already have for KVM).

thanks
-- PMM

Re: [PATCH v3 04/15] hw/core/machine: Add igvm-cfg object and processing for IGVM files

2024-07-01 Thread Roy Hopkins

On Fri, 2024-06-28 at 12:23 +0100, Daniel P. Berrangé wrote:
> On Fri, Jun 28, 2024 at 12:09:59PM +0100, Roy Hopkins wrote:
> > On Mon, 2024-06-24 at 15:00 +0100, Daniel P. Berrangé wrote:
> > > On Fri, Jun 21, 2024 at 03:29:07PM +0100, Roy Hopkins wrote:
> > > > An IGVM file contains configuration of guest state that should be
> > > > applied during configuration of the guest, before the guest is started.
> > > > 
> > > > This patch allows the user to add an igvm-cfg object to the machine
> > > > configuration that allows an IGVM file to be configured that will be
> > > > applied to the guest before it is started.
> > > > 
> > > > If an IGVM configuration is provided then the IGVM file is processed at
> > > > the end of the board initialization, before the state transition to
> > > > PHASE_MACHINE_INITIALIZED.
> > > > 
> > > > Signed-off-by: Roy Hopkins 
> > > > ---
> > > >  include/hw/boards.h |  2 ++
> > > >  hw/core/machine.c   | 20 
> > > >  qemu-options.hx | 25 +
> > > >  3 files changed, 47 insertions(+)
> 
> snip
> 
> > > This adds igvm-cfg for all machines, regardless of architecture target.
> > > 
> > > Are igvm files fully cross-platform portable, or should we just put
> > > this into the TYPE_X86_MACHINE base class to limit it ?
> > > 
> > > It at least reports errors if I try to load an IGVM file with
> > > qemu-system-aarch64 + virt type
> > > 
> > > $ ./build/qemu-system-aarch64 -object igvm-cfg,file=../buildigvm/ovmf-
> > > sev.igvm,id=igvm -machine virt,igvm-cfg=igvm
> > > qemu-system-aarch64: IGVM file does not describe a compatible supported
> > > platform
> > > 
> > > so that's good.
> > 
> > The IGVM specification is designed to support non X86 platforms, hence its
> > inclusion for all machines. Support for non-X86 is likely to result in
> > changes
> > to the specification though that will impact the library we depend on.
> > 
> > There would obviously need to be some further implementation to support non-
> > X86
> > machines in QEMU, in the same way that further implementation is required to
> > support other X86 confidential computing platforms such as TDX.
> > 
> > So, this poses the question: should we move it to TYPE_X86_MACHINE as the
> > current supported platforms are all on X86? Or should we leave it where it
> > is
> > with a view to adding non X86 platform support with less impact later? I'd
> > appreciate your views on this.
> 
> The pro of putting it in the base machine class is that we don't need to
> repeat the property creation in each machine as we broaden support to other
> arches, I presume aarch64 is probably most likely future candidate.
> 
> The downside of putting it in the base machine class is that it limits
> mgmt app ability to probe QEMU for support. ie it wouldn't be possible
> to probe whether individual machines support it or not, as we broaden
> QEMU support.
> 
> Then again, we will already face that limited ability to probe even on
> x86, as we won't be able to query whether IGVM is SNP only, or has been
> extended to TDX too.
> 
> With my libvirt hat on, probing is still probably the more important
> factor, so would push towards putting the property just to individual
> machine classes that definitely have been wired up to be able to use
> it.
> 
> With regards,
> Daniel

Ok, thanks. I'll move the instance to 'X86MachineState' and the call to 
'igvm->process()' into 'pc_q35_init()' and 'pc_init1()'.

Regards,
Roy

Re: [PATCH 0/4] Drop ifdef for macOS versions older than 12.0

2024-07-01 Thread Akihiko Odaki


On 2024/07/01 20:43, Peter Maydell wrote:

On Sat, 29 Jun 2024 at 07:26, Akihiko Odaki  wrote:


macOS versions older than 12.0 are no longer supported.

docs/about/build-platforms.rst says:

Support for the previous major version will be dropped 2 years after
the new major version is released or when the vendor itself drops
support, whichever comes first.


macOS 12.0 was released 2021:
https://www.apple.com/newsroom/2021/10/macos-monterey-is-now-available/



Further, we have already dropped support for macos < 12
in the ui/cocoa.m code in commit 2d27c91e2b72ac.

For the whole series:
Reviewed-by: Peter Maydell 


Signed-off-by: Akihiko Odaki 


PS: you don't need to put a signed-off-by line on the cover
letter, only in the individual patches.


I have been using b4 (https://b4.docs.kernel.org/en/latest/) these days 
and it automatically appends Signed-off-by to the cover letter.


Regards,
Akihiko Odaki

Re: [PATCH v4 01/14] tests/tcg/minilib: Constify digits in print_num

2024-07-01 Thread Philippe Mathieu-Daudé


On 30/6/24 21:00, Richard Henderson wrote:

This avoids a memcpy to the stack when compiled with clang.
Since we don't enable optimization, nor provide memcpy,
this results in an undefined symbol error at link time.

Signed-off-by: Richard Henderson 
---
  tests/tcg/minilib/printf.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/3] target/arm: Always add pmu property

2024-07-01 Thread Akihiko Odaki


On 2024/07/01 20:54, Peter Maydell wrote:

On Sat, 29 Jun 2024 at 13:51, Akihiko Odaki  wrote:


kvm-steal-time and sve properties are added for KVM even if the
corresponding features are not available. Always add pmu property too.

Signed-off-by: Akihiko Odaki 
---
  target/arm/cpu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 35fa281f1b98..0da72c12a5bd 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1770,9 +1770,10 @@ void arm_cpu_post_init(Object *obj)

  if (arm_feature(&cpu->env, ARM_FEATURE_PMU)) {
  cpu->has_pmu = true;
-object_property_add_bool(obj, "pmu", arm_get_pmu, arm_set_pmu);
  }

+object_property_add_bool(obj, "pmu", arm_get_pmu, arm_set_pmu);


This will allow the user to set the ARM_FEATURE_PMU feature
bit on TCG CPUs where that doesn't make sense. If we want to
make the property visible on all CPUs, we need to make it
be an error to set it when it's not valid to set it (probably
by adding some TCG/hvf equivalent to the "raise an error
in arm_set_pmu()" code branch we already have for KVM).


Doesn't TCG support PMU though?
Certainly hvf needs some care on the other hand.

Regards,
Akihiko Odaki

Re: [PATCH] hw/char/pl011: ensure UARTIBRD register is 16-bit

2024-07-01 Thread Peter Maydell

On Sun, 30 Jun 2024 at 18:00, Zheyu Ma  wrote:
>
> The PL011 TRM says that "The 16-bit integer is written to the Integer Baud 
> Rate
> Register, UARTIBRD". Updated the handling of the UARTIBRD register to ensure
> only 16-bit values are written to it.

Thanks for this patch.

I think we could improve the patch commit message:
Have the subject state the problem we're fixing rather than
the solution:
 hw/char/pl011: Avoid division-by-zero in pl011_get_baudrate()

and then have the commit message say why this happens
and why we make the change we do:

 In pl011_get_baudrate(), when we calculate the baudrate
 we can accidentally divide by zero. This happens because
 although (as the specification requires) we treat UARTIBRD = 0
 as invalid, we aren't correctly limiting UARTIBRD and UARTFBRD
 values to the 16-bit and 5-bit ranges the hardware allows,
 and so some non-zero values of UARTIBRD can result in
 a zero divisor.

 Enforce the correct register field widths on guest writes
 and on inbound migration to avoid the division by zero.

(I mention here a few things which I'm about to comment on below.)

> ASAN log:
> ==2973125==ERROR: AddressSanitizer: FPE on unknown address 0x55f72629b348 (pc 
> 0x55f72629b348 bp 0x7fffa24d0e00 sp 0x7fffa24d0d60 T0)
> #0 0x55f72629b348 in pl011_get_baudrate hw/char/pl011.c:255:17
> #1 0x55f726298d94 in pl011_trace_baudrate_change hw/char/pl011.c:260:33
> #2 0x55f726296fc8 in pl011_write hw/char/pl011.c:378:9
>
> Reproducer:
> cat << EOF | qemu-system-aarch64 -display \
> none -machine accel=qtest, -m 512M -machine realview-pb-a8 -qtest stdio
> writeq 0x1000b024 0xf800
> EOF
>
> Signed-off-by: Zheyu Ma 
> ---
>  hw/char/pl011.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/char/pl011.c b/hw/char/pl011.c
> index 8753b84a84..f962786e2a 100644
> --- a/hw/char/pl011.c
> +++ b/hw/char/pl011.c
> @@ -374,7 +374,7 @@ static void pl011_write(void *opaque, hwaddr offset,
>  s->ilpr = value;
>  break;
>  case 9: /* UARTIBRD */
> -s->ibrd = value;
> +s->ibrd = value & 0x;

This is necessary but not sufficient:
 * we also need to mask the write to s->fbrd (which is 6 bits);
   otherwise you can arrange a combination of s->ibrd
   and s->fbrd such that the addition in the divisor
   exactly overflows to 0
 * we should mask these also in pl011_post_load(), to prevent
   the division-by-zero in the "malicious inbound migration
   state" case.

Also, this source file (like most in QEMU) uses lower case
in hex numbers, i.e. 0x .

>  pl011_trace_baudrate_change(s);
>  break;
>  case 10: /* UARTFBRD */
> --
> 2.34.1

thanks
-- PMM

Re: [PATCH v5 0/3] vhost-user-blk: live resize additional APIs

2024-07-01 Thread Raphael Norwitz

I have no issues with these APIs, but I'm not a QMP expert so others
should review those bits.

For the vhost-user-blk code:

Acked-by: Raphael Norwitz 

On Tue, Jun 25, 2024 at 8:19 AM Vladimir Sementsov-Ogievskiy
 wrote:
>
> v5:
> 03: drop extra check on is is runstate running
>
>
> Vladimir Sementsov-Ogievskiy (3):
>   qdev-monitor: add option to report GenericError from find_device_state
>   vhost-user-blk: split vhost_user_blk_sync_config()
>   qapi: introduce device-sync-config
>
>  hw/block/vhost-user-blk.c | 27 ++--
>  hw/virtio/virtio-pci.c|  9 +++
>  include/hw/qdev-core.h|  3 +++
>  qapi/qdev.json| 24 ++
>  system/qdev-monitor.c | 53 ---
>  5 files changed, 105 insertions(+), 11 deletions(-)
>
> --
> 2.34.1
>

Re: [PATCH] hw/usb: Fix memory leak in musb_reset()

2024-07-01 Thread Peter Maydell

On Sun, 30 Jun 2024 at 17:33, Zheyu Ma  wrote:
>
> The musb_reset function was causing a memory leak by not properly freeing
> the memory associated with USBPacket instances before reinitializing them.
> This commit addresses the memory leak by adding calls to usb_packet_cleanup
> for each USBPacket instance before reinitializing them with usb_packet_init.
>
> Asan log:
>
> =2970623==ERROR: LeakSanitizer: detected memory leaks
> Direct leak of 256 byte(s) in 16 object(s) allocated from:
> #0 0x561e20629c3d in malloc 
> llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
> #1 0x7fee91885738 in g_malloc 
> (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
> #2 0x561e21b4d0e1 in usb_packet_init hw/usb/core.c:531:5
> #3 0x561e21c5016b in musb_reset hw/usb/hcd-musb.c:372:9
> #4 0x561e21c502a9 in musb_init hw/usb/hcd-musb.c:385:5
> #5 0x561e21c893ef in tusb6010_realize hw/usb/tusb6010.c:827:15
> #6 0x561e23443355 in device_set_realized hw/core/qdev.c:510:13
> #7 0x561e2346ac1b in property_set_bool qom/object.c:2354:5
> #8 0x561e23463895 in object_property_set qom/object.c:1463:5
> #9 0x561e23477909 in object_property_set_qobject qom/qom-qobject.c:28:10
> #10 0x561e234645ed in object_property_set_bool qom/object.c:1533:15
> #11 0x561e2343c830 in qdev_realize hw/core/qdev.c:291:12
> #12 0x561e2343c874 in qdev_realize_and_unref hw/core/qdev.c:298:11
> #13 0x561e20ad5091 in sysbus_realize_and_unref hw/core/sysbus.c:261:12
> #14 0x561e22553283 in n8x0_usb_setup hw/arm/nseries.c:800:5
> #15 0x561e2254e99b in n8x0_init hw/arm/nseries.c:1356:5
> #16 0x561e22561170 in n810_init hw/arm/nseries.c:1418:5
>
> Signed-off-by: Zheyu Ma 
> ---
>  hw/usb/hcd-musb.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/usb/hcd-musb.c b/hw/usb/hcd-musb.c
> index 6dca373cb1..0300aeaec6 100644
> --- a/hw/usb/hcd-musb.c
> +++ b/hw/usb/hcd-musb.c
> @@ -368,6 +368,8 @@ void musb_reset(MUSBState *s)
>  s->ep[i].maxp[1] = 0x40;
>  s->ep[i].musb = s;
>  s->ep[i].epnum = i;
> +usb_packet_cleanup(&s->ep[i].packey[0].p);
> +usb_packet_cleanup(&s->ep[i].packey[1].p);
>  usb_packet_init(&s->ep[i].packey[0].p);
>  usb_packet_init(&s->ep[i].packey[1].p);
>  }

I don't think it's valid to call usb_packet_cleanup() on
a USBPacket that's never been inited, which is what will
happen on the first reset with this patch.

Looking at how hcd-ehci.c handles its s->ipacket (which has
the same "allocated at device creation and reused for the
whole lifetime of the device" semantics) I think what we
want is:
 * at device init, call usb_packet_init()
 * at device reset, do nothing
 * at device finalize, call usb_packet_cleanup()

(There is currently no hcd-musb function for finalize
because the only uses of it are for devices which continue
to exist for the whole lifetime of the simulation. So we
can ignore the last part of that.)

PS: the tusb6010 and hcd-musb are both used only by deprecated
board types which are scheduled to be deleted by the end
of the year (basically after we get the 9.1 release out,
and before 9.2). You might prefer to target your fuzzing
efforts to board types which aren't deprecated.

thanks
-- PMM

[PATCH v2] hw/nvme: add cross namespace copy support

2024-07-01 Thread Arun Kumar

Extend copy command to copy user data across different namespaces via
support for specifying a namespace for each source range

Signed-off-by: Arun Kumar 
---

Notes:
v1->v2: updated commit message

 hw/nvme/ctrl.c   | 355 ---
 include/block/nvme.h |  37 +++--
 2 files changed, 289 insertions(+), 103 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..c181caae38 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -2696,6 +2696,7 @@ typedef struct NvmeCopyAIOCB {
 BlockAIOCB common;
 BlockAIOCB *aiocb;
 NvmeRequest *req;
+NvmeCtrl *n;
 int ret;
 
 void *ranges;
@@ -2714,6 +2715,8 @@ typedef struct NvmeCopyAIOCB {
 uint64_t slba;
 
 NvmeZone *zone;
+NvmeNamespace *sns;
+uint32_t tcl;
 } NvmeCopyAIOCB;
 
 static void nvme_copy_cancel(BlockAIOCB *aiocb)
@@ -2760,13 +2763,19 @@ static void nvme_copy_done(NvmeCopyAIOCB *iocb)
 
 static void nvme_do_copy(NvmeCopyAIOCB *iocb);
 
-static void nvme_copy_source_range_parse_format0(void *ranges, int idx,
- uint64_t *slba, uint32_t *nlb,
- uint16_t *apptag,
- uint16_t *appmask,
- uint64_t *reftag)
+static void nvme_copy_source_range_parse_format0_2(void *ranges,
+   int idx, uint64_t *slba,
+   uint32_t *nlb,
+   uint32_t *snsid,
+   uint16_t *apptag,
+   uint16_t *appmask,
+   uint64_t *reftag)
 {
-NvmeCopySourceRangeFormat0 *_ranges = ranges;
+NvmeCopySourceRangeFormat0_2 *_ranges = ranges;
+
+if (snsid) {
+*snsid = le32_to_cpu(_ranges[idx].sparams);
+}
 
 if (slba) {
 *slba = le64_to_cpu(_ranges[idx].slba);
@@ -2789,13 +2798,19 @@ static void nvme_copy_source_range_parse_format0(void 
*ranges, int idx,
 }
 }
 
-static void nvme_copy_source_range_parse_format1(void *ranges, int idx,
- uint64_t *slba, uint32_t *nlb,
- uint16_t *apptag,
- uint16_t *appmask,
- uint64_t *reftag)
+static void nvme_copy_source_range_parse_format1_3(void *ranges, int idx,
+   uint64_t *slba,
+   uint32_t *nlb,
+   uint32_t *snsid,
+   uint16_t *apptag,
+   uint16_t *appmask,
+   uint64_t *reftag)
 {
-NvmeCopySourceRangeFormat1 *_ranges = ranges;
+NvmeCopySourceRangeFormat1_3 *_ranges = ranges;
+
+if (snsid) {
+*snsid = le32_to_cpu(_ranges[idx].sparams);
+}
 
 if (slba) {
 *slba = le64_to_cpu(_ranges[idx].slba);
@@ -2827,18 +2842,20 @@ static void nvme_copy_source_range_parse_format1(void 
*ranges, int idx,
 
 static void nvme_copy_source_range_parse(void *ranges, int idx, uint8_t format,
  uint64_t *slba, uint32_t *nlb,
- uint16_t *apptag, uint16_t *appmask,
- uint64_t *reftag)
+ uint32_t *snsid, uint16_t *apptag,
+ uint16_t *appmask, uint64_t *reftag)
 {
 switch (format) {
 case NVME_COPY_FORMAT_0:
-nvme_copy_source_range_parse_format0(ranges, idx, slba, nlb, apptag,
- appmask, reftag);
+case NVME_COPY_FORMAT_2:
+nvme_copy_source_range_parse_format0_2(ranges, idx, slba, nlb, snsid,
+   apptag, appmask, reftag);
 break;
 
 case NVME_COPY_FORMAT_1:
-nvme_copy_source_range_parse_format1(ranges, idx, slba, nlb, apptag,
- appmask, reftag);
+case NVME_COPY_FORMAT_3:
+nvme_copy_source_range_parse_format1_3(ranges, idx, slba, nlb, snsid,
+   apptag, appmask, reftag);
 break;
 
 default:
@@ -2854,10 +2871,10 @@ static inline uint16_t 
nvme_check_copy_mcl(NvmeNamespace *ns,
 for (int idx = 0; idx < nr; idx++) {
 uint32_t nlb;
 nvme_copy_source_range_parse(iocb->ranges, idx, iocb->format, NULL,
- &nlb, NULL, NULL, NULL);
+ &nlb, NULL, NULL, NUL

Re: [PATCH 1/2] hw: Move declaration of IRQState to header and add init function

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 21:01, BALATON Zoltan  wrote:
>
> To allow embedding a qemu_irq in a struct move its definition to the
> header and add a function to init it in place without allocating it.
>
> Signed-off-by: BALATON Zoltan 

Yes, I think this makes sense, and I'm not quite sure
why we don't support it already (probably just historical
inertia -- the qemu_irq APIs were written before QOM
and when the whole design was oriented around an
"allocate and return a pointer" style rather than
allowing for embedding structs).

-- PMM

Re: [PATCH] virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged

2024-07-01 Thread Eric Auger

Hi Cédric,

On 7/1/24 12:14, Cédric Le Goater wrote:
> When a VFIO device is hoplugged in a VM using virtio-iommu, IOMMUPciBus
> and IOMMUDevice cache entries are created in the .get_address_space()
> handler of the machine IOMMU device. However, these entries are never
> destroyed, not even when the VFIO device is detached from the machine.
> This can lead to an assert if the device is reattached again.
>
> When reattached, the .get_address_space() handler reuses an
> IOMMUDevice entry allocated when the VFIO device was first attached.
> virtio_iommu_set_host_iova_ranges() is called later on from the
> .set_iommu_device() handler an fails with an assert on 'probe_done'
> because the device appears to have been already probed when this is
> not the case.
>
> The IOMMUDevice entry is allocated in pci_device_iommu_address_space()
> called from under vfio_realize(), the VFIO PCI realize handler. Since
> pci_device_unset_iommu_device() is called from vfio_exitfn(), a sub
> function of the PCIDevice unrealize() handler, it seems that the
> .unset_iommu_device() handler is the best place to release resources
> allocated at realize time. Clear the IOMMUDevice cache entry there to
> fix hotplug.
>
> Fixes: 817ef10da23c ("virtio-iommu: Implement set|unset]_iommu_device() 
> callbacks")
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/virtio/virtio-iommu.c | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 
> 72011d2d11ebf1da343b5924f5514ccfe6b2580d..57f53f0fa79cb34bfb75f80bcb9301b523b2a6ab
>  100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -467,6 +467,26 @@ static AddressSpace *virtio_iommu_find_add_as(PCIBus 
> *bus, void *opaque,
>  return &sdev->as;
>  }
>  
> +static void virtio_iommu_device_clear(VirtIOIOMMU *s, PCIBus *bus, int devfn)
> +{
> +IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
> +IOMMUDevice *sdev;
> +
> +if (!sbus) {
> +return;
> +}
> +
> +sdev = sbus->pbdev[devfn];
> +if (!sdev) {
> +return;
> +}
> +
> +g_list_free_full(sdev->resv_regions, g_free);
> +sdev->resv_regions = NULL;
Besides my previous general comments, I think this is a reasonable fix
to get in while implementing something better
Feel free to add my

Reviewed-by: Eric Auger 
Tested-by: Eric Auger 


Thanks

Eric

> +g_free(sdev);
> +sbus->pbdev[devfn] = NULL;
> +}
> +
>  static gboolean hiod_equal(gconstpointer v1, gconstpointer v2)
>  {
>  const struct hiod_key *key1 = v1;
> @@ -708,6 +728,7 @@ virtio_iommu_unset_iommu_device(PCIBus *bus, void 
> *opaque, int devfn)
>  }
>  
>  g_hash_table_remove(viommu->host_iommu_devices, &key);
> +virtio_iommu_device_clear(viommu, bus, devfn);
>  }
>  
>  static const PCIIOMMUOps virtio_iommu_ops = {

Re: [PATCH v2 03/14] hw/s390x: convert 'ccw' machine definitions to use new macros

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This changes the DEFINE_CCW_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

The added benefit is that it avoids the need to repeat the
version number twice in two different formats in the calls
to DEFINE_CCW_MACHINE.

A DEFINE_CCW_MACHINE_AS_LATEST helper is added so that it
is not required to pass 'false' for every single historical
machine type.

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/s390x/s390-virtio-ccw.c | 96 +-
  1 file changed, 53 insertions(+), 43 deletions(-)


Nice!

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 05/14] hw/m68k: convert 'virt' machine definitions to use new macros

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This changes the DEFINE_VIRT_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

A DEFINE_VIRT_MACHINE_AS_LATEST helper is added so that it
is not required to pass 'false' for every single historical
machine type.

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/m68k/virt.c | 51 --
  1 file changed, 29 insertions(+), 22 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 02/14] hw/arm: convert 'virt' machine definitions to use new macros

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This changes the DEFINE_VIRT_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
---
  hw/arm/virt.c | 28 +++-
  1 file changed, 15 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 06/14] hw/i386: convert 'i440fx' machine definitions to use new macros

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This changes the DEFINE_I440FX_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

The added benefit is that it avoids the need to repeat the
version number thrice in three different formats in the calls
to DEFINE_I440FX_MACHINE.

Signed-off-by: Daniel P. Berrangé 
---
  hw/i386/pc_piix.c| 219 +++
  include/hw/i386/pc.h |  26 +
  2 files changed, 122 insertions(+), 123 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/2] hw/isa/vt82c686.c: Embed i8259 irq in device state instead of allocating

2024-07-01 Thread Peter Maydell

On Sat, 29 Jun 2024 at 21:01, BALATON Zoltan  wrote:
>
> To avoid a warning about unfreed qemu_irq embed the i8259 irq in the
> device state instead of allocating it.
>
> Signed-off-by: BALATON Zoltan 
> ---
>  hw/isa/vt82c686.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
> index 8582ac0322..834051abeb 100644
> --- a/hw/isa/vt82c686.c
> +++ b/hw/isa/vt82c686.c
> @@ -592,6 +592,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(ViaISAState, VIA_ISA)
>
>  struct ViaISAState {
>  PCIDevice dev;
> +
> +IRQState i8259_irq;
>  qemu_irq cpu_intr;
>  qemu_irq *isa_irqs_in;
>  uint16_t irq_state[ISA_NUM_IRQS];
> @@ -715,13 +717,12 @@ static void via_isa_realize(PCIDevice *d, Error **errp)
>  ViaISAState *s = VIA_ISA(d);
>  DeviceState *dev = DEVICE(d);
>  PCIBus *pci_bus = pci_get_bus(d);
> -qemu_irq *isa_irq;
>  ISABus *isa_bus;
>  int i;
>
>  qdev_init_gpio_out(dev, &s->cpu_intr, 1);
>  qdev_init_gpio_in_named(dev, via_isa_pirq, "pirq", PCI_NUM_PINS);
> -isa_irq = qemu_allocate_irqs(via_isa_request_i8259_irq, s, 1);
> +qemu_init_irq(&s->i8259_irq, via_isa_request_i8259_irq, s, 0);
>  isa_bus = isa_bus_new(dev, pci_address_space(d), pci_address_space_io(d),
>errp);

So if I understand correctly, this IRQ line isn't visible
from outside this chip, we're just trying to wire together
two internal components of the chip? If so, I agree that
this seems a better way than creating a named GPIO that
we then have to document as a "not really an external
connection, don't try to use this" line. (We've done that
before I think in other devices, and it works but it's
a bit odd-looking.)

That said, I do notice that the via_isa_request_i8259_irq()
function doesn't do anything except pass the level onto
another qemu_irq, so I think the theoretical ideal would be
if we could arrange to plumb things directly through rather
than needing this extra qemu_irq and function. There's
probably a reason (order of device creation/connection?)
that doesn't work though.

-- PMM

Re: [PATCH v2 07/14] hw/i386: convert 'q35' machine definitions to use new macros

2024-07-01 Thread Philippe Mathieu-Daudé


On 20/6/24 18:57, Daniel P. Berrangé wrote:

This changes the DEFINE_Q35_MACHINE macro to use the common
helpers for constructing versioned symbol names and strings,
bringing greater consistency across targets.

The added benefit is that it avoids the need to repeat the
version number thrice in three different formats in the calls
to DEFINE_Q35_MACHINE.

Due to the odd-ball '4.0.1' machine type version, this
commit introduces a DEFINE_Q35_BUGFIX helper, to allow
defining of "bugfix" machine types which have a three
digit version.

Signed-off-by: Daniel P. Berrangé 
---
  hw/i386/pc_q35.c | 215 ---
  1 file changed, 90 insertions(+), 125 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] scsi: Don't ignore most usb-storage properties

2024-07-01 Thread Fiona Ebner

Hi,

we got a user report about bootindex for an 'usb-storage' device not
working anymore [0] and I reproduced it and bisected it to this patch.

Am 31.01.24 um 14:06 schrieb Kevin Wolf:
> @@ -399,11 +397,10 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
> BlockBackend *blk,
>  object_property_add_child(OBJECT(bus), name, OBJECT(dev));
>  g_free(name);
>  
> +s = SCSI_DEVICE(dev);
> +s->conf = *conf;
> +
>  qdev_prop_set_uint32(dev, "scsi-id", unit);
> -if (bootindex >= 0) {
> -object_property_set_int(OBJECT(dev), "bootindex", bootindex,
> -&error_abort);
> -}

The fact that this is not called anymore means that the 'set' method for
the property is also not called. Here, that method is
device_set_bootindex() (as configured by scsi_dev_instance_init() ->
device_add_bootindex_property()). Therefore, the device is never
registered via add_boot_device_path() meaning that the bootindex
property does not have the desired effect anymore.

Is it necessary to keep the object_property_set_{bool,int} and
qdev_prop_set_enum calls around for these potential side effects? Would
it even be necessary to introduce new similar calls for the newly
supported properties? Or is there an easy alternative to
s->conf = *conf;
that does trigger the side effects?

>  if (object_property_find(OBJECT(dev), "removable")) {
>  qdev_prop_set_bit(dev, "removable", removable);
>  }
> @@ -414,19 +411,12 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
> BlockBackend *blk,
>  object_unparent(OBJECT(dev));
>  return NULL;
>  }
> -if (!object_property_set_bool(OBJECT(dev), "share-rw", share_rw, errp)) {
> -object_unparent(OBJECT(dev));
> -return NULL;
> -}
> -
> -qdev_prop_set_enum(dev, "rerror", rerror);
> -qdev_prop_set_enum(dev, "werror", werror);
>  
>  if (!qdev_realize_and_unref(dev, &bus->qbus, errp)) {
>  object_unparent(OBJECT(dev));
>  return NULL;
>  }
> -return SCSI_DEVICE(dev);
> +return s;
>  }
>  
>  void scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
[0]: https://forum.proxmox.com/threads/149772/post-679433

Best Regards,
Fiona

Re: [PATCH v8 2/8] xen: mapcache: Unmap first entries in buckets

2024-07-01 Thread Anthony PERARD

Hi all,

Following this commit, a test which install Debian in a guest with OVMF
as firmware started to fail. QEMU exit with an error when GRUB is
running on the freshly installed Debian (I don't know if GRUB is
starting Linux or not).
The error is:
Bad ram offset 

Some logs:
http://logs.test-lab.xenproject.org/osstest/logs/186611/test-amd64-amd64-xl-qemuu-ovmf-amd64/info.html

Any idea? Something is trying to do something with the address "-1" when
it shouldn't?

Cheers,

Anthony

On Wed, May 29, 2024 at 04:07:33PM +0200, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
> 
> When invalidating memory ranges, if we happen to hit the first
> entry in a bucket we were never unmapping it. This was harmless
> for foreign mappings but now that we're looking to reuse the
> mapcache for transient grant mappings, we must unmap entries
> when invalidated.
> 
> Signed-off-by: Edgar E. Iglesias 
> Reviewed-by: Stefano Stabellini 
> ---
>  hw/xen/xen-mapcache.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
> index bc860f4373..ec95445696 100644
> --- a/hw/xen/xen-mapcache.c
> +++ b/hw/xen/xen-mapcache.c
> @@ -491,18 +491,23 @@ static void 
> xen_invalidate_map_cache_entry_unlocked(MapCache *mc,
>  return;
>  }
>  entry->lock--;
> -if (entry->lock > 0 || pentry == NULL) {
> +if (entry->lock > 0) {
>  return;
>  }
>  
> -pentry->next = entry->next;
>  ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>  perror("unmap fails");
>  exit(-1);
>  }
> +
>  g_free(entry->valid_mapping);
> -g_free(entry);
> +if (pentry) {
> +pentry->next = entry->next;
> +g_free(entry);
> +} else {
> +memset(entry, 0, sizeof *entry);
> +}
>  }
>  
>  typedef struct XenMapCacheData {
> -- 
> 2.40.1
> 
> 
-- 

Anthony Perard | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

1 2 3 4 >

1 - 100 of 344 matches

Mail list logo