date:20220202

Re: [PULL 00/10] Block layer patches

2022-02-02 Thread Peter Maydell

On Tue, 1 Feb 2022 at 15:21, Kevin Wolf  wrote:
>
> The following changes since commit 804b30d25f8d70dc2dea951883ea92235274a50c:
>
>   Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220130' into 
> staging (2022-01-31 11:10:08 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/kmwolf/qemu.git tags/for-upstream
>
> for you to fetch changes up to fc176116cdea816ceb8dd969080b2b95f58edbc0:
>
>   block/rbd: workaround for ceph issue #53784 (2022-02-01 15:16:32 +0100)
>
> 
> Block layer patches
>
> - rbd: fix handling of holes in .bdrv_co_block_status
> - Fix potential crash in bdrv_set_backing_hd()
> - vhost-user-blk export: Fix shutdown with requests in flight
> - FUSE export: Fix build failure on FreeBSD
> - Documentation improvements
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0
for any user-visible changes.

-- PMM

Re: [PATCH 1/2] migration/rdma: Increase the backlog from 5 to 128

2022-02-02 Thread Dr. David Alan Gilbert

* Pankaj Gupta (pankaj.gu...@ionos.com) wrote:
> > > > >  migration/rdma.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/migration/rdma.c b/migration/rdma.c
> > > > > index c7c7a384875b..2e223170d06d 100644
> > > > > --- a/migration/rdma.c
> > > > > +++ b/migration/rdma.c
> > > > > @@ -4238,7 +4238,7 @@ void rdma_start_incoming_migration(const char 
> > > > > *host_port, Error **errp)
> > > > >
> > > > >  trace_rdma_start_incoming_migration_after_dest_init();
> > > > >
> > > > > -ret = rdma_listen(rdma->listen_id, 5);
> > > > > +ret = rdma_listen(rdma->listen_id, 128);
> > > >
> > > > 128 backlog seems too much to me. Any reason for choosing this number.
> > > > Any rationale to choose this number?
> > > >
> > > 128 is the default value of SOMAXCONN, I can use that if it is preferred.
> >
> > AFAICS backlog is only applicable with RDMA iWARP CM mode. Maybe we
> > can increase it to 128.these many
> 
> Or maybe we first increase it to 20 or 32? or so to avoid memory
> overhead if we are not
> using these many connections at the same time.

Can you explain why you're requiring more than 1?  Is this with multifd
patches?

Dave

> > Maybe you can also share any testing data for multiple concurrent live
> > migrations using RDMA, please.
> >
> > Thanks,
> > Pankaj
> >
> > Thanks,
> > Pankaj
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Stefan Hajnoczi

On Tue, Feb 01, 2022 at 10:34:32PM -0700, Alex Williamson wrote:
> On Wed, 2 Feb 2022 01:13:22 +
> Jag Raman  wrote:
> 
> > > On Feb 1, 2022, at 5:47 PM, Alex Williamson  
> > > wrote:
> > > 
> > > On Tue, 1 Feb 2022 21:24:08 +
> > > Jag Raman  wrote:
> > >   
> > >>> On Feb 1, 2022, at 10:24 AM, Alex Williamson 
> > >>>  wrote:
> > >>> 
> > >>> On Tue, 1 Feb 2022 09:30:35 +
> > >>> Stefan Hajnoczi  wrote:
> > >>>   
> >  On Mon, Jan 31, 2022 at 09:16:23AM -0700, Alex Williamson wrote:
> > > On Fri, 28 Jan 2022 09:18:08 +
> > > Stefan Hajnoczi  wrote:
> > >   
> > >> On Thu, Jan 27, 2022 at 02:22:53PM -0700, Alex Williamson wrote: 
> > >>  
> > >>> If the goal here is to restrict DMA between devices, ie. 
> > >>> peer-to-peer
> > >>> (p2p), why are we trying to re-invent what an IOMMU already does?   
> > >>>  
> > >> 
> > >> The issue Dave raised is that vfio-user servers run in separate
> > >> processses from QEMU with shared memory access to RAM but no direct
> > >> access to non-RAM MemoryRegions. The virtiofs DAX Window BAR is one
> > >> example of a non-RAM MemoryRegion that can be the source/target of 
> > >> DMA
> > >> requests.
> > >> 
> > >> I don't think IOMMUs solve this problem but luckily the vfio-user
> > >> protocol already has messages that vfio-user servers can use as a
> > >> fallback when DMA cannot be completed through the shared memory RAM
> > >> accesses.
> > >>   
> > >>> In
> > >>> fact, it seems like an IOMMU does this better in providing an IOVA
> > >>> address space per BDF.  Is the dynamic mapping overhead too much?  
> > >>> What
> > >>> physical hardware properties or specifications could we leverage to
> > >>> restrict p2p mappings to a device?  Should it be governed by machine
> > >>> type to provide consistency between devices?  Should each "isolated"
> > >>> bus be in a separate root complex?  Thanks,
> > >> 
> > >> There is a separate issue in this patch series regarding isolating 
> > >> the
> > >> address space where BAR accesses are made (i.e. the global
> > >> address_space_memory/io). When one process hosts multiple vfio-user
> > >> server instances (e.g. a software-defined network switch with 
> > >> multiple
> > >> ethernet devices) then each instance needs isolated memory and io 
> > >> address
> > >> spaces so that vfio-user clients don't cause collisions when they map
> > >> BARs to the same address.
> > >> 
> > >> I think the the separate root complex idea is a good solution. This
> > >> patch series takes a different approach by adding the concept of
> > >> isolated address spaces into hw/pci/.  
> > > 
> > > This all still seems pretty sketchy, BARs cannot overlap within the
> > > same vCPU address space, perhaps with the exception of when they're
> > > being sized, but DMA should be disabled during sizing.
> > > 
> > > Devices within the same VM context with identical BARs would need to
> > > operate in different address spaces.  For example a translation offset
> > > in the vCPU address space would allow unique addressing to the 
> > > devices,
> > > perhaps using the translation offset bits to address a root complex 
> > > and
> > > masking those bits for downstream transactions.
> > > 
> > > In general, the device simply operates in an address space, ie. an
> > > IOVA.  When a mapping is made within that address space, we perform a
> > > translation as necessary to generate a guest physical address.  The
> > > IOVA itself is only meaningful within the context of the address 
> > > space,
> > > there is no requirement or expectation for it to be globally unique.
> > > 
> > > If the vfio-user server is making some sort of requirement that IOVAs
> > > are unique across all devices, that seems very, very wrong.  Thanks,  
> > > 
> >  
> >  Yes, BARs and IOVAs don't need to be unique across all devices.
> >  
> >  The issue is that there can be as many guest physical address spaces as
> >  there are vfio-user clients connected, so per-client isolated address
> >  spaces are required. This patch series has a solution to that problem
> >  with the new pci_isol_as_mem/io() API.
> > >>> 
> > >>> Sorry, this still doesn't follow for me.  A server that hosts multiple
> > >>> devices across many VMs (I'm not sure if you're referring to the device
> > >>> or the VM as a client) needs to deal with different address spaces per
> > >>> device.  The server needs to be able to uniquely identify every DMA,
> > >>> which must be part of the interface protocol.  But I don't see how that
> > >>> imposes a requirement of an isolated address space.  If we want the
> > >>> device isolated because we don't trust the server, that's whe

Re: [PATCH v4 00/12] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-02-02 Thread Steven Price

Hi Jun,

On 02/02/2022 02:28, Nakajima, Jun wrote:
> 
>> On Jan 28, 2022, at 8:47 AM, Steven Price  wrote:
>>
>> On 18/01/2022 13:21, Chao Peng wrote:
>>> This is the v4 of this series which try to implement the fd-based KVM
>>> guest private memory. The patches are based on latest kvm/queue branch
>>> commit:
>>>
>>>  fea31d169094 KVM: x86/pmu: Fix available_event_types check for
>>>   REF_CPU_CYCLES event
>>>
>>> Introduction
>>> 
>>> In general this patch series introduce fd-based memslot which provides
>>> guest memory through memory file descriptor fd[offset,size] instead of
>>> hva/size. The fd can be created from a supported memory filesystem
>>> like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
>>> and the the memory backing store exchange callbacks when such memslot
>>> gets created. At runtime KVM will call into callbacks provided by the
>>> backing store to get the pfn with the fd+offset. Memory backing store
>>> will also call into KVM callbacks when userspace fallocate/punch hole
>>> on the fd to notify KVM to map/unmap secondary MMU page tables.
>>>
>>> Comparing to existing hva-based memslot, this new type of memslot allows
>>> guest memory unmapped from host userspace like QEMU and even the kernel
>>> itself, therefore reduce attack surface and prevent bugs.
>>>
>>> Based on this fd-based memslot, we can build guest private memory that
>>> is going to be used in confidential computing environments such as Intel
>>> TDX and AMD SEV. When supported, the memory backing store can provide
>>> more enforcement on the fd and KVM can use a single memslot to hold both
>>> the private and shared part of the guest memory. 
>>
>> This looks like it will be useful for Arm's Confidential Compute
>> Architecture (CCA) too - in particular we need a way of ensuring that
>> user space cannot 'trick' the kernel into accessing memory which has
>> been delegated to a realm (i.e. protected guest), and a memfd seems like
>> a good match.
> 
> Good to hear that it will be useful for ARM’s CCA as well.
> 
>>
>> Some comments below.
>>
>>> mm extension
>>> -
>>> Introduces new F_SEAL_INACCESSIBLE for shmem and new MFD_INACCESSIBLE
>>> flag for memfd_create(), the file created with these flags cannot read(),
>>> write() or mmap() etc via normal MMU operations. The file content can
>>> only be used with the newly introduced memfile_notifier extension.
>>
>> For Arm CCA we are expecting to seed the realm with an initial memory
>> contents (e.g. kernel and initrd) which will then be measured before
>> execution starts. The 'obvious' way of doing this with a memfd would be
>> to populate parts of the memfd then seal it with F_SEAL_INACCESSIBLE.
> 
> As far as I understand, we have the same problem with TDX, where a guest TD 
> (Trust Domain) starts in private memory. We seed the private memory typically 
> with a guest firmware, and the initial image (plaintext) is copied to 
> somewhere in QEMU memory (from disk, for example) for that purpose; this 
> location is not associated with the target GPA.
> 
> Upon a (new) ioctl from QEMU, KVM requests the TDX Module to copy the pages 
> to private memory (by encrypting) specifying the target GPA, using a TDX 
> interface function (TDH.MEM.PAGE.ADD). The actual pages for the private 
> memory is allocated by the callbacks provided by the backing store during the 
> “copy” operation.
> 
> We extended the existing KVM_MEMORY_ENCRYPT_OP (ioctl) for the above. 

Ok, so if I understand correctly QEMU would do something along the lines of:

1. Use memfd_create(...MFD_INACCESSIBLE) to allocate private memory for
the guest.

2. ftruncate/fallocate the memfd to back the appropriate areas of the memfd.

3. Create a memslot in KVM pointing to the memfd

4. Load the 'guest firmware' (kernel/initrd or similar) into VMM memory

5. Use the KVM_MEMORY_ENCRYPT_OP to request the 'guest firmware' be
copied into the private memory. The ioctl would temporarily pin the
pages and ask the TDX module to copy (& encrypt) the data into the
private memory, unpinning after the copy.

6. QEMU can then free the unencrypted copy of the guest firmware.

>>
>> However as things stand it's not possible to set the INACCESSIBLE seal
>> after creating a memfd (F_ALL_SEALS hasn't been updated to include it).
>>
>> One potential workaround would be for arm64 to provide a custom KVM
>> ioctl to effectively memcpy() into the guest's protected memory which
>> would only be accessible before the guest has started. The drawback is
>> that it requires two copies of the data during guest setup.
> 
> So, the guest pages are not encrypted in the realm?

The pages are likely to be encrypted, but architecturally it doesn't
matter - the hardware prevents the 'Normal World' accessing the pages
when they are assigned to the realm. Encryption is only necessary to
protect against hardware attacks (e.g. bus snooping).

> I think you could do the same thing, i.e. KVM cop

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Peter Maydell

On Tue, 1 Feb 2022 at 23:51, Alex Williamson  wrote:
>
> On Tue, 1 Feb 2022 21:24:08 +
> Jag Raman  wrote:
> > The PCIBus data structure already has address_space_mem and
> > address_space_io to contain the BAR regions of devices attached
> > to it. I understand that these two PCIBus members form the
> > PCI address space.
>
> These are the CPU address spaces.  When there's no IOMMU, the PCI bus is
> identity mapped to the CPU address space.  When there is an IOMMU, the
> device address space is determined by the granularity of the IOMMU and
> may be entirely separate from address_space_mem.

Note that those fields in PCIBus are just whatever MemoryRegions
the pci controller model passed in to the call to pci_root_bus_init()
or equivalent. They may or may not be specifically the CPU's view
of anything. (For instance on the versatilepb board, the PCI controller
is visible to the CPU via several MMIO "windows" at known addresses,
which let the CPU access into the PCI address space at a programmable
offset. We model that by creating a couple of container MRs which
we pass to pci_root_bus_init() to be the PCI memory and IO spaces,
and then using alias MRs to provide the view into those at the
guest-programmed offset. The CPU sees those windows, and doesn't
have direct access to the whole PCIBus::address_space_mem.)
I guess you could say they're the PCI controller's view of the PCI
address space ?

We have a tendency to be a bit sloppy with use of AddressSpaces
within QEMU where it happens that the view of the world that a
DMA-capable device matches that of the CPU, but conceptually
they can definitely be different, especially in the non-x86 world.
(Linux also confuses matters here by preferring to program a 1:1
mapping even if the hardware is more flexible and can do other things.
The model of the h/w in QEMU should support the other cases too, not
just 1:1.)

> I/O port space is always the identity mapped CPU address space unless
> sparse translations are used to create multiple I/O port spaces (not
> implemented).  I/O port space is only accessed by the CPU, there are no
> device initiated I/O port transactions, so the address space relative
> to the device is irrelevant.

Does the PCI spec actually forbid any master except the CPU from
issuing I/O port transactions, or is it just that in practice nobody
makes a PCI device that does weird stuff like that ?

thanks
-- PMM

Re: [PATCH 1/2] migration/rdma: Increase the backlog from 5 to 128

2022-02-02 Thread Jinpu Wang

On Wed, Feb 2, 2022 at 10:20 AM Dr. David Alan Gilbert
 wrote:
>
> * Pankaj Gupta (pankaj.gu...@ionos.com) wrote:
> > > > > >  migration/rdma.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/migration/rdma.c b/migration/rdma.c
> > > > > > index c7c7a384875b..2e223170d06d 100644
> > > > > > --- a/migration/rdma.c
> > > > > > +++ b/migration/rdma.c
> > > > > > @@ -4238,7 +4238,7 @@ void rdma_start_incoming_migration(const char 
> > > > > > *host_port, Error **errp)
> > > > > >
> > > > > >  trace_rdma_start_incoming_migration_after_dest_init();
> > > > > >
> > > > > > -ret = rdma_listen(rdma->listen_id, 5);
> > > > > > +ret = rdma_listen(rdma->listen_id, 128);
> > > > >
> > > > > 128 backlog seems too much to me. Any reason for choosing this number.
> > > > > Any rationale to choose this number?
> > > > >
> > > > 128 is the default value of SOMAXCONN, I can use that if it is 
> > > > preferred.
> > >
> > > AFAICS backlog is only applicable with RDMA iWARP CM mode. Maybe we
> > > can increase it to 128.these many
> >
> > Or maybe we first increase it to 20 or 32? or so to avoid memory
> > overhead if we are not
> > using these many connections at the same time.
>
> Can you explain why you're requiring more than 1?  Is this with multifd
> patches?

no, I'm not using multifs patches, just code reading, I feel 5 is too
small for the backlog setting.

As Pankaj rightly mentioned, in RDMA, only iWARP CM will take some
effect, it does nothing for InfiniBand
and RoCE.

Please ignore this patch, we can revisit this when we introduce
multifid with RDMA.

Thanks!
Jinpu Wang
>
> Dave
>
> > > Maybe you can also share any testing data for multiple concurrent live
> > > migrations using RDMA, please.
> > >
> > > Thanks,
> > > Pankaj
> > >
> > > Thanks,
> > > Pankaj
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Michael S. Tsirkin

On Wed, Feb 02, 2022 at 09:30:42AM +, Peter Maydell wrote:
> > I/O port space is always the identity mapped CPU address space unless
> > sparse translations are used to create multiple I/O port spaces (not
> > implemented).  I/O port space is only accessed by the CPU, there are no
> > device initiated I/O port transactions, so the address space relative
> > to the device is irrelevant.
> 
> Does the PCI spec actually forbid any master except the CPU from
> issuing I/O port transactions, or is it just that in practice nobody
> makes a PCI device that does weird stuff like that ?
> 
> thanks
> -- PMM

Hmm, the only thing vaguely related in the spec that I know of is this:

PCI Express supports I/O Space for compatibility with legacy devices 
which require their use.
Future revisions of this specification may deprecate the use of I/O 
Space.

Alex, what did you refer to?

-- 
MST

Re: [PATCH 2/2] migration/rdma: set the REUSEADDR option for destination

2022-02-02 Thread Dr. David Alan Gilbert

* Jack Wang (jinpu.w...@ionos.com) wrote:
> This allow address could be reused to avoid rdma_bind_addr error
> out.

In what case do you get the error - after a failed migrate and then a
retry?

Dave

> Signed-off-by: Jack Wang 
> ---
>  migration/rdma.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2e223170d06d..b498ef013c77 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
> **errp)
>  char ip[40] = "unknown";
>  struct rdma_addrinfo *res, *e;
>  char port_str[16];
> +int reuse = 1;
>  
>  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
>  rdma->wr_data[idx].control_len = 0;
> @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> Error **errp)
>  goto err_dest_init_bind_addr;
>  }
>  
> +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> RDMA_OPTION_ID_REUSEADDR,
> +   &reuse, sizeof reuse);
> +if (ret) {
> +ERROR(errp, "Error: could not set REUSEADDR option");
> +goto err_dest_init_bind_addr;
> +}
>  for (e = res; e != NULL; e = e->ai_next) {
>  inet_ntop(e->ai_family,
>  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> ip);
> -- 
> 2.25.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

arm: singlestep bug

2022-02-02 Thread Andrew Jones

Hello TCG developers,

We have new debug test cases in kvm-unit-tests thanks to Ricardo Koller.
The singlestep test cases are failing with TCG. Enabling TCG debug outputs
the error

  TCG hflags mismatch (current:(0x04a1,0x4000) 
rebuilt:(0x04a3,0x4000)

I noticed that the test passed on an older QEMU, so I bisected it and
found commit e979972a6a17 ("target/arm: Rely on hflags correct in
cpu_get_tb_cpu_state"), which unfortunately doesn't tell us anything
that the above error message didn't say already (apparently we can't
currently depend on hflags being correct wrt singlestep at this point).

Thanks,
drew

[PATCH v2] hw/rx: rx-gdbsim DTB load address aligned of 16byte.

2022-02-02 Thread Yoshinori Sato

Linux kernel required alined address of DTB.
But missing align in dtb load function.
Fixed to load to the correct address.

v2 changes.
Use ROUND_DOWN macro.

Signed-off-by: Yoshinori Sato 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/rx/rx-gdbsim.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
index 75d1fec6ca..2356af83a0 100644
--- a/hw/rx/rx-gdbsim.c
+++ b/hw/rx/rx-gdbsim.c
@@ -142,7 +142,7 @@ static void rx_gdbsim_init(MachineState *machine)
 exit(1);
 }
 /* DTB is located at the end of SDRAM space. */
-dtb_offset = machine->ram_size - dtb_size;
+dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16 - 1);
 rom_add_blob_fixed("dtb", dtb, dtb_size,
SDRAM_BASE + dtb_offset);
 /* Set dtb address to R1 */
-- 
2.30.2

Re: [PATCH 2/2] migration/rdma: set the REUSEADDR option for destination

2022-02-02 Thread Jinpu Wang

On Wed, Feb 2, 2022 at 11:15 AM Dr. David Alan Gilbert
 wrote:
>
> * Jack Wang (jinpu.w...@ionos.com) wrote:
> > This allow address could be reused to avoid rdma_bind_addr error
> > out.
>
> In what case do you get the error - after a failed migrate and then a
> retry?

Yes, what I saw is in case of error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr

Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.

with this patch, I don't see the error anymore.
>
> Dave
Thanks!
>
> > Signed-off-by: Jack Wang 
> > ---
> >  migration/rdma.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index 2e223170d06d..b498ef013c77 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> > Error **errp)
> >  char ip[40] = "unknown";
> >  struct rdma_addrinfo *res, *e;
> >  char port_str[16];
> > +int reuse = 1;
> >
> >  for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
> >  rdma->wr_data[idx].control_len = 0;
> > @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, 
> > Error **errp)
> >  goto err_dest_init_bind_addr;
> >  }
> >
> > +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, 
> > RDMA_OPTION_ID_REUSEADDR,
> > +   &reuse, sizeof reuse);
> > +if (ret) {
> > +ERROR(errp, "Error: could not set REUSEADDR option");
> > +goto err_dest_init_bind_addr;
> > +}
> >  for (e = res; e != NULL; e = e->ai_next) {
> >  inet_ntop(e->ai_family,
> >  &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof 
> > ip);
> > --
> > 2.25.1
> >
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>

[PATCH] hw/arm/smmuv3: Fix device reset

2022-02-02 Thread Eric Auger

We currently miss a bunch of register resets in the device reset
function. This sometimes prevents the guest from rebooting after
a system_reset (with virtio-blk-pci). For instance, we may get
the following errors:

invalid STE
smmuv3-iommu-memory-region-0-0 translation failed for 
iova=0x13a9d2000(SMMU_EVT_C_BAD_STE)
Invalid read at addr 0x13A9D2000, size 2, region '(null)', reason: rejected
invalid STE
smmuv3-iommu-memory-region-0-0 translation failed for 
iova=0x13a9d2000(SMMU_EVT_C_BAD_STE)
Invalid write at addr 0x13A9D2000, size 2, region '(null)', reason: rejected
invalid STE

Signed-off-by: Eric Auger 
Fixes: 10a83cb988 ("hw/arm/smmuv3: Skeleton")
---
 hw/arm/smmuv3.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 3b43368be0f..674623aabea 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -278,6 +278,12 @@ static void smmuv3_init_regs(SMMUv3State *s)
 s->features = 0;
 s->sid_split = 0;
 s->aidr = 0x1;
+s->cr[0] = 0;
+s->cr0ack = 0;
+s->irq_ctrl = 0;
+s->gerror = 0;
+s->gerrorn = 0;
+s->statusr = 0;
 }
 
 static int smmu_get_ste(SMMUv3State *s, dma_addr_t addr, STE *buf,
-- 
2.26.3

Re: [PATCH v2 3/3] migration: Perform vmsd structure check during tests

2022-02-02 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  wrote:
> > * Juan Quintela (quint...@redhat.com) wrote:
> >> "Dr. David Alan Gilbert (git)"  wrote:
> >> > From: "Dr. David Alan Gilbert" 
> >> >
> >> > Perform a check on vmsd structures during test runs in the hope
> >> > of catching any missing terminators and other simple screwups.
> >> >
> >> > Signed-off-by: Dr. David Alan Gilbert 
> >> 
> >> Reviewed-by: Juan Quintela 
> >> 
> >> queued.
> >
> > Careful; I think that'll break with slirp until libslirp gets updated
> > first.
> 
> As expected, it broke it.
> 
> I resend the PULL request wihtout that two patches.
> 
> Once that we are here, how it is that make check didn't catch this?

Because in my local world I did the changes to libslirp; I wanted to
make sure qemu people were happy with the changes before proposing them
to libslirp.

Which I've just done:

https://gitlab.freedesktop.org/slirp/libslirp/-/merge_requests/112

Dave

> Later, Juan.
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PULL 18/20] block/nbd: drop connection_co

2022-02-02 Thread Fabian Ebner

Am 27.09.21 um 23:55 schrieb Eric Blake:
> From: Vladimir Sementsov-Ogievskiy 
> 
> OK, that's a big rewrite of the logic.
> 
> Pre-patch we have an always running coroutine - connection_co. It does
> reply receiving and reconnecting. And it leads to a lot of difficult
> and unobvious code around drained sections and context switch. We also
> abuse bs->in_flight counter which is increased for connection_co and
> temporary decreased in points where we want to allow drained section to
> begin. One of these place is in another file: in nbd_read_eof() in
> nbd/client.c.
> 
> We also cancel reconnect and requests waiting for reconnect on drained
> begin which is not correct. And this patch fixes that.
> 
> Let's finally drop this always running coroutine and go another way:
> do both reconnect and receiving in request coroutines.
>

Hi,

while updating our stack to 6.2, one of our live-migration tests stopped
working (backtrace is below) and bisecting led me to this patch.

The VM has a single qcow2 disk (converting to raw doesn't make a
difference) and the issue only appears when using iothread (for both
virtio-scsi-pci and virtio-block-pci).

Reverting 1af7737871fb3b66036f5e520acb0a98fc2605f7 (which lives on top)
and 4ddb5d2fde6f22b2cf65f314107e890a7ca14fcf (the commit corresponding
to this patch) in v6.2.0 makes the migration work again.

Backtrace:

Thread 1 (Thread 0x7f9d93458fc0 (LWP 56711) "kvm"):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f9d9d6bc537 in __GI_abort () at abort.c:79
#2  0x7f9d9d6bc40f in __assert_fail_base (fmt=0x7f9d9d825128
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5579153763f8
"qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)",
file=0x5579153764f9 "../io/channel.c", line=483, function=) at assert.c:92
#3  0x7f9d9d6cb662 in __GI___assert_fail
(assertion=assertion@entry=0x5579153763f8
"qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)",
file=file@entry=0x5579153764f9 "../io/channel.c", line=line@entry=483,
function=function@entry=0x557915376570 <__PRETTY_FUNCTION__.2>
"qio_channel_restart_read") at assert.c:101
#4  0x5579150c351c in qio_channel_restart_read (opaque=) at ../io/channel.c:483
#5  qio_channel_restart_read (opaque=) at ../io/channel.c:477
#6  0x55791520182a in aio_dispatch_handler
(ctx=ctx@entry=0x557916908c60, node=0x7f9d8400f800) at
../util/aio-posix.c:329
#7  0x557915201f62 in aio_dispatch_handlers (ctx=0x557916908c60) at
../util/aio-posix.c:372
#8  aio_dispatch (ctx=0x557916908c60) at ../util/aio-posix.c:382
#9  0x5579151ea74e in aio_ctx_dispatch (source=,
callback=, user_data=) at ../util/async.c:311
#10 0x7f9d9e647e6b in g_main_context_dispatch () from
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x557915203030 in glib_pollfds_poll () at ../util/main-loop.c:232
#12 os_host_main_loop_wait (timeout=992816) at ../util/main-loop.c:255
#13 main_loop_wait (nonblocking=nonblocking@entry=0) at
../util/main-loop.c:531
#14 0x5579150539c1 in qemu_main_loop () at ../softmmu/runstate.c:726
#15 0x557914ce8ebe in main (argc=, argv=, envp=) at ../softmmu/main.c:50

[RFC PATCH] arm: force flag recalculation when messing with DAIF

2022-02-02 Thread Alex Bennée

The recently introduced debug tests in kvm-unit-tests exposed an error
in our handling of singlestep cause by stale hflags. This is caught by
--enable-debug-tcg when running the tests.

Signed-off-by: Alex Bennée 
Cc: Richard Henderson 
Cc: Andrew Jones 
---
 target/arm/helper-a64.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index d6a6fd73d9..7cf953b1e6 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -83,12 +83,14 @@ void HELPER(msr_i_daifset)(CPUARMState *env, uint32_t imm)
 {
 daif_check(env, 0x1e, imm, GETPC());
 env->daif |= (imm << 6) & PSTATE_DAIF;
+arm_rebuild_hflags(env);
 }
 
 void HELPER(msr_i_daifclear)(CPUARMState *env, uint32_t imm)
 {
 daif_check(env, 0x1f, imm, GETPC());
 env->daif &= ~((imm << 6) & PSTATE_DAIF);
+arm_rebuild_hflags(env);
 }
 
 /* Convert a softfloat float_relation_ (as returned by
-- 
2.30.2

Re: arm: singlestep bug

2022-02-02 Thread Andrew Jones

On Wed, Feb 02, 2022 at 11:16:46AM +, Alex Bennée wrote:
...
> Side note:
> 
>   ad5fb8830150071487025b3594a7b1bf218d12d8 is the first bad commit
>   commit ad5fb8830150071487025b3594a7b1bf218d12d8
>   Author: Zixuan Wang 
>   Date:   Mon Oct 4 13:49:19 2021 -0700
> 
> breaks the running on kvm-unit-test for me, I needed to patch:
> 
> --8<---cut here---start->8---
> modified   run_tests.sh
> @@ -31,7 +31,8 @@ specify the appropriate qemu binary for ARCH-run.
>  EOF
>  }
>  
> -RUNTIME_arch_run="./$TEST_SUBDIR/run"
> +RUNTIME_arch_run="./$TEST_DIR/run"
> +#RUNTIME_arch_run="./$TEST_SUBDIR/run"
>  source scripts/runtime.bash
>  
>  # require enhanced getopt
> --8<---cut here---end--->8---
>

You need to rerun ./configure to get a new config.mak file.

Thanks,
drew

Re: arm: singlestep bug

2022-02-02 Thread Alex Bennée

Andrew Jones  writes:

> Hello TCG developers,
>
> We have new debug test cases in kvm-unit-tests thanks to Ricardo
> Koller.

Yay tests ;-)

> The singlestep test cases are failing with TCG. Enabling TCG debug outputs
> the error
>
>   TCG hflags mismatch (current:(0x04a1,0x4000) 
> rebuilt:(0x04a3,0x4000)

This shows that:

  FIELD(TBFLAG_ANY, SS_ACTIVE, 1, 1)

should be set but wasn't cached.

> I noticed that the test passed on an older QEMU, so I bisected it and
> found commit e979972a6a17 ("target/arm: Rely on hflags correct in
> cpu_get_tb_cpu_state"), which unfortunately doesn't tell us anything
> that the above error message didn't say already (apparently we can't
> currently depend on hflags being correct wrt singlestep at this
> point).

Fortunately this is intended - the enable-debug always recalculates the
hflags (and expensive operation) and makes it pretty easy to spot where
we failed to call arm_rebuild_hflags(). You can do this with the normal
debug tools or my new favourite tool (for short programs) using the
execlog plugin.

  0, 0x40080a24, 0x52840020, "movz w0, #0x2001"
  0, 0x40080a28, 0x2a01, "orr w0, w0, w1"
  0, 0x40080a2c, 0xd5100240, "msr mdscr_el1, x0"
  0, 0x40080a30, 0xd5033fdf, "isb "
  0, 0x40080a34, 0x350001f4, "cbnz w20, #0x40080a70"
  TCG hflags mismatch (current:(0x04a1,0x4000) 
rebuilt:(0x04a3,0x4000)

This is a touch weird though because any msr write does trigger a
rebuild of the flags. See handle_sys():

if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
/*
 * A write to any coprocessor regiser that ends a TB
 * must rebuild the hflags for the next TB.
 */
TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
tcg_temp_free_i32(tcg_el);
/*
 * We default to ending the TB on a coprocessor register write,
 * but allow this to be suppressed by the register definition
 * (usually only necessary to work around guest bugs).
 */
s->base.is_jmp = DISAS_UPDATE_EXIT;
}

And indeed in rr I can see it working though the tail end of
helper_rebuild_flags_a64() but it seems arm_singlestep_active() returns
false at this point. This ultimately fails at
aa64_generate_debug_exceptions():

/*
 * Same EL to same EL debug exceptions need MDSCR_KDE enabled
 * while not masking the (D)ebug bit in DAIF.
 */
debug_el = arm_debug_target_el(env);

if (cur_el == debug_el) {
return extract32(env->cp15.mdscr_el1, 13, 1)
&& !(env->daif & PSTATE_D);
}

And if I look at the objdump it is indeed the instruction we never
completed:

 a34:   350001f4cbnzw20, a70 
 a38:   d50348ffmsr daifclr, #0x8

So if I force the flag generation on manipulating daif:

--8<---cut here---start->8---
modified   target/arm/helper-a64.c
@@ -83,12 +83,14 @@ void HELPER(msr_i_daifset)(CPUARMState *env, uint32_t imm)
 {
 daif_check(env, 0x1e, imm, GETPC());
 env->daif |= (imm << 6) & PSTATE_DAIF;
+arm_rebuild_hflags(env);
 }

 void HELPER(msr_i_daifclear)(CPUARMState *env, uint32_t imm)
 {
 daif_check(env, 0x1f, imm, GETPC());
 env->daif &= ~((imm << 6) & PSTATE_DAIF);
+arm_rebuild_hflags(env);
 }

--8<---cut here---end--->8---

  I now get a working test:

  env QEMU=$HOME/lsrc/qemu.git/builds/all.debug/qemu-system-aarch64 
./run_tests.sh -g debug
  PASS debug-bp (6 tests)
  PASS debug-bp-migration (7 tests)
  PASS debug-wp (8 tests)
  PASS debug-wp-migration (9 tests)
  PASS debug-sstep (1 tests)
  PASS debug-sstep-migration (1 tests)

(I was momentarily confused when debug-sstep failed, but that was I'd
forgotten to point to my build, the system 5.2 qemu is broken in this
regard).

I'll spin up a proper patch.

Side note:

  ad5fb8830150071487025b3594a7b1bf218d12d8 is the first bad commit
  commit ad5fb8830150071487025b3594a7b1bf218d12d8
  Author: Zixuan Wang 
  Date:   Mon Oct 4 13:49:19 2021 -0700

breaks the running on kvm-unit-test for me, I needed to patch:

--8<---cut here---start->8---
modified   run_tests.sh
@@ -31,7 +31,8 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }

-RUNTIME_arch_run="./$TEST_SUBDIR/run"
+RUNTIME_arch_run="./$TEST_DIR/run"
+#RUNTIME_arch_run="./$TEST_SUBDIR/run"
 source scripts/runtime.bash

 # require enhanced getopt
--8<---cut here---end--->8---

-- 
Alex Bennée

Re: [RFC PATCH] arm: force flag recalculation when messing with DAIF

2022-02-02 Thread Andrew Jones

On Wed, Feb 02, 2022 at 12:23:53PM +, Alex Bennée wrote:
> The recently introduced debug tests in kvm-unit-tests exposed an error
> in our handling of singlestep cause by stale hflags. This is caught by
> --enable-debug-tcg when running the tests.
> 
> Signed-off-by: Alex Bennée 
> Cc: Richard Henderson 
> Cc: Andrew Jones 

s/Cc: Andrew/Reported-by: Andrew/

and now also

Tested-by: Andrew Jones 

Thanks,
drew

> ---
>  target/arm/helper-a64.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
> index d6a6fd73d9..7cf953b1e6 100644
> --- a/target/arm/helper-a64.c
> +++ b/target/arm/helper-a64.c
> @@ -83,12 +83,14 @@ void HELPER(msr_i_daifset)(CPUARMState *env, uint32_t imm)
>  {
>  daif_check(env, 0x1e, imm, GETPC());
>  env->daif |= (imm << 6) & PSTATE_DAIF;
> +arm_rebuild_hflags(env);
>  }
>  
>  void HELPER(msr_i_daifclear)(CPUARMState *env, uint32_t imm)
>  {
>  daif_check(env, 0x1f, imm, GETPC());
>  env->daif &= ~((imm << 6) & PSTATE_DAIF);
> +arm_rebuild_hflags(env);
>  }
>  
>  /* Convert a softfloat float_relation_ (as returned by
> -- 
> 2.30.2
>

Re: [PATCH v2] hw/rx: rx-gdbsim DTB load address aligned of 16byte.

2022-02-02 Thread Philippe Mathieu-Daudé via


On 2/2/22 11:30, Yoshinori Sato wrote:

Linux kernel required alined address of DTB.
But missing align in dtb load function.
Fixed to load to the correct address.

v2 changes.
Use ROUND_DOWN macro.

Signed-off-by: Yoshinori Sato 
Reviewed-by: Philippe Mathieu-Daudé 
---
  hw/rx/rx-gdbsim.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
index 75d1fec6ca..2356af83a0 100644
--- a/hw/rx/rx-gdbsim.c
+++ b/hw/rx/rx-gdbsim.c
@@ -142,7 +142,7 @@ static void rx_gdbsim_init(MachineState *machine)
  exit(1);
  }
  /* DTB is located at the end of SDRAM space. */
-dtb_offset = machine->ram_size - dtb_size;
+dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16 - 1);


Why did you add '-1'?

Re: [PATCH v2 3/3] migration: Perform vmsd structure check during tests

2022-02-02 Thread Juan Quintela

"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> "Dr. David Alan Gilbert"  wrote:
>> > * Juan Quintela (quint...@redhat.com) wrote:
>> >> "Dr. David Alan Gilbert (git)"  wrote:
>> >> > From: "Dr. David Alan Gilbert" 
>> >> >
>> >> > Perform a check on vmsd structures during test runs in the hope
>> >> > of catching any missing terminators and other simple screwups.
>> >> >
>> >> > Signed-off-by: Dr. David Alan Gilbert 
>> >> 
>> >> Reviewed-by: Juan Quintela 
>> >> 
>> >> queued.
>> >
>> > Careful; I think that'll break with slirp until libslirp gets updated
>> > first.
>> 
>> As expected, it broke it.
>> 
>> I resend the PULL request wihtout that two patches.
>> 
>> Once that we are here, how it is that make check didn't catch this?
>
> Because in my local world I did the changes to libslirp; I wanted to
> make sure qemu people were happy with the changes before proposing them
> to libslirp.
>
> Which I've just done:
>
> https://gitlab.freedesktop.org/slirp/libslirp/-/merge_requests/112

I mean make check.

It worked for me on my PULL request.  I would have assumed that it
checks slirp.

Later, Juan.

Re: [PATCH v2 3/4] virtio-iommu: Support bypass domain

2022-02-02 Thread Eric Auger

Hi Dave,

On 1/31/22 2:07 PM, Dr. David Alan Gilbert wrote:
> * Eric Auger (eric.au...@redhat.com) wrote:
>> Hi Jean,
>>
>> On 1/27/22 3:29 PM, Jean-Philippe Brucker wrote:
>>> The driver can create a bypass domain by passing the
>>> VIRTIO_IOMMU_ATTACH_F_BYPASS flag on the ATTACH request. Bypass domains
>>> perform slightly better than domains with identity mappings since they
>>> skip translation.
>>>
>>> Signed-off-by: Jean-Philippe Brucker 
>>> ---
>>>  hw/virtio/virtio-iommu.c | 32 ++--
>>>  1 file changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>> index ec02029bb6..a112428c65 100644
>>> --- a/hw/virtio/virtio-iommu.c
>>> +++ b/hw/virtio/virtio-iommu.c
>>> @@ -43,6 +43,7 @@
>>>  
>>>  typedef struct VirtIOIOMMUDomain {
>>>  uint32_t id;
>>> +bool bypass;
>> I am afraid this will break the migration if you don't change
>> vmstate_domain.
>>
>> See static const VMStateDescription vmstate_domain.
>> Also you need to migrate the new bypass field.
>>
>> Logically we should handle this with a vmstate subsection I think to
>> handle migration of older devices. However I doubt the device has been
>> used in production environment supporting migration so my guess is we
>> may skip that burden and just add the missing field. Adding Juan, Dave &
>> Peter for advices.
> I'm not sure about users of this; if no one has used it then yeh; you
> could bump up the version_id to make it a bit clearer.

Thank you for your input. Yes to me it sounds OK to only bump the
version_id while adding the new field.

Eric
>
> Dave
>
>> Thanks
>>
>> Eric
>>
>>>  GTree *mappings;
>>>  QLIST_HEAD(, VirtIOIOMMUEndpoint) endpoint_list;
>>>  } VirtIOIOMMUDomain;
>>> @@ -258,12 +259,16 @@ static void virtio_iommu_put_endpoint(gpointer data)
>>>  }
>>>  
>>>  static VirtIOIOMMUDomain *virtio_iommu_get_domain(VirtIOIOMMU *s,
>>> -  uint32_t domain_id)
>>> +  uint32_t domain_id,
>>> +  bool bypass)
>>>  {
>>>  VirtIOIOMMUDomain *domain;
>>>  
>>>  domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
>>>  if (domain) {
>>> +if (domain->bypass != bypass) {
>>> +return NULL;
>>> +}
>>>  return domain;
>>>  }
>>>  domain = g_malloc0(sizeof(*domain));
>>> @@ -271,6 +276,7 @@ static VirtIOIOMMUDomain 
>>> *virtio_iommu_get_domain(VirtIOIOMMU *s,
>>>  domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
>>> NULL, (GDestroyNotify)g_free,
>>> (GDestroyNotify)g_free);
>>> +domain->bypass = bypass;
>>>  g_tree_insert(s->domains, GUINT_TO_POINTER(domain_id), domain);
>>>  QLIST_INIT(&domain->endpoint_list);
>>>  trace_virtio_iommu_get_domain(domain_id);
>>> @@ -334,11 +340,16 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>  {
>>>  uint32_t domain_id = le32_to_cpu(req->domain);
>>>  uint32_t ep_id = le32_to_cpu(req->endpoint);
>>> +uint32_t flags = le32_to_cpu(req->flags);
>>>  VirtIOIOMMUDomain *domain;
>>>  VirtIOIOMMUEndpoint *ep;
>>>  
>>>  trace_virtio_iommu_attach(domain_id, ep_id);
>>>  
>>> +if (flags & ~VIRTIO_IOMMU_ATTACH_F_BYPASS) {
>>> +return VIRTIO_IOMMU_S_INVAL;
>>> +}
>>> +
>>>  ep = virtio_iommu_get_endpoint(s, ep_id);
>>>  if (!ep) {
>>>  return VIRTIO_IOMMU_S_NOENT;
>>> @@ -356,7 +367,12 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>>  }
>>>  }
>>>  
>>> -domain = virtio_iommu_get_domain(s, domain_id);
>>> +domain = virtio_iommu_get_domain(s, domain_id,
>>> + flags & VIRTIO_IOMMU_ATTACH_F_BYPASS);
>>> +if (!domain) {
>>> +/* Incompatible bypass flag */
>>> +return VIRTIO_IOMMU_S_INVAL;
>>> +}
>>>  QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
>>>  
>>>  ep->domain = domain;
>>> @@ -419,6 +435,10 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>>>  return VIRTIO_IOMMU_S_NOENT;
>>>  }
>>>  
>>> +if (domain->bypass) {
>>> +return VIRTIO_IOMMU_S_INVAL;
>>> +}
>>> +
>>>  interval = g_malloc0(sizeof(*interval));
>>>  
>>>  interval->low = virt_start;
>>> @@ -464,6 +484,11 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
>>>  if (!domain) {
>>>  return VIRTIO_IOMMU_S_NOENT;
>>>  }
>>> +
>>> +if (domain->bypass) {
>>> +return VIRTIO_IOMMU_S_INVAL;
>>> +}
>>> +
>>>  interval.low = virt_start;
>>>  interval.high = virt_end;
>>>  
>>> @@ -780,6 +805,9 @@ static IOMMUTLBEntry 
>>> virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>>>  entry.perm = flag;
>>>  }
>>>  goto unlock;
>>> +} else if (ep->domain->bypass) {
>>> +entry.perm = flag;
>>>

Re: [PATCH v3 1/1] virtio: fix the condition for iommu_platform not supported

2022-02-02 Thread Halil Pasic

On Wed, 2 Feb 2022 02:06:12 -0500
"Michael S. Tsirkin"  wrote:

[..]
> > In my opinion not forcing the guest to negotiate IOMMU_PLATFORM when  
> > ->get_dma_as() is not set is at least unfortunate. Please observe, that  
> > virtio-pci is not affected by this omission because for virtio-pci
> > devices ->get_dma_as != NULL always holds. And what is the deal for
> > devices that don't implement get_dma_as() (and don't need address
> > translation)? If iommu_platform=on is justified (no user error) then
> > the device does not have access to the entire guest memory. Which
> > means it more than likely needs cooperation form the guest (driver).
> > So detecting that the guest does not support IOMMU_PLATFORM and failing
> > gracefully via virtio_validate_features() instead of carrying on
> > in good faith and failing in ugly ways when the host attempts to access
> > guest memory to which it does not have access to. If we assume user
> > error, that is the host can access at least all the memory it needs
> > to access to make that device work, then it is probably still a
> > good idea to fail the device and thus help the user correct his
> > error.
> > 
> > IMHO the best course of action is
> > diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
> > index 34f5a0a664..1d0eb16d1c 100644
> > --- a/hw/virtio/virtio-bus.c
> > +++ b/hw/virtio/virtio-bus.c
> > @@ -80,7 +80,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
> > **errp)
> >  
> >  vdev_has_iommu = virtio_host_has_feature(vdev, 
> > VIRTIO_F_IOMMU_PLATFORM);
> >  if (klass->get_dma_as != NULL && has_iommu) {
> > -virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> >  vdev->dma_as = klass->get_dma_as(qbus->parent);
> >  if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
> >  error_setg(errp,
> > @@ -89,6 +88,7 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error 
> > **errp)
> >  } else {
> >  vdev->dma_as = &address_space_memory;
> >  }
> > +virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
> >  }
> > 
> > which would be a separate patch, as this is a separate issue. Jason,
> > Michael, Connie, what do you think?  
> 
> Do you mean just force VIRTIO_F_IOMMU_PLATFORM for everyone?
> Or am I misreading the patch?

Yes. Where force means: prevent the driver from setting FEATURES_OK
if it cleared VIRTIO_F_IOMMU_PLATFORM. I really don't see the case
where the device offering but the driver not accepting
VIRTIO_F_IOMMU_PLATFORM is good and useful.

Regards,
Halil


> 
> 
> > Regards,
> > Halil  
> 
>

Re: [PATCH v3 1/1] virtio: fix the condition for iommu_platform not supported

2022-02-02 Thread Daniel Henrique Barboza





On 2/1/22 22:15, Halil Pasic wrote:

On Tue, 1 Feb 2022 16:31:22 -0300
Daniel Henrique Barboza  wrote:


On 2/1/22 15:33, Halil Pasic wrote:

On Tue, 1 Feb 2022 12:36:25 -0300
Daniel Henrique Barboza  wrote:
   

+vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
if (klass->get_dma_as != NULL && has_iommu) {
virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
vdev->dma_as = klass->get_dma_as(qbus->parent);
+if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
+error_setg(errp,
+   "iommu_platform=true is not supported by the device");
+}


  

} else {
vdev->dma_as = &address_space_memory;
}



I struggled to understand what this 'else' clause was doing and I assumed that 
it was
wrong. Searching through the ML I learned that this 'else' clause is intended 
to handle
legacy virtio devices that doesn't support the DMA API (introduced in 
8607f5c3072caeebb)
and thus shouldn't set  VIRTIO_F_IOMMU_PLATFORM.


My suggestion, if a v4 is required for any other reason, is to add a small 
comment in this
'else' clause explaining that this is the legacy virtio devices condition and 
those devices
don't set F_IOMMU_PLATFORM. This would make the code easier to read for a 
virtio casual like
myself.


I do not agree that this is about legacy virtio. In my understanding
virtio-ccw simply does not need translation because CCW devices use
guest physical addresses as per architecture. It may be considered
legacy stuff form PCI perspective, but I don't think it is legacy
in general.



I wasn't talking about virtio-ccw. I was talking about this piece of code:


  if (klass->get_dma_as != NULL && has_iommu) {
  virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
  vdev->dma_as = klass->get_dma_as(qbus->parent);
  } else {
  vdev->dma_as = &address_space_memory;
  }


I suggested something like this:



  if (klass->get_dma_as != NULL && has_iommu) {
  virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
  vdev->dma_as = klass->get_dma_as(qbus->parent);
  } else {
  /*
   * We don't force VIRTIO_F_IOMMU_PLATFORM for legacy devices, i.e.
   * devices that don't implement klass->get_dma_as, regardless of
   * 'has_iommu' setting.
   */
  vdev->dma_as = &address_space_memory;
  }


At least from my reading of commits 8607f5c3072 and 2943b53f682 this seems to be
the case. I spent some time thinking that this IF/ELSE was wrong because I 
wasn't
aware of this history.


With virtio-ccw we take the else branch because we don't implement
->get_dma_as(). I don't consider all the virtio-ccw to be legacy.

IMHO there are two ways to think about this:
a) The commit that introduced this needs a fix which implemets
get_dma_as() for virtio-ccw in a way that it simply returns
address_space_memory.
b) The presence of ->get_dma_as() is not indicative of "legacy".

BTW in virtospeak "legacy" has a special meaning: pre-1.0 virtio. Do you
mean that legacy. And if I read the virtio-pci code correctly
->get_dma_as is set for legacy, transitional and modern devices alike.



Oh ok. I'm not well versed into virtiospeak. My "legacy" comment was a poor 
choice of
word for the situation.

We can ignore the "legacy" bit. My idea/suggestion is to put a comment at that 
point
explaining the logic behind into not forcing VIRTIO_F_IOMMU_PLATFORM in devices 
that
doesn't implement ->get_dma_as().

I am assuming that this is an intended design that was introduced by 2943b53f682
("virtio: force VIRTIO_F_IOMMU_PLATFORM"), meaning that the implementation of 
the
->get_dma_as is being used as a parameter to force the feature in the device. 
And with
this code:


if (klass->get_dma_as != NULL && has_iommu) {
virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
vdev->dma_as = klass->get_dma_as(qbus->parent);
} else {
vdev->dma_as = &address_space_memory;
}

It is possible that we have 2 vdev devices where ->dma_as = 
&address_space_memory, but one
of them is sitting in a bus where "klass->get_dma_as(qbus->parent) = 
&address_space_memory",
and this device will have VIRTIO_F_IOMMU_PLATFORM forced onto it and the former 
won't.


If this is not an intended design I can only speculate how to fix it. Forcing 
VIRTIO_F_IOMMU_PLATFORM
in all devices, based only on has_iommu, can break stuff. Setting 
VIRTIO_F_IOMMU_PLATFORM only
if "vdev->dma_as != &address_space_memory" make some sense but I am fairly 
certain it will
break stuff the other way. Or perhaps the fix is something else entirely.






IMHO the important thing to figure out is what impact that
virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
in the first branch (of the if-else) has. IMHO if one examines the
commits 8607f5c307 ("virtio: convert

Re: [PATCH v3 0/2] tests/9pfs: Fix leak and add some more g_auto* annotations

2022-02-02 Thread Christian Schoenebeck

On Dienstag, 1. Februar 2022 16:15:06 CET Greg Kurz wrote:
> This is the continuation of:
> 
> https://lore.kernel.org/qemu-devel/2022020137.732325b4@bahia/T/#t
> 
> v3: - fix leak in its own patch
> 
> Greg Kurz (2):
>   tests/9pfs: Fix leak of local_test_path
>   tests/9pfs: Use g_autofree and g_autoptr where possible
> 
>  tests/qtest/libqos/virtio-9p.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> -- 
> 2.34.1
> 

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

Thanks!

Best regards,
Christian Schoenebeck

Re: [PATCH 10/20] tcg/i386: Implement avx512 immediate sari shift

2022-02-02 Thread Alex Bennée



Richard Henderson  writes:

> AVX512 has VPSRAQ with immediate operand, in the same form as
> with AVX, but requires EVEX encoding and W1.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH 09/20] tcg/i386: Implement avx512 scalar shift

2022-02-02 Thread Alex Bennée



Richard Henderson  writes:

> AVX512VL has VPSRAQ.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v2 3/3] migration: Perform vmsd structure check during tests

2022-02-02 Thread Peter Maydell

On Wed, 2 Feb 2022 at 11:32, Dr. David Alan Gilbert  wrote:
> Because in my local world I did the changes to libslirp; I wanted to
> make sure qemu people were happy with the changes before proposing them
> to libslirp.
>
> Which I've just done:
>
> https://gitlab.freedesktop.org/slirp/libslirp/-/merge_requests/112

Does QEMU's own vmstate handling code see the libslirp vmstate
structures ? Looking at the code it seems to me like QEMU's
migration code only interacts with slirp via the
slirp_state_save() and slirp_state_load() functions.
Internally those work with some use of a vmstate structure,
but the code that iterates over field arrays in those is
all inside slirp itself (in src/slirp/vmstate.c if you're
looking at the in-tree copy).

So maybe I'm missing something but I'm not sure there really
is a dependency on the libslirp change here...

-- PMM

Re: [PULL 18/20] block/nbd: drop connection_co

2022-02-02 Thread Eric Blake

On Wed, Feb 02, 2022 at 12:49:36PM +0100, Fabian Ebner wrote:
> Am 27.09.21 um 23:55 schrieb Eric Blake:
> > From: Vladimir Sementsov-Ogievskiy 
> > 
> > OK, that's a big rewrite of the logic.
> > 
> > Pre-patch we have an always running coroutine - connection_co. It does
> > reply receiving and reconnecting. And it leads to a lot of difficult
> > and unobvious code around drained sections and context switch. We also
> > abuse bs->in_flight counter which is increased for connection_co and
> > temporary decreased in points where we want to allow drained section to
> > begin. One of these place is in another file: in nbd_read_eof() in
> > nbd/client.c.
> > 
> > We also cancel reconnect and requests waiting for reconnect on drained
> > begin which is not correct. And this patch fixes that.
> > 
> > Let's finally drop this always running coroutine and go another way:
> > do both reconnect and receiving in request coroutines.
> >
> 
> Hi,
> 
> while updating our stack to 6.2, one of our live-migration tests stopped
> working (backtrace is below) and bisecting led me to this patch.
> 
> The VM has a single qcow2 disk (converting to raw doesn't make a
> difference) and the issue only appears when using iothread (for both
> virtio-scsi-pci and virtio-block-pci).
> 
> Reverting 1af7737871fb3b66036f5e520acb0a98fc2605f7 (which lives on top)
> and 4ddb5d2fde6f22b2cf65f314107e890a7ca14fcf (the commit corresponding
> to this patch) in v6.2.0 makes the migration work again.
> 
> Backtrace:
> 
> Thread 1 (Thread 0x7f9d93458fc0 (LWP 56711) "kvm"):
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x7f9d9d6bc537 in __GI_abort () at abort.c:79
> #2  0x7f9d9d6bc40f in __assert_fail_base (fmt=0x7f9d9d825128
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5579153763f8
> "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)",
> file=0x5579153764f9 "../io/channel.c", line=483, function= out>) at assert.c:92

Given that this assertion is about which aio context is set, I wonder
if the conversation at
https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg00096.html is
relevant; if so, Vladimir may already be working on the patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v5 00/43] CXl 2.0 emulation Support

2022-02-02 Thread Jonathan Cameron via

Changes since v4:
https://lore.kernel.org/linux-cxl/20220124171705.10432-1-jonathan.came...@huawei.com/

Note documentation patch that Alex requested to follow.
I don't want to delay getting this out as Alex mentioned possibly
having time to continue reviewing in latter part of this week.

Issues identified by CI / Alex Bennée
- Stubs added for hw/cxl/cxl-host and hw/acpi/cxl plus related meson
changes to use them as necessary.
- Drop uid from cxl-test (result of last minute change in v4 that was not
carried through to the test)
- Fix naming clash with field name ERROR which on some arches is defined
and results in the string being replaced with 0 in some of the
register field related defines. Call it ERR instead.
- Fix type issue around mr->size by using 64 bit acessor functions.
- Add a new patch to exclude pxb-cxl from device-crash-test in similar
fashion to pxb.

CI tests now passing with exception of checkpatch which has what
I think is a false positive and build-oss-fuzz which keeps timing out.
https://gitlab.com/jic23/qemu/-/pipelines/460109208
There were a few tweaks to patch descriptions after I pushed that
out (I missed a few RB from Alex).

Other changes (mostly from Alex's review)
- Change component register handling to now report UNIMP and return 0
for 8 byte registers as we currently don't implement any of them.
Note that this means we need a kernel fix:

https://lore.kernel.org/linux-cxl/20220201153437.2873-1-jonathan.came...@huawei.com/
- Drop majority of the macros used in defining mailbox handlers in
favour of written out code.
- Use REG64 where appropriate. This was introduced whilst this set
has been underdevelopment so I missed it.
- Clarify some register access options wrt to CXL 2.0 Errata F4.
- Change timestamp to qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
- Use typed enums to enforce types of function arguements.
- Default to cxl being off in machine_class_init() removing
need to set it to off in machines where there is no support as yet.
- Add Alex's RB where given.

Looking in particular for:
* Review of the PCI interactions
* x86 and ARM machine interactions (particularly the memory maps)
* Review of the interleaving approach - is the basic idea
acceptable?
* Review of the command line interface.
* CXL related review welcome but much of that got reviewed
in earlier versions and hasn't changed substantially.

Big TODOs:

* Interleave boundary issues. I haven't yet solved this but didn't
want to futher delay the review of the rest of the series.

* Volatile memory devices (easy but it's more code so left for now).
* Switch support. Linux kernel support is under review currently,
so there is now something to test against.
* Hotplug? May not need much but it's not tested yet!
* More tests and tighter verification that values written to hardware
are actually valid - stuff that real hardware would check.
* Testing, testing and more testing. I have been running a basic
set of ARM and x86 tests on this, but there is always room for
more tests and greater automation.
* CFMWS flags as requested by Ben.

Why do we want QEMU emulation of CXL?

As Ben stated in V3, QEMU support has been critical to getting OS
software written given lack of availability of hardware supporting the
latest CXL features (coupled with very high demand for support being
ready in a timely fashion). What has become clear since Ben's v3
is that situation is a continuous one. Whilst we can't talk about
them yet, CXL 3.0 features and OS support have been prototyped on
top of this support and a lot of the ongoing kernel work is being
tested against these patches. The kernel CXL mocking code allows
some forms of testing, but QEMU provides a more versatile and
exensible platform.

Other features on the qemu-list that build on these include PCI-DOE
/CDAT support from the Avery Design team further showing how this
code is useful. Whilst not directly related this is also the test
platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
utilizes and extends those technologies and is likely to be an early
adopter.
Refs:
CMA Kernel:
https://lore.kernel.org/all/20210804161839.3492053-1-jonathan.came...@huawei.com/
CMA Qemu:
https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbr...@avery-design.com/
DOE Qemu:
https://lore.kernel.org/qemu-devel/162332-15662-1-git-send-email-cbr...@avery-design.com/

As can be seen there is non trivial interaction with other areas of
Qemu, particularly PCI and keeping this set up to date is proving
a burden we'd rather do without :)

Ben mentioned a few other good reasons in v3:
https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widaw...@intel.com/

The evolution of this series perhaps leave it in a less than
entirely obvious order and that may get tidied up in future postings.
I'm also open to this being considered in bite sized chunks. What
we have here is about what you need for it to be useful for testing
currently ke

[PATCH v5 01/43] hw/pci/cxl: Add a CXL component type (interface)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci/pci.c | 10 ++
 include/hw/pci/pci.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 5d30f9ca60..474ea98c1d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -201,6 +201,11 @@ static const TypeInfo pci_bus_info = {
 .class_init = pci_bus_class_init,
 };
 
+static const TypeInfo cxl_interface_info = {
+.name  = INTERFACE_CXL_DEVICE,
+.parent= TYPE_INTERFACE,
+};
+
 static const TypeInfo pcie_interface_info = {
 .name  = INTERFACE_PCIE_DEVICE,
 .parent= TYPE_INTERFACE,
@@ -2128,6 +2133,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+if (object_class_dynamic_cast(klass, INTERFACE_CXL_DEVICE)) {
+pci_dev->cap_present |= QEMU_PCIE_CAP_CXL;
+}
+
 pci_dev = do_pci_register_device(pci_dev,
  object_get_typename(OBJECT(qdev)),
  pci_dev->devfn, errp);
@@ -2884,6 +2893,7 @@ static void pci_register_types(void)
 type_register_static(&pci_bus_info);
 type_register_static(&pcie_bus_info);
 type_register_static(&conventional_pci_interface_info);
+type_register_static(&cxl_interface_info);
 type_register_static(&pcie_interface_info);
 type_register_static(&pci_device_type_info);
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 023abc0f79..908896ebe8 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -195,6 +195,8 @@ enum {
 QEMU_PCIE_LNKSTA_DLLLA = (1 << QEMU_PCIE_LNKSTA_DLLLA_BITNR),
 #define QEMU_PCIE_EXTCAP_INIT_BITNR 9
 QEMU_PCIE_EXTCAP_INIT = (1 << QEMU_PCIE_EXTCAP_INIT_BITNR),
+#define QEMU_PCIE_CXL_BITNR 10
+QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 };
 
 #define TYPE_PCI_DEVICE "pci-device"
@@ -202,6 +204,12 @@ typedef struct PCIDeviceClass PCIDeviceClass;
 DECLARE_OBJ_CHECKERS(PCIDevice, PCIDeviceClass,
  PCI_DEVICE, TYPE_PCI_DEVICE)
 
+/*
+ * Implemented by devices that can be plugged on CXL buses. In the spec, this 
is
+ * actually a "CXL Component, but we name it device to match the PCI naming.
+ */
+#define INTERFACE_CXL_DEVICE "cxl-device"
+
 /* Implemented by devices that can be plugged on PCI Express buses */
 #define INTERFACE_PCIE_DEVICE "pci-express-device"
 
-- 
2.32.0

[PATCH v5 04/43] hw/cxl/device: Introduce a CXL device (8.2.8)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
v5:
 Include the impacts of the published CXL 2.0 Errata F4 which clarified
 access permissions.
 - Documentation updates.
 - The 48 bit registers is gone.
 
 include/hw/cxl/cxl.h|   1 +
 include/hw/cxl/cxl_device.h | 165 
 2 files changed, 166 insertions(+)
 create mode 100644 include/hw/cxl/cxl_device.h

diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 8c738c7a2b..b9d1ac3fad 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -12,5 +12,6 @@
 
 #include "cxl_pci.h"
 #include "cxl_component.h"
+#include "cxl_device.h"
 
 #endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
new file mode 100644
index 00..b2416e45bf
--- /dev/null
+++ b/include/hw/cxl/cxl_device.h
@@ -0,0 +1,165 @@
+/*
+ * QEMU CXL Devices
+ *
+ * Copyright (c) 2020 Intel
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef CXL_DEVICE_H
+#define CXL_DEVICE_H
+
+#include "hw/register.h"
+
+/*
+ * The following is how a CXL device's MMIO space is laid out. The only
+ * requirement from the spec is that the capabilities array and the capability
+ * headers start at offset 0 and are contiguously packed. The headers 
themselves
+ * provide offsets to the register fields. For this emulation, registers will
+ * start at offset 0x80 (m == 0x80). No secondary mailbox is implemented which
+ * means that n = m + sizeof(mailbox registers) + sizeof(device registers).
+ *
+ * This is roughly described in 8.2.8 Figure 138 of the CXL 2.0 spec.
+ *
+ *   +-+
+ *   | |
+ *   |Memory Device Registers  |
+ *   | |
+ * n + PAYLOAD_SIZE_MAX  ---
+ *  ^| |
+ *  || |
+ *  || |
+ *  || |
+ *  || |
+ *  || Mailbox Payload |
+ *  || |
+ *  || |
+ *  || |
+ *  |---
+ *  ||   Mailbox Registers |
+ *  || |
+ *  n---
+ *  ^| |
+ *  ||Device Registers |
+ *  || |
+ *  m-->
+ *  ^|  Memory Device Capability Header|
+ *  |---
+ *  || Mailbox Capability Header   |
+ *  |-- 
+ *  || Device Capability Header|
+ *  |---
+ *  || |
+ *  || |
+ *  ||  Device Cap Array[0..n] |
+ *  || |
+ *  || |
+ *   | |
+ *  0+-+
+ *
+ */
+
+#define CXL_DEVICE_CAP_HDR1_OFFSET 0x10 /* Figure 138 */
+#define CXL_DEVICE_CAP_REG_SIZE 0x10 /* 8.2.8.2 */
+#define CXL_DEVICE_CAPS_MAX 4 /* 8.2.8.2.1 + 8.2.8.5 */
+
+#define CXL_DEVICE_REGISTERS_OFFSET 0x80 /* Read comment above */
+#define CXL_DEVICE_REGISTERS_LENGTH 0x8 /* 8.2.8.3.1 */
+
+#define CXL_MAILBOX_REGISTERS_OFFSET \
+(CXL_DEVICE_REGISTERS_OFFSET + CXL_DEVICE_REGISTERS_LENGTH)
+#define CXL_MAILBOX_REGISTERS_SIZE 0x20 /* 8.2.8.4, Figure 139 */
+#define CXL_MAILBOX_PAYLOAD_SHIFT 11
+#define CXL_MAILBOX_MAX_PAYLOAD_SIZE (1 << CXL_MAILBOX_PAYLOAD_SHIFT)
+#define CXL_MAILBOX_REGISTERS_LENGTH \
+(CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
+
+typedef struct cxl_devi

[PATCH v5 03/43] MAINTAINERS: Add entry for Compute Express Link Emulation

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

The CXL emulation will be jointly maintained by Ben Widawsky
and Jonathan Cameron.  Broken out as a separate patch
to improve visibility.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index b43344fa98..930f04c6c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2524,6 +2524,13 @@ F: qapi/block*.json
 F: qapi/transaction.json
 T: git https://repo.or.cz/qemu/armbru.git block-next
 
+Compute Express Link
+M: Ben Widawsky 
+M: Jonathan Cameron 
+S: Supported
+F: hw/cxl/
+F: include/hw/cxl/
+
 Dirty Bitmaps
 M: Eric Blake 
 M: Vladimir Sementsov-Ogievskiy 
-- 
2.32.0

Re: [PATCH 12/20] tcg/i386: Implement avx512 variable rotate

2022-02-02 Thread Alex Bennée



Richard Henderson  writes:

> AVX512VL has VPROLVQ and VPRORVQ.
>
> Signed-off-by: Richard Henderson 

I could make the same comment from the previous patch about the goto
gen_simd stuff. Anyway:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH 11/20] tcg/i386: Implement avx512 immediate rotate

2022-02-02 Thread Alex Bennée



Richard Henderson  writes:

> AVX512VL has VPROLD and VPROLQ, layered onto the same
> opcode as PSHIFTD, but requires EVEX encoding and W.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/i386/tcg-target.h |  2 +-
>  tcg/i386/tcg-target.c.inc | 15 +--
>  2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 12d098ad6c..38c09fd66c 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -195,7 +195,7 @@ extern bool have_movbe;
>  #define TCG_TARGET_HAS_not_vec  0
>  #define TCG_TARGET_HAS_neg_vec  0
>  #define TCG_TARGET_HAS_abs_vec  1
> -#define TCG_TARGET_HAS_roti_vec 0
> +#define TCG_TARGET_HAS_roti_vec have_avx512vl
>  #define TCG_TARGET_HAS_rots_vec 0
>  #define TCG_TARGET_HAS_rotv_vec 0
>  #define TCG_TARGET_HAS_shi_vec  1
> diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
> index c4e6f2e5ea..5ab7c4c0fa 100644
> --- a/tcg/i386/tcg-target.c.inc
> +++ b/tcg/i386/tcg-target.c.inc
> @@ -361,7 +361,7 @@ static bool tcg_target_const_match(int64_t val, TCGType 
> type, int ct)
>  #define OPC_PSHUFLW (0x70 | P_EXT | P_SIMDF2)
>  #define OPC_PSHUFHW (0x70 | P_EXT | P_SIMDF3)
>  #define OPC_PSHIFTW_Ib  (0x71 | P_EXT | P_DATA16) /* /2 /6 /4 */
> -#define OPC_PSHIFTD_Ib  (0x72 | P_EXT | P_DATA16) /* /2 /6 /4 */
> +#define OPC_PSHIFTD_Ib  (0x72 | P_EXT | P_DATA16) /* /1 /2 /6 /4 */
>  #define OPC_PSHIFTQ_Ib  (0x73 | P_EXT | P_DATA16) /* /2 /6 /4 */
>  #define OPC_PSLLW   (0xf1 | P_EXT | P_DATA16)
>  #define OPC_PSLLD   (0xf2 | P_EXT | P_DATA16)
> @@ -2906,6 +2906,14 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode 
> opc,
>  insn |= P_VEXW | P_EVEX;
>  }
>  sub = 4;
> +goto gen_shift;
> +case INDEX_op_rotli_vec:
> +insn = OPC_PSHIFTD_Ib | P_EVEX;  /* VPROL[DQ] */
> +if (vece == MO_64) {
> +insn |= P_VEXW;
> +}
> +sub = 1;
> +goto gen_shift;

This could just be a /* fall-through */ although given the large amount
of gotos the switch statement is gathering I'm not sure it makes too
much difference.

Is there any reason why gen_shift couldn't be pushed into a helper
function so we just had:

static void tcg_out_vec_shift(s, vece, insn, sub, a0, a1, a2) {
tcg_debug_assert(vece != MO_8);
if (type == TCG_TYPE_V256) {
insn |= P_VEXL;
}
tcg_out_vex_modrm(s, insn, sub, a0, a1);
tcg_out8(s, a2);
}

...

case INDEX_op_rotli_vec:
insn = OPC_PSHIFTD_Ib | P_EVEX;  /* VPROL[DQ] */
if (vece == MO_64) {
insn |= P_VEXW;
}
tcg_out_vec_shift(s, vece, insn, 1, a0, a1, a2);
break;

Surely the compiler would inline if needed (and even if it didn't it the
code generation that critical we care about a few cycles)?


-- 
Alex Bennée

[PATCH v5 02/43] hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
v5:
Alex pointed out the odd handling of 8 byte accesses.
That lead to discovering a kernel bug around access to
the Cap Array Header for which a fix is no on the linux-...@vger.kernel.org
list.

We don't currently implement any of the 8 byte registers, so for
now this logs UNIMP and read 0, write ignored. 

hw/Kconfig |   1 +
 hw/cxl/Kconfig |   3 +
 hw/cxl/cxl-component-utils.c   | 219 +
 hw/cxl/meson.build |   4 +
 hw/meson.build |   1 +
 include/hw/cxl/cxl.h   |  16 +++
 include/hw/cxl/cxl_component.h | 196 +
 include/hw/cxl/cxl_pci.h   | 138 +
 8 files changed, 578 insertions(+)
 create mode 100644 hw/cxl/Kconfig
 create mode 100644 hw/cxl/cxl-component-utils.c
 create mode 100644 hw/cxl/meson.build
 create mode 100644 include/hw/cxl/cxl.h
 create mode 100644 include/hw/cxl/cxl_component.h
 create mode 100644 include/hw/cxl/cxl_pci.h

diff --git a/hw/Kconfig b/hw/Kconfig
index ad20cce0a9..50e0952889 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -6,6 +6,7 @@ source audio/Kconfig
 source block/Kconfig
 source char/Kconfig
 source core/Kconfig
+source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
 source gpio/Kconfig
diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
new file mode 100644
index 00..8e67519b16
--- /dev/null
+++ b/hw/cxl/Kconfig
@@ -0,0 +1,3 @@
+config CXL
+bool
+default y if PCI_EXPRESS
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
new file mode 100644
index 00..07297b3bbe
--- /dev/null
+++ b/hw/cxl/cxl-component-utils.c
@@ -0,0 +1,219 @@
+/*
+ * CXL Utility library for components
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
+
+static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr offset,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return 0;
+}
+
+if (cregs->special_ops && cregs->special_ops->read) {
+return cregs->special_ops->read(cxl_cstate, offset, size);
+} else {
+return cregs->cache_mem_registers[offset / 4];
+}
+}
+
+static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
+unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+
+if (size == 8) {
+qemu_log_mask(LOG_UNIMP,
+  "CXL 8 byte cache mem registers not implemented\n");
+return;
+}
+if (cregs->special_ops && cregs->special_ops->write) {
+cregs->special_ops->write(cxl_cstate, offset, value, size);
+} else {
+cregs->cache_mem_registers[offset / 4] = value;
+}
+}
+
+/*
+ * 8.2.3
+ *   The access restrictions specified in Section 8.2.2 also apply to CXL 2.0
+ *   Component Registers.
+ *
+ * 8.2.2
+ *   • A 32 bit register shall be accessed as a 4 Bytes quantity. Partial
+ *   reads are not permitted.
+ *   • A 64 bit register shall be accessed as a 8 Bytes quantity. Partial
+ *   reads are not permitted.
+ *
+ * As of the spec defined today, only 4 byte registers exist.
+ */
+static const MemoryRegionOps cache_mem_ops = {
+.read = cxl_cache_mem_read_reg,
+.write = cxl_cache_mem

[PATCH v5 18/43] hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO.  This patch includes
i386/pc support.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
---
 hw/i386/acpi-build.c| 26 +++---
 hw/i386/pc.c| 27 ++-
 hw/pci-bridge/pci_expander_bridge.c | 53 -
 include/hw/cxl/cxl.h|  4 +++
 4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 09940f6e84..1e1e9b9d38 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -28,6 +28,7 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 #include "hw/pci/pci.h"
+#include "hw/cxl/cxl.h"
 #include "hw/core/cpu.h"
 #include "target/i386/cpu.h"
 #include "hw/misc/pvpanic.h"
@@ -1398,7 +1399,7 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
-typedef enum { PCI, PCIE } PCIBusType;
+typedef enum { PCI, PCIE, CXL } PCIBusType;
 static void init_pci_acpi(Aml *dev, int uid, PCIBusType type,
   bool native_pcie_hp)
 {
@@ -1562,22 +1563,30 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 QLIST_FOREACH(bus, &bus->child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+int32_t uid = bus_num; /* TODO: Explicit uid */
+int type;
 
 /* look only for expander root buses */
 if (!pci_bus_is_root(bus)) {
 continue;
 }
 
+type = pci_bus_is_cxl(bus) ? CXL :
+ pci_bus_is_express(bus) ? PCIE : PCI;
+
 if (bus_num < root_bus_limit) {
 root_bus_limit = bus_num - 1;
 }
 
 scope = aml_scope("\\_SB");
-dev = aml_device("PC%.02X", bus_num);
+if (type == CXL) {
+dev = aml_device("CL%.02X", uid);
+} else {
+dev = aml_device("PC%.02X", bus_num);
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 
-init_pci_acpi(dev, bus_num,
-  pci_bus_is_express(bus) ? PCIE : PCI, true);
+init_pci_acpi(dev, uid, type, true);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
@@ -1589,6 +1598,15 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(dev, aml_name_decl("_CRS", crs));
 aml_append(scope, dev);
 aml_append(dsdt, scope);
+
+/* Handle the ranges for the PXB expanders */
+if (type == CXL) {
+MemoryRegion *mr = &machine->cxl_devices_state->host_mr;
+uint64_t base = mr->addr;
+
+crs_range_insert(crs_range_set.mem_ranges, base,
+ base + memory_region_size(mr) - 1);
+}
 }
 }
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b6800a511a..7a18dce529 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -75,6 +75,7 @@
 #include "acpi-build.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-common.h"
 #include "qapi/qapi-visit-machine.h"
@@ -815,6 +816,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr cxl_base;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -904,6 +906,26 @@ void pc_memory_init(PCMachineState *pcms,
 &machine->device_memory->mr);
 }
 
+if (machine->cxl_devices_state->is_enabled) {
+MemoryRegion *mr = &machine->cxl_devices_state->host_mr;
+hwaddr cxl_size = MiB;
+
+if (pcmc->has_reserved_memory && machine->device_memory->base) {
+cxl_base = machine->device_memory->base;
+if (!pcmc->broken_reserved_end) {
+cxl_base += memory_region_size(&machine->device_memory->mr);
+}
+} else if (pcms->sgx_epc.size != 0) {
+cxl_base = sgx_epc_above_4g_end(&pcms->sgx_epc);
+} else {
+cxl_base = 0x1ULL + x86ms->above_4g_mem_size;
+}
+
+e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
+memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
+memory_region_add_subregion(system_memory, cxl_base, mr);
+}
+
 /* Initialize PC system firmware */
 pc_system_firmware_init(pcms, rom_memory);
 
@@ -964,7 +986,10 @@ uint64_t pc_pci_hole64_start(voi

[PATCH v5 05/43] hw/cxl/device: Implement the CAP array (8.2.8.1-2)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 109 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl_device.h |  31 +-
 3 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 hw/cxl/cxl-device-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
new file mode 100644
index 00..0895b9d78b
--- /dev/null
+++ b/hw/cxl/cxl-device-utils.c
@@ -0,0 +1,109 @@
+/*
+ * CXL Utility library for devices
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/cxl/cxl.h"
+
+/*
+ * Device registers have no restrictions per the spec, and so fall back to the
+ * default memory mapped register rules in 8.2:
+ *   Software shall use CXL.io Memory Read and Write to access memory mapped
+ *   register defined in this section. Unless otherwise specified, software
+ *   shall restrict the accesses width based on the following:
+ *   • A 32 bit register shall be accessed as a 1 Byte, 2 Bytes or 4 Bytes
+ * quantity.
+ *   • A 64 bit register shall be accessed as a 1 Byte, 2 Bytes, 4 Bytes or 8
+ * Bytes
+ *   • The address shall be a multiple of the access width, e.g. when
+ * accessing a register as a 4 Byte quantity, the address shall be
+ * multiple of 4.
+ *   • The accesses shall map to contiguous bytes.If these rules are not
+ * followed, the behavior is undefined
+ */
+
+static uint64_t caps_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (size == 4) {
+return cxl_dstate->caps_reg_state32[offset / 4];
+} else {
+return cxl_dstate->caps_reg_state64[offset / 8];
+}
+}
+
+static uint64_t dev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+return 0;
+}
+
+static const MemoryRegionOps dev_ops = {
+.read = dev_reg_read,
+.write = NULL, /* status register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+
+static const MemoryRegionOps caps_ops = {
+.read = caps_reg_read,
+.write = NULL, /* caps registers are read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 4,
+.max_access_size = 8,
+},
+};
+
+void cxl_device_register_block_init(Object *obj, CXLDeviceState *cxl_dstate)
+{
+/* This will be a BAR, so needs to be rounded up to pow2 for PCI spec */
+memory_region_init(&cxl_dstate->device_registers, obj, "device-registers",
+   pow2ceil(CXL_MMIO_SIZE));
+
+memory_region_init_io(&cxl_dstate->caps, obj, &caps_ops, cxl_dstate,
+  "cap-array", CXL_CAPS_SIZE);
+memory_region_init_io(&cxl_dstate->device, obj, &dev_ops, cxl_dstate,
+  "device-status", CXL_DEVICE_REGISTERS_LENGTH);
+
+memory_region_add_subregion(&cxl_dstate->device_registers, 0,
+&cxl_dstate->caps);
+memory_region_add_subregion(&cxl_dstate->device_registers,
+CXL_DEVICE_REGISTERS_OFFSET,
+&cxl_dstate->device);
+}
+
+static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
+void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
+{
+uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
+const int cap_count = 1;
+
+/* CXL Device Capabilities Array Register */
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_VERSION, 1);
+ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_COUNT, cap_count);
+
+cxl_device_cap_init(cxl_dstate, DEVICE, 1);
+device_reg_init_common(cxl_dstate);
+}
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 3231b5de1e..dd7c6f8e5a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -1,4 +1,5 @@
 softmmu_ss.add(when: 'CONFIG_CXL',
if_true: files(
'cxl-component-utils.c',
+   'cxl-device-utils.c',
))
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index b2416e45bf..1ac0dcd97e

[PATCH v5 11/43] hw/pxb: Use a type for realizing expanders

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index de932286b5..d4514227a8 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,6 +24,8 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
+enum BusType { PCI, PCIE };
+
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
@@ -221,7 +223,8 @@ static gint pxb_compare(gconstpointer a, gconstpointer b)
0;
 }
 
-static void pxb_dev_realize_common(PCIDevice *dev, bool pcie, Error **errp)
+static void pxb_dev_realize_common(PCIDevice *dev, enum BusType type,
+   Error **errp)
 {
 PXBDev *pxb = convert_to_pxb(dev);
 DeviceState *ds, *bds = NULL;
@@ -246,7 +249,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, bool 
pcie, Error **errp)
 }
 
 ds = qdev_new(TYPE_PXB_HOST);
-if (pcie) {
+if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
@@ -295,7 +298,7 @@ static void pxb_dev_realize(PCIDevice *dev, Error **errp)
 return;
 }
 
-pxb_dev_realize_common(dev, false, errp);
+pxb_dev_realize_common(dev, PCI, errp);
 }
 
 static void pxb_dev_exitfn(PCIDevice *pci_dev)
@@ -348,7 +351,7 @@ static void pxb_pcie_dev_realize(PCIDevice *dev, Error 
**errp)
 return;
 }
 
-pxb_dev_realize_common(dev, true, errp);
+pxb_dev_realize_common(dev, PCIE, errp);
 }
 
 static void pxb_pcie_dev_class_init(ObjectClass *klass, void *data)
-- 
2.32.0

[PATCH v5 07/43] hw/cxl/device: Add memory device utilities

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/cxl/cxl-device-utils.c   | 38 -
 include/hw/cxl/cxl_device.h | 22 ++---
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 39011468ef..14336d846d 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -137,6 +137,31 @@ static void mailbox_reg_write(void *opaque, hwaddr offset, 
uint64_t value,
 cxl_process_mailbox(cxl_dstate);
 }
 
+static uint64_t mdev_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t retval = 0;
+
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MEDIA_STATUS, 1);
+retval = FIELD_DP64(retval, CXL_MEM_DEV_STS, MBOX_READY, 1);
+
+return retval;
+}
+
+static const MemoryRegionOps mdev_ops = {
+.read = mdev_reg_read,
+.write = NULL, /* memory device register is read only */
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 8,
+.max_access_size = 8,
+},
+};
+
 static const MemoryRegionOps mailbox_ops = {
 .read = mailbox_reg_read,
 .write = mailbox_reg_write,
@@ -194,6 +219,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
   "device-status", CXL_DEVICE_REGISTERS_LENGTH);
 memory_region_init_io(&cxl_dstate->mailbox, obj, &mailbox_ops, cxl_dstate,
   "mailbox", CXL_MAILBOX_REGISTERS_LENGTH);
+memory_region_init_io(&cxl_dstate->memory_device, obj, &mdev_ops,
+  cxl_dstate, "memory device caps",
+  CXL_MEMORY_DEVICE_REGISTERS_LENGTH);
 
 memory_region_add_subregion(&cxl_dstate->device_registers, 0,
 &cxl_dstate->caps);
@@ -203,6 +231,9 @@ void cxl_device_register_block_init(Object *obj, 
CXLDeviceState *cxl_dstate)
 memory_region_add_subregion(&cxl_dstate->device_registers,
 CXL_MAILBOX_REGISTERS_OFFSET,
 &cxl_dstate->mailbox);
+memory_region_add_subregion(&cxl_dstate->device_registers,
+CXL_MEMORY_DEVICE_REGISTERS_OFFSET,
+&cxl_dstate->memory_device);
 }
 
 static void device_reg_init_common(CXLDeviceState *cxl_dstate) { }
@@ -215,10 +246,12 @@ static void mailbox_reg_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_dstate->payload_size = CXL_MAILBOX_MAX_PAYLOAD_SIZE;
 }
 
+static void memdev_reg_init_common(CXLDeviceState *cxl_dstate) { }
+
 void cxl_device_register_init_common(CXLDeviceState *cxl_dstate)
 {
 uint64_t *cap_hdrs = cxl_dstate->caps_reg_state64;
-const int cap_count = 2;
+const int cap_count = 3;
 
 /* CXL Device Capabilities Array Register */
 ARRAY_FIELD_DP64(cap_hdrs, CXL_DEV_CAP_ARRAY, CAP_ID, 0);
@@ -231,5 +264,8 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MAILBOX, 2);
 mailbox_reg_init_common(cxl_dstate);
 
+cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
+memdev_reg_init_common(cxl_dstate);
+
 assert(cxl_initialize_mailbox(cxl_dstate) == 0);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 49dcca7e44..7fd8d0f616 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -77,15 +77,21 @@
 #define CXL_MAILBOX_REGISTERS_LENGTH \
 (CXL_MAILBOX_REGISTERS_SIZE + CXL_MAILBOX_MAX_PAYLOAD_SIZE)
 
-#define CXL_MMIO_SIZE   \
-(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_REGISTERS_LENGTH +\
- CXL_MAILBOX_REGISTERS_LENGTH)
+
+#define CXL_MEMORY_DEVICE_REGISTERS_OFFSET \
+(CXL_MAILBOX_REGISTERS_OFFSET + CXL_MAILBOX_REGISTERS_LENGTH)
+#define CXL_MEMORY_DEVICE_REGISTERS_LENGTH 0x8
+
+#define CXL_MMIO_SIZE   \
+(CXL_DEVICE_CAP_REG_SIZE + CXL_DEVICE_REGISTERS_LENGTH +\
+ CXL_MAILBOX_REGISTERS_LENGTH + CXL_MEMORY_DEVICE_REGISTERS_LENGTH)
 
 typedef struct cxl_device_state {
 MemoryRegion device_registers;
 
 /* mmio for device capabilities array - 8.2.8.2 */
 MemoryRegion device;
+MemoryRegion memory_device;
 struct {
 MemoryRegion caps;

[PATCH v5 24/43] acpi/cxl: Create the CEDT (9.14.1)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---

v5: Part of this patch moved earlier to fix a reset issue.

 hw/acpi/cxl.c   | 68 +
 hw/i386/acpi-build.c| 27 
 hw/pci-bridge/pci_expander_bridge.c | 18 
 include/hw/acpi/cxl.h   |  5 +++
 include/hw/pci/pci_bridge.h | 20 +
 5 files changed, 120 insertions(+), 18 deletions(-)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 7124d5a1a3..442f836a3e 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -18,7 +18,11 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
 #include "hw/cxl/cxl.h"
+#include "hw/mem/memory-device.h"
 #include "hw/acpi/acpi.h"
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/bios-linker-loader.h"
@@ -26,6 +30,70 @@
 #include "qapi/error.h"
 #include "qemu/uuid.h"
 
+static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
+{
+SysBusDevice *sbd = SYS_BUS_DEVICE(cxl->cxl.cxl_host_bridge);
+struct MemoryRegion *mr = sbd->mmio[0].memory;
+
+/* Type */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 32, 2);
+
+/* UID - currently equal to bus number */
+build_append_int_noprefix(table_data, cxl->bus_nr, 4);
+
+/* Version */
+build_append_int_noprefix(table_data, 1, 4);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base - subregion within a container that is in PA space */
+build_append_int_noprefix(table_data, mr->container->addr + mr->addr, 8);
+
+/* Length */
+build_append_int_noprefix(table_data, memory_region_size(mr), 8);
+}
+
+static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
+{
+Aml *cedt = opaque;
+
+if (object_dynamic_cast(obj, TYPE_PXB_CXL_DEVICE)) {
+cedt_build_chbs(cedt->buf, PXB_CXL_DEV(obj));
+}
+
+return 0;
+}
+
+void cxl_build_cedt(MachineState *ms, GArray *table_offsets, GArray 
*table_data,
+BIOSLinker *linker, const char *oem_id,
+const char *oem_table_id)
+{
+Aml *cedt;
+AcpiTable table = { .sig = "CEDT", .rev = 1, .oem_id = oem_id,
+.oem_table_id = oem_table_id };
+
+acpi_add_table(table_offsets, table_data);
+acpi_table_begin(&table, table_data);
+cedt = init_aml_allocator();
+
+/* reserve space for CEDT header */
+
+object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
+free_aml_allocator();
+
+acpi_table_end(linker, &table);
+}
+
 static Aml *__build_cxl_osc_method(void)
 {
 Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cec7465267..0479bf 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -76,6 +76,8 @@
 #include "hw/acpi/hmat.h"
 #include "hw/acpi/viot.h"
 
+#include "hw/acpi/cxl.h"
+
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and

[PATCH v5 06/43] hw/cxl/device: Implement basic mailbox (8.2.8.4)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
v5: Responses to Alex Bennée review.

  - Fix the invalid write case Alex noted and return early.
  - Drop the RCU_READ_LOCK as it was pointless and I don't think
we need to lock at all until we introduce other write paths
(second mailbox or background commands).
  - Missing static on cel_uuid
  - Documentation of where cel_uuid value comes from (the CXL spec)
  - Drop a check that can't fail and hence get rid of a confusing
LOG_UNIMP.
  - Move some small code rearrangement back to earlier patch.
  - Reorder the mailbox handler code and update the docs, as first
part of removing many of the macros from this code.
  - Upper case remaining defines + drop the define_mailbox_handler_const()
as it is never used.

 hw/cxl/cxl-device-utils.c   | 128 ++-
 hw/cxl/cxl-mailbox-utils.c  | 171 
 hw/cxl/meson.build  |   1 +
 include/hw/cxl/cxl.h|   3 +
 include/hw/cxl/cxl_device.h |  19 +++-
 5 files changed, 320 insertions(+), 2 deletions(-)
 create mode 100644 hw/cxl/cxl-mailbox-utils.c

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 0895b9d78b..39011468ef 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -44,6 +44,114 @@ static uint64_t dev_reg_read(void *opaque, hwaddr offset, 
unsigned size)
 return 0;
 }
 
+static uint64_t mailbox_reg_read(void *opaque, hwaddr offset, unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+switch (size) {
+case 1:
+return cxl_dstate->mbox_reg_state[offset];
+case 2:
+return cxl_dstate->mbox_reg_state16[offset / 2];
+case 4:
+return cxl_dstate->mbox_reg_state32[offset / 4];
+case 8:
+return cxl_dstate->mbox_reg_state64[offset / 8];
+default:
+g_assert_not_reached();
+}
+}
+
+static void mailbox_mem_writel(uint32_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CTRL:
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_CAP:
+/* RO register */
+break;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 32-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+reg_state[offset / 4] = value;
+}
+
+static void mailbox_mem_writeq(uint64_t *reg_state, hwaddr offset,
+   uint64_t value)
+{
+switch (offset) {
+case A_CXL_DEV_MAILBOX_CMD:
+break;
+case A_CXL_DEV_BG_CMD_STS:
+/* BG not supported */
+/* fallthrough */
+case A_CXL_DEV_MAILBOX_STS:
+/* Read only register, will get updated by the state machine */
+return;
+default:
+qemu_log_mask(LOG_UNIMP,
+  "%s Unexpected 64-bit access to 0x%" PRIx64 " (WI)\n",
+  __func__, offset);
+return;
+}
+
+
+reg_state[offset / 8] = value;
+}
+
+static void mailbox_reg_write(void *opaque, hwaddr offset, uint64_t value,
+  unsigned size)
+{
+CXLDeviceState *cxl_dstate = opaque;
+
+if (offset >= A_CXL_DEV_CMD_PAYLOAD) {
+memcpy(cxl_dstate->mbox_reg_state + offset, &value, size);
+return;
+}
+
+/*
+ * Lock is needed to prevent concurrent writes as well as to
+ * prevent writes coming in while the firmware is processing.
+ * Until background commands or the second mailbox are implemented
+ * memory access is synchronized at a higher level (per memory region).
+ */
+
+switch (size) {
+case 4:
+mailbox_mem_writel(cxl_dstate->mbox_reg_state32, offset, value);
+break;
+case 8:
+mailbox_mem_writeq(cxl_dstate->mbox_reg_state64, offset, value);
+break;
+default:
+g_assert_not_reached();
+}
+
+if (ARRAY_FIELD_EX32(cxl_dstate->mbox_reg_state32, CXL_DEV_MAILBOX_CTRL,
+ DOORBELL))
+cxl_process_mailbox(cxl_dstate);
+}
+
+static const MemoryRegionOps mailbox_ops = {
+.read = mailbox_reg_read,
+.write = mailbox_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = false,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+}

[PATCH v5 12/43] hw/pci/cxl: Create a CXL bus type

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/pci-bridge/pci_expander_bridge.c | 9 -
 include/hw/pci/pci_bus.h| 7 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index d4514227a8..a6caa1e7b5 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -24,7 +24,7 @@
 #include "hw/boards.h"
 #include "qom/object.h"
 
-enum BusType { PCI, PCIE };
+enum BusType { PCI, PCIE, CXL };
 
 #define TYPE_PXB_BUS "pxb-bus"
 typedef struct PXBBus PXBBus;
@@ -35,6 +35,10 @@ DECLARE_INSTANCE_CHECKER(PXBBus, PXB_BUS,
 DECLARE_INSTANCE_CHECKER(PXBBus, PXB_PCIE_BUS,
  TYPE_PXB_PCIE_BUS)
 
+#define TYPE_PXB_CXL_BUS "pxb-cxl-bus"
+DECLARE_INSTANCE_CHECKER(PXBBus, PXB_CXL_BUS,
+ TYPE_PXB_CXL_BUS)
+
 struct PXBBus {
 /*< private >*/
 PCIBus parent_obj;
@@ -251,6 +255,9 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 ds = qdev_new(TYPE_PXB_HOST);
 if (type == PCIE) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_PCIE_BUS);
+} else if (type == CXL) {
+bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
+bus->flags |= PCI_BUS_CXL;
 } else {
 bus = pci_root_bus_new(ds, "pxb-internal", NULL, NULL, 0, 
TYPE_PXB_BUS);
 bds = qdev_new("pci-bridge");
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 347440d42c..eb94e7e85c 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -24,6 +24,8 @@ enum PCIBusFlags {
 PCI_BUS_IS_ROOT = 0x0001,
 /* PCIe extended configuration space is accessible on this bus */
 PCI_BUS_EXTENDED_CONFIG_SPACE   = 0x0002,
+/* This is a CXL Type BUS */
+PCI_BUS_CXL = 0x0004,
 };
 
 struct PCIBus {
@@ -53,6 +55,11 @@ struct PCIBus {
 Notifier machine_done;
 };
 
+static inline bool pci_bus_is_cxl(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_CXL);
+}
+
 static inline bool pci_bus_is_root(PCIBus *bus)
 {
 return !!(bus->flags & PCI_BUS_IS_ROOT);
-- 
2.32.0

[PATCH v5 26/43] hw/cxl/device: Plumb real Label Storage Area (LSA) sizing

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c  |  3 +++
 hw/mem/cxl_type3.c  | 24 +---
 include/hw/cxl/cxl_device.h | 29 +
 3 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index d022711b2a..ccf9c3d794 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -278,6 +278,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 } __attribute__((packed)) *id;
 _Static_assert(sizeof(*id) == 0x43, "Bad identify size");
 
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
 uint64_t size = cxl_dstate->pmem_size;
 
 if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
@@ -292,6 +294,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 
 id->total_capacity = size / (256 << 20);
 id->persistent_capacity = size / (256 << 20);
+id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
 return CXL_MBOX_SUCCESS;
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index da091157f2..b16262d3cc 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -13,21 +13,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/cxl/cxl.h"
 
-typedef struct cxl_type3_dev {
-/* Private */
-PCIDevice parent_obj;
-
-/* Properties */
-uint64_t size;
-HostMemoryBackend *hostmem;
-
-/* State */
-CXLComponentState cxl_cstate;
-CXLDeviceState cxl_dstate;
-} CXLType3Dev;
-
-#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
-
 static void build_dvsecs(CXLType3Dev *ct3d)
 {
 CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
@@ -186,10 +171,16 @@ static Property ct3_props[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static uint64_t get_lsa_size(CXLType3Dev *ct3d)
+{
+return 0;
+}
+
 static void ct3_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
+CXLType3Class *cvc = CXL_TYPE3_DEV_CLASS(oc);
 
 pc->realize = ct3_realize;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
@@ -201,11 +192,14 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 dc->desc = "CXL PMEM Device (Type 3)";
 dc->reset = ct3d_reset;
 device_class_set_props(dc, ct3_props);
+
+cvc->get_lsa_size = get_lsa_size;
 }
 
 static const TypeInfo ct3d_info = {
 .name = TYPE_CXL_TYPE3_DEV,
 .parent = TYPE_PCI_DEVICE,
+.class_size = sizeof(struct CXLType3Class),
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
 .instance_finalize = ct3_finalize,
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 8102d2a813..ebb391153a 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -230,4 +230,33 @@ REG64(CXL_MEM_DEV_STS, 0)
 FIELD(CXL_MEM_DEV_STS, MBOX_READY, 4, 1)
 FIELD(CXL_MEM_DEV_STS, RESET_NEEDED, 5, 3)
 
+typedef struct cxl_type3_dev {
+/* Private */
+PCIDevice parent_obj;
+
+/* Properties */
+uint64_t size;
+HostMemoryBackend *hostmem;
+HostMemoryBackend *lsa;
+
+/* State */
+CXLComponentState cxl_cstate;
+CXLDeviceState cxl_dstate;
+} CXLType3Dev;
+
+#ifndef TYPE_CXL_TYPE3_DEV
+#define TYPE_CXL_TYPE3_DEV "cxl-type3"
+#endif
+
+#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (obj), TYPE_CXL_TYPE3_DEV)
+OBJECT_DECLARE_TYPE(CXLType3Device, CXLType3Class, CXL_TYPE3_DEV)
+
+struct CXLType3Class {
+/* Private */
+PCIDeviceClass parent_class;
+
+/* public */
+uint64_t (*get_lsa_size)(CXLType3Dev *ct3d);
+};
+
 #endif
-- 
2.32.0

[PATCH v5 14/43] tests/acpi: allow DSDT.viot table changes.

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

The next patch unifies some of the PCI host bridge DSDT
generation code and results in some minor changes to this file.

Signed-off-by: Jonathan Cameron 
---
v5: No change, but Alex suggested we combine this and next
two patches.  I'd like feedback from the bios tables test maintainer
on this question.

 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..08a8095432 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/q35/DSDT.viot",
-- 
2.32.0

[PATCH v5 13/43] hw/pxb: Allow creation of a CXL PXB (host bridge)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
  -device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan.Cameron 
---
v5: All in response to Alex's review (thanks!)
 - Moved pxb_dev_reset() to cxl realize function instead of doing it
   in the common code called from that function.
 - Fixed pxb_dev_reset() not being called in other paths due to it
   being registered in the wrong class_init. Note it was also broken
   so pulled a reference from the PXB_CXL_DEV to the host bridge
   back from patch 24 as we now need it here.
 
 hw/pci-bridge/pci_expander_bridge.c | 95 -
 hw/pci/pci.c|  7 +++
 include/hw/pci/pci.h|  6 ++
 3 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index a6caa1e7b5..c7a28c7b2e 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -17,6 +17,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/qdev-properties.h"
 #include "hw/pci/pci_bridge.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
@@ -56,6 +57,17 @@ DECLARE_INSTANCE_CHECKER(PXBDev, PXB_DEV,
 DECLARE_INSTANCE_CHECKER(PXBDev, PXB_PCIE_DEV,
  TYPE_PXB_PCIE_DEVICE)
 
+#define TYPE_PXB_CXL_DEVICE "pxb-cxl"
+DECLARE_INSTANCE_CHECKER(PXBDev, PXB_CXL_DEV,
+ TYPE_PXB_CXL_DEVICE)
+
+typedef struct CXLHost {
+PCIHostState parent_obj;
+
+CXLComponentState cxl_cstate;
+} CXLHost;
+
+
 struct PXBDev {
 /*< private >*/
 PCIDevice parent_obj;
@@ -64,10 +76,18 @@ struct PXBDev {
 uint8_t bus_nr;
 uint16_t numa_node;
 bool bypass_iommu;
+struct cxl_dev {
+CXLHost *cxl_host_bridge;
+} cxl;
 };
 
 static PXBDev *convert_to_pxb(PCIDevice *dev)
 {
+/* A CXL PXB's parent bus is PCIe, so the normal check won't work */
+if (object_dynamic_cast(OBJECT(dev), TYPE_PXB_CXL_DEVICE)) {
+return PXB_CXL_DEV(dev);
+}
+
 return pci_bus_is_express(pci_get_bus(dev))
 ? PXB_PCIE_DEV(dev) : PXB_DEV(dev);
 }
@@ -76,6 +96,9 @@ static GList *pxb_dev_list;
 
 #define TYPE_PXB_HOST "pxb-host"
 
+#define TYPE_PXB_CXL_HOST "pxb-cxl-host"
+#define PXB_CXL_HOST(obj) OBJECT_CHECK(CXLHost, (obj), TYPE_PXB_CXL_HOST)
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
@@ -112,11 +135,20 @@ static const TypeInfo pxb_pcie_bus_info = {
 .class_init= pxb_bus_class_init,
 };
 
+static const TypeInfo pxb_cxl_bus_info = {
+.name  = TYPE_PXB_CXL_BUS,
+.parent= TYPE_CXL_BUS,
+.instance_size = sizeof(PXBBus),
+.class_init= pxb_bus_class_init,
+};
+
 static const char *pxb_host_root_bus_path(PCIHostState *host_bridge,
   PCIBus *rootbus)
 {
-PXBBus *bus = pci_bus_is_express(rootbus) ?
-  PXB_PCIE_BUS(rootbus) : PXB_BUS(rootbus);
+PXBBus *bus = pci_bus_is_cxl(rootbus) ?
+  PXB_CXL_BUS(rootbus) :
+  pci_bus_is_express(rootbus) ? PXB_PCIE_BUS(rootbus) :
+PXB_BUS(rootbus);
 
 snprintf(bus->bus_path, 8, ":%02x", pxb_bus_num(rootbus));
 return bus->bus_path;
@@ -218,6 +250,16 @@ static int pxb_map_irq_fn(PCIDevice *pci_dev, int pin)
 return pin - PCI_SLOT(pxb->devfn);
 }
 
+static void pxb_dev_reset(DeviceState *dev)
+{
+CXLHost *cxl = PXB_CXL_DEV(dev)->cxl.cxl_host_bridge;
+CXLComponentState *cxl_cstate = &cxl->cxl_cstate;
+uint32_t *reg_state = cxl_cstate->crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, TARGET_COUNT, 8);
+}
+
 static gint pxb_compare(gconstpointer a, gconstpointer b)
 {
 const PXBDev *pxb_a = a, *pxb_b = b;
@@ -258,6 +300,7 @@ static void pxb_dev_realize_common(PCIDevice *dev, enum 
BusType type,
 } else if (type == CXL) {
 bus = pci_root_bus_new(ds, dev_name, NULL, NULL, 0, TYPE_PXB_CXL_BUS);
 bus->flags |= PCI_BUS_CXL;
+PXB_CXL_

[PATCH v5 08/43] hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
v5: Follow through on upper casing defines in patch 6.

 hw/cxl/cxl-mailbox-utils.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index d497ec50a6..8aa1b1e525 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -38,6 +38,14 @@
  *  a register interface that already deals with it.
  */
 
+enum {
+EVENTS  = 0x01,
+#define GET_RECORDS   0x0
+#define CLEAR_RECORDS   0x1
+#define GET_INTERRUPT_POLICY   0x2
+#define SET_INTERRUPT_POLICY   0x3
+};
+
 /* 8.2.8.4.5.1 Command Return Codes */
 typedef enum {
 CXL_MBOX_SUCCESS = 0x0,
@@ -93,9 +101,26 @@ struct cxl_cmd {
 return CXL_MBOX_SUCCESS;  \
 }
 
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_records, 0x20);
+DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
+DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
+DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
+
 static QemuUUID cel_uuid;
 
-static struct cxl_cmd cxl_cmd_set[256][256] = {};
+#define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_LOG_CHANGE (1 << 4)
+
+static struct cxl_cmd cxl_cmd_set[256][256] = {
+[EVENTS][GET_RECORDS] = { "EVENTS_GET_RECORDS",
+cmd_events_get_records, 1, 0 },
+[EVENTS][CLEAR_RECORDS] = { "EVENTS_CLEAR_RECORDS",
+cmd_events_clear_records, ~0, IMMEDIATE_LOG_CHANGE },
+[EVENTS][GET_INTERRUPT_POLICY] = { "EVENTS_GET_INTERRUPT_POLICY",
+cmd_events_get_interrupt_policy, 0, 0 },
+[EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
+cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+};
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
-- 
2.32.0

[PATCH v5 15/43] acpi/pci: Consolidate host bridge setup

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This cleanup will make it easier to add support for CXL to the mix.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
v5: Make the PCI bus type a typed enum.

 hw/i386/acpi-build.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ce823e8fcb..09940f6e84 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1398,6 +1398,24 @@ static void build_smb0(Aml *table, I2CBus *smbus, int 
devnr, int func)
 aml_append(table, scope);
 }
 
+typedef enum { PCI, PCIE } PCIBusType;
+static void init_pci_acpi(Aml *dev, int uid, PCIBusType type,
+  bool native_pcie_hp)
+{
+if (type == PCI) {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
+aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
+aml_append(dev, aml_name_decl("_UID", aml_int(uid)));
+/* Expander bridges do not have ACPI PCI Hot-plug enabled */
+aml_append(dev, build_q35_osc_method(native_pcie_hp));
+}
+}
+
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker,
AcpiPmInfo *pm, AcpiMiscInfo *misc,
@@ -1429,9 +1447,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 if (misc->is_piix4) {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
-aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
+init_pci_acpi(dev, pcmc->pci_root_uid, PCI, false);
 aml_append(sb_scope, dev);
 aml_append(dsdt, sb_scope);
 
@@ -1447,11 +1463,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 } else {
 sb_scope = aml_scope("_SB");
 dev = aml_device("PCI0");
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
-aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
-aml_append(dev, build_q35_osc_method(!pm->pcihp_bridge_en));
+init_pci_acpi(dev, pcmc->pci_root_uid, PCIE, !pm->pcihp_bridge_en);
 aml_append(sb_scope, dev);
 if (mcfg_valid) {
 aml_append(sb_scope, build_q35_dram_controller(&mcfg));
@@ -1562,17 +1574,10 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 scope = aml_scope("\\_SB");
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
-if (pci_bus_is_express(bus)) {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
 
-/* Expander bridges do not have ACPI PCI Hot-plug enabled */
-aml_append(dev, build_q35_osc_method(true));
-} else {
-aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
-}
+init_pci_acpi(dev, bus_num,
+  pci_bus_is_express(bus) ? PCIE : PCI, true);
 
 if (numa_node != NUMA_NODE_UNASSIGNED) {
 aml_append(dev, aml_name_decl("_PXM", aml_int(numa_node)));
-- 
2.32.0

[PATCH v5 25/43] hw/cxl/device: Add some trivial commands

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
v5: Follow through on rework of how mailbox handlers are done.

 hw/cxl/cxl-mailbox-utils.c | 69 +-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 808faec114..d022711b2a 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -44,6 +44,8 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+FIRMWARE_UPDATE = 0x02,
+#define GET_INFO  0x0
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
@@ -52,6 +54,8 @@ enum {
 #define GET_LOG   0x1
 IDENTIFY= 0x40,
 #define MEMORY_DEVICE 0x0
+CCLS= 0x41,
+#define GET_PARTITION_INFO 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -114,6 +118,39 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.2.1 */
+static ret_code cmd_firmware_update_get_info(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint8_t slots_supported;
+uint8_t slot_info;
+uint8_t caps;
+uint8_t rsvd[0xd];
+char fw_rev1[0x10];
+char fw_rev2[0x10];
+char fw_rev3[0x10];
+char fw_rev4[0x10];
+} __attribute__((packed)) *fw_info;
+_Static_assert(sizeof(*fw_info) == 0x50, "Bad firmware info size");
+
+if (cxl_dstate->pmem_size < (256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+fw_info = (void *)cmd->payload;
+memset(fw_info, 0, sizeof(*fw_info));
+
+fw_info->slots_supported = 2;
+fw_info->slot_info = BIT(0) | BIT(3);
+fw_info->caps = 0;
+snprintf(fw_info->fw_rev1, 0x10, "BWFW VERSION %02d", 0);
+
+*len = sizeof(*fw_info);
+return CXL_MBOX_SUCCESS;
+}
+
 /* 8.2.9.3.1 */
 static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
   CXLDeviceState *cxl_dstate,
@@ -260,6 +297,33 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint64_t active_vmem;
+uint64_t active_pmem;
+uint64_t next_vmem;
+uint64_t next_pmem;
+} __attribute__((packed)) *part_info = (void *)cmd->payload;
+_Static_assert(sizeof(*part_info) == 0x20, "Bad get partition info size");
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+/* PMEM only */
+part_info->active_vmem = 0;
+part_info->next_vmem = 0;
+part_info->active_pmem = size / (256 << 20);
+part_info->next_pmem = part_info->active_pmem;
+
+*len = sizeof(*part_info);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -273,15 +337,18 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[FIRMWARE_UPDATE][GET_INFO] = { "FIRMWARE_UPDATE_GET_INFO",
+cmd_firmware_update_get_info, 0, 0 },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 [IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
 cmd_identify_memory_device, 0, 0 },
+[CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
+cmd_ccls_get_partition_info, 0, 0 },
 };
 
-
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
-- 
2.32.0

[PATCH v5 09/43] hw/cxl/device: Timestamp implementation (8.2.9.3)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Errata F4 to CXL 2.0 clarified the meaning of the timer as the
sum of the value set with the timestamp set command and the number
of nano seconds since it was last set.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 v5: Reponses to Alex's review.
 - Change to using the qemu_clock_get_ns()
 - Follow through of new approach to mailbox handlers from patch 5.

 hw/cxl/cxl-mailbox-utils.c  | 44 +
 include/hw/cxl/cxl_device.h |  6 +
 2 files changed, 50 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 8aa1b1e525..258285ab03 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -44,6 +44,9 @@ enum {
 #define CLEAR_RECORDS   0x1
 #define GET_INTERRUPT_POLICY   0x2
 #define SET_INTERRUPT_POLICY   0x3
+TIMESTAMP   = 0x03,
+#define GET   0x0
+#define SET   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -106,9 +109,48 @@ DEFINE_MAILBOX_HANDLER_NOP(events_clear_records);
 DEFINE_MAILBOX_HANDLER_ZEROED(events_get_interrupt_policy, 4);
 DEFINE_MAILBOX_HANDLER_NOP(events_set_interrupt_policy);
 
+/* 8.2.9.3.1 */
+static ret_code cmd_timestamp_get(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+uint64_t time, delta;
+
+if (!cxl_dstate->timestamp.set) {
+*(uint64_t *)cmd->payload = 0;
+goto done;
+}
+
+/* First find the delta from the last time the host set the time. */
+time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+delta = time - cxl_dstate->timestamp.last_set;
+
+/* Then adjust the actual time */
+stq_le_p(cmd->payload, cxl_dstate->timestamp.host_set + delta);
+
+done:
+*len = 8;
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.3.2 */
+static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
+  CXLDeviceState *cxl_dstate,
+  uint16_t *len)
+{
+cxl_dstate->timestamp.set = true;
+cxl_dstate->timestamp.last_set = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+
+cxl_dstate->timestamp.host_set = le64_to_cpu(*(uint64_t *)cmd->payload);
+
+*len = 0;
+return CXL_MBOX_SUCCESS;
+}
+
 static QemuUUID cel_uuid;
 
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
 static struct cxl_cmd cxl_cmd_set[256][256] = {
@@ -120,6 +162,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_get_interrupt_policy, 0, 0 },
 [EVENTS][SET_INTERRUPT_POLICY] = { "EVENTS_SET_INTERRUPT_POLICY",
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
+[TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
+[TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 7fd8d0f616..8102d2a813 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -117,6 +117,12 @@ typedef struct cxl_device_state {
 size_t cel_size;
 };
 
+struct {
+bool set;
+uint64_t last_set;
+uint64_t host_set;
+} timestamp;
+
 /* memory region for persistent memory, HDM */
 uint64_t pmem_size;
 } CXLDeviceState;
-- 
2.32.0

Re: [PATCH 13/20] tcg/i386: Support avx512vbmi2 vector shift-double instructions

2022-02-02 Thread Alex Bennée



Richard Henderson  writes:

> We will use VPSHLD, VPSHLDV and VPSHRDV for 16-bit rotates.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

[PATCH v5 16/43] tests/acpi: Add update DSDT.viot

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

The consolidation of DSDT AML generation for PCI host bridges
lead to some minor ordering changes and the addition of _ADR
with a default of 0 for those case that didn't already have it.
Only DSDT.viot test is affected.

Changes all similar to:

Scope (\_SB)
 {
   Device (PC30)
   {
-Name (_UID, 0x30)  // _UID: Unique ID
 Name (_BBN, 0x30)  // _BBN: BIOS Bus Number
 Name (_HID, EisaId ("PNP0A08") /* PCI Express Bus */)  // _HID: 
Hardware ID
 Name (_CID, EisaId ("PNP0A03") /* PCI Bus */)  // _CID: Compatible ID
+Name (_ADR, Zero)  // _ADR: Address
+Name (_UID, 0x30)  // _UID: Unique ID
 Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/DSDT.viot   | Bin 9398 -> 9416 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/q35/DSDT.viot b/tests/data/acpi/q35/DSDT.viot
index 
1c3b4da5cbe81ecab5e1ef50d383b561c5e0f55f..207ac5b9ae4c3a4bc0094c2242d1a1b08771b784
 100644
GIT binary patch
delta 139
zcmdnydBT&+CDWlVjy%CeC%7
z+^Kj^(SX5#0jQdxl0g7Ptr1kM!sPw((lEse3<_8k8$uNeOjb|?Dc;

[PATCH v5 10/43] hw/cxl/device: Add log commands (8.2.9.4) + CEL

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
v5: Results of Alex's review.
 - Follow through on v5 removal of mailbox handler related macros.
   It was this patch where Alex highlighted the need to make that
   change.
   
 hw/cxl/cxl-mailbox-utils.c | 69 ++
 1 file changed, 69 insertions(+)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 258285ab03..16bb998735 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -47,6 +47,9 @@ enum {
 TIMESTAMP   = 0x03,
 #define GET   0x0
 #define SET   0x1
+LOGS= 0x04,
+#define GET_SUPPORTED 0x0
+#define GET_LOG   0x1
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -149,6 +152,70 @@ static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
 
 static QemuUUID cel_uuid;
 
+/* 8.2.9.4.1 */
+static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+uint16_t entries;
+uint8_t rsvd[6];
+struct {
+QemuUUID uuid;
+uint32_t size;
+} log_entries[1];
+} __attribute__((packed)) *supported_logs = (void *)cmd->payload;
+_Static_assert(sizeof(*supported_logs) == 0x1c, "Bad supported log size");
+
+supported_logs->entries = 1;
+supported_logs->log_entries[0].uuid = cel_uuid;
+supported_logs->log_entries[0].size = 4 * cxl_dstate->cel_size;
+
+*len = sizeof(*supported_logs);
+return CXL_MBOX_SUCCESS;
+}
+
+/* 8.2.9.4.2 */
+static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+QemuUUID uuid;
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(16))) *get_log = (void *)cmd->payload;
+
+/*
+ * 8.2.9.4.2
+ *   The device shall return Invalid Parameter if the Offset or Length
+ *   fields attempt to access beyond the size of the log as reported by Get
+ *   Supported Logs.
+ *
+ * XXX: Spec is wrong, "Invalid Parameter" isn't a thing.
+ * XXX: Spec doesn't address incorrect UUID incorrectness.
+ *
+ * The CEL buffer is large enough to fit all commands in the emulation, so
+ * the only possible failure would be if the mailbox itself isn't big
+ * enough.
+ */
+if (get_log->offset + get_log->length > cxl_dstate->payload_size) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+if (!qemu_uuid_is_equal(&get_log->uuid, &cel_uuid)) {
+return CXL_MBOX_UNSUPPORTED;
+}
+
+/* Store off everything to local variables so we can wipe out the payload 
*/
+*len = get_log->length;
+
+memmove(cmd->payload, cxl_dstate->cel_log + get_log->offset,
+   get_log->length);
+
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -164,6 +231,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_events_set_interrupt_policy, 4, IMMEDIATE_CONFIG_CHANGE },
 [TIMESTAMP][GET] = { "TIMESTAMP_GET", cmd_timestamp_get, 0, 0 },
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
+[LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
+[LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
-- 
2.32.0

[PATCH v5 28/43] hw/cxl/component: Add utils for interleave parameter encoding/decoding

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Both registers and the CFMWS entries in CDAT use simple encodings
for the number of interleave ways and the interleave granularity.
Introduce simple conversion functions to/from the unencoded
number / size.  So far the iw decode has not been needed so is
it not implemented.

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-component-utils.c   | 34 ++
 include/hw/cxl/cxl_component.h |  8 
 2 files changed, 42 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 07297b3bbe..795dbc7561 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 #include "hw/pci/pci.h"
 #include "hw/cxl/cxl.h"
 
@@ -217,3 +218,36 @@ void cxl_component_create_dvsec(CXLComponentState *cxl, 
uint16_t length,
 range_init_nofail(&cxl->dvsecs[type], cxl->dvsec_offset, length);
 cxl->dvsec_offset += length;
 }
+
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp)
+{
+switch (iw) {
+case 1: return 0x0;
+case 2: return 0x1;
+case 4: return 0x2;
+case 8: return 0x3;
+case 16: return 0x4;
+case 3: return 0x8;
+case 6: return 0x9;
+case 12: return 0xa;
+default:
+error_setg(errp, "Interleave ways: %d not supported", iw);
+return 0;
+}
+}
+
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp)
+{
+switch (gran) {
+case 256: return 0;
+case 512: return 1;
+case 1024: return 2;
+case 2048: return 3;
+case 4096: return 4;
+case 8192: return 5;
+case 16384: return 6;
+default:
+error_setg(errp, "Interleave granularity: %" PRIu64 " invalid", gran);
+return 0;
+}
+}
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 33aeab9b99..42cd140f75 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -193,4 +193,12 @@ void cxl_component_register_init_common(uint32_t 
*reg_state,
 void cxl_component_create_dvsec(CXLComponentState *cxl_cstate, uint16_t length,
 uint16_t type, uint8_t rev, uint8_t *body);
 
+uint8_t cxl_interleave_ways_enc(int iw, Error **errp);
+uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error **errp);
+
+static inline hwaddr cxl_decode_ig(int ig)
+{
+return 1 << (ig + 8);
+}
+
 #endif
-- 
2.32.0

[PATCH v5 37/43] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl

2022-02-02 Thread Jonathan Cameron via

Code based on i386/pc enablement.
The memory layout places space for 16 host bridge register regions after
the GIC_REDIST2 in the extended memmap.
The CFMWs are placed above the extended memmap.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
---
 hw/arm/virt-acpi-build.c | 30 ++
 hw/arm/virt.c| 40 +++-
 include/hw/arm/virt.h|  1 +
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 449fab0080..865709156a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -39,6 +39,7 @@
 #include "hw/acpi/aml-build.h"
 #include "hw/acpi/utils.h"
 #include "hw/acpi/pci.h"
+#include "hw/acpi/cxl.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/acpi/tpm.h"
@@ -157,10 +158,29 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 }
 
+/* Uses local definition of AcpiBuildState so can't easily be common code */
+static void build_acpi0017(Aml *table)
+{
+Aml *dev, *scope, *method;
+
+scope =  aml_scope("_SB");
+dev = aml_device("CXLM");
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0017")));
+
+method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(0x01)));
+aml_append(dev, method);
+
+aml_append(scope, dev);
+aml_append(table, scope);
+}
+
 static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
   uint32_t irq, VirtMachineState *vms)
 {
 int ecam_id = VIRT_ECAM_ID(vms->highmem_ecam);
+bool cxl_present = false;
+PCIBus *bus = vms->bus;
 struct GPEXConfig cfg = {
 .mmio32 = memmap[VIRT_PCIE_MMIO],
 .pio= memmap[VIRT_PCIE_PIO],
@@ -174,6 +194,14 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
MemMapEntry *memmap,
 }
 
 acpi_dsdt_add_gpex(scope, &cfg);
+QLIST_FOREACH(bus, &vms->bus->child, sibling) {
+if (pci_bus_is_cxl(bus)) {
+cxl_present = true;
+}
+}
+if (cxl_present) {
+build_acpi0017(scope);
+}
 }
 
 static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
@@ -991,6 +1019,8 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
vms->oem_table_id);
 }
 }
+cxl_build_cedt(ms, table_offsets, tables_blob, tables->linker,
+   vms->oem_id, vms->oem_table_id);
 
 if (ms->nvdimms_state->is_enabled) {
 nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 2b6cc7aa9e..b59e470ae4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -78,6 +78,7 @@
 #include "hw/virtio/virtio-mem-pci.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
+#include "hw/cxl/cxl.h"
 #include "qemu/guest-random.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
@@ -178,6 +179,7 @@ static const MemMapEntry base_memmap[] = {
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
 [VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
+[VIRT_CXL_HOST] =   { 0x0, 64 * KiB * 16 }, /* 16 UID */
 [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
 /* Second PCIe window */
 [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },
@@ -1508,6 +1510,17 @@ static void create_pcie(VirtMachineState *vms)
 }
 }
 
+static void create_cxl_host_reg_region(VirtMachineState *vms)
+{
+MemoryRegion *sysmem = get_system_memory();
+MachineState *ms = MACHINE(vms);
+MemoryRegion *mr = &ms->cxl_devices_state->host_mr;
+
+memory_region_init(mr, OBJECT(ms), "cxl_host_reg",
+   vms->memmap[VIRT_CXL_HOST].size);
+memory_region_add_subregion(sysmem, vms->memmap[VIRT_CXL_HOST].base, mr);
+}
+
 static void create_platform_bus(VirtMachineState *vms)
 {
 DeviceState *dev;
@@ -1670,7 +1683,7 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
-hwaddr base, device_memory_base, device_memory_size, memtop;
+hwaddr base, device_memory_base, device_memory_size, memtop, cxl_fmw_base;
 int i;
 
 vms->memmap = extended_memmap;
@@ -1762,6 +1775,20 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 memory_region_init(&ms->device_memory->mr, OBJECT(vms),
"device-memory", device_memory_size);
 }
+
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+
+cxl_fmw_base = ROUND_UP(base, 256 * MiB);
+for (it = ms->cxl_devices_state->fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+
+fw->base = cxl_fmw_base;
+memory_region_init_io(&fw->mr, OBJECT(vms), &cfmws_ops, fw,
+

[PATCH v5 29/43] hw/cxl/host: Add support for CXL Fixed Memory Windows.

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

The concept of these is introduced in [1] in terms of the
description the CEDT ACPI table. The principal is more general.
Unlike once traffic hits the CXL root bridges, the host system
memory address routing is implementation defined and effectively
static once observable by standard / generic system software.
Each CXL Fixed Memory Windows (CFMW) is a region of PA space
which has fixed system dependent routing configured so that
accesses can be routed to the CXL devices below a set of target
root bridges. The accesses may be interleaved across multiple
root bridges.

For QEMU we could have fully specified these regions in terms
of a base PA + size, but as the absolute address does not matter
it is simpler to let individual platforms place the memory regions.

ExampleS:
-cxl-fixed-memory-window targets=cxl.0,size=128G
-cxl-fixed-memory-window targets=cxl.1,size=128G
-cxl-fixed-memory-window 
targets=cxl0,targets=cxl.1,size=256G,interleave-granularity=2k

Specifies
* 2x 128G regions not interleaved across root bridges, one for each of
  the root bridges with ids cxl.0 and cxl.1
* 256G region interleaved across root bridges with ids cxl.0 and cxl.1
with a 2k interleave granularity.

When system software enumerates the devices below a given root bridge
it can then decide which CFMW to use. If non interleave is desired
(or possible) it can use the appropriate CFMW for the root bridge in
question.  If there are suitable devices to interleave across the
two root bridges then it may use the 3rd CFMS.

A number of other designs were considered but the following constraints
made it hard to adapt existing QEMU approaches to this particular problem.
1) The size must be known before a specific architecture / board brings
   up it's PA memory map.  We need to set up an appropriate region.
2) Using links to the host bridges provides a clean command line interface
   but these links cannot be established until command line devices have
   been added.

Hence the two step process used here of first establishing the size,
interleave-ways and granularity + caching the ids of the host bridges
and then, once available finding the actual host bridges so they can
be used later to support interleave decoding.

[1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / specifications)

Signed-off-by: Jonathan Cameron 
---
v5:
Build fix as suggested by Alex to move this from specific_ss to softmmu_ss.

 hw/cxl/cxl-host-stubs.c |  22 +++
 hw/cxl/cxl-host.c   | 138 
 hw/cxl/meson.build  |   6 ++
 include/hw/cxl/cxl.h|  20 ++
 qapi/machine.json   |  15 +
 qemu-options.hx |  37 +++
 softmmu/vl.c|  11 
 7 files changed, 249 insertions(+)
 create mode 100644 hw/cxl/cxl-host-stubs.c
 create mode 100644 hw/cxl/cxl-host.c

diff --git a/hw/cxl/cxl-host-stubs.c b/hw/cxl/cxl-host-stubs.c
new file mode 100644
index 00..f942dda41b
--- /dev/null
+++ b/hw/cxl/cxl-host-stubs.c
@@ -0,0 +1,22 @@
+/*
+ * CXL host parameter parsing routine stubs
+ *
+ * Copyright (c) 2022 Huawei
+ */
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/option.h"
+#include "hw/cxl/cxl.h"
+
+QemuOptsList qemu_cxl_fixed_window_opts = {
+.name = "cxl-fixed-memory-window",
+.implied_opt_name = "type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_cxl_fixed_window_opts.head),
+.desc = { { 0 } }
+};
+
+void parse_cxl_fixed_memory_window_opts(MachineState *ms) {};
+
+void cxl_fixed_memory_window_link_targets(Error **errp) {};
+
+const MemoryRegionOps cfmws_ops;
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
new file mode 100644
index 00..9f303e6d8e
--- /dev/null
+++ b/hw/cxl/cxl-host.c
@@ -0,0 +1,138 @@
+/*
+ * CXL host parameter parsing routines
+ *
+ * Copyright (c) 2022 Huawei
+ * Modeled loosely on the NUMA options handling in hw/core/numa.c
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/bitmap.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "sysemu/qtest.h"
+#include "hw/boards.h"
+
+#include "qapi/opts-visitor.h"
+#include "qapi/qapi-visit-machine.h"
+#include "qemu/option.h"
+#include "hw/cxl/cxl.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pci_host.h"
+#include "hw/pci/pcie_port.h"
+
+QemuOptsList qemu_cxl_fixed_window_opts = {
+.name = "cxl-fixed-memory-window",
+.implied_opt_name = "type",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_cxl_fixed_window_opts.head),
+.desc = { { 0 } }
+};
+
+static void set_cxl_fixed_memory_window_options(MachineState *ms,
+CXLFixedMemoryWindowOptions 
*object,
+Error **errp)
+{
+CXLFixedWindow *fw = g_malloc0(sizeof(*fw));
+strList *target;
+int i;
+
+for (target = object->targets; target; target = target->next) {
+fw->num_targets++;
+}
+
+fw->enc_int_

[PATCH v5 38/43] RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Inorder to implement memory interleaving we need a means to proxy
the calls. Adding mem_ops allows such proxying.

Note should have no impact on use cases not using _dispatch_read/write.
For now, only file backed hostmem is considered to seek feedback on
the approach before considering other hostmem backends.

Signed-off-by: Jonathan Cameron 
---
 softmmu/memory.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 678dc62f06..d537091c63 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1606,6 +1606,15 @@ void memory_region_init_ram_from_file(MemoryRegion *mr,
 Error *err = NULL;
 memory_region_init(mr, owner, name, size);
 mr->ram = true;
+
+/*
+ * ops used only when directly accessing via
+ * - memory_region_dispatch_read()
+ * - memory_region_dispatch_write()
+ */
+mr->ops = &ram_device_mem_ops;
+mr->opaque = mr;
+
 mr->readonly = readonly;
 mr->terminates = true;
 mr->destructor = memory_region_destructor_ram;
-- 
2.32.0

[PATCH v5 21/43] hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation.
Whilst the implementation currently does no validity checks on the
encoder set up, future work will add sanity checking specific to
the type of cxl component.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index c4021d2434..da091157f2 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -61,6 +61,56 @@ static void build_dvsecs(CXLType3Dev *ct3d)
REG_LOC_DVSEC_REVID, dvsec);
 }
 
+static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
+{
+ComponentRegisters *cregs = &ct3d->cxl_cstate.crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+
+assert(which == 0);
+
+/* TODO: Sanity checks that the decoder is possible */
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+
+static void ct3d_reg_write(void *opaque, hwaddr offset, uint64_t value,
+   unsigned size)
+{
+CXLComponentState *cxl_cstate = opaque;
+ComponentRegisters *cregs = &cxl_cstate->crb;
+CXLType3Dev *ct3d = container_of(cxl_cstate, CXLType3Dev, cxl_cstate);
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+int which_hdm = -1;
+
+assert(size == 4);
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+which_hdm = 0;
+break;
+default:
+break;
+}
+
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+hdm_decoder_commit(ct3d, which_hdm);
+}
+}
+
+static void ct3_finalize(Object *obj)
+{
+CXLType3Dev *ct3d = CT3(obj);
+CXLComponentState *cxl_cstate = &ct3d->cxl_cstate;
+ComponentRegisters *regs = &cxl_cstate->crb;
+
+g_free((void *)regs->special_ops);
+}
+
 static void cxl_setup_memory(CXLType3Dev *ct3d, Error **errp)
 {
 MemoryRegion *mr;
@@ -103,6 +153,9 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 ct3d->cxl_cstate.pdev = pci_dev;
 build_dvsecs(ct3d);
 
+regs->special_ops = g_new0(MemoryRegionOps, 1);
+regs->special_ops->write = ct3d_reg_write;
+
 cxl_component_register_block_init(OBJECT(pci_dev), cxl_cstate,
   TYPE_CXL_TYPE3_DEV);
 
@@ -155,6 +208,7 @@ static const TypeInfo ct3d_info = {
 .parent = TYPE_PCI_DEVICE,
 .class_init = ct3_class_init,
 .instance_size = sizeof(CXLType3Dev),
+.instance_finalize = ct3_finalize,
 .interfaces = (InterfaceInfo[]) {
 { INTERFACE_CXL_DEVICE },
 { INTERFACE_PCIE_DEVICE },
-- 
2.32.0

[PATCH v5 17/43] cxl: Machine level control on whether CXL support is enabled

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.

Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
v5: From Alex review.
* Set default to false in machine_class_init to avoid
  having to do it in all the boards.

 hw/core/machine.c| 28 
 hw/i386/pc.c |  1 +
 include/hw/boards.h  |  2 ++
 include/hw/cxl/cxl.h |  4 
 4 files changed, 35 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index d856485cb4..6ff5dba64e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 #include "sysemu/qtest.h"
 #include "hw/pci/pci.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/cxl/cxl.h"
 #include "migration/global_state.h"
 #include "migration/vmstate.h"
 #include "exec/confidential-guest-support.h"
@@ -545,6 +546,20 @@ static void machine_set_nvdimm_persistence(Object *obj, 
const char *value,
 nvdimms_state->persistence_string = g_strdup(value);
 }
 
+static bool machine_get_cxl(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->cxl_devices_state->is_enabled;
+}
+
+static void machine_set_cxl(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->cxl_devices_state->is_enabled = value;
+}
+
 void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char *type)
 {
 QAPI_LIST_PREPEND(mc->allowed_dynamic_sysbus_devices, g_strdup(type));
@@ -777,6 +792,8 @@ static void machine_class_init(ObjectClass *oc, void *data)
 mc->default_ram_size = 128 * MiB;
 mc->rom_file_has_mr = true;
 
+/* Few machines support CXL, so default to off */
+mc->cxl_supported = false;
 /* numa node memory size aligned on 8MB by default.
  * On Linux, each node's border has to be 8MB aligned
  */
@@ -922,6 +939,16 @@ static void machine_initfn(Object *obj)
 "Valid values are cpu, mem-ctrl");
 }
 
+if (mc->cxl_supported) {
+Object *obj = OBJECT(ms);
+
+ms->cxl_devices_state = g_new0(CXLState, 1);
+object_property_add_bool(obj, "cxl", machine_get_cxl, machine_set_cxl);
+object_property_set_description(obj, "cxl",
+"Set on/off to enable/disable "
+"CXL instantiation");
+}
+
 if (mc->cpu_index_to_instance_props && mc->get_default_cpu_node_id) {
 ms->numa_state = g_new0(NumaState, 1);
 object_property_add_bool(obj, "hmat",
@@ -956,6 +983,7 @@ static void machine_finalize(Object *obj)
 g_free(ms->device_memory);
 g_free(ms->nvdimms_state);
 g_free(ms->numa_state);
+g_free(ms->cxl_devices_state);
 }
 
 bool machine_usb(MachineState *machine)
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c8696ac01e..b6800a511a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1739,6 +1739,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->default_cpu_type = TARGET_DEFAULT_CPU_TYPE;
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
+mc->cxl_supported = true;
 mc->default_ram_id = "pc.ram";
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
diff --git a/include/hw/boards.h b/include/hw/boards.h
index c92ac8815c..680718dafc 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -269,6 +269,7 @@ struct MachineClass {
 bool ignore_boot_device_suffixes;
 bool smbus_no_migration_support;
 bool nvdimm_supported;
+bool cxl_supported;
 bool numa_mem_supported;
 bool auto_enable_numa;
 SMPCompatProps smp_props;
@@ -360,6 +361,7 @@ struct MachineState {
 CPUArchIdList *possible_cpus;
 CpuTopology smp;
 struct NVDIMMState *nvdimms_state;
+struct CXLState *cxl_devices_state;
 struct NumaState *numa_state;
 };
 
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 554ad93b6b..31af92fd5e 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -17,4 +17,8 @@
 #define CXL_COMPONENT_REG_BAR_IDX 0
 #define CXL_DEVICE_REG_BAR_IDX 2
 
+typedef struct CXLState {
+bool is_enabled;
+} CXLState;
+
 #endif
-- 
2.32.0

[PATCH v5 30/43] acpi/cxl: Introduce CFMWS structures in CEDT

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

The CEDT CXL Fixed Window Memory Window Structures (CFMWs)
define regions of the host phyiscal address map which
(via an impdef means) are configured such that they have
a particular interleave setup across one or more CXL Host Bridges.

Reported-by: Alison Schofield 
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/acpi/cxl.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index 442f836a3e..50efc7f690 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -60,6 +60,64 @@ static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
 build_append_int_noprefix(table_data, memory_region_size(mr), 8);
 }
 
+/*
+ * CFMWS entries in CXL 2.0 ECN: CEDT CFMWS & QTG _DSM.
+ * Interleave ways encoding in CXL 2.0 ECN: 3, 6, 12 and 16-way memory
+ * interleaving.
+ */
+static void cedt_build_cfmws(GArray *table_data, MachineState *ms)
+{
+CXLState *cxls = ms->cxl_devices_state;
+GList *it;
+
+for (it = cxls->fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+int i;
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 36 + 4 * fw->num_targets, 2);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base HPA */
+build_append_int_noprefix(table_data, fw->mr.addr, 8);
+
+/* Window Size */
+build_append_int_noprefix(table_data, fw->size, 8);
+
+/* Host Bridge Interleave Ways */
+build_append_int_noprefix(table_data, fw->enc_int_ways, 1);
+
+/* Host Bridge Interleave Arithmetic */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge Interleave Granularity */
+build_append_int_noprefix(table_data, fw->enc_int_gran, 4);
+
+/* Window Restrictions */
+build_append_int_noprefix(table_data, 0x0f, 2); /* No restrictions */
+
+/* QTG ID */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge List (list of UIDs - currently bus_nr) */
+for (i = 0; i < fw->num_targets; i++) {
+g_assert(fw->target_hbs[i]);
+build_append_int_noprefix(table_data, fw->target_hbs[i]->bus_nr, 
4);
+}
+}
+}
+
 static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
 {
 Aml *cedt = opaque;
@@ -86,6 +144,7 @@ void cxl_build_cedt(MachineState *ms, GArray *table_offsets, 
GArray *table_data,
 /* reserve space for CEDT header */
 
 object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+cedt_build_cfmws(cedt->buf, ms);
 
 /* copy AML table into ACPI tables blob and patch header there */
 g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
-- 
2.32.0

[PATCH v5 22/43] acpi/cxl: Add _OSC implementation (9.14.2)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
v5: Fix for issue seen on in patch 31.
 - Introduce stubs as the gpex pxb code is compiled on mips machines.
 
 hw/acpi/Kconfig   |   5 ++
 hw/acpi/cxl-stub.c|  12 +
 hw/acpi/cxl.c | 104 ++
 hw/acpi/meson.build   |   4 +-
 hw/i386/acpi-build.c  |  14 +-
 include/hw/acpi/cxl.h |  23 ++
 6 files changed, 160 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/cxl-stub.c
 create mode 100644 hw/acpi/cxl.c
 create mode 100644 include/hw/acpi/cxl.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 622b0b50b7..76cafca652 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -5,6 +5,7 @@ config ACPI_X86
 bool
 select ACPI
 select ACPI_NVDIMM
+select ACPI_CXL
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
 select ACPI_HMAT
@@ -60,3 +61,7 @@ config ACPI_HW_REDUCED
 select ACPI
 select ACPI_MEMORY_HOTPLUG
 select ACPI_NVDIMM
+
+config ACPI_CXL
+bool
+depends on ACPI
diff --git a/hw/acpi/cxl-stub.c b/hw/acpi/cxl-stub.c
new file mode 100644
index 00..15bc21076b
--- /dev/null
+++ b/hw/acpi/cxl-stub.c
@@ -0,0 +1,12 @@
+
+/*
+ * Stubs for ACPI platforms that don't support CXl
+ */
+#include "qemu/osdep.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/cxl.h"
+
+void build_cxl_osc_method(Aml *dev)
+{
+g_assert_not_reached();
+}
diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
new file mode 100644
index 00..7124d5a1a3
--- /dev/null
+++ b/hw/acpi/cxl.c
@@ -0,0 +1,104 @@
+/*
+ * CXL ACPI Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cxl/cxl.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "hw/acpi/cxl.h"
+#include "qapi/error.h"
+#include "qemu/uuid.h"
+
+static Aml *__build_cxl_osc_method(void)
+{
+Aml *method, *if_uuid, *else_uuid, *if_arg1_not_1, *if_cxl, 
*if_caps_masked;
+Aml *a_ctrl = aml_local(0);
+Aml *a_cdw1 = aml_name("CDW1");
+
+method = aml_method("_OSC", 4, AML_NOTSERIALIZED);
+aml_append(method, aml_create_dword_field(aml_arg(3), aml_int(0), "CDW1"));
+
+/* 9.14.2.1.4 */
+if_uuid = aml_if(
+aml_lor(aml_equal(aml_arg(0),
+  aml_touuid("33DB4D5B-1FF7-401C-9657-7441C03DD766")),
+aml_equal(aml_arg(0),
+  
aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC";
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(4), 
"CDW2"));
+aml_append(if_uuid, aml_create_dword_field(aml_arg(3), aml_int(8), 
"CDW3"));
+
+aml_append(if_uuid, aml_store(aml_name("CDW3"), a_ctrl));
+
+/* This is all the same as what's used for PCIe */
+aml_append(if_uuid,
+   aml_and(aml_name("CTRL"), aml_int(0x1F), aml_name("CTRL")));
+
+if_arg1_not_1 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(0x1;
+/* Unknown revision */
+aml_append(if_arg1_not_1, aml_or(a_cdw1, aml_int(0x08), a_cdw1));
+aml_append(if_uuid, if_arg1_not_1);
+
+if_caps_masked = aml_if(aml_lnot(aml_equal(aml_name("CDW3"), a_ctrl)));
+/* Capability bits were masked */
+aml_append(if_caps_masked, aml_or(a_cdw1, aml_int(0x10), a_cdw1));
+aml_append(if_uuid, if_caps_masked);
+
+aml_append(if_uuid, aml_store(aml_name("CDW2"), aml_name("SUPP")));
+aml_append(if_uuid, aml_store(aml_name("CDW3"), aml_name("CTRL")));
+
+if_cxl = aml_if(aml_equal(
+aml_arg(0), aml_touuid("68F2D50B-C469-4D8A-BD3D-941A103FD3FC")));
+/* CXL support field */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(12), 
"CDW4"));
+/* CXL capabilities */
+aml_append(if_cxl, aml_create_dword_field(aml_arg(3), aml_int(16), 
"CDW5"));
+aml_append(if_cxl, aml_store(aml_name("CDW4"), aml_name("SUPC")));
+aml_append(if_cxl, aml_store(aml_name("CDW5"), aml_name("CTRC")));
+
+/* CXL 2.0 Port/Device Register access */
+aml_append(if_cxl,
+   aml_or(aml_name("CDW5"),

[PATCH v5 43/43] scripts/device-crash-test: Add exception for pxb-cxl

2022-02-02 Thread Jonathan Cameron via

The CXL expander bridge has several requirements but the one that
is checked first is that it is attached to a PCI Express bus,
not a PCI one so document that.

Signed-off-by: Jonathan Cameron 
---
v5:
 New patch - should probably be pushed down to introduction of pxb-cxl.
 Will do that in v6
 
 scripts/device-crash-test | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/device-crash-test b/scripts/device-crash-test
index 7fbd99158b..52bd3d8f71 100755
--- a/scripts/device-crash-test
+++ b/scripts/device-crash-test
@@ -93,6 +93,7 @@ ERROR_RULE_LIST = [
 {'device':'pci-bridge', 'expected':True},  # Bridge chassis 
not specified. Each bridge is required to be assigned a unique chassis id > 0.
 {'device':'pci-bridge-seat', 'expected':True}, # Bridge chassis 
not specified. Each bridge is required to be assigned a unique chassis id > 0.
 {'device':'pxb', 'expected':True}, # Bridge chassis 
not specified. Each bridge is required to be assigned a unique chassis id > 0.
+{'device':'pxb-cxl', 'expected':True}, # pxb-cxl devices 
cannot reside on a PCI bus.
 {'device':'scsi-block', 'expected':True},  # drive property 
not set
 {'device':'scsi-generic', 'expected':True},# drive property 
not set
 {'device':'scsi-hd', 'expected':True}, # drive property 
not set
-- 
2.32.0

Re: [PULL 18/20] block/nbd: drop connection_co

2022-02-02 Thread Hanna Reitz


On 02.02.22 14:53, Eric Blake wrote:

On Wed, Feb 02, 2022 at 12:49:36PM +0100, Fabian Ebner wrote:

Am 27.09.21 um 23:55 schrieb Eric Blake:

From: Vladimir Sementsov-Ogievskiy 

OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.

Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.


Hi,

while updating our stack to 6.2, one of our live-migration tests stopped
working (backtrace is below) and bisecting led me to this patch.

The VM has a single qcow2 disk (converting to raw doesn't make a
difference) and the issue only appears when using iothread (for both
virtio-scsi-pci and virtio-block-pci).

Reverting 1af7737871fb3b66036f5e520acb0a98fc2605f7 (which lives on top)
and 4ddb5d2fde6f22b2cf65f314107e890a7ca14fcf (the commit corresponding
to this patch) in v6.2.0 makes the migration work again.

Backtrace:

Thread 1 (Thread 0x7f9d93458fc0 (LWP 56711) "kvm"):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f9d9d6bc537 in __GI_abort () at abort.c:79
#2  0x7f9d9d6bc40f in __assert_fail_base (fmt=0x7f9d9d825128
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5579153763f8
"qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)",
file=0x5579153764f9 "../io/channel.c", line=483, function=) at assert.c:92

Given that this assertion is about which aio context is set, I wonder
if the conversation at
https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg00096.html is
relevant; if so, Vladimir may already be working on the patch.


It should be exactly that patch:

https://lists.gnu.org/archive/html/qemu-devel/2022-01/msg06222.html

(From the discussion it appears that for v1 I need to ensure the 
reconnection timer is deleted immediately once reconnecting succeeds, 
and then that should be good to move out of the RFC state.)


Basically, I expect qemu to crash every time that you try to use an NBD 
block device in an I/O thread (unless you don’t do any I/O), for example 
this is the simplest reproducer I know of:


$ qemu-nbd --fork -k /tmp/nbd.sock -f raw null-co://

$ qemu-system-x86_64 \
    -object iothread,id=iothr0 \
    -device virtio-scsi,id=vscsi,iothread=iothr0 \
    -blockdev '{
    "driver": "nbd",
    "node-name": "nbd",
    "server": {
    "type": "unix",
    "path": "/tmp/nbd.sock"
    } }' \
    -device scsi-hd,bus=vscsi.0,drive=nbd
qemu-system-x86_64: ../qemu-6.2.0/io/channel.c:483: 
qio_channel_restart_read: Assertion `qemu_get_current_aio_context() == 
qemu_coroutine_get_aio_context(co)' failed.
qemu-nbd: Disconnect client, due to: Unable to read from socket: 
Connection reset by peer
[1]    108747 abort (core dumped)  qemu-system-x86_64 -object 
iothread,id=iothr0 -device  -blockdev  -device

[PATCH v5 19/43] hw/cxl/rp: Add a root port

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
  -device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/Kconfig  |   5 +
 hw/pci-bridge/cxl_root_port.c  | 231 +
 hw/pci-bridge/meson.build  |   1 +
 hw/pci-bridge/pcie_root_port.c |   6 +-
 hw/pci/pci.c   |   4 +-
 5 files changed, 245 insertions(+), 2 deletions(-)
 create mode 100644 hw/pci-bridge/cxl_root_port.c

diff --git a/hw/pci-bridge/Kconfig b/hw/pci-bridge/Kconfig
index f8df4315ba..02614f49aa 100644
--- a/hw/pci-bridge/Kconfig
+++ b/hw/pci-bridge/Kconfig
@@ -27,3 +27,8 @@ config DEC_PCI
 
 config SIMBA
 bool
+
+config CXL
+bool
+default y if PCI_EXPRESS && PXB
+depends on PCI_EXPRESS && MSI_NONBROKEN && PXB
diff --git a/hw/pci-bridge/cxl_root_port.c b/hw/pci-bridge/cxl_root_port.c
new file mode 100644
index 00..dd714db836
--- /dev/null
+++ b/hw/pci-bridge/cxl_root_port.c
@@ -0,0 +1,231 @@
+/*
+ * CXL 2.0 Root Port Implementation
+ *
+ * Copyright(C) 2020 Intel Corporation.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/range.h"
+#include "hw/pci/pci_bridge.h"
+#include "hw/pci/pcie_port.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+#include "hw/cxl/cxl.h"
+
+#define CXL_ROOT_PORT_DID 0x7075
+
+/* Copied from the gen root port which we derive */
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET 0x100
+#define GEN_PCIE_ROOT_PORT_ACS_OFFSET \
+(GEN_PCIE_ROOT_PORT_AER_OFFSET + PCI_ERR_SIZEOF)
+#define CXL_ROOT_PORT_DVSEC_OFFSET \
+(GEN_PCIE_ROOT_PORT_ACS_OFFSET + PCI_ACS_SIZEOF)
+
+typedef struct CXLRootPort {
+/*< private >*/
+PCIESlot parent_obj;
+
+CXLComponentState cxl_cstate;
+PCIResReserve res_reserve;
+} CXLRootPort;
+
+#define TYPE_CXL_ROOT_PORT "cxl-rp"
+DECLARE_INSTANCE_CHECKER(CXLRootPort, CXL_ROOT_PORT, TYPE_CXL_ROOT_PORT)
+
+static void latch_registers(CXLRootPort *crp)
+{
+uint32_t *reg_state = crp->cxl_cstate.crb.cache_mem_registers;
+
+cxl_component_register_init_common(reg_state, CXL2_ROOT_PORT);
+}
+
+static void build_dvsecs(CXLComponentState *cxl)
+{
+uint8_t *dvsec;
+
+dvsec = (uint8_t *)&(struct cxl_dvsec_port_extensions){ 0 };
+cxl_component_create_dvsec(cxl, EXTENSIONS_PORT_DVSEC_LENGTH,
+   EXTENSIONS_PORT_DVSEC,
+   EXTENSIONS_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct cxl_dvsec_port_gpf){
+.rsvd= 0,
+.phase1_ctrl = 1, /* 1μs timeout */
+.phase2_ctrl = 1, /* 1μs timeout */
+};
+cxl_component_create_dvsec(cxl, GPF_PORT_DVSEC_LENGTH, GPF_PORT_DVSEC,
+   GPF_PORT_DVSEC_REVID, dvsec);
+
+dvsec = (uint8_t *)&(struct cxl_dvsec_port_flexbus){
+.cap = 0x26, /* IO, Mem, non-MLD */
+.ctrl= 0,
+.status  = 0x26, /* same */
+.rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */
+};
+cxl_component_create_dvsec(cxl, PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
+   PCIE_FLEXBUS_PORT_DVSEC,
+   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
+
+dvsec = (uint8_t *)&(struct cxl_dvsec_register_locator){
+.rsvd = 0,
+.reg0_base_lo = RBI_COMPONENT_REG | CXL_COMPONENT_REG_BAR_IDX,
+.reg0_base_hi = 0,
+};
+cxl_component_create_dvsec(cxl, REG_LOC_DVSEC_LENGTH, REG_LOC_DVSEC,
+   REG_LOC_DVSEC_REVID, dvsec);
+}
+
+static void cxl_rp_realize(DeviceState *dev, Error **errp)
+{
+PCIDevice *pci_dev = PCI_DEVICE(dev);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(dev);
+CXLRootPort *crp   = CXL_ROOT_PORT(dev);
+CXLComponentState *cxl_cstate = &crp->cxl_cstate;
+C

[PATCH v5 33/43] CXL/cxl_component: Add cxl_get_hb_cstate()

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Accessor to get hold of the cxl state for a CXL host bridge
without exposing the internals of the implementation.

Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/pci_expander_bridge.c | 7 +++
 include/hw/cxl/cxl_component.h  | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 9a2710c067..d53efb09a3 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -81,6 +81,13 @@ static GList *pxb_dev_list;
 #define TYPE_PXB_CXL_HOST "pxb-cxl-host"
 #define PXB_CXL_HOST(obj) OBJECT_CHECK(CXLHost, (obj), TYPE_PXB_CXL_HOST)
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb)
+{
+CXLHost *host = PXB_CXL_HOST(hb);
+
+return &host->cxl_cstate;
+}
+
 static int pxb_bus_num(PCIBus *bus)
 {
 PXBDev *pxb = convert_to_pxb(bus->parent_dev);
diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 42cd140f75..29d7268275 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -201,4 +201,6 @@ static inline hwaddr cxl_decode_ig(int ig)
 return 1 << (ig + 8);
 }
 
+CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
+
 #endif
-- 
2.32.0

Re: [PATCH v1 12/22] plugins: stxp test case from Aaron (!upstream)

2022-02-02 Thread Aaron Lindsay via

On Feb 01 15:29, Alex Bennée wrote:
> 
> Aaron Lindsay  writes:
> 
> > On Jan 24 20:15, Alex Bennée wrote:
> >> Signed-off-by: Alex Bennée 
> >> Cc: Aaron Lindsay 
> >> Message-ID: 
> >> 
> >> ---
> >> [AJB] this was for testing, I think you can show the same stuff with
> >> the much more complete execlog now.
> >
> > Is it true that execlog can also reproduce the duplicate loads which are
> > still an outstanding issue?
> 
> Are we still seeing duplicate loads? I thought that had been fixed.

I have not explicitly tested for the duplicate loads on atomics lately
(though I have seen some transient behavior related to atomics that I
have struggled to reliably reproduce, but I believe that's a different
issue). I hadn't seen a subsequent fix come through after the initial
fix for stores and assumed it was still an issue. Sorry for my
assumption, particularly if I just missed it.

-Aaron

> >> ---
> >>  contrib/plugins/stxp-plugin.c | 50 +++
> >>  tests/tcg/aarch64/stxp.c  | 28 +
> >>  contrib/plugins/Makefile  |  1 +
> >>  tests/tcg/aarch64/Makefile.target |  3 ++
> >>  4 files changed, 82 insertions(+)
> >>  create mode 100644 contrib/plugins/stxp-plugin.c
> >>  create mode 100644 tests/tcg/aarch64/stxp.c
> >> 
> >> diff --git a/contrib/plugins/stxp-plugin.c b/contrib/plugins/stxp-plugin.c
> >> new file mode 100644
> >> index 00..432cf8c1ed
> >> --- /dev/null
> >> +++ b/contrib/plugins/stxp-plugin.c
> >> @@ -0,0 +1,50 @@
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >> +
> >> +void qemu_logf(const char *str, ...)
> >> +{
> >> +char message[1024];
> >> +va_list args;
> >> +va_start(args, str);
> >> +vsnprintf(message, 1023, str, args);
> >> +
> >> +qemu_plugin_outs(message);
> >> +
> >> +va_end(args);
> >> +}
> >> +
> >> +void before_insn_cb(unsigned int cpu_index, void *udata)
> >> +{
> >> +uint64_t pc = (uint64_t)udata;
> >> +qemu_logf("Executing PC: 0x%" PRIx64 "\n", pc);
> >> +}
> >> +
> >> +static void mem_cb(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo, 
> >> uint64_t va, void *udata)
> >> +{
> >> +uint64_t pc = (uint64_t)udata;
> >> +qemu_logf("PC 0x%" PRIx64 " accessed memory at 0x%" PRIx64 "\n", pc, 
> >> va);
> >> +}
> >> +
> >> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> >> +{
> >> +size_t n = qemu_plugin_tb_n_insns(tb);
> >> +
> >> +for (size_t i = 0; i < n; i++) {
> >> +struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> >> +uint64_t pc = qemu_plugin_insn_vaddr(insn);
> >> +
> >> +qemu_plugin_register_vcpu_insn_exec_cb(insn, before_insn_cb, 
> >> QEMU_PLUGIN_CB_R_REGS, (void *)pc);
> >> +qemu_plugin_register_vcpu_mem_cb(insn, mem_cb, 
> >> QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, (void*)pc);
> >> +}
> >> +}
> >> +
> >> +QEMU_PLUGIN_EXPORT
> >> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
> >> +int argc, char **argv)
> >> +{
> >> +qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> >> +return 0;
> >> +}
> >> diff --git a/tests/tcg/aarch64/stxp.c b/tests/tcg/aarch64/stxp.c
> >> new file mode 100644
> >> index 00..fb8ef6a46d
> >> --- /dev/null
> >> +++ b/tests/tcg/aarch64/stxp.c
> >> @@ -0,0 +1,28 @@
> >> +
> >> +
> >> +void stxp_issue_demo(void *arr)
> >> +{
> >> +asm(".align 8\n\t"
> >> +"mov x0, %[in]\n\t"
> >> +"mov x18, 0x1000\n\t"
> >> +"mov x2, 0x0\n\t"
> >> +"mov x3, 0x0\n\t"
> >> +"loop:\n\t"
> >> +"prfm  pstl1strm, [x0]\n\t"
> >> +"ldxp  x16, x17, [x0]\n\t"
> >> +"stxp  w16, x2, x3, [x0]\n\t"
> >> +"\n\t"
> >> +"subs x18, x18, 1\n\t"
> >> +"beq done\n\t"
> >> +"b loop\n\t"
> >> +"done:\n\t"
> >> +: /* none out */
> >> +: [in] "r" (arr) /* in */
> >> +: "x0", "x2", "x3", "x16", "x17", "x18"); /* clobbers */
> >> +}
> >> +
> >> +int main()
> >> +{
> >> +char arr[16];
> >> +stxp_issue_demo(&arr);
> >> +}
> >> diff --git a/contrib/plugins/Makefile b/contrib/plugins/Makefile
> >> index 54ac5ccd9f..576ed5875a 100644
> >> --- a/contrib/plugins/Makefile
> >> +++ b/contrib/plugins/Makefile
> >> @@ -20,6 +20,7 @@ NAMES += howvec
> >>  NAMES += lockstep
> >>  NAMES += hwprofile
> >>  NAMES += cache
> >> +NAMES += stxp-plugin
> >>  
> >>  SONAMES := $(addsuffix .so,$(addprefix lib,$(NAMES)))
> >>  
> >> diff --git a/tests/tcg/aarch64/Makefile.target 
> >> b/tests/tcg/aarch64/Makefile.target
> >> index 1d967901bd..54b2e90d00 100644
> >> --- a/tests/tcg/aarch64/Makefile.target
> >> +++ b/tests/tcg/aarch64/Makefile.target
> >> @@ -72,4 +72,7 @@ endif
> >>  
> >>  endif
> >>  
> >> +# Load/Store exclusive test
> >> +AARCH64_TESTS += stxp
> >> +
> >>  TESTS += $(A

[PATCH v5 20/43] hw/cxl/device: Add a memory device (8.2.8.5)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

Once the CXL topology is fully configure and address decoders committed,
the guest physical address for the memory device is part of a larger
window which is owned by the platform.  The creation of these windows
is later in this series.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"

Note: Dropped PCDIMM info interfaces for now.  They can be added if
appropriate at a later date.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-mailbox-utils.c |  47 ++
 hw/mem/Kconfig |   5 ++
 hw/mem/cxl_type3.c | 170 +
 hw/mem/meson.build |   1 +
 include/hw/cxl/cxl.h   |   1 +
 include/hw/cxl/cxl_pci.h   |  22 +
 include/hw/pci/pci_ids.h   |   1 +
 7 files changed, 247 insertions(+)
 create mode 100644 hw/mem/cxl_type3.c

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 16bb998735..808faec114 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -50,6 +50,8 @@ enum {
 LOGS= 0x04,
 #define GET_SUPPORTED 0x0
 #define GET_LOG   0x1
+IDENTIFY= 0x40,
+#define MEMORY_DEVICE 0x0
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -216,6 +218,48 @@ static ret_code cmd_logs_get_log(struct cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+/* 8.2.9.5.1.1 */
+static ret_code cmd_identify_memory_device(struct cxl_cmd *cmd,
+   CXLDeviceState *cxl_dstate,
+   uint16_t *len)
+{
+struct {
+char fw_revision[0x10];
+uint64_t total_capacity;
+uint64_t volatile_capacity;
+uint64_t persistent_capacity;
+uint64_t partition_align;
+uint16_t info_event_log_size;
+uint16_t warning_event_log_size;
+uint16_t failure_event_log_size;
+uint16_t fatal_event_log_size;
+uint32_t lsa_size;
+uint8_t poison_list_max_mer[3];
+uint16_t inject_poison_limit;
+uint8_t poison_caps;
+uint8_t qos_telemetry_caps;
+} __attribute__((packed)) *id;
+_Static_assert(sizeof(*id) == 0x43, "Bad identify size");
+
+uint64_t size = cxl_dstate->pmem_size;
+
+if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+return CXL_MBOX_INTERNAL_ERROR;
+}
+
+id = (void *)cmd->payload;
+memset(id, 0, sizeof(*id));
+
+/* PMEM only */
+snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
+
+id->total_capacity = size / (256 << 20);
+id->persistent_capacity = size / (256 << 20);
+
+*len = sizeof(*id);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
@@ -233,8 +277,11 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 [TIMESTAMP][SET] = { "TIMESTAMP_SET", cmd_timestamp_set, 8, 
IMMEDIATE_POLICY_CHANGE },
 [LOGS][GET_SUPPORTED] = { "LOGS_GET_SUPPORTED", cmd_logs_get_supported, 0, 
0 },
 [LOGS][GET_LOG] = { "LOGS_GET_LOG", cmd_logs_get_log, 0x18, 0 },
+[IDENTIFY][MEMORY_DEVICE] = { "IDENTIFY_MEMORY_DEVICE",
+cmd_identify_memory_device, 0, 0 },
 };
 
+
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
 {
 uint16_t ret = CXL_MBOX_SUCCESS;
diff --git a/hw/mem/Kconfig b/hw/mem/Kconfig
index 03dbb3c7df..73c5ae8ad9 100644
--- a/hw/mem/Kconfig
+++ b/hw/mem/Kconfig
@@ -11,3 +11,8 @@ config NVDIMM
 
 config SPARSE_MEM
 bool
+
+config CXL_MEM_DEVICE
+bool
+default y if CXL
+select MEM_DEVICE
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
new file mode 100644
index 00..c4021d2434
--- /dev/null
+++ b/hw/mem/cxl_type3.c
@@ -0,0 +1,170 @@
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qemu/error-report.h"
+#include "hw/mem/memory-device.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/pci/pci.h"
+#include "hw/qdev-properties.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/range.h"
+#include "qemu/rcu.h"
+#include "sysemu/hostmem.h"
+#include "hw/cxl/cxl.h"
+
+typedef struct cxl_type3_dev {
+/* Private */
+PCIDevice parent_obj;
+
+/* Properties */
+uint64_t size;
+HostMemoryBackend *hostmem;
+
+/* State */
+CXLComponentState cxl_cstate;
+CXLDeviceState cxl_dstate;
+} CXLType3Dev;
+
+#define CT3(obj) OBJECT_CHECK(CXLType3Dev, (o

[PATCH v5 23/43] tests/acpi: allow CEDT table addition

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Following patches will add a new ACPI table, the
CXL Early Discovery Table (CEDT).

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/pc/CEDT | 0
 tests/data/acpi/q35/CEDT| 0
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 3 files changed, 2 insertions(+)
 create mode 100644 tests/data/acpi/pc/CEDT
 create mode 100644 tests/data/acpi/q35/CEDT

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/q35/CEDT b/tests/data/acpi/q35/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..9b07f1e1ff 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/CEDT",
+"tests/data/acpi/q35/CEDT",
-- 
2.32.0

[PATCH v5 35/43] cxl/cxl-host: Add memops for CFMWS region.

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

These memops perform interleave decoding, walking down the
CXL topology from CFMWS described host interleave
decoder via CXL host bridge HDM decoders, through the CXL
root ports and finally call CXL type 3 specific read and write
functions.

Note that, whilst functional the current implementation does
not support:
* switches
* multiple HDM decoders at a given level.
* unaligned accesses across the interleave boundaries

Signed-off-by: Jonathan Cameron 
---
v5: No changes, debugging solution to unaligned access across
  interleave boundaries continues.
  
 hw/cxl/cxl-host.c| 125 +++
 include/hw/cxl/cxl.h |   2 +
 2 files changed, 127 insertions(+)

diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index 9f303e6d8e..d9cad188a8 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -136,3 +136,128 @@ void cxl_fixed_memory_window_link_targets(Error **errp)
 }
 }
 }
+
+/* TODO: support, multiple hdm decoders */
+static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
+uint8_t *target)
+{
+uint32_t ctrl;
+uint32_t ig_enc;
+uint32_t iw_enc;
+uint32_t target_reg;
+uint32_t target_idx;
+
+ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+if (!FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED)) {
+return false;
+}
+
+ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
+iw_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW);
+target_idx = (addr / cxl_decode_ig(ig_enc)) % (1 << iw_enc);
+
+if (target_idx > 4) {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= target_idx * 8;
+} else {
+target_reg = cache_mem[R_CXL_HDM_DECODER0_TARGET_LIST_LO];
+target_reg >>= (target_idx - 4) * 8;
+}
+*target = target_reg & 0xff;
+
+return true;
+}
+
+static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
+{
+CXLComponentState *hb_cstate;
+PCIHostState *hb;
+int rb_index;
+uint32_t *cache_mem;
+uint8_t target;
+bool target_found;
+PCIDevice *rp, *d;
+
+/* Address is relative to memory region. Convert to HPA */
+addr += fw->base;
+
+rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
+hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl.cxl_host_bridge);
+if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
+return NULL;
+}
+
+hb_cstate = cxl_get_hb_cstate(hb);
+if (!hb_cstate) {
+return NULL;
+}
+
+cache_mem = hb_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, &target);
+if (!target_found) {
+return NULL;
+}
+
+rp = pcie_find_port_by_pn(hb->bus, target);
+if (!rp) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
+
+if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3_DEV)) {
+return NULL;
+}
+
+return d;
+}
+
+static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
+  unsigned size, MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+*data = 0;
+/* Reads to invalid address return poison */
+return MEMTX_ERROR;
+}
+
+return cxl_type3_read(d, addr + fw->base, data, size, attrs);
+}
+
+static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
+   uint64_t data, unsigned size,
+   MemTxAttrs attrs)
+{
+CXLFixedWindow *fw = opaque;
+PCIDevice *d;
+
+d = cxl_cfmws_find_device(fw, addr);
+if (d == NULL) {
+/* Writes to invalid address are silent */
+return MEMTX_OK;
+}
+
+return cxl_type3_write(d, addr + fw->base, data, size, attrs);
+}
+
+const MemoryRegionOps cfmws_ops = {
+.read_with_attrs = cxl_read_cfmws,
+.write_with_attrs = cxl_write_cfmws,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+.unaligned = true,
+},
+};
+
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 1b72c0b7b7..260d602ec9 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -46,4 +46,6 @@ extern QemuOptsList qemu_cxl_fixed_window_opts;
 void parse_cxl_fixed_memory_window_opts(MachineState *ms);
 void cxl_fixed_memory_window_link_targets(Error **errp);
 
+extern const MemoryRegionOps cfmws_ops;
+
 #endif
-- 
2.32.0

[PATCH v5 27/43] hw/cxl/device: Implement get/set Label Storage Area (LSA)

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Implement get and set handlers for the Label Storage Area
used to hold data describing persistent memory configuration
so that it can be ensured it is seen in the same configuration
after reboot.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---
v5:
 Fix wrong bit for IMMEDIATE_DATA_CHANGE

 hw/cxl/cxl-mailbox-utils.c  | 57 +
 hw/mem/cxl_type3.c  | 56 +++-
 include/hw/cxl/cxl_device.h |  5 
 3 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index ccf9c3d794..f4a309ddbf 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -56,6 +56,8 @@ enum {
 #define MEMORY_DEVICE 0x0
 CCLS= 0x41,
 #define GET_PARTITION_INFO 0x0
+#define GET_LSA   0x2
+#define SET_LSA   0x3
 };
 
 /* 8.2.8.4.5.1 Command Return Codes */
@@ -327,7 +329,59 @@ static ret_code cmd_ccls_get_partition_info(struct cxl_cmd 
*cmd,
 return CXL_MBOX_SUCCESS;
 }
 
+static ret_code cmd_ccls_get_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint32_t offset;
+uint32_t length;
+} __attribute__((packed, __aligned__(8))) *get_lsa;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint32_t offset, length;
+
+get_lsa = (void *)cmd->payload;
+offset = get_lsa->offset;
+length = get_lsa->length;
+
+*len = 0;
+if (offset + length > cvc->get_lsa_size(ct3d)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+*len = cvc->get_lsa(ct3d, get_lsa, length, offset);
+return CXL_MBOX_SUCCESS;
+}
+
+static ret_code cmd_ccls_set_lsa(struct cxl_cmd *cmd,
+ CXLDeviceState *cxl_dstate,
+ uint16_t *len)
+{
+struct {
+uint32_t offset;
+uint32_t rsvd;
+} __attribute__((packed, __aligned__(8))) *set_lsa = (void *)cmd->payload;
+CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
+CXLType3Class *cvc = CXL_TYPE3_DEV_GET_CLASS(ct3d);
+uint16_t plen = *len;
+
+*len = 0;
+if (!plen) {
+return CXL_MBOX_SUCCESS;
+}
+
+if (set_lsa->offset + plen > cvc->get_lsa_size(ct3d) + sizeof(*set_lsa)) {
+return CXL_MBOX_INVALID_INPUT;
+}
+
+cvc->set_lsa(ct3d, (void *)set_lsa + sizeof(*set_lsa),
+ plen - sizeof(*set_lsa), set_lsa->offset);
+return CXL_MBOX_SUCCESS;
+}
+
 #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
+#define IMMEDIATE_DATA_CHANGE (1 << 2)
 #define IMMEDIATE_POLICY_CHANGE (1 << 3)
 #define IMMEDIATE_LOG_CHANGE (1 << 4)
 
@@ -350,6 +404,9 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
 cmd_identify_memory_device, 0, 0 },
 [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO",
 cmd_ccls_get_partition_info, 0, 0 },
+[CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 },
+[CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa,
+~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
 };
 
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index b16262d3cc..b1ba4bf0de 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -8,6 +8,7 @@
 #include "qapi/error.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "qemu/pmem.h"
 #include "qemu/range.h"
 #include "qemu/rcu.h"
 #include "sysemu/hostmem.h"
@@ -114,6 +115,11 @@ static void cxl_setup_memory(CXLType3Dev *ct3d, Error 
**errp)
 memory_region_set_enabled(mr, true);
 host_memory_backend_set_mapped(ct3d->hostmem, true);
 ct3d->cxl_dstate.pmem_size = ct3d->hostmem->size;
+
+if (!ct3d->lsa) {
+error_setg(errp, "lsa property must be set");
+return;
+}
 }
 
 
@@ -168,12 +174,58 @@ static Property ct3_props[] = {
 DEFINE_PROP_SIZE("size", CXLType3Dev, size, -1),
 DEFINE_PROP_LINK("memdev", CXLType3Dev, hostmem, TYPE_MEMORY_BACKEND,
  HostMemoryBackend *),
+DEFINE_PROP_LINK("lsa", CXLType3Dev, lsa, TYPE_MEMORY_BACKEND,
+ HostMemoryBackend *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
 static uint64_t get_lsa_size(CXLType3Dev *ct3d)
 {
-return 0;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->lsa);
+return memory_region_size(mr);
+}
+
+static void validate_lsa_access(MemoryRegion *mr, uint64_t size,
+uint64_t offset)
+{
+assert(offset + size <= memory_region_size(mr));
+assert(offset + size > offset);
+}
+
+static uint64_t get_lsa(CXLType3Dev *ct3d, void *buf, uint64_t size,
+uint64_t offset)
+{
+MemoryRegion *mr;
+void *lsa;
+
+mr = host_memory_backend_get_memory(ct3d-

[PATCH v5 34/43] mem/cxl_type3: Add read and write functions for associated hostmem.

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Once a read or write reaches a CXL type 3 device, the HDM decoders
on the device are used to establish the Device Physical Address
which should be accessed.  These functions peform the required maths
and then directly access the hostmem->mr to fullfil the actual
operation.  Note that failed writes are silent, but failed reads
return poison.  Note this is based loosely on:

https://lore.kernel.org/qemu-devel/20200817161853.593247-6-f4...@amsat.org/
[RFC PATCH 0/9] hw/misc: Add support for interleaved memory accesses

Only lightly tested so far.  More complex test cases yet to be written.

Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c  | 81 +
 include/hw/cxl/cxl_device.h |  5 +++
 2 files changed, 86 insertions(+)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index b1ba4bf0de..064e8c942c 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -161,6 +161,87 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  &ct3d->cxl_dstate.device_registers);
 }
 
+/* TODO: Support multiple HDM decoders and DPA skip */
+static bool cxl_type3_dpa(CXLType3Dev *ct3d, hwaddr host_addr, uint64_t *dpa)
+{
+uint32_t *cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
+uint64_t decoder_base, decoder_size, hpa_offset;
+uint32_t hdm0_ctrl;
+int ig, iw;
+
+decoder_base = (((uint64_t)cache_mem[R_CXL_HDM_DECODER0_BASE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_BASE_LO]);
+if ((uint64_t)host_addr < decoder_base) {
+return false;
+}
+
+hpa_offset = (uint64_t)host_addr - decoder_base;
+
+decoder_size = ((uint64_t)cache_mem[R_CXL_HDM_DECODER0_SIZE_HI] << 32) |
+cache_mem[R_CXL_HDM_DECODER0_SIZE_LO];
+if (hpa_offset >= decoder_size) {
+return false;
+}
+
+hdm0_ctrl = cache_mem[R_CXL_HDM_DECODER0_CTRL];
+iw = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IW);
+ig = FIELD_EX32(hdm0_ctrl, CXL_HDM_DECODER0_CTRL, IG);
+
+*dpa = (MAKE_64BIT_MASK(0, 8 + ig) & hpa_offset) |
+((MAKE_64BIT_MASK(8 + ig + iw, 64 - 8 - ig - iw) & hpa_offset) >> iw);
+
+return true;
+}
+
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
+   unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CT3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+/* TODO support volatile region */
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_ERROR;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, &dpa_offset)) {
+return MEMTX_ERROR;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_ERROR;
+}
+
+return memory_region_dispatch_read(mr, dpa_offset, data,
+   size_memop(size), attrs);
+}
+
+MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
+unsigned size, MemTxAttrs attrs)
+{
+CXLType3Dev *ct3d = CT3(d);
+uint64_t dpa_offset;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(ct3d->hostmem);
+if (!mr) {
+return MEMTX_OK;
+}
+
+if (!cxl_type3_dpa(ct3d, host_addr, &dpa_offset)) {
+return MEMTX_OK;
+}
+
+if (dpa_offset > int128_get64(mr->size)) {
+return MEMTX_OK;
+}
+
+return memory_region_dispatch_write(mr, dpa_offset, data,
+size_memop(size), attrs);
+}
+
 static void ct3d_reset(DeviceState *dev)
 {
 CXLType3Dev *ct3d = CT3(dev);
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 43908f161b..83da5d4e8f 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -264,4 +264,9 @@ struct CXLType3Class {
 uint64_t offset);
 };
 
+MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
+   unsigned size, MemTxAttrs attrs);
+MemTxResult cxl_type3_write(PCIDevice *d, hwaddr host_addr, uint64_t data,
+unsigned size, MemTxAttrs attrs);
+
 #endif
-- 
2.32.0

[PATCH v5 39/43] hw/cxl/component Add a dumb HDM decoder handler

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Add a trivial handler for now to cover the root bridge
where we could do some error checking in future.

Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
---

 hw/cxl/cxl-component-utils.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 795dbc7561..c5124708b6 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -32,6 +32,31 @@ static uint64_t cxl_cache_mem_read_reg(void *opaque, hwaddr 
offset,
 }
 }
 
+static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
+ uint32_t value)
+{
+ComponentRegisters *cregs = &cxl_cstate->crb;
+uint32_t *cache_mem = cregs->cache_mem_registers;
+bool should_commit = false;
+
+switch (offset) {
+case A_CXL_HDM_DECODER0_CTRL:
+should_commit = FIELD_EX32(value, CXL_HDM_DECODER0_CTRL, COMMIT);
+break;
+default:
+break;
+}
+
+memory_region_transaction_begin();
+stl_le_p((uint8_t *)cache_mem + offset, value);
+if (should_commit) {
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMIT, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, ERR, 0);
+ARRAY_FIELD_DP32(cache_mem, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
+}
+memory_region_transaction_commit();
+}
+
 static void cxl_cache_mem_write_reg(void *opaque, hwaddr offset, uint64_t 
value,
 unsigned size)
 {
@@ -45,6 +70,12 @@ static void cxl_cache_mem_write_reg(void *opaque, hwaddr 
offset, uint64_t value,
 }
 if (cregs->special_ops && cregs->special_ops->write) {
 cregs->special_ops->write(cxl_cstate, offset, value, size);
+return;
+}
+
+if (offset >= A_CXL_HDM_DECODER_CAPABILITY &&
+offset <= A_CXL_HDM_DECODER0_TARGET_LIST_HI) {
+dumb_hdm_handler(cxl_cstate, offset, value);
 } else {
 cregs->cache_mem_registers[offset / 4] = value;
 }
-- 
2.32.0

[PATCH v5 31/43] hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl

2022-02-02 Thread Jonathan Cameron via

This adds code to instantiate the slightly extended ACPI root port
description in DSDT as per the CXL 2.0 specification.

Basically a cut and paste job from the i386/pc code.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
---
v5:
 No change to this patch, but build issue seen here was fixed at
 introduction of build_cxl_osc_method() in patch 22.

 hw/arm/Kconfig  |  1 +
 hw/pci-host/gpex-acpi.c | 22 +++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 2e0049196d..3df419fa6d 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -29,6 +29,7 @@ config ARM_VIRT
 select ACPI_APEI
 select ACPI_VIOT
 select VIRTIO_MEM_SUPPORTED
+select ACPI_CXL
 
 config CHEETAH
 bool
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index e7e162a00a..fb60aa517f 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -5,6 +5,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_host.h"
+#include "hw/acpi/cxl.h"
 
 static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq)
 {
@@ -139,6 +140,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 QLIST_FOREACH(bus, &bus->child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+bool is_cxl;
 
 if (!pci_bus_is_root(bus)) {
 continue;
@@ -153,9 +155,19 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 nr_pcie_buses = bus_num;
 }
 
+is_cxl = pci_bus_is_cxl(bus);
+
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+if (is_cxl) {
+struct Aml *pkg = aml_package(2);
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
+aml_append(pkg, aml_eisaid("PNP0A08"));
+aml_append(pkg, aml_eisaid("PNP0A03"));
+aml_append(dev, aml_name_decl("_CID", pkg));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb Device")));
@@ -175,7 +187,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 cfg->pio.base, 0, 0, 0);
 aml_append(dev, aml_name_decl("_CRS", crs));
 
-acpi_dsdt_add_pci_osc(dev);
+if (is_cxl) {
+build_cxl_osc_method(dev);
+} else {
+acpi_dsdt_add_pci_osc(dev);
+}
 
 aml_append(scope, dev);
 }
-- 
2.32.0

[PATCH v5 41/43] qtest/acpi: Add reference CEDT tables.

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

More sophisticated tests will come later, but for now deal
with the NULL case.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/pc/CEDT | Bin 0 -> 36 bytes
 tests/data/acpi/q35/CEDT| Bin 0 -> 36 bytes
 tests/data/acpi/virt/CEDT   | Bin 0 -> 36 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   3 ---
 4 files changed, 3 deletions(-)

diff --git a/tests/data/acpi/pc/CEDT b/tests/data/acpi/pc/CEDT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b44db4ce1db980d783ad568a03c17c2915d111b0
 100644
GIT binary patch
literal 36
jcmZ>EbqP^nU|?VjaPoKd2v%^42yj*a0!E-1hz+6veU1hJ

literal 0
HcmV?d1

diff --git a/tests/data/acpi/q35/CEDT b/tests/data/acpi/q35/CEDT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b44db4ce1db980d783ad568a03c17c2915d111b0
 100644
GIT binary patch
literal 36
jcmZ>EbqP^nU|?VjaPoKd2v%^42yj*a0!E-1hz+6veU1hJ

literal 0
HcmV?d1

diff --git a/tests/data/acpi/virt/CEDT b/tests/data/acpi/virt/CEDT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b44db4ce1db980d783ad568a03c17c2915d111b0
 100644
GIT binary patch
literal 36
jcmZ>EbqP^nU|?VjaPoKd2v%^42yj*a0!E-1hz+6veU1hJ

literal 0
HcmV?d1

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index c7726cad80..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,4 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/CEDT",
-"tests/data/acpi/q35/CEDT",
-"tests/data/acpi/virt/CEDT",
-- 
2.32.0

[PATCH v1] an547: Correct typo that swaps ahb and apb peripherals

2022-02-02 Thread Jimmy Brisson

Turns out that this manifests in being unable to configure
the ethernet access permissions, as the IotKitPPC looks
these up by name.

With this fix, eth is configurable

Signed-off-by: Jimmy Brisson 
---
 hw/arm/mps2-tz.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index f40e854dec..3c6456762a 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -1030,7 +1030,7 @@ static void mps2tz_common_init(MachineState *machine)
 };
 
 const PPCInfo an547_ppcs[] = { {
-.name = "apb_ppcexp0",
+.name = "ahb_ppcexp0",
 .ports = {
 { "ssram-mpc", make_mpc, &mms->mpc[0], 0x5700, 0x1000 },
 { "qspi-mpc", make_mpc, &mms->mpc[1], 0x57001000, 0x1000 },
@@ -1072,7 +1072,7 @@ static void mps2tz_common_init(MachineState *machine)
 { "rtc", make_rtc, &mms->rtc, 0x4930b000, 0x1000 },
 },
 }, {
-.name = "ahb_ppcexp0",
+.name = "apb_ppcexp0",
 .ports = {
 { "gpio0", make_unimp_dev, &mms->gpio[0], 0x4110, 0x1000 },
 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
-- 
2.33.1

[PATCH v5 42/43] qtest/cxl: Add very basic sanity tests

2022-02-02 Thread Jonathan Cameron via

From: Ben Widawsky 

Simple 'does it boot tests' with up to
2x PXB host bridge, each with 2x CXL RP and each of those with
a Type 3 memory device.  Single CFMWS to interleave across the
two HBs and ultimate the 4 devices.

More complete tests may be possible but CXL interleave setup
is complex so a lot of steps will be needed.

Signed-off-by: Ben Widawsky 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
---
 tests/qtest/cxl-test.c  | 151 
 tests/qtest/meson.build |   4 ++
 2 files changed, 155 insertions(+)
 create mode 100644 tests/qtest/cxl-test.c

diff --git a/tests/qtest/cxl-test.c b/tests/qtest/cxl-test.c
new file mode 100644
index 00..a50c0c6de4
--- /dev/null
+++ b/tests/qtest/cxl-test.c
@@ -0,0 +1,151 @@
+/*
+ * QTest testcase for CXL
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+#define QEMU_PXB_CMD "-machine q35,cxl=on " \
+ "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+ "-cxl-fixed-memory-window targets=cxl.0,size=4G "
+
+#define QEMU_2PXB_CMD "-machine q35,cxl=on " \
+  "-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 "  \
+  "-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=53 " \
+  "-cxl-fixed-memory-window 
targets=cxl.0,targets=cxl.1,size=4G "
+
+#define QEMU_RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 "
+
+/* Dual ports on first pxb */
+#define QEMU_2RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 "
+
+/* Dual ports on each of the pxb instances */
+#define QEMU_4RP "-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 " \
+ "-device cxl-rp,id=rp1,bus=cxl.0,chassis=0,slot=1 " \
+ "-device cxl-rp,id=rp2,bus=cxl.1,chassis=0,slot=2 " \
+ "-device cxl-rp,id=rp3,bus=cxl.1,chassis=0,slot=3 "
+
+#define QEMU_T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M " \
+ "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,id=cxl-pmem0,size=256M "
+
+#define QEMU_2T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,id=cxl-pmem0,size=256M " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,id=cxl-pmem1,size=256M "
+
+#define QEMU_4T3D "-object 
memory-backend-file,id=cxl-mem0,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp0,memdev=cxl-mem0,id=cxl-pmem0,size=256M " \
+  "-object 
memory-backend-file,id=cxl-mem1,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp1,memdev=cxl-mem1,id=cxl-pmem1,size=256M " \
+  "-object 
memory-backend-file,id=cxl-mem2,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp2,memdev=cxl-mem2,id=cxl-pmem2,size=256M " \
+  "-object 
memory-backend-file,id=cxl-mem3,mem-path=%s,size=256M "\
+  "-device 
cxl-type3,bus=rp3,memdev=cxl-mem3,id=cxl-pmem3,size=256M "
+
+static void cxl_basic_hb(void)
+{
+qtest_start("-machine q35,cxl=on");
+qtest_end();
+}
+
+static void cxl_basic_pxb(void)
+{
+qtest_start("-machine q35,cxl=on -device pxb-cxl,bus=pcie.0");
+qtest_end();
+}
+
+static void cxl_pxb_with_window(void)
+{
+qtest_start(QEMU_PXB_CMD);
+qtest_end();
+}
+
+static void cxl_2pxb_with_window(void)
+{
+qtest_start(QEMU_2PXB_CMD);
+qtest_end();
+}
+
+static void cxl_root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_RP);
+qtest_end();
+}
+
+static void cxl_2root_port(void)
+{
+qtest_start(QEMU_PXB_CMD QEMU_2RP);
+qtest_end();
+}
+
+static void cxl_t3d(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_RP QEMU_T3D, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_1pxb_2rp_2t3d(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_PXB_CMD QEMU_2RP QEMU_2T3D, tmpfs, tmpfs);
+
+qtest_start(cmdline->str);
+qtest_end();
+
+g_string_free(cmdline, TRUE);
+}
+
+static void cxl_2pxb_4rp_4t3d(void)
+{
+GString *cmdline;
+char template[] = "/tmp/cxl-test-XX";
+const char *tmpfs;
+
+tmpfs = mkdtemp(template);
+
+cmdline = g_string_new(NULL);
+g_string_printf(cmdline, QEMU_2PXB_CMD QEMU_4RP QEMU_4T3D,
+tmpfs, tmpfs, t

[PATCH v5 32/43] pci/pcie_port: Add pci_find_port_by_pn()

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Simple function to search a PCIBus to find a port by
it's port number.

CXL interleave decoding uses the port number as a target
so it is necessary to locate the port when doing interleave
decoding.

Signed-off-by: Jonathan Cameron 
---
 hw/pci/pcie_port.c | 25 +
 include/hw/pci/pcie_port.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c
index e95c1e5519..687e4e763a 100644
--- a/hw/pci/pcie_port.c
+++ b/hw/pci/pcie_port.c
@@ -136,6 +136,31 @@ static void pcie_port_class_init(ObjectClass *oc, void 
*data)
 device_class_set_props(dc, pcie_port_props);
 }
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn)
+{
+int devfn;
+
+for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
+PCIDevice *d = bus->devices[devfn];
+PCIEPort *port;
+
+if (!d || !pci_is_express(d) || !d->exp.exp_cap) {
+continue;
+}
+
+if (!object_dynamic_cast(OBJECT(d), TYPE_PCIE_PORT)) {
+continue;
+}
+
+port = PCIE_PORT(d);
+if (port->port == pn) {
+return d;
+}
+}
+
+return NULL;
+}
+
 static const TypeInfo pcie_port_type_info = {
 .name = TYPE_PCIE_PORT,
 .parent = TYPE_PCI_BRIDGE,
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index e25b289ce8..7b8193061a 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -39,6 +39,8 @@ struct PCIEPort {
 
 void pcie_port_init_reg(PCIDevice *d);
 
+PCIDevice *pcie_find_port_by_pn(PCIBus *bus, uint8_t pn);
+
 #define TYPE_PCIE_SLOT "pcie-slot"
 OBJECT_DECLARE_SIMPLE_TYPE(PCIESlot, PCIE_SLOT)
 
-- 
2.32.0

[PATCH v5 36/43] arm/virt: Allow virt/CEDT creation

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Allow for the creation of the CEDT ACPI table without
qtest fails due to the unknown ACPI tables.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/virt/CEDT   | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 2 files changed, 1 insertion(+)
 create mode 100644 tests/data/acpi/virt/CEDT

diff --git a/tests/data/acpi/virt/CEDT b/tests/data/acpi/virt/CEDT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 9b07f1e1ff..c7726cad80 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,3 +1,4 @@
 /* List of comma-separated changed AML files to ignore */
 "tests/data/acpi/pc/CEDT",
 "tests/data/acpi/q35/CEDT",
+"tests/data/acpi/virt/CEDT",
-- 
2.32.0

[PATCH] qemu-options: fix incorrect description for '-drive index='

2022-02-02 Thread Laurent Vivier

qemu-options.hx contains grammar that a native English-speaking
person would never use.

Replace "This option defines where is connected the drive" by
"This option defines where the drive is connected".

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/853
Signed-off-by: Laurent Vivier 
---
 qemu-options.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index ba3ae6a42aa3..094a6c1d7c28 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1377,7 +1377,7 @@ SRST
 the bus number and the unit id.
 
 ``index=index``
-This option defines where is connected the drive by using an
+This option defines where the drive is connected by using an
 index in the list of available connectors of a given interface
 type.
 
-- 
2.34.1

Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-02 Thread Emanuele Giuseppe Esposito




On 01/02/2022 15:47, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2022 19:27, Emanuele Giuseppe Esposito wrote:
>> Protect bdrv_replace_child_noperm, as it modifies the
>> graph by adding/removing elements to .children and .parents
>> list of a bs. Use the newly introduced
>> bdrv_subtree_drained_{begin/end}_unlocked drains to achieve
>> that and be free from the aiocontext lock.
>>
>> One important criteria to keep in mind is that if the caller of
>> bdrv_replace_child_noperm creates a transaction, we need to make sure
>> that the
>> whole transaction is under the same drain block. This is imperative,
>> as having
>> multiple drains also in the .abort() class of functions causes
>> discrepancies
>> in the drained counters (as nodes are put back into the original
>> positions),
>> making it really hard to retourn all to zero and leaving the code very
>> buggy.
>> See https://patchew.org/QEMU/20211213104014.69858-1-eespo...@redhat.com/
>> for more explanations.
>>
>> Unfortunately we still need to have bdrv_subtree_drained_begin/end
>> in bdrv_detach_child() releasing and then holding the AioContext
>> lock, since it later invokes bdrv_try_set_aio_context() that is
>> not safe yet. Once all is cleaned up, we can also remove the
>> acquire/release locks in job_unref, artificially added because of this.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>   block.c | 50 --
>>   1 file changed, 44 insertions(+), 6 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index fcc44a49a0..6196c95aae 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3114,8 +3114,22 @@ static void bdrv_detach_child(BdrvChild **childp)
>>   BlockDriverState *old_bs = (*childp)->bs;
>>     assert(qemu_in_main_thread());
>> +    if (old_bs) {
>> +    /*
>> + * TODO: this is called by job_unref with lock held, because
>> + * afterwards it calls bdrv_try_set_aio_context.
>> + * Once all of this is fixed, take care of removing
>> + * the aiocontext lock and make this function _unlocked.
>> + */
>> +    bdrv_subtree_drained_begin(old_bs);
>> +    }
>> +
>>   bdrv_replace_child_noperm(childp, NULL, true);
>>   +    if (old_bs) {
>> +    bdrv_subtree_drained_end(old_bs);
>> +    }
>> +
>>   if (old_bs) {
>>   /*
>>    * Update permissions for old node. We're just taking a
>> parent away, so
>> @@ -3154,6 +3168,7 @@ BdrvChild
>> *bdrv_root_attach_child(BlockDriverState *child_bs,
>>   Transaction *tran = tran_new();
>>     assert(qemu_in_main_thread());
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>>     ret = bdrv_attach_child_common(child_bs, child_name, child_class,
>>  child_role, perm, shared_perm,
>> opaque,
>> @@ -3168,6 +3183,7 @@ out:
>>   tran_finalize(tran, ret);
>>   /* child is unset on failure by bdrv_attach_child_common_abort() */
>>   assert((ret < 0) == !child);
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>>     bdrv_unref(child_bs);
>>   return child;
>> @@ -3197,6 +3213,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>     assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(parent_bs);
>> +    bdrv_subtree_drained_begin_unlocked(child_bs);
>> +
>>   ret = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
>> child_class,
>>  child_role, &child, tran, errp);
>>   if (ret < 0) {
>> @@ -3211,6 +3230,9 @@ BdrvChild *bdrv_attach_child(BlockDriverState
>> *parent_bs,
>>   out:
>>   tran_finalize(tran, ret);
>>   /* child is unset on failure by bdrv_attach_child_common_abort() */
>> +    bdrv_subtree_drained_end_unlocked(child_bs);
>> +    bdrv_subtree_drained_end_unlocked(parent_bs);
>> +
>>   assert((ret < 0) == !child);
>>     bdrv_unref(child_bs);
>> @@ -3456,6 +3478,11 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>     assert(qemu_in_main_thread());
>>   +    bdrv_subtree_drained_begin_unlocked(bs);
>> +    if (backing_hd) {
>> +    bdrv_subtree_drained_begin_unlocked(backing_hd);
>> +    }
>> +
>>   ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
>>   if (ret < 0) {
>>   goto out;
>> @@ -3464,6 +3491,10 @@ int bdrv_set_backing_hd(BlockDriverState *bs,
>> BlockDriverState *backing_hd,
>>   ret = bdrv_refresh_perms(bs, errp);
>>   out:
>>   tran_finalize(tran, ret);
>> +    if (backing_hd) {
>> +    bdrv_subtree_drained_end_unlocked(backing_hd);
>> +    }
>> +    bdrv_subtree_drained_end_unlocked(bs);
>>     return ret;
>>   }
>> @@ -5266,7 +5297,8 @@ static int
>> bdrv_replace_node_common(BlockDriverState *from,
>>     assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>   assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
>> -    bdrv_drained_begin(from);
>> +    bdrv_

[PATCH v5 40/43] i386/pc: Enable CXL fixed memory windows

2022-02-02 Thread Jonathan Cameron via

From: Jonathan Cameron 

Add the CFMWs memory regions to the memorymap and adjust the
PCI window to avoid hitting the same memory.

Signed-off-by: Jonathan Cameron 
---
 hw/i386/pc.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7a18dce529..5ece806d2b 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -816,7 +816,7 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
-hwaddr cxl_base;
+hwaddr cxl_base, cxl_resv_end = 0;
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
@@ -924,6 +924,24 @@ void pc_memory_init(PCMachineState *pcms,
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
 memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
 memory_region_add_subregion(system_memory, cxl_base, mr);
+cxl_resv_end = cxl_base + cxl_size;
+if (machine->cxl_devices_state->fixed_windows) {
+hwaddr cxl_fmw_base;
+GList *it;
+
+cxl_fmw_base = ROUND_UP(cxl_base + cxl_size, 256 * MiB);
+for (it = machine->cxl_devices_state->fixed_windows; it; it = 
it->next) {
+CXLFixedWindow *fw = it->data;
+
+fw->base = cxl_fmw_base;
+memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
+  "cxl-fixed-memory-region", fw->size);
+memory_region_add_subregion(system_memory, fw->base, &fw->mr);
+e820_add_entry(fw->base, fw->size, E820_RESERVED);
+cxl_fmw_base += fw->size;
+cxl_resv_end = cxl_fmw_base;
+}
+}
 }
 
 /* Initialize PC system firmware */
@@ -953,6 +971,10 @@ void pc_memory_init(PCMachineState *pcms,
 if (!pcmc->broken_reserved_end) {
 res_mem_end += memory_region_size(&machine->device_memory->mr);
 }
+
+if (machine->cxl_devices_state->is_enabled) {
+res_mem_end = cxl_resv_end;
+}
 *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB));
 fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
 }
@@ -989,6 +1011,13 @@ uint64_t pc_pci_hole64_start(void)
 if (ms->cxl_devices_state->host_mr.addr) {
 hole64_start = ms->cxl_devices_state->host_mr.addr +
 memory_region_size(&ms->cxl_devices_state->host_mr);
+if (ms->cxl_devices_state->fixed_windows) {
+GList *it;
+for (it = ms->cxl_devices_state->fixed_windows; it; it = it->next) 
{
+CXLFixedWindow *fw = it->data;
+hole64_start = fw->mr.addr + memory_region_size(&fw->mr);
+}
+}
 } else if (pcmc->has_reserved_memory && ms->device_memory->base) {
 hole64_start = ms->device_memory->base;
 if (!pcmc->broken_reserved_end) {
-- 
2.32.0

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Alex Williamson

On Wed, 2 Feb 2022 05:06:49 -0500
"Michael S. Tsirkin"  wrote:

> On Wed, Feb 02, 2022 at 09:30:42AM +, Peter Maydell wrote:
> > > I/O port space is always the identity mapped CPU address space unless
> > > sparse translations are used to create multiple I/O port spaces (not
> > > implemented).  I/O port space is only accessed by the CPU, there are no
> > > device initiated I/O port transactions, so the address space relative
> > > to the device is irrelevant.  
> > 
> > Does the PCI spec actually forbid any master except the CPU from
> > issuing I/O port transactions, or is it just that in practice nobody
> > makes a PCI device that does weird stuff like that ?
> > 
> > thanks
> > -- PMM  
> 
> Hmm, the only thing vaguely related in the spec that I know of is this:
> 
>   PCI Express supports I/O Space for compatibility with legacy devices 
> which require their use.
>   Future revisions of this specification may deprecate the use of I/O 
> Space.
> 
> Alex, what did you refer to?

My evidence is largely by omission, but that might be that in practice
it's not used rather than explicitly forbidden.  I note that the bus
master enable bit specifies:

Bus Master Enable - Controls the ability of a Function to issue
Memory and I/O Read/Write Requests, and the ability of
a Port to forward Memory and I/O Read/Write Requests in
the Upstream direction.

That would suggest it's possible, but for PCI device assignment, I'm
not aware of any means through which we could support this.  There is
no support in the IOMMU core for mapping I/O port space, nor could we
trap such device initiated transactions to emulate them.  I can't spot
any mention of I/O port space in the VT-d spec, however the AMD-Vi spec
does include a field in the device table:

controlIoCtl: port I/O control. Specifies whether
device-initiated port I/O space transactions are blocked,
forwarded, or translated.

00b=Device-initiated port I/O is not allowed. The IOMMU target
aborts the transaction if a port I/O space transaction is
received. Translation requests are target aborted.

01b=Device-initiated port I/O space transactions are allowed.
The IOMMU must pass port I/O accesses untranslated. Translation
requests are target aborted.

10b=Transactions in the port I/O space address range are
translated by the IOMMU page tables as memory transactions.

11b=Reserved.

I don't see this field among the macros used by the Linux driver in
configuring these device entries, so I assume it's left to the default
value, ie. zero, blocking device initiated I/O port transactions.

So yes, I suppose device initiated I/O port transactions are possible,
but we have no support or reason to support them, so I'm going to go
ahead and continue believing any I/O port address space from the device
perspective is largely irrelevant ;)  Thanks,

Alex

Re: [PATCH v1 1/1] target/i386: Mask xstate_bv based on the cpu enabled features

2022-02-02 Thread David Edmondson

On Tuesday, 2022-02-01 at 16:09:57 -03, Leonardo Brás wrote:

> Hello David, thanks for this feedback!
>
> On Mon, 2022-01-31 at 12:53 +, David Edmondson wrote:
>> On Saturday, 2022-01-29 at 06:46:45 -03, Leonardo Bras wrote:
>> 
>> > The following steps describe a migration bug:
>> > 1 - Bring up a VM with -cpu EPYC on a host with EPYC-Milan cpu
>> > 2 - Migrate to a host with EPYC-Naples cpu
>> > 
>> > The guest kernel crashes shortly after the migration.
>> > 
>> > The crash happens due to a fault caused by XRSTOR:
>> > A set bit in XSTATE_BV is not set in XCR0.
>> > The faulting bit is FEATURE_PKRU (enabled in Milan, but not in
>> > Naples)
>> 
>> I'm trying to understand how this happens.
>> 
>> If we boot on EPYC-Milan with "-cpu EPYC", the PKRU feature should
>> not
>> be exposed to the VM (it is not available in the EPYC CPU).
>> 
>> Given this, how would bit 0x200 (representing PKRU) end up set in
>> xstate_bv?
>
> During my debug, I noticed this bit gets set before the kernel even
> starts. 
>
> It's possible Seabios and/or IPXE are somehow setting 0x200 using the
> xrstor command. I am not sure if qemu is able to stop this in KVM mode.

I don't believe that this should be possible.

If the CPU is set to EPYC in QEMU then .features[FEAT_7_0_ECX] does not
include CPUID_7_0_ECX_PKU, which in turn means that when
x86_cpu_enable_xsave_components() generates FEAT_XSAVE_COMP_LO it should
not set XSTATE_PKRU_BIT.

Given that, KVM's vcpu->arch.guest_supported_xcr0 will not include
XSTATE_PKRU_BIT, and __kvm_set_xcr() should not allow that bit to be
set when it intercepts the guest xsetbv instruction.

dme.
-- 
Please forgive me if I act a little strange, for I know not what I do.

Re: [PATCH v3 1/1] virtio: fix the condition for iommu_platform not supported

2022-02-02 Thread Daniel Henrique Barboza





On 2/2/22 13:23, Halil Pasic wrote:

On Wed, 2 Feb 2022 10:24:51 -0300
Daniel Henrique Barboza  wrote:


On 2/1/22 22:15, Halil Pasic wrote:

On Tue, 1 Feb 2022 16:31:22 -0300
Daniel Henrique Barboza  wrote:
   

On 2/1/22 15:33, Halil Pasic wrote:

On Tue, 1 Feb 2022 12:36:25 -0300
Daniel Henrique Barboza  wrote:
  

+vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
 if (klass->get_dma_as != NULL && has_iommu) {
 virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
 vdev->dma_as = klass->get_dma_as(qbus->parent);
+if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
+error_setg(errp,
+   "iommu_platform=true is not supported by the device");
+}


 

 } else {
 vdev->dma_as = &address_space_memory;
 }



I struggled to understand what this 'else' clause was doing and I assumed that 
it was
wrong. Searching through the ML I learned that this 'else' clause is intended 
to handle
legacy virtio devices that doesn't support the DMA API (introduced in 
8607f5c3072caeebb)
and thus shouldn't set  VIRTIO_F_IOMMU_PLATFORM.


My suggestion, if a v4 is required for any other reason, is to add a small 
comment in this
'else' clause explaining that this is the legacy virtio devices condition and 
those devices
don't set F_IOMMU_PLATFORM. This would make the code easier to read for a 
virtio casual like
myself.


I do not agree that this is about legacy virtio. In my understanding
virtio-ccw simply does not need translation because CCW devices use
guest physical addresses as per architecture. It may be considered
legacy stuff form PCI perspective, but I don't think it is legacy
in general.



I wasn't talking about virtio-ccw. I was talking about this piece of code:


   if (klass->get_dma_as != NULL && has_iommu) {
   virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
   vdev->dma_as = klass->get_dma_as(qbus->parent);
   } else {
   vdev->dma_as = &address_space_memory;
   }


I suggested something like this:



   if (klass->get_dma_as != NULL && has_iommu) {
   virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
   vdev->dma_as = klass->get_dma_as(qbus->parent);
   } else {
   /*
* We don't force VIRTIO_F_IOMMU_PLATFORM for legacy devices, i.e.
* devices that don't implement klass->get_dma_as, regardless of
* 'has_iommu' setting.
*/
   vdev->dma_as = &address_space_memory;
   }


At least from my reading of commits 8607f5c3072 and 2943b53f682 this seems to be
the case. I spent some time thinking that this IF/ELSE was wrong because I 
wasn't
aware of this history.


With virtio-ccw we take the else branch because we don't implement
->get_dma_as(). I don't consider all the virtio-ccw to be legacy.

IMHO there are two ways to think about this:
a) The commit that introduced this needs a fix which implemets
get_dma_as() for virtio-ccw in a way that it simply returns
address_space_memory.
b) The presence of ->get_dma_as() is not indicative of "legacy".

BTW in virtospeak "legacy" has a special meaning: pre-1.0 virtio. Do you
mean that legacy. And if I read the virtio-pci code correctly
->get_dma_as is set for legacy, transitional and modern devices alike.



Oh ok. I'm not well versed into virtiospeak. My "legacy" comment was a poor 
choice of
word for the situation.

We can ignore the "legacy" bit. My idea/suggestion is to put a comment at that 
point
explaining the logic behind into not forcing VIRTIO_F_IOMMU_PLATFORM in devices 
that
doesn't implement ->get_dma_as().

I am assuming that this is an intended design that was introduced by 2943b53f682
("virtio: force VIRTIO_F_IOMMU_PLATFORM"), meaning that the implementation of 
the
->get_dma_as is being used as a parameter to force the feature in the device. 
And with
this code:


  if (klass->get_dma_as != NULL && has_iommu) {
  virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
  vdev->dma_as = klass->get_dma_as(qbus->parent);
  } else {
  vdev->dma_as = &address_space_memory;
  }

It is possible that we have 2 vdev devices where ->dma_as = 
&address_space_memory, but one
of them is sitting in a bus where "klass->get_dma_as(qbus->parent) = 
&address_space_memory",
and this device will have VIRTIO_F_IOMMU_PLATFORM forced onto it and the former 
won't.


If this is not an intended design I can only speculate how to fix it. Forcing 
VIRTIO_F_IOMMU_PLATFORM
in all devices, based only on has_iommu, can break stuff. Setting 
VIRTIO_F_IOMMU_PLATFORM only
if "vdev->dma_as != &address_space_memory" make some sense but I am fairly 
certain it will
break stuff the other way. Or perhaps the fix is something else entirely.






IMHO the important thing to figure out is what impact that
virtio_add_feature(&v

Re: [PATCH v3 04/11] 9p: darwin: Handle struct dirent differences

2022-02-02 Thread Will Cohen

Does the version proposed in v3 address the V9fsFidState issues? In 9p.c
for v2 to v3, we propose

-return telldir(fidp->fs.dir.stream);
+return v9fs_co_telldir(pdu, fidp);

and in codir.c from v2 to v3 we propose
-saved_dir_pos = telldir(fidp->fs.dir.stream);
+saved_dir_pos = s->ops->telldir(&s->ctx, &fidp->fs);

This removes the direct access to fidp->, and we hope this should be
sufficient to avoid the concurrency
and undefined behaviors you noted in the v2 review.


On Thu, Jan 27, 2022 at 7:56 PM Will Cohen  wrote:

> From: Keno Fischer 
>
> On darwin d_seekoff exists, but is optional and does not seem to
> be commonly used by file systems. Use `telldir` instead to obtain
> the seek offset.
>
> Signed-off-by: Keno Fischer 
> [Michael Roitzsch: - Rebase for NixOS]
> Signed-off-by: Michael Roitzsch 
> [Will Cohen: - Adjust to pass testing]
> Signed-off-by: Will Cohen 
> Signed-off-by: Fabian Franz 
> ---
>  hw/9pfs/9p-synth.c |  2 ++
>  hw/9pfs/9p.c   | 33 +++--
>  hw/9pfs/codir.c|  4 
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
> index 4a4a776d06..09b9c25288 100644
> --- a/hw/9pfs/9p-synth.c
> +++ b/hw/9pfs/9p-synth.c
> @@ -222,7 +222,9 @@ static void synth_direntry(V9fsSynthNode *node,
>  {
>  strcpy(entry->d_name, node->name);
>  entry->d_ino = node->attr->inode;
> +#ifndef CONFIG_DARWIN
>  entry->d_off = off + 1;
> +#endif
>  }
>
>  static struct dirent *synth_get_dentry(V9fsSynthNode *dir,
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index 1563d7b7c6..7851f85f8f 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -2218,6 +2218,25 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU
> *pdu, V9fsFidState *fidp,
>  return offset;
>  }
>
> +/**
> + * Get the seek offset of a dirent. If not available from the structure
> itself,
> + * obtain it by calling telldir.
> + */
> +static int v9fs_dent_telldir(V9fsPDU *pdu, V9fsFidState *fidp,
> + struct dirent *dent)
> +{
> +#ifdef CONFIG_DARWIN
> +/*
> + * Darwin has d_seekoff, which appears to function similarly to d_off.
> + * However, it does not appear to be supported on all file systems,
> + * so use telldir for correctness.
> + */
> +return v9fs_co_telldir(pdu, fidp);
> +#else
> +return dent->d_off;
> +#endif
> +}
> +
>  static int coroutine_fn v9fs_do_readdir_with_stat(V9fsPDU *pdu,
>V9fsFidState *fidp,
>uint32_t max_count)
> @@ -2281,7 +2300,11 @@ static int coroutine_fn
> v9fs_do_readdir_with_stat(V9fsPDU *pdu,
>  count += len;
>  v9fs_stat_free(&v9stat);
>  v9fs_path_free(&path);
> -saved_dir_pos = dent->d_off;
> +saved_dir_pos = v9fs_dent_telldir(pdu, fidp, dent);
> +if (saved_dir_pos < 0) {
> +err = saved_dir_pos;
> +break;
> +}
>  }
>
>  v9fs_readdir_unlock(&fidp->fs.dir);
> @@ -2420,6 +2443,7 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU
> *pdu, V9fsFidState *fidp,
>  V9fsString name;
>  int len, err = 0;
>  int32_t count = 0;
> +off_t off;
>  struct dirent *dent;
>  struct stat *st;
>  struct V9fsDirEnt *entries = NULL;
> @@ -2480,12 +2504,17 @@ static int coroutine_fn v9fs_do_readdir(V9fsPDU
> *pdu, V9fsFidState *fidp,
>  qid.version = 0;
>  }
>
> +off = v9fs_dent_telldir(pdu, fidp, dent);
> +if (off < 0) {
> +err = off;
> +break;
> +}
>  v9fs_string_init(&name);
>  v9fs_string_sprintf(&name, "%s", dent->d_name);
>
>  /* 11 = 7 + 4 (7 = start offset, 4 = space for storing count) */
>  len = pdu_marshal(pdu, 11 + count, "Qqbs",
> -  &qid, dent->d_off,
> +  &qid, off,
>dent->d_type, &name);
>
>  v9fs_string_free(&name);
> diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
> index 032cce04c4..c1b5694f3f 100644
> --- a/hw/9pfs/codir.c
> +++ b/hw/9pfs/codir.c
> @@ -167,7 +167,11 @@ static int do_readdir_many(V9fsPDU *pdu, V9fsFidState
> *fidp,
>  }
>
>  size += len;
> +#ifdef CONFIG_DARWIN
> +saved_dir_pos = s->ops->telldir(&s->ctx, &fidp->fs);
> +#else
>  saved_dir_pos = dent->d_off;
> +#endif
>  }
>
>  /* restore (last) saved position */
> --
> 2.34.1
>
>

Re: [PATCH] hw/i2c: flatten pca954x mux device

2022-02-02 Thread Philippe Mathieu-Daudé via


On 1/2/22 21:54, Patrick Venture wrote:



On Tue, Feb 1, 2022 at 11:02 AM Philippe Mathieu-Daudé > wrote:


On 1/2/22 17:30, Patrick Venture wrote:
 > Previously this device created N subdevices which each owned an
i2c bus.
 > Now this device simply owns the N i2c busses directly.
 >
 > Tested: Verified devices behind mux are still accessible via qmp
and i2c
 > from within an arm32 SoC.
 >
 > Reviewed-by: Hao Wu mailto:wuhao...@google.com>>
 > Signed-off-by: Patrick Venture mailto:vent...@google.com>>
 > ---
 >   hw/i2c/i2c_mux_pca954x.c | 75
++--
 >   1 file changed, 11 insertions(+), 64 deletions(-)

 >   static void pca954x_init(Object *obj)
 >   {
 >       Pca954xState *s = PCA954X(obj);
 >       Pca954xClass *c = PCA954X_GET_CLASS(obj);
 >       int i;
 >
 > -    /* Only initialize the children we expect. */
 > +    /* SMBus modules. Cannot fail. */
 >       for (i = 0; i < c->nchans; i++) {
 > -        object_initialize_child(obj, "channel[*]", &s->channel[i],
 > -                                TYPE_PCA954X_CHANNEL);
 > +        /* start all channels as disabled. */
 > +        s->enabled[i] = false;
 > +        s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");

This is not a QOM property, so you need to initialize manually:


that was my suspicion but this is the output I'm seeing:

{'execute': 'qom-list', 'arguments': { 'path': 
'/machine/soc/smbus[0]/i2c-bus/child[0]' }}


{"return": [
{"name": "type", "type": "string"},
{"name": "parent_bus", "type": "link"},
{"name": "realized", "type": "bool"},
{"name": "hotplugged", "type": "bool"},
{"name": "hotpluggable", "type": "bool"},
{"name": "address", "type": "uint8"},
{"name": "channel[3]", "type": "child"},
{"name": "channel[0]", "type": "child"},
{"name": "channel[1]", "type": "child"},
{"name": "channel[2]", "type": "child"}
]}

It seems to be naming them via the order they're created.

Is this not behaving how you expect?


On the monitor:

(qemu) info qtree
bus: main-system-bus
  type System
  ...
  dev: npcm7xx-smbus, id ""
gpio-out "sysbus-irq" 1
mmio f008d000/1000
bus: i2c-bus
  type i2c-bus
  dev: pca9548, id ""
address = 119 (0x77)
bus: channel[*]
  type i2c-bus
bus: channel[*]
  type i2c-bus
bus: channel[*]
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 73 (0x49)
bus: channel[*]
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 72 (0x48)
bus: channel[*]
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 73 (0x49)
bus: channel[*]
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 72 (0x48)
bus: channel[*]
  type i2c-bus
bus: channel[*]
  type i2c-bus


-- >8 --
diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
index f9ce633b3a..a9517b612a 100644
--- a/hw/i2c/i2c_mux_pca954x.c
+++ b/hw/i2c/i2c_mux_pca954x.c
@@ -189,9 +189,11 @@ static void pca954x_init(Object *obj)

       /* SMBus modules. Cannot fail. */
       for (i = 0; i < c->nchans; i++) {
+        g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
+
           /* start all channels as disabled. */
           s->enabled[i] = false;
-        s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
+        s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
       }
   }

---

(look at HMP 'info qtree' output).


With this snippet:

(qemu) info qtree
bus: main-system-bus
  type System
  ...
  dev: npcm7xx-smbus, id ""
gpio-out "sysbus-irq" 1
mmio f008d000/1000
bus: i2c-bus
  type i2c-bus
  dev: pca9548, id ""
address = 119 (0x77)
bus: i2c.7
  type i2c-bus
bus: i2c.6
  type i2c-bus
bus: i2c.5
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 73 (0x49)
bus: i2c.4
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 72 (0x48)
bus: i2c.3
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 73 (0x49)
bus: i2c.2
  type i2c-bus
  dev: tmp105, id ""
gpio-out "" 1
address = 72 (0x48)
bus: i2c.1
  type i2c-bus
bus: i2c.0
  type i2c-bus

Regards,

Phil.

Re: [PATCH] hw/i2c: flatten pca954x mux device

2022-02-02 Thread Patrick Venture

On Wed, Feb 2, 2022 at 8:34 AM Patrick Venture  wrote:

>
>
> On Tue, Feb 1, 2022 at 12:54 PM Patrick Venture 
> wrote:
>
>>
>>
>> On Tue, Feb 1, 2022 at 11:02 AM Philippe Mathieu-Daudé 
>> wrote:
>>
>>> On 1/2/22 17:30, Patrick Venture wrote:
>>> > Previously this device created N subdevices which each owned an i2c
>>> bus.
>>> > Now this device simply owns the N i2c busses directly.
>>> >
>>> > Tested: Verified devices behind mux are still accessible via qmp and
>>> i2c
>>> > from within an arm32 SoC.
>>> >
>>> > Reviewed-by: Hao Wu 
>>> > Signed-off-by: Patrick Venture 
>>> > ---
>>> >   hw/i2c/i2c_mux_pca954x.c | 75
>>> ++--
>>> >   1 file changed, 11 insertions(+), 64 deletions(-)
>>>
>>> >   static void pca954x_init(Object *obj)
>>> >   {
>>> >   Pca954xState *s = PCA954X(obj);
>>> >   Pca954xClass *c = PCA954X_GET_CLASS(obj);
>>> >   int i;
>>> >
>>> > -/* Only initialize the children we expect. */
>>> > +/* SMBus modules. Cannot fail. */
>>> >   for (i = 0; i < c->nchans; i++) {
>>> > -object_initialize_child(obj, "channel[*]", &s->channel[i],
>>> > -TYPE_PCA954X_CHANNEL);
>>> > +/* start all channels as disabled. */
>>> > +s->enabled[i] = false;
>>> > +s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
>>>
>>> This is not a QOM property, so you need to initialize manually:
>>>
>>
>> that was my suspicion but this is the output I'm seeing:
>>
>> {'execute': 'qom-list', 'arguments': { 'path':
>> '/machine/soc/smbus[0]/i2c-bus/child[0]' }}
>>
>> {"return": [
>> {"name": "type", "type": "string"},
>> {"name": "parent_bus", "type": "link"},
>> {"name": "realized", "type": "bool"},
>> {"name": "hotplugged", "type": "bool"},
>> {"name": "hotpluggable", "type": "bool"},
>> {"name": "address", "type": "uint8"},
>> {"name": "channel[3]", "type": "child"},
>> {"name": "channel[0]", "type": "child"},
>> {"name": "channel[1]", "type": "child"},
>> {"name": "channel[2]", "type": "child"}
>> ]}
>>
>> It seems to be naming them via the order they're created.
>>
>> Is this not behaving how you expect?
>>
>
> Philippe,
>
> I0202 08:29:45.380384  6641 stream.go:31] qemu: child buses at "pca9546":
> "channel[*]", "channel[*]", "channel[*]", "channel[*]"
>
> Ok, so that's interesting.  In one system (using qom-list) it's correct,
> but then when using it to do path assignment (qdev-monitor), it fails...
>
> I'm not as fond of the name i2c-bus.%d, since they're referred to as
> channels in the datasheet.  If I do the manual name creation, can I keep
> the name channel or should I pivot over?
>
> Thanks
>
>
>>
>>>
>>> -- >8 --
>>> diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
>>> index f9ce633b3a..a9517b612a 100644
>>> --- a/hw/i2c/i2c_mux_pca954x.c
>>> +++ b/hw/i2c/i2c_mux_pca954x.c
>>> @@ -189,9 +189,11 @@ static void pca954x_init(Object *obj)
>>>
>>>   /* SMBus modules. Cannot fail. */
>>>   for (i = 0; i < c->nchans; i++) {
>>> +g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
>>> +
>>>   /* start all channels as disabled. */
>>>   s->enabled[i] = false;
>>> -s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
>>> +s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
>>>   }
>>>   }
>>>
>>> ---
>>>
>>> (look at HMP 'info qtree' output).
>>>
>>> >   }
>>> >   }
>>>
>>> With the change:
>>> Reviewed-by: Philippe Mathieu-Daudé 
>>> Tested-by: Philippe Mathieu-Daudé 
>>>
>>
Just saw your reply, and found a bunch of other non-spam in my spam
folder.  I sent the message to the anti-spam team, hopefully that'll
resolve this for myself and presumably others.

I definitely see the same result with the qdev-monitor, but was really
surprised that the qom-list worked.  I'll explicitly set the name, and
i2c.%d is fine.  The detail that they're channels is not really important
to the end user presumably.

I'll have v2 out shortly.

thanks,
Patrick

[PATCH v2] hw/i2c: flatten pca954x mux device

2022-02-02 Thread Patrick Venture

Previously this device created N subdevices which each owned an i2c bus.
Now this device simply owns the N i2c busses directly.

Tested: Verified devices behind mux are still accessible via qmp and i2c
from within an arm32 SoC.

Reviewed-by: Hao Wu 
Signed-off-by: Patrick Venture 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
---
v2: explicitly create an incrementing name for the i2c busses (channels).
---
 hw/i2c/i2c_mux_pca954x.c | 77 +++-
 1 file changed, 13 insertions(+), 64 deletions(-)

diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
index 847c59921c..a9517b612a 100644
--- a/hw/i2c/i2c_mux_pca954x.c
+++ b/hw/i2c/i2c_mux_pca954x.c
@@ -30,24 +30,6 @@
 #define PCA9548_CHANNEL_COUNT 8
 #define PCA9546_CHANNEL_COUNT 4
 
-/*
- * struct Pca954xChannel - The i2c mux device will have N of these states
- * that own the i2c channel bus.
- * @bus: The owned channel bus.
- * @enabled: Is this channel active?
- */
-typedef struct Pca954xChannel {
-SysBusDevice parent;
-
-I2CBus   *bus;
-
-bool enabled;
-} Pca954xChannel;
-
-#define TYPE_PCA954X_CHANNEL "pca954x-channel"
-#define PCA954X_CHANNEL(obj) \
-OBJECT_CHECK(Pca954xChannel, (obj), TYPE_PCA954X_CHANNEL)
-
 /*
  * struct Pca954xState - The pca954x state object.
  * @control: The value written to the mux control.
@@ -59,8 +41,8 @@ typedef struct Pca954xState {
 
 uint8_t control;
 
-/* The channel i2c buses. */
-Pca954xChannel channel[PCA9548_CHANNEL_COUNT];
+bool enabled[PCA9548_CHANNEL_COUNT];
+I2CBus *bus[PCA9548_CHANNEL_COUNT];
 } Pca954xState;
 
 /*
@@ -98,11 +80,11 @@ static bool pca954x_match(I2CSlave *candidate, uint8_t 
address,
 }
 
 for (i = 0; i < mc->nchans; i++) {
-if (!mux->channel[i].enabled) {
+if (!mux->enabled[i]) {
 continue;
 }
 
-if (i2c_scan_bus(mux->channel[i].bus, address, broadcast,
+if (i2c_scan_bus(mux->bus[i], address, broadcast,
  current_devs)) {
 if (!broadcast) {
 return true;
@@ -125,9 +107,9 @@ static void pca954x_enable_channel(Pca954xState *s, uint8_t 
enable_mask)
  */
 for (i = 0; i < mc->nchans; i++) {
 if (enable_mask & (1 << i)) {
-s->channel[i].enabled = true;
+s->enabled[i] = true;
 } else {
-s->channel[i].enabled = false;
+s->enabled[i] = false;
 }
 }
 }
@@ -184,23 +166,7 @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t channel)
 Pca954xState *pca954x = PCA954X(mux);
 
 g_assert(channel < pc->nchans);
-return I2C_BUS(qdev_get_child_bus(DEVICE(&pca954x->channel[channel]),
-  "i2c-bus"));
-}
-
-static void pca954x_channel_init(Object *obj)
-{
-Pca954xChannel *s = PCA954X_CHANNEL(obj);
-s->bus = i2c_init_bus(DEVICE(s), "i2c-bus");
-
-/* Start all channels as disabled. */
-s->enabled = false;
-}
-
-static void pca954x_channel_class_init(ObjectClass *klass, void *data)
-{
-DeviceClass *dc = DEVICE_CLASS(klass);
-dc->desc = "Pca954x Channel";
+return pca954x->bus[channel];
 }
 
 static void pca9546_class_init(ObjectClass *klass, void *data)
@@ -215,28 +181,19 @@ static void pca9548_class_init(ObjectClass *klass, void 
*data)
 s->nchans = PCA9548_CHANNEL_COUNT;
 }
 
-static void pca954x_realize(DeviceState *dev, Error **errp)
-{
-Pca954xState *s = PCA954X(dev);
-Pca954xClass *c = PCA954X_GET_CLASS(s);
-int i;
-
-/* SMBus modules. Cannot fail. */
-for (i = 0; i < c->nchans; i++) {
-sysbus_realize(SYS_BUS_DEVICE(&s->channel[i]), &error_abort);
-}
-}
-
 static void pca954x_init(Object *obj)
 {
 Pca954xState *s = PCA954X(obj);
 Pca954xClass *c = PCA954X_GET_CLASS(obj);
 int i;
 
-/* Only initialize the children we expect. */
+/* SMBus modules. Cannot fail. */
 for (i = 0; i < c->nchans; i++) {
-object_initialize_child(obj, "channel[*]", &s->channel[i],
-TYPE_PCA954X_CHANNEL);
+g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
+
+/* start all channels as disabled. */
+s->enabled[i] = false;
+s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
 }
 }
 
@@ -252,7 +209,6 @@ static void pca954x_class_init(ObjectClass *klass, void 
*data)
 rc->phases.enter = pca954x_enter_reset;
 
 dc->desc = "Pca954x i2c-mux";
-dc->realize = pca954x_realize;
 
 k->write_data = pca954x_write_data;
 k->receive_byte = pca954x_read_byte;
@@ -278,13 +234,6 @@ static const TypeInfo pca954x_info[] = {
 .parent= TYPE_PCA954X,
 .class_init= pca9548_class_init,
 },
-{
-.name = TYPE_PCA954X_CHANNEL,
-.parent = TYPE_SYS_BUS_DEVICE,
-.class_init = pca954x_channel_class_init,
-.instance_size = sizeof(Pca954xChannel),
-.instance_init =

Re: [PATCH v2] 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread

2022-02-02 Thread Christian Schoenebeck

On Freitag, 28. Januar 2022 23:33:26 CET Vitaly Chikunov wrote:
> `struct dirent' returned from readdir(3) could be shorter than
> `sizeof(struct dirent)', thus memcpy of sizeof length will overread
> into unallocated page causing SIGSEGV. Example stack trace:
> 
>  #0  0x559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 +
> 0x497eed) #1  0x559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64
> + 0x4982e9) #2  0x55eb7983 coroutine_trampoline
> (/usr/bin/qemu-system-x86_64 + 0x963983) #3  0x773e0be0 n/a (n/a +
> 0x0)
> 
> While fixing, provide a helper for any future `struct dirent' cloning.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841
> Cc: qemu-sta...@nongnu.org
> Co-authored-by: Christian Schoenebeck 
> Signed-off-by: Vitaly Chikunov 
> ---
> Tested on x86-64 Linux.

I was too optimistic. Looks like this needs more work. With this patch applied
the 9p test cases [1] are crashing now:

$ gdb --args tests/qtest/qos-test -m slow
...
# Start of flush tests
ok 50 
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/success
ok 51 
/x86_64/pc/i440FX-pcihost/pci-bus-pc/pci-bus/virtio-9p-pci/virtio-9p/virtio-9p-tests/synth/flush/ignored
# End of flush tests
# Start of readdir tests
Broken pipe

Thread 1 "qos-test" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77b7d537 in __GI_abort () at abort.c:79
#2  0x555ba495 in qtest_client_socket_recv_line (s=0x557663c0) at 
../tests/qtest/libqtest.c:503
#3  0x555ba5b3 in qtest_rsp_args (s=0x557663c0, expected_args=2) at 
../tests/qtest/libqtest.c:523
#4  0x555bbdb4 in qtest_clock_rsp (s=0x557663c0) at 
../tests/qtest/libqtest.c:970
#5  0x555bbe55 in qtest_clock_step (s=0x557663c0, step=100) at 
../tests/qtest/libqtest.c:985
#6  0x555cdc21 in qvirtio_wait_used_elem (qts=0x557663c0, 
d=0x55779b48, vq=0x557b0480, desc_idx=8, len=0x0, timeout_us=1000)
at ../tests/qtest/libqos/virtio.c:220
#7  0x555ae79f in v9fs_req_wait_for_reply (req=0x557899a0, len=0x0) 
at ../tests/qtest/virtio-9p-test.c:278
#8  0x555b03bf in fs_readdir (obj=0x55779bb0, data=0x0, 
t_alloc=0x557448b8) at ../tests/qtest/virtio-9p-test.c:851
#9  0x555990c4 in run_one_test (arg=0x557ac600) at 
../tests/qtest/qos-test.c:182
#10 0x77f02b9e in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#15 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#19 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#20 0x77f0299b in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x77f0308a in g_test_run_suite () from 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x77f030a1 in g_test_run () from 
/lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x555995a3 in main (argc=1, argv=0x7fffe508, 
envp=0x7fffe528) at ../tests/qtest/qos-test.c:338
(gdb)

[1] https://wiki.qemu.org/Documentation/9p#Test_Cases

Best regards,
Christian Schoenebeck

Re: [PATCH] hw/i2c: flatten pca954x mux device

2022-02-02 Thread Patrick Venture

On Tue, Feb 1, 2022 at 12:54 PM Patrick Venture  wrote:

>
>
> On Tue, Feb 1, 2022 at 11:02 AM Philippe Mathieu-Daudé 
> wrote:
>
>> On 1/2/22 17:30, Patrick Venture wrote:
>> > Previously this device created N subdevices which each owned an i2c bus.
>> > Now this device simply owns the N i2c busses directly.
>> >
>> > Tested: Verified devices behind mux are still accessible via qmp and i2c
>> > from within an arm32 SoC.
>> >
>> > Reviewed-by: Hao Wu 
>> > Signed-off-by: Patrick Venture 
>> > ---
>> >   hw/i2c/i2c_mux_pca954x.c | 75 ++--
>> >   1 file changed, 11 insertions(+), 64 deletions(-)
>>
>> >   static void pca954x_init(Object *obj)
>> >   {
>> >   Pca954xState *s = PCA954X(obj);
>> >   Pca954xClass *c = PCA954X_GET_CLASS(obj);
>> >   int i;
>> >
>> > -/* Only initialize the children we expect. */
>> > +/* SMBus modules. Cannot fail. */
>> >   for (i = 0; i < c->nchans; i++) {
>> > -object_initialize_child(obj, "channel[*]", &s->channel[i],
>> > -TYPE_PCA954X_CHANNEL);
>> > +/* start all channels as disabled. */
>> > +s->enabled[i] = false;
>> > +s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
>>
>> This is not a QOM property, so you need to initialize manually:
>>
>
> that was my suspicion but this is the output I'm seeing:
>
> {'execute': 'qom-list', 'arguments': { 'path':
> '/machine/soc/smbus[0]/i2c-bus/child[0]' }}
>
> {"return": [
> {"name": "type", "type": "string"},
> {"name": "parent_bus", "type": "link"},
> {"name": "realized", "type": "bool"},
> {"name": "hotplugged", "type": "bool"},
> {"name": "hotpluggable", "type": "bool"},
> {"name": "address", "type": "uint8"},
> {"name": "channel[3]", "type": "child"},
> {"name": "channel[0]", "type": "child"},
> {"name": "channel[1]", "type": "child"},
> {"name": "channel[2]", "type": "child"}
> ]}
>
> It seems to be naming them via the order they're created.
>
> Is this not behaving how you expect?
>

Philippe,

I0202 08:29:45.380384  6641 stream.go:31] qemu: child buses at "pca9546":
"channel[*]", "channel[*]", "channel[*]", "channel[*]"

Ok, so that's interesting.  In one system (using qom-list) it's correct,
but then when using it to do path assignment (qdev-monitor), it fails...

I'm not as fond of the name i2c-bus.%d, since they're referred to as
channels in the datasheet.  If I do the manual name creation, can I keep
the name channel or should I pivot over?

Thanks


>
>>
>> -- >8 --
>> diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
>> index f9ce633b3a..a9517b612a 100644
>> --- a/hw/i2c/i2c_mux_pca954x.c
>> +++ b/hw/i2c/i2c_mux_pca954x.c
>> @@ -189,9 +189,11 @@ static void pca954x_init(Object *obj)
>>
>>   /* SMBus modules. Cannot fail. */
>>   for (i = 0; i < c->nchans; i++) {
>> +g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
>> +
>>   /* start all channels as disabled. */
>>   s->enabled[i] = false;
>> -s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
>> +s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
>>   }
>>   }
>>
>> ---
>>
>> (look at HMP 'info qtree' output).
>>
>> >   }
>> >   }
>>
>> With the change:
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Tested-by: Philippe Mathieu-Daudé 
>>
>

Re: [PATCH v3 1/1] virtio: fix the condition for iommu_platform not supported

2022-02-02 Thread Halil Pasic

On Wed, 2 Feb 2022 10:24:51 -0300
Daniel Henrique Barboza  wrote:

> On 2/1/22 22:15, Halil Pasic wrote:
> > On Tue, 1 Feb 2022 16:31:22 -0300
> > Daniel Henrique Barboza  wrote:
> >   
> >> On 2/1/22 15:33, Halil Pasic wrote:  
> >>> On Tue, 1 Feb 2022 12:36:25 -0300
> >>> Daniel Henrique Barboza  wrote:
> >>>  
> > +vdev_has_iommu = virtio_host_has_feature(vdev, 
> > VIRTIO_F_IOMMU_PLATFORM);
> > if (klass->get_dma_as != NULL && has_iommu) {
> > virtio_add_feature(&vdev->host_features, 
> > VIRTIO_F_IOMMU_PLATFORM);
> > vdev->dma_as = klass->get_dma_as(qbus->parent);
> > +if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
> > +error_setg(errp,
> > +   "iommu_platform=true is not supported by the 
> > device");
> > +}  
> 
>  
> > } else {
> > vdev->dma_as = &address_space_memory;
> > }  
> 
> 
>  I struggled to understand what this 'else' clause was doing and I 
>  assumed that it was
>  wrong. Searching through the ML I learned that this 'else' clause is 
>  intended to handle
>  legacy virtio devices that doesn't support the DMA API (introduced in 
>  8607f5c3072caeebb)
>  and thus shouldn't set  VIRTIO_F_IOMMU_PLATFORM.
> 
> 
>  My suggestion, if a v4 is required for any other reason, is to add a 
>  small comment in this
>  'else' clause explaining that this is the legacy virtio devices 
>  condition and those devices
>  don't set F_IOMMU_PLATFORM. This would make the code easier to read for 
>  a virtio casual like
>  myself.  
> >>>
> >>> I do not agree that this is about legacy virtio. In my understanding
> >>> virtio-ccw simply does not need translation because CCW devices use
> >>> guest physical addresses as per architecture. It may be considered
> >>> legacy stuff form PCI perspective, but I don't think it is legacy
> >>> in general.  
> >>
> >>
> >> I wasn't talking about virtio-ccw. I was talking about this piece of code:
> >>
> >>
> >>   if (klass->get_dma_as != NULL && has_iommu) {
> >>   virtio_add_feature(&vdev->host_features, 
> >> VIRTIO_F_IOMMU_PLATFORM);
> >>   vdev->dma_as = klass->get_dma_as(qbus->parent);
> >>   } else {
> >>   vdev->dma_as = &address_space_memory;
> >>   }
> >>
> >>
> >> I suggested something like this:
> >>
> >>
> >>
> >>   if (klass->get_dma_as != NULL && has_iommu) {
> >>   virtio_add_feature(&vdev->host_features, 
> >> VIRTIO_F_IOMMU_PLATFORM);
> >>   vdev->dma_as = klass->get_dma_as(qbus->parent);
> >>   } else {
> >>   /*
> >>* We don't force VIRTIO_F_IOMMU_PLATFORM for legacy devices, 
> >> i.e.
> >>* devices that don't implement klass->get_dma_as, regardless of
> >>* 'has_iommu' setting.
> >>*/
> >>   vdev->dma_as = &address_space_memory;
> >>   }
> >>
> >>
> >> At least from my reading of commits 8607f5c3072 and 2943b53f682 this seems 
> >> to be
> >> the case. I spent some time thinking that this IF/ELSE was wrong because I 
> >> wasn't
> >> aware of this history.  
> > 
> > With virtio-ccw we take the else branch because we don't implement  
> > ->get_dma_as(). I don't consider all the virtio-ccw to be legacy.  
> > 
> > IMHO there are two ways to think about this:
> > a) The commit that introduced this needs a fix which implemets
> > get_dma_as() for virtio-ccw in a way that it simply returns
> > address_space_memory.
> > b) The presence of ->get_dma_as() is not indicative of "legacy".
> > 
> > BTW in virtospeak "legacy" has a special meaning: pre-1.0 virtio. Do you
> > mean that legacy. And if I read the virtio-pci code correctly  
> > ->get_dma_as is set for legacy, transitional and modern devices alike.  
> 
> 
> Oh ok. I'm not well versed into virtiospeak. My "legacy" comment was a poor 
> choice of
> word for the situation.
> 
> We can ignore the "legacy" bit. My idea/suggestion is to put a comment at 
> that point
> explaining the logic behind into not forcing VIRTIO_F_IOMMU_PLATFORM in 
> devices that
> doesn't implement ->get_dma_as().
> 
> I am assuming that this is an intended design that was introduced by 
> 2943b53f682
> ("virtio: force VIRTIO_F_IOMMU_PLATFORM"), meaning that the implementation of 
> the
> ->get_dma_as is being used as a parameter to force the feature in the device. 
> And with  
> this code:
> 
> 
>  if (klass->get_dma_as != NULL && has_iommu) {
>  virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
>  vdev->dma_as = klass->get_dma_as(qbus->parent);
>  } else {
>  vdev->dma_as = &address_space_memory;
>  }
> 
> It is possible that we have 2 vdev devices where ->dma_as = 
> &address_space_memory, but one
> of them is sitting in a bus where "klass->get_dma_as(qbus->parent) = 
> &addre

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Michael S. Tsirkin

On Wed, Feb 02, 2022 at 08:49:33AM -0700, Alex Williamson wrote:
> > Alex, what did you refer to?
> 
> My evidence is largely by omission, but that might be that in practice
> it's not used rather than explicitly forbidden.  I note that the bus
> master enable bit specifies:
> 
>   Bus Master Enable - Controls the ability of a Function to issue
>   Memory and I/O Read/Write Requests, and the ability of
>   a Port to forward Memory and I/O Read/Write Requests in
>   the Upstream direction.
> 
> That would suggest it's possible, but for PCI device assignment, I'm
> not aware of any means through which we could support this.  There is
> no support in the IOMMU core for mapping I/O port space, nor could we
> trap such device initiated transactions to emulate them.  I can't spot
> any mention of I/O port space in the VT-d spec, however the AMD-Vi spec
> does include a field in the device table:
> 
>   controlIoCtl: port I/O control. Specifies whether
>   device-initiated port I/O space transactions are blocked,
>   forwarded, or translated.
> 
>   00b=Device-initiated port I/O is not allowed. The IOMMU target
>   aborts the transaction if a port I/O space transaction is
>   received. Translation requests are target aborted.
>   
>   01b=Device-initiated port I/O space transactions are allowed.
>   The IOMMU must pass port I/O accesses untranslated. Translation
>   requests are target aborted.
>   
>   10b=Transactions in the port I/O space address range are
>   translated by the IOMMU page tables as memory transactions.
> 
>   11b=Reserved.
> 
> I don't see this field among the macros used by the Linux driver in
> configuring these device entries, so I assume it's left to the default
> value, ie. zero, blocking device initiated I/O port transactions.
> 
> So yes, I suppose device initiated I/O port transactions are possible,
> but we have no support or reason to support them, so I'm going to go
> ahead and continue believing any I/O port address space from the device
> perspective is largely irrelevant ;)  Thanks,
> 
> Alex

Right, it would seem devices can initiate I/O space transactions but IOMMUs
don't support virtualizing them and so neither does VFIO.


-- 
MST

Re: [PATCH v6 1/3] nvdimm: Add realize, unrealize callbacks to NVDIMMDevice class

2022-02-02 Thread Daniel Henrique Barboza





On 2/1/22 18:57, Shivaprasad G Bhat wrote:

A new subclass inheriting NVDIMMDevice is going to be introduced in
subsequent patches. The new subclass uses the realize and unrealize
callbacks. Add them on NVDIMMClass to appropriately call them as part
of plug-unplug.

Signed-off-by: Shivaprasad G Bhat 
---


Acked-by: Daniel Henrique Barboza 


  hw/mem/nvdimm.c  |   16 
  hw/mem/pc-dimm.c |5 +
  include/hw/mem/nvdimm.h  |2 ++
  include/hw/mem/pc-dimm.h |1 +
  4 files changed, 24 insertions(+)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index 7397b67156..59959d5563 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -181,10 +181,25 @@ static MemoryRegion 
*nvdimm_md_get_memory_region(MemoryDeviceState *md,
  static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
  {
  NVDIMMDevice *nvdimm = NVDIMM(dimm);
+NVDIMMClass *ndc = NVDIMM_GET_CLASS(nvdimm);
  
  if (!nvdimm->nvdimm_mr) {

  nvdimm_prepare_memory_region(nvdimm, errp);
  }
+
+if (ndc->realize) {
+ndc->realize(nvdimm, errp);
+}
+}
+
+static void nvdimm_unrealize(PCDIMMDevice *dimm)
+{
+NVDIMMDevice *nvdimm = NVDIMM(dimm);
+NVDIMMClass *ndc = NVDIMM_GET_CLASS(nvdimm);
+
+if (ndc->unrealize) {
+ndc->unrealize(nvdimm);
+}
  }
  
  /*

@@ -240,6 +255,7 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
  DeviceClass *dc = DEVICE_CLASS(oc);
  
  ddc->realize = nvdimm_realize;

+ddc->unrealize = nvdimm_unrealize;
  mdc->get_memory_region = nvdimm_md_get_memory_region;
  device_class_set_props(dc, nvdimm_properties);
  
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c

index 48b913aba6..03bd0dd60e 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -216,6 +216,11 @@ static void pc_dimm_realize(DeviceState *dev, Error **errp)
  static void pc_dimm_unrealize(DeviceState *dev)
  {
  PCDIMMDevice *dimm = PC_DIMM(dev);
+PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+
+if (ddc->unrealize) {
+ddc->unrealize(dimm);
+}
  
  host_memory_backend_set_mapped(dimm->hostmem, false);

  }
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index bcf62f825c..cf8f59be44 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -103,6 +103,8 @@ struct NVDIMMClass {
  /* write @size bytes from @buf to NVDIMM label data at @offset. */
  void (*write_label_data)(NVDIMMDevice *nvdimm, const void *buf,
   uint64_t size, uint64_t offset);
+void (*realize)(NVDIMMDevice *nvdimm, Error **errp);
+void (*unrealize)(NVDIMMDevice *nvdimm);
  };
  
  #define NVDIMM_DSM_MEM_FILE "etc/acpi/nvdimm-mem"

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 1473e6db62..322bebe555 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -63,6 +63,7 @@ struct PCDIMMDeviceClass {
  
  /* public */

  void (*realize)(PCDIMMDevice *dimm, Error **errp);
+void (*unrealize)(PCDIMMDevice *dimm);
  };
  
  void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState *machine,

Re: [PATCH v3 1/1] virtio: fix the condition for iommu_platform not supported

2022-02-02 Thread Michael S. Tsirkin

On Wed, Feb 02, 2022 at 05:23:53PM +0100, Halil Pasic wrote:
> On Wed, 2 Feb 2022 10:24:51 -0300
> Daniel Henrique Barboza  wrote:
> 
> > On 2/1/22 22:15, Halil Pasic wrote:
> > > On Tue, 1 Feb 2022 16:31:22 -0300
> > > Daniel Henrique Barboza  wrote:
> > >   
> > >> On 2/1/22 15:33, Halil Pasic wrote:  
> > >>> On Tue, 1 Feb 2022 12:36:25 -0300
> > >>> Daniel Henrique Barboza  wrote:
> > >>>  
> > > +vdev_has_iommu = virtio_host_has_feature(vdev, 
> > > VIRTIO_F_IOMMU_PLATFORM);
> > > if (klass->get_dma_as != NULL && has_iommu) {
> > > virtio_add_feature(&vdev->host_features, 
> > > VIRTIO_F_IOMMU_PLATFORM);
> > > vdev->dma_as = klass->get_dma_as(qbus->parent);
> > > +if (!vdev_has_iommu && vdev->dma_as != 
> > > &address_space_memory) {
> > > +error_setg(errp,
> > > +   "iommu_platform=true is not supported by the 
> > > device");
> > > +}  
> > 
> >  
> > > } else {
> > > vdev->dma_as = &address_space_memory;
> > > }  
> > 
> > 
> >  I struggled to understand what this 'else' clause was doing and I 
> >  assumed that it was
> >  wrong. Searching through the ML I learned that this 'else' clause is 
> >  intended to handle
> >  legacy virtio devices that doesn't support the DMA API (introduced in 
> >  8607f5c3072caeebb)
> >  and thus shouldn't set  VIRTIO_F_IOMMU_PLATFORM.
> > 
> > 
> >  My suggestion, if a v4 is required for any other reason, is to add a 
> >  small comment in this
> >  'else' clause explaining that this is the legacy virtio devices 
> >  condition and those devices
> >  don't set F_IOMMU_PLATFORM. This would make the code easier to read 
> >  for a virtio casual like
> >  myself.  
> > >>>
> > >>> I do not agree that this is about legacy virtio. In my understanding
> > >>> virtio-ccw simply does not need translation because CCW devices use
> > >>> guest physical addresses as per architecture. It may be considered
> > >>> legacy stuff form PCI perspective, but I don't think it is legacy
> > >>> in general.  
> > >>
> > >>
> > >> I wasn't talking about virtio-ccw. I was talking about this piece of 
> > >> code:
> > >>
> > >>
> > >>   if (klass->get_dma_as != NULL && has_iommu) {
> > >>   virtio_add_feature(&vdev->host_features, 
> > >> VIRTIO_F_IOMMU_PLATFORM);
> > >>   vdev->dma_as = klass->get_dma_as(qbus->parent);
> > >>   } else {
> > >>   vdev->dma_as = &address_space_memory;
> > >>   }
> > >>
> > >>
> > >> I suggested something like this:
> > >>
> > >>
> > >>
> > >>   if (klass->get_dma_as != NULL && has_iommu) {
> > >>   virtio_add_feature(&vdev->host_features, 
> > >> VIRTIO_F_IOMMU_PLATFORM);
> > >>   vdev->dma_as = klass->get_dma_as(qbus->parent);
> > >>   } else {
> > >>   /*
> > >>* We don't force VIRTIO_F_IOMMU_PLATFORM for legacy devices, 
> > >> i.e.
> > >>* devices that don't implement klass->get_dma_as, regardless 
> > >> of
> > >>* 'has_iommu' setting.
> > >>*/
> > >>   vdev->dma_as = &address_space_memory;
> > >>   }
> > >>
> > >>
> > >> At least from my reading of commits 8607f5c3072 and 2943b53f682 this 
> > >> seems to be
> > >> the case. I spent some time thinking that this IF/ELSE was wrong because 
> > >> I wasn't
> > >> aware of this history.  
> > > 
> > > With virtio-ccw we take the else branch because we don't implement  
> > > ->get_dma_as(). I don't consider all the virtio-ccw to be legacy.  
> > > 
> > > IMHO there are two ways to think about this:
> > > a) The commit that introduced this needs a fix which implemets
> > > get_dma_as() for virtio-ccw in a way that it simply returns
> > > address_space_memory.
> > > b) The presence of ->get_dma_as() is not indicative of "legacy".
> > > 
> > > BTW in virtospeak "legacy" has a special meaning: pre-1.0 virtio. Do you
> > > mean that legacy. And if I read the virtio-pci code correctly  
> > > ->get_dma_as is set for legacy, transitional and modern devices alike.  
> > 
> > 
> > Oh ok. I'm not well versed into virtiospeak. My "legacy" comment was a poor 
> > choice of
> > word for the situation.
> > 
> > We can ignore the "legacy" bit. My idea/suggestion is to put a comment at 
> > that point
> > explaining the logic behind into not forcing VIRTIO_F_IOMMU_PLATFORM in 
> > devices that
> > doesn't implement ->get_dma_as().
> > 
> > I am assuming that this is an intended design that was introduced by 
> > 2943b53f682
> > ("virtio: force VIRTIO_F_IOMMU_PLATFORM"), meaning that the implementation 
> > of the
> > ->get_dma_as is being used as a parameter to force the feature in the 
> > device. And with  
> > this code:
> > 
> > 
> >  if (klass->get_dma_as != NULL && has_iommu) {
> >  virtio_add_feature(&vdev->host_features,

Re: [PATCH v5 03/18] pci: isolated address space for PCI bus

2022-02-02 Thread Alex Williamson

On Wed, 2 Feb 2022 09:30:42 +
Peter Maydell  wrote:

> On Tue, 1 Feb 2022 at 23:51, Alex Williamson  
> wrote:
> >
> > On Tue, 1 Feb 2022 21:24:08 +
> > Jag Raman  wrote:  
> > > The PCIBus data structure already has address_space_mem and
> > > address_space_io to contain the BAR regions of devices attached
> > > to it. I understand that these two PCIBus members form the
> > > PCI address space.  
> >
> > These are the CPU address spaces.  When there's no IOMMU, the PCI bus is
> > identity mapped to the CPU address space.  When there is an IOMMU, the
> > device address space is determined by the granularity of the IOMMU and
> > may be entirely separate from address_space_mem.  
> 
> Note that those fields in PCIBus are just whatever MemoryRegions
> the pci controller model passed in to the call to pci_root_bus_init()
> or equivalent. They may or may not be specifically the CPU's view
> of anything. (For instance on the versatilepb board, the PCI controller
> is visible to the CPU via several MMIO "windows" at known addresses,
> which let the CPU access into the PCI address space at a programmable
> offset. We model that by creating a couple of container MRs which
> we pass to pci_root_bus_init() to be the PCI memory and IO spaces,
> and then using alias MRs to provide the view into those at the
> guest-programmed offset. The CPU sees those windows, and doesn't
> have direct access to the whole PCIBus::address_space_mem.)
> I guess you could say they're the PCI controller's view of the PCI
> address space ?

Sure, that's fair.

> We have a tendency to be a bit sloppy with use of AddressSpaces
> within QEMU where it happens that the view of the world that a
> DMA-capable device matches that of the CPU, but conceptually
> they can definitely be different, especially in the non-x86 world.
> (Linux also confuses matters here by preferring to program a 1:1
> mapping even if the hardware is more flexible and can do other things.
> The model of the h/w in QEMU should support the other cases too, not
> just 1:1.)

Right, this is why I prefer to look at the device address space as
simply an IOVA.  The IOVA might be a direct physical address or
coincidental identity mapped physical address via an IOMMU, but none of
that should be the concern of the device.
 
> > I/O port space is always the identity mapped CPU address space unless
> > sparse translations are used to create multiple I/O port spaces (not
> > implemented).  I/O port space is only accessed by the CPU, there are no
> > device initiated I/O port transactions, so the address space relative
> > to the device is irrelevant.  
> 
> Does the PCI spec actually forbid any master except the CPU from
> issuing I/O port transactions, or is it just that in practice nobody
> makes a PCI device that does weird stuff like that ?

As realized in reply to MST, more the latter.  Not used, no point to
enabling, no means to enable depending on the physical IOMMU
implementation.  Thanks,

Alex

Re: [PATCH v6 21/33] block: move BQL logic of bdrv_co_invalidate_cache in bdrv_activate

2022-02-02 Thread Paolo Bonzini


On 1/27/22 12:03, Kevin Wolf wrote:

+int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
+{
+Error *local_err = NULL;
+
+if (bs->drv->bdrv_co_invalidate_cache) {
+bs->drv->bdrv_co_invalidate_cache(bs, &local_err);
+if (local_err) {
+bs->open_flags |= BDRV_O_INACTIVE;


This doesn't feel like the right place. The flag is cleared by the
caller, so it should also be set again on failure by the caller and not
by this function.

What bdrv_co_invalidate_cache() could do is assert that BDRV_O_INACTIVE
is cleared when it's called.


Do you think this would be handled more easily into its own series?

In general, the work in this series is more incremental than its size 
suggests.  Perhaps it should be flushed out in smaller pieces.


Paolo

Re: [PATCH] hw/i2c: flatten pca954x mux device

2022-02-02 Thread Philippe Mathieu-Daudé via

On 2/2/22 17:40, Patrick Venture wrote:

Philippe,

I0202 08:29:45.380384  6641 stream.go:31] qemu: child buses at
"pca9546": "channel[*]", "channel[*]", "channel[*]", "channel[*]"

Ok, so that's interesting.  In one system (using qom-list) it's
correct, but then when using it to do path assignment
(qdev-monitor), it fails...

I'm not as fond of the name i2c-bus.%d, since they're referred to as
channels in the datasheet.  If I do the manual name creation, can I
keep the name channel or should I pivot over?

Thanks

-- >8 --
diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
index f9ce633b3a..a9517b612a 100644
--- a/hw/i2c/i2c_mux_pca954x.c
+++ b/hw/i2c/i2c_mux_pca954x.c
@@ -189,9 +189,11 @@ static void pca954x_init(Object *obj)

       /* SMBus modules. Cannot fail. */
       for (i = 0; i < c->nchans; i++) {
+        g_autofree gchar *bus_name =
g_strdup_printf("i2c.%d", i);
+
           /* start all channels as disabled. */
           s->enabled[i] = false;
-        s->bus[i] = i2c_init_bus(DEVICE(s), "channel[*]");
+        s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
       }
   }

---

(look at HMP 'info qtree' output).

 >       }
 >   }

With the change:
Reviewed-by: Philippe Mathieu-Daudé mailto:f4...@amsat.org>>
Tested-by: Philippe Mathieu-Daudé mailto:f4...@amsat.org>>

Just saw your reply, and found a bunch of other non-spam in my spam 
folder.  I sent the message to the anti-spam team, hopefully that'll 
resolve this for myself and presumably others.

Thanks. I suppose the problem is the amsat.org domain.

I definitely see the same result with the qdev-monitor, but was really 
surprised that the qom-list worked.  I'll explicitly set the name, and 
i2c.%d is fine.  The detail that they're channels is not really 
important to the end user presumably.

I agree it is better to follow datasheets, thus I am fine if you
change and use channel. How would it look like? "channel.0"?
FYI qdev busses are described in docs/qdev-device-use.txt.

We should be able to plug a device using some command line
such "-device i2c_test_dev,bus=channel.0,addr=0x55".
I wonder how to select the base PCA9548 ...

Maybe we need to pass the PCA ID to pca954x_init(), so we can
name "channel.2.0" for the 1st channel on the 2nd PCA?

Regards,

Phil.

Re: [PATCH 10/12] block.c: add subtree_drains where needed

2022-02-02 Thread Paolo Bonzini


On 2/2/22 16:37, Emanuele Giuseppe Esposito wrote:

So we have disk B with backing file C, and new disk A that wants to have
backing file C.

I think I understand what you mean, so in theory the operation would be
- create new child
- add child to A->children list
- add child to C->parents list

So in theory we need to
* drain A (without subtree), because it can't happen that child nodes of
   A have in-flight requests that look at A status (children list), right?
   In other words, if A has another node X, can a request in X inspect
   A->children
* drain C, as parents can inspect C status (like B). Same assumption
   here, C->children[x]->bs cannot have requests inspecting C->parents
   list?


In that case (i.e. if parents have to be drained, but children need not) 
bdrv_drained_begin_unlocked would be enough, right?


That would mean that ->children is I/O state but ->parents is global 
state.  I think it's quite a bit more complicated to analyze and to 
understand.


Paolo

Re: [PATCH v3 04/11] 9p: darwin: Handle struct dirent differences

2022-02-02 Thread Christian Schoenebeck

On Mittwoch, 2. Februar 2022 16:07:09 CET Will Cohen wrote:
> Does the version proposed in v3 address the V9fsFidState issues? In 9p.c
> for v2 to v3, we propose
> 
> -return telldir(fidp->fs.dir.stream);
> +return v9fs_co_telldir(pdu, fidp);
> 
> and in codir.c from v2 to v3 we propose
> -saved_dir_pos = telldir(fidp->fs.dir.stream);
> +saved_dir_pos = s->ops->telldir(&s->ctx, &fidp->fs);
> 
> This removes the direct access to fidp->, and we hope this should be
> sufficient to avoid the concurrency
> and undefined behaviors you noted in the v2 review.

I am not sure why you think that you are no longer accessing fidp, you still 
do, just in a slightly different way.

Let me propose a different solution: on macOS there is 'd_seekoff' in struct 
dirent. As already discussed that dirent field is apparently unused (zero) by 
macOS. So what about filling this dirent field (early, on driver level, not on 
server/controller level [9p.c]) with telldir() for macOS, then you have the 
same info as other systems provide with dirent field 'd_off' later on.

Then you can add an inline helper function or a macro to deal with macOS vs. 
RoW, e.g.:

inline
off_t qemu_dirent_off(struct dirent *dent)
{
#ifdef CONFIG_DARWIN
return dent->d_seekoff;
#else
return dent->d_off;
#endif
}

And in 9p.c at all locations where dent->d_off is currently accessed, you 
would just use that helper instead.

Best regards,
Christian Schoenebeck

[PULL 5/6] hw/display/artist: Mouse cursor fixes for HP-UX

2022-02-02 Thread Helge Deller

This patch fix the behaviour and positioning of the X11 mouse cursor in HP-UX.

The current code missed to subtract the offset of the CURSOR_CTRL register from
the current mouse cursor position. The HP-UX graphics driver stores in this
register the offset of the mouse graphics compared to the current cursor
position.  Without this adjustment the mouse behaves strange at the screen
borders.

Additionally, depending on the HP-UX version, the mouse cursor position
in the cursor_pos register reports different values. To accommodate this
track the current min and max reported values and auto-adjust at runtime.

With this fix the mouse now behaves as expected on HP-UX 10 and 11.

Signed-off-by: Helge Deller 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 42 ++
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 442bdbc130..8a9fa482d0 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -80,6 +80,7 @@ struct ARTISTState {
 uint32_t line_pattern_skip;

 uint32_t cursor_pos;
+uint32_t cursor_cntrl;

 uint32_t cursor_height;
 uint32_t cursor_width;
@@ -301,19 +302,42 @@ static void artist_get_cursor_pos(ARTISTState *s, int *x, 
int *y)
 {
 /*
  * Don't know whether these magic offset values are configurable via
- * some register. They are the same for all resolutions, so don't
- * bother about it.
+ * some register. They seem to be the same for all resolutions.
+ * The cursor values provided in the registers are:
+ * X-value: -295 (for HP-UX 11) and 338 (for HP-UX 10.20) up to 2265
+ * Y-value: 1146 down to 0
+ * The emulated Artist graphic is like a CRX graphic, and as such
+ * it's usually fixed at 1280x1024 pixels.
+ * Because of the maximum Y-value of 1146 you can not choose a higher
+ * vertical resolution on HP-UX (unless you disable the mouse).
  */

-*y = 0x47a - artist_get_y(s->cursor_pos);
-*x = ((artist_get_x(s->cursor_pos) - 338) / 2);
+static int offset = 338;
+int lx;
+
+/* ignore if uninitialized */
+if (s->cursor_pos == 0) {
+*x = *y = 0;
+return;
+}
+
+lx = artist_get_x(s->cursor_pos);
+if (lx < offset)
+offset = lx;
+*x = (lx - offset) / 2;
+
+*y = 1146 - artist_get_y(s->cursor_pos);
+
+/* subtract cursor offset from cursor control register */
+*x -= (s->cursor_cntrl & 0xf0) >> 4;
+*y -= (s->cursor_cntrl & 0x0f);

 if (*x > s->width) {
-*x = 0;
+*x = s->width;
 }

 if (*y > s->height) {
-*y = 0;
+*y = s->height;
 }
 }

@@ -1027,6 +1051,7 @@ static void artist_reg_write(void *opaque, hwaddr addr, 
uint64_t val,
 break;

 case CURSOR_CTRL:
+combine_write_reg(addr, val, size, &s->cursor_cntrl);
 break;

 case IMAGE_BITMAP_OP:
@@ -1331,8 +1356,8 @@ static int vmstate_artist_post_load(void *opaque, int 
version_id)

 static const VMStateDescription vmstate_artist = {
 .name = "artist",
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .post_load = vmstate_artist_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT16(height, ARTISTState),
@@ -1352,6 +1377,7 @@ static const VMStateDescription vmstate_artist = {
 VMSTATE_UINT32(line_end, ARTISTState),
 VMSTATE_UINT32(line_xy, ARTISTState),
 VMSTATE_UINT32(cursor_pos, ARTISTState),
+VMSTATE_UINT32(cursor_cntrl, ARTISTState),
 VMSTATE_UINT32(cursor_height, ARTISTState),
 VMSTATE_UINT32(cursor_width, ARTISTState),
 VMSTATE_UINT32(plane_mask, ARTISTState),
--
2.34.1

[PULL 6/6] hw/display/artist: Fix draw_line() artefacts

2022-02-02 Thread Helge Deller

From: Sven Schnelle 

The draw_line() function left artefacts on the screen because it was using the
x/y variables which were incremented in the loop before. Fix it by using the
unmodified x1/x2 variables instead.

Signed-off-by: Sven Schnelle 
Signed-off-by: Helge Deller 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Helge Deller 
---
 hw/display/artist.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/display/artist.c b/hw/display/artist.c
index 8a9fa482d0..1d877998b9 100644
--- a/hw/display/artist.c
+++ b/hw/display/artist.c
@@ -553,10 +553,11 @@ static void draw_line(ARTISTState *s,
 }
 x++;
 } while (x <= x2 && (max_pix == -1 || --max_pix > 0));
+
 if (c1)
-artist_invalidate_lines(buf, x, dy+1);
+artist_invalidate_lines(buf, x1, x2 - x1);
 else
-artist_invalidate_lines(buf, y, dx+1);
+artist_invalidate_lines(buf, y1 > y2 ? y2 : y1, x2 - x1);
 }

 static void draw_line_pattern_start(ARTISTState *s)
--
2.34.1

1 2 >

1 - 100 of 181 matches

Mail list logo