On Fri, 18 Oct 2024 12:12:52 -0400 Gregory Price <gou...@gourry.net> wrote:
> From: Svetly Todorov <svetly.todo...@memverge.com> > > Using a shared-memory bytemap, validates that DC adds, releases, > and reclamations happen on extents belonging to the appropriate > host. > > The MHSLD device inherits from the CXL_TYPE3 class and adds the following > configuration options: As in previous the -- syntax is somewhat confusing as that's not how the parameters are applied. > --mhd-head=<u32> > --mhd-state_file=<str> > --mhd-init=<bool> > > --mhd-head specifies the head ID of the host on the given device. > > --mhd-state_file is the name of the shared-memory-backed file used > to store the MHD state. > > --mhd-init indicates whether this QEMU instance should initialize > the state_file; if so, the instance will create the file if it does > not exist, ftruncate it to the appropriate size, and initialize its > header. It is assumed that the --mhd-init instance is run and allowed > to completely finish configuration before any other guests access the > shared state. > > The shared state file only needs to be intialized once. Even if a guest > dies without clearing the ownership bits associated with its head-ID, > future guests with that ID will clear those bits in cxl_mhsld_realize(), > regardless of whether mhd_init is true or false. That sounds like a race condition if not all hosts are brought up before the first add. > > The following command line options create an MHSLD with 4GB of > backing memory, whose state is tracked in /dev/shm/mhd_metadata. > --mhd-init=true tells this instance to initialize the state as > described above. > > ./qemu-system_x86-64 \ > [... other options ...] \ > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \ > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \ > -object memory-backend-ram,id=mem0,size=4G \ > -device > cxl-mhsld,bus=rp0,num-dc-regions=1,volatile-dc-memdev=mem0,id=cxl-mem0,sn=66667,mhd-head=0,mhd-state_file=mhd_metadata,mhd-init=true > \ > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G \ > -qmp unix:/tmp/qmp-sock-1,server,nowait > > Once this guest completes setup, other guests looking to access the > device can be booted with the same configuration options, but with > --mhd-head != 0, > --mhd-init=false, > and a different QMP socket. > > Signed-off-by: Gregory Price <gou...@gourry.net> > Signed-off-by: Svetly Todorov <svetly.todo...@memverge.com> A few trivial things inline. In general the scheme looks workable but I'm not sure the contraints at setup time etc are suitable for an upstream solution. Certainly a useful tool to have for kernel development though so I'll try and find time in next few days to apply this on my gitlab tree. Longer term I think we need a more complex external program or a main / proxy type arrangement so that ordering requirements can be enforce and we can have richer info. Having to chat to each qmp interface independently works fine is also a bit more complex than I think we would eventually want. Having a solution in place though will make it much easier to move towards an eventual upstreamable solution. This is a great place to start from! Jonathan > +static bool cxl_mhsld_release_extent(PCIDevice *d, uint64_t dpa, uint64_t > len) > +{ > + cxl_mhsld_state_clear(CXL_MHSLD(d), dpa / MHSLD_BLOCK_SZ, > + len / MHSLD_BLOCK_SZ); Trivial but align after ( > + return true; > +} > diff --git a/hw/cxl/mhsld/mhsld.h b/hw/cxl/mhsld/mhsld.h > new file mode 100644 > index 0000000000..e7ead1f0d2 > --- /dev/null > +++ b/hw/cxl/mhsld/mhsld.h > @@ -0,0 +1,75 @@ > +/* > + * SPDX-License-Identifier: GPL-2.0-or-later > + * > + * Copyright (c) 2024 MemVerge Inc. > + * > + */ > + > +#ifndef CXL_MHSLD_H > +#define CXL_MHSLD_H > +#include <stdint.h> > +#include "hw/cxl/cxl.h" > +#include "hw/cxl/cxl_mailbox.h" > +#include "hw/cxl/cxl_device.h" > +#include "qemu/units.h" > + > +#define MHSLD_BLOCK_SZ (2 * MiB) > + > +/* > + * We limit the number of heads to prevent the shared state > + * region from becoming a major memory hog. We need 512MB of > + * memory space to track 8-host ownership of 4GB of memory in > + * blocks of 2MB. This can change if the block size is increased. I'm lost what makes up that size? > + */ > +#define MHSLD_HEADS (8) > + > +/* > + * The shared state cannot have 2 variable sized regions > + * so we have to max out the ldmap. > + */ > +typedef struct MHSLDSharedState { > + uint8_t nr_heads; > + uint8_t nr_lds; > + uint8_t ldmap[MHSLD_HEADS]; > + uint64_t nr_blocks; > + uint8_t blocks[]; > +} MHSLDSharedState; > + > +struct CXLMHSLDState { > + CXLType3Dev ct3d; > + bool mhd_init; > + char *mhd_state_file; > + int mhd_state_fd; > + size_t mhd_state_size; > + uint32_t mhd_head; > + MHSLDSharedState *mhd_state; > +}; > + > +struct CXLMHSLDClass { > + CXLType3Class parent_class; > +}; > + > +enum { > + MHSLD_MHD = 0x55, > + #define GET_MHD_INFO 0x0 > +}; > + > +/* > + * MHD Get Info Command > + * Returns information the LD's associated with this head > + */ > +typedef struct MHDGetInfoInput { > + uint8_t start_ld; > + uint8_t ldmap_len; > +} QEMU_PACKED MHDGetInfoInput; > + > +typedef struct MHDGetInfoOutput { > + uint8_t nr_lds; > + uint8_t nr_heads; > + uint16_t resv1; > + uint8_t start_ld; > + uint8_t ldmap_len; > + uint16_t resv2; > + uint8_t ldmap[]; > +} QEMU_PACKED MHDGetInfoOutput; > +#endif