On Wed, Mar 12, 2025 at 03:33:12PM -0400, Gregory Price wrote: > On Wed, Mar 12, 2025 at 06:05:43PM +0000, Jonathan Cameron wrote: > > > > Longer term I remain a little unconvinced by whether this is the best > > approach > > because I also want a single management path (so fake CCI etc) and that may > > need to be exposed to one of the hosts for tests purposes. In the current > > approach commands are issued to each host directly to surface memory. > > > > Lets say we implement this > > ----------- ----------- > | Host 1 | | Host 2 | > | | | | | > | v | Add | | > | CCI | ------> | Evt Log | > ----------- ----------- > ^ > What mechanism > do you use here? > > And how does it not just replicate QMP logic? > > Not arguing against it, I just see what amounts to more code than > required to test the functionality. QMP fits the bill so split the CCI > interface for single-host management testing and the MHSLD interface.
We have recently discussed the approach internally. Our idea is to do something similar to what you have done with MHSLD emulation, use shmem dev to share information (mailbox???) between the two devices. > > Why not leave the 1-node DCD with inbound CCI interface for testing and > leave QMP interface for development of a reference fabric manager > outside the scope of another host? For this two hosts setup, for now I can see benefits, the two hosts can have different kernel, that is to say, the one served as FM only need to support for exmaple out-of-band communication with the hardware (MCTP over i2c), and do not need to evolve with what we want to test on the target host (boot with kernel with features we care). That is very important at least for test purpose, as mctp over i2c support for x86 support is not upstreamed yet, we do not want to rebase whenever the kernel is updated. More speficially, let's say, we deploy libcxlmi test framework on the FM host, and then we can test the target host whatever features needed to test (DCD etc). Again the FM host does not need to have dcd kernel support. Compared to qmp interface, since libcxlmi already supports a lot of commands and more commands are being included. It should be much more convinient than implementing them with qmp interface. Fan > > TL;DR: :[ distributed systems are hard to test > > > > > > > 2.If not fully supported yet, are there any available development > > > branches > > > or patches that implement this functionality? > > > > > > 3.Are there any guidelines or considerations for configuring and testing > > > CXL memory pooling in QEMU? > > > > There is some information in that patch series cover letter. > > > > The attached series implements an MHSLD, but implementing the pooling > mechanism (i.e. fabric manager logic) is left to the imagination of the > reader. You will want to look at Fan Ni's DCD patch set to understand > the QMP Add/Remove logic for DCD capacity. This patch set just enables > you to manage 2+ QEMU Guests sharing a DCD State in shared memory. > > So you'll have to send DCD commands individual guest QEMU via QMP, but > the underlying logic manages the shared state via locks to emulate real > MHSLD behavior. > QMP|---> Host 1 --------v > [FM]-----| [Shared State] > QMP|---> Host 2 --------^ > > This differs from a real DCD in that a real DCD is a single endpoint for > management, rather than N endpoints (1 per vm). > > |---> Host 1 > [FM] ---> [DCD] --| > |---> Host 2 > > However this is an implementation detail on the FM side, so I chose to > do it this way to simplify the QEMU MHSLD implementation. There's far > fewer interactions this way - with the downside that having one of the > hosts manage the shared state isn't possible via the current emulation. > > It could probably be done, but I'm not sure what value it has since the > FM implementation difference is a matter a small amount of python. > > It's been a while since I played with this patch set and I do not have a > reference pooling manager available to me any longer unfortunately. But > I'm happy to provide some guidance where I can. > > ~Gregory