On Thu, Sep 30, 2021 at 12:33:30PM +0200, David Hildenbrand (da...@redhat.com) wrote: > > > On 30.09.21 11:40, david.dai wrote: > > On Wed, Sep 29, 2021 at 11:30:53AM +0200, David Hildenbrand > > (da...@redhat.com) wrote: > > > > > > On 27.09.21 14:28, david.dai wrote: > > > > On Mon, Sep 27, 2021 at 11:07:43AM +0200, David Hildenbrand > > > > (da...@redhat.com) wrote: > > > > > > > > > > CAUTION: This email originated from outside of the organization. Do > > > > > not > > > > > click links or open attachments unless you recognize the sender and > > > > > know the > > > > > content is safe. > > > > > > > > > > > > > > > On 27.09.21 10:27, Stefan Hajnoczi wrote: > > > > > > On Sun, Sep 26, 2021 at 10:16:14AM +0800, David Dai wrote: > > > > > > > Add a virtual pci to QEMU, the pci device is used to dynamically > > > > > > > attach memory > > > > > > > to VM, so driver in guest can apply host memory in fly without > > > > > > > virtualization > > > > > > > management software's help, such as libvirt/manager. The attached > > > > > > > memory is > > > > > > > > > > We do have virtio-mem to dynamically attach memory to a VM. It could > > > > > be > > > > > extended by a mechanism for the VM to request more/less memory, that's > > > > > already a planned feature. But yeah, virito-mem memory is exposed as > > > > > ordinary system RAM, not only via a BAR to mostly be managed by user > > > > > space > > > > > completely. > > > > > > There is a virtio-pmem spec proposal to expose the memory region via a PCI > > > BAR. We could do something similar for virtio-mem, however, we would have > > > to > > > wire that new model up differently in QEMU (it's no longer a "memory > > > device" > > > like a DIMM then). > > > > > > > > > > > > > > > > I wish virtio-mem can solve our problem, but it is a dynamic allocation > > > > mechanism > > > > for system RAM in virtualization. In heterogeneous computing > > > > environments, the > > > > attached memory usually comes from computing device, it should be > > > > managed separately. > > > > we doesn't hope Linux MM controls it. > > > > > > If that heterogeneous memory would have a dedicated node (which usually is > > > the case IIRC) , and you let it manage by the Linux kernel (dax/kmem), you > > > can bind the memory backend of virtio-mem to that special NUMA node. So > > > all > > > memory managed by that virtio-mem device would come from that > > > heterogeneous > > > memory. > > > > > > > Yes, CXL type 2, 3 devices expose memory to host as a dedicated node, the > > node > > is marked as soft_reserved_memory, dax/kmem can take over the node to > > create a > > dax devcie. This dax device can be regarded as the memory backend of > > virtio-mem > > > > I don't sure whether a dax device can be open by multiple VMs or host > > applications. > > virito-mem currently relies on having a single sparse memory region (anon > mmap, mmaped file, mmaped huge pages, mmap shmem) per VM. Although we can > share memory with other processes, sharing with other VMs is not intended. > Instead of actually mmaping parts dynamically (which can be quite > expensive), virtio-mem relies on punching holes into the backend and > dynamically allocating memory/file blocks/... on access. > > So the easy way to make it work is: > > a) Exposing the CXL memory to the buddy via dax/kmem, esulting in device > memory getting managed by the buddy on a separate NUMA node. >
Linux kernel buddy system? how to guarantee other applications don't apply memory from it > > b) (optional) allocate huge pages on that separate NUMA node. > c) Use ordinary memory-device-ram or memory-device-memfd (for huge pages), > *bidning* the memory backend to that special NUMA node. > "-object memory-backend/device-ram or memory-device-memfd, id=mem0, size=768G" How to bind backend memory to NUMA node > > This will dynamically allocate memory from that special NUMA node, resulting > in the virtio-mem device completely being backed by that device memory, > being able to dynamically resize the memory allocation. > > > Exposing an actual devdax to the virtio-mem device, shared by multiple VMs > isn't really what we want and won't work without major design changes. Also, > I'm not so sure it's a very clean design: exposing memory belonging to other > VMs to unrelated QEMU processes. This sounds like a serious security hole: > if you managed to escalate to the QEMU process from inside the VM, you can > access unrelated VM memory quite happily. You want an abstraction > in-between, that makes sure each VM/QEMU process only sees private memory: > for example, the buddy via dax/kmem. > Hi David Thanks for your suggestion, also sorry for my delayed reply due to my long vacation. How does current virtio-mem dynamically attach memory to guest, via page fault? Thanks, David > -- > Thanks, > > David / dhildenb > >