On Wed, Dec 11, 2013 at 03:45:29PM +0100, Paolo Bonzini wrote: > Il 11/12/2013 15:20, Michael S. Tsirkin ha scritto: > > > It means that its necessary to expose that 3-4GB physical memory region > > > in QEMU belongs to the same node (that is, guest must be aware that > > > 3-3.75GB and the tail of RAM are on the same node). > > > > > > So the problem Paolo mentions is fixable. > > I'm not sure if it is fixable. You need a 2M mountpoint to bind the 3G-4G > range correctly, a 1G mountpoint for everything else, and QEMU only allows > to specify one path. > > Without Marcelo's patch there is a workaround; if you know the size of the 4G > hole and configure the first two nodes with unequal sizes. For example > > -m 8192 \ > -object memory-ram,id=ram-node0,size=3840M,hostnode=0 -numa > node,memdev=ram-node0 \ > -object memory-ram,id=ram-node1,size=4352M,hostnode=1 -numa > node,memdev=ram-node1 > > RAM address Host virtual address low bits Guest physical > addresses > 0M-3840M 0 0M-3840M > 3840M-8192M 0 4096M-8448M > > Then you'll waste 1GB of RAM (you'll use 9 hugepages instead of 8), but > everything will be aligned. Or you just make your guest 7680M and not waste > the memory. > > But with Marcelo's patch, ram-node1 will be split in two. QEMU will try > to realign the second part of ram-node1, but the result is that the second > part is misaligned and only the first 256M (the tail of guest physical > memory) stays aligned: > > RAM address Host virtual address low bits Guest physical > addresses > 0M-3840M 0 0M-3840M > 4096M-8192M 256M 4096M-8192M > 3840M-4096M 0 8192M-8448M > > So you still waste memory, _and_ get incorrect alignment.
You are adding a new aspect to the problem (that host memory is created as separate devices). $subject alignment code aligns on top of one hosts virtually contiguous address range. As mentioned in the earlier thread with Igor, this is fixable as long as two memory devices map to a single host virtually contiguous range. But if you think that aligning to 1GB guest hole alignment, don't have a problem with that. For the NUMA problem, as long as its possible to specify multiple physical address ranges for a single node (which is true), the problem you raise is fixable. All is necessary is to expose to the guest which non physically contiguous 1GB ranges ("representing" 1GB pages in the host), reside in what node. > > Okay so > > Marcelo - do you ack this patch for 2.0? > > Paolo - do you re-ack this patch for 2.0? > > I very much prefer Gerd's approach. 2GB low memory for q35 is a bit wasteful, > but we have some time to fix that before release. > > Paolo OK.