As you know, QEMU can't direct it's memory allocation now, this may cause guest cross node access performance regression. And, the worse thing is that if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages().
KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =>kvm_assign_device() => kvm_iommu_map_memslots() => kvm_iommu_map_pages() => kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policy before the pages are really mapped. According to this patch set, we are able to set guest nodes memory policy like following: -numa node,nodeid=0,cpus=0, \ -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ -numa node,nodeid=1,cpus=1 \ -numa mem,size=1024M,policy=interleave,host-nodes=1 This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format. Also add "set-mem-policy" QMP and hmp command to set memory policy. And patch 10/11 adds a QMP command "query-numa" to show numa info through this API. And patch 11/11 converts the "info numa" monitor command to use this QMP command "query-numa". V1->V2: change to use QemuOpts in numa options (Paolo) handle Error in mpol parser (Paolo) change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo) V2->V3: also handle Error in cpus parser (5/10) split out common parser from cpus and hostnode parser (Bandan 6/10) V3-V4: rebase to request for comments V4->V5: use OptVisitor and split -numa option (Paolo) - s/set-mpol/set-mem-policy (Andreas) - s/mem-policy/policy - s/mem-hostnode/host-nodes fix hmp command process after error (Luiz) add qmp command query-numa and convert info numa to it (Luiz) V5->V6: remove tabs in json file (Laszlo, Paolo) add back "-numa node,mem=xxx" as legacy (Paolo) change cpus and host-nodes to array (Laszlo, Eric) change "nodeid" to "uint16" add NumaMemPolicy enum type (Eric) rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help V6-V7: change UInt16 to uint16 (Laszlo) fix a typo in adding qmp command set-mem-policy V7-V8: rebase to current master with Laszlo's V2 of OptsVisitor patch set fix an adding white space line error V8->V9: rebase to current master check if total numa memory size is equal to ram_size (Paolo) add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo) replace the use of numa_num_configured_nodes() (Andrew) avoid abusing the fact i==nodeid (Andrew) Wanlong Gao (12): NUMA: add NumaOptions, NumaNodeOptions and NumaMemOptions NUMA: split -numa option NUMA: check if the total numa memory size is equal to ram_size NUMA: move numa related code to numa.c NUMA: Add numa_info structure to contain numa nodes info NUMA: Add Linux libnuma detection NUMA: parse guest numa nodes memory policy NUMA: set guest numa nodes memory policy NUMA: add qmp command set-mem-policy to set memory policy for NUMA node NUMA: add hmp command set-mem-policy NUMA: add qmp command query-numa NUMA: convert hmp command info_numa to use qmp command query_numa Makefile.target | 2 +- configure | 32 ++++ cpus.c | 14 -- hmp-commands.hx | 16 ++ hmp.c | 119 +++++++++++++ hmp.h | 2 + hw/i386/pc.c | 4 +- include/sysemu/cpus.h | 1 - include/sysemu/sysemu.h | 16 +- monitor.c | 21 +-- numa.c | 455 ++++++++++++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 131 ++++++++++++++ qemu-options.hx | 6 +- qmp-commands.hx | 90 ++++++++++ vl.c | 160 ++--------------- 15 files changed, 885 insertions(+), 184 deletions(-) create mode 100644 numa.c -- 1.8.4.rc4