----- Original Message ----- > On 08/28/2013 09:44 PM, Paolo Bonzini wrote: > > Il 26/08/2013 10:43, Andrew Jones ha scritto: > >> > >> ----- Original Message ----- > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote: > >>>>>>>>>> Is this patch still necessary? I thought that dropping the > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses? > >>>>>>>>>> > >>>>>>>>>> Yes, in 08/12 we also use mbind(), > >>>>>> You don't need a whole library for mbind(), it's a syscall. See > >>>>>> syscall(2). > >>>>>> > >>>>>>>>>> and in 09/12 we use max_numa_node(). > >>>>>> Really? I didn't see it there. And anyway, that goes back to our > >>>>>> discussion > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should > >>>>>> support, > >>>>>> and then just checking that we don't blow that limit whenever reading > >>>>>> host node info, i.e. > >>>>>> > >>>>>> maxnode = 0; > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES) > >>>>>> node_read(&info[maxnode++]); > >>>>>> > >>>>>> type of a thing. > >>>>>> > >>>>>> And, if there's a place you really need to know the current online > >>>>>> number > >>>>>> of host nodes, then, like I said earlier, you should just go to sysfs > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only > >>>>>> initializes > >>>>>> at library load time, so it's not going to adapt to > >>>>>> onlining/offlining. > >>>> > >>>> OK, thank you. > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall > >>>> directly, > >>>> right? > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a more > >> general lib. Whether or not we want to redefine those symbols within > >> qemu, in order to avoid the dependency on installing numactl-devel, isn't > >> something I can answer. That's a better question for Anthony. Anthony? > >> Paolo, > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the > >> linux-header synch script? > >> > > > > I think using libnuma is fine. In principle this could be used on other > > OSes than Linux, I think? > > But seems that mbind(2) is Linux-specific syscall, right? >
You would need to avoid directly calling mbind, i.e. use libnuma for all numa related calls. Then, if libnuma were to support more OSes, qemu would automatically (wrt to numa) as well. Your mbind() with libnuma would look like this numa_set_bind_policy(strict) numa_tonodemask_memory(addr, size, nodemask) The problem is that set_bind_policy only takes a bool, and thus only allows two of the four possibly policies MPOL_BIND strict == 1 MPOL_PREFERRED strict == 0 So, due to libnuma's policy setting limitations, and the fact it doesn't currently support more OSes than Linux, then I prefer your current series version that drops libnuma. If qemu will need to support NUMA on another OS, then we can cross this bridge when we get there. drew