ovs-numa reads the CPUs and NUMA nodes configuration from sysfs. It is only used to decide how to set the affinity of pmd threads.
This new code just needs get_mempolicy() from libnuma: it is used to get the NUMA node of a virtual address and it is simply a wrapper to a system call. Want do you think? Should we implement that in OVS? If we implement the system call wrapper in ovs-numa we could avoid depending on libnuma-dev for the build, but (I guess) we would still need to link with -lnuma, as it is a dependency of the statically linked DPDK library Thanks On 10/03/2016 14:38, "Ben Pfaff" <b...@ovn.org> wrote: >How does this numa library relate to lib/ovs-numa.[ch]? > >On Thu, Mar 10, 2016 at 01:22:42AM +0000, Daniele Di Proietto wrote: >> Thanks for the patch, I'll put this in the use case list for >> my series if I need to resend it! >> >> It would be nice to get the numa socket information without >> linking OVS with libnuma, maybe using some DPDK api. From >> a quick look I didn't find any way, but maybe you know a >> better way. >> >> Some preliminary comments inline >> >> On 04/03/2016 02:08, "dev on behalf of Ciara Loftus" >> <dev-boun...@openvswitch.org on behalf of ciara.lof...@intel.com> wrote: >> >> >This commit allows for vHost memory from QEMU, DPDK and OVS, as well >> >as the servicing PMD, to all come from the same socket. >> > >> >DPDK v2.2 introduces a new configuration option: >> >CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket >> >from which a vhost device's memory has been allocated by QEMU, and >> >accordingly reallocates device memory managed by DPDK to that same >> >socket. >> > >> >OVS by default sets the socket id of a vhost port to that of the >> >master lcore. This commit introduces the ability to update the >> >socket id of the port if it is detected (during VM boot) that the >> >port memory is not on the default NUMA node. If this is the case, the >> >mempool of the port is also changed to the new node, and a PMD >> >thread currently servicing the port will no longer, in favour of a >> >thread from the new node (if enabled in the CPU mask). >> > >> >Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> >> >--- >> > INSTALL.DPDK.md | 6 +++++- >> > acinclude.m4 | 2 +- >> > lib/netdev-dpdk.c | 25 +++++++++++++++++++++++-- >> > 3 files changed, 29 insertions(+), 4 deletions(-) >> > >> >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> >index dca79bd..82e6908 100644 >> >--- a/INSTALL.DPDK.md >> >+++ b/INSTALL.DPDK.md >> >@@ -33,6 +33,10 @@ on Debian/Ubuntu) >> > >> > `CONFIG_RTE_BUILD_COMBINE_LIBS=y` >> > >> >+ Enable NUMA-aware vHost by modifying the following in the same >>file: >> >+ >> >+ `CONFIG_RTE_LIBRTE_VHOST_NUMA=y` >> >+ >> >> I guess we should also update install_dpdk() in ./travis/build.sh to do >> this if it's required >> >> > Then run `make install` to build and install the library. >> > For default install without IVSHMEM: >> > >> >@@ -383,7 +387,7 @@ Performance Tuning: >> > >> > It is good practice to ensure that threads that are in the datapath >>are >> > pinned to cores in the same NUMA area. e.g. pmd threads and QEMU >>vCPUs >> >- responsible for forwarding. >> >+ responsible for forwarding. This is now default behavior for vHost >> >ports. >> > >> > 9. Rx Mergeable buffers >> > >> >diff --git a/acinclude.m4 b/acinclude.m4 >> >index 11c7787..432bdbd 100644 >> >--- a/acinclude.m4 >> >+++ b/acinclude.m4 >> >@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [ >> > found=false >> > save_LIBS=$LIBS >> > for extras in "" "-ldl"; do >> >- LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB" >> >+ LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma" >> >> I guess we should also list libnuma-dev in .travis.yml and something >> similar >> in rhel/openvswitch-fedora.spec >> >> > AC_LINK_IFELSE( >> > [AC_LANG_PROGRAM([#include <rte_config.h> >> > #include <rte_eal.h>], >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> >index 17b8d51..4e1ce53 100644 >> >--- a/lib/netdev-dpdk.c >> >+++ b/lib/netdev-dpdk.c >> >@@ -29,6 +29,7 @@ >> > #include <stdio.h> >> > #include <sys/types.h> >> > #include <sys/stat.h> >> >+#include <numaif.h> >> > >> > #include "dirs.h" >> > #include "dp-packet.h" >> >@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev) >> > { >> > struct netdev_dpdk *netdev; >> > bool exists = false; >> >+ int newnode = 0; >> >+ long err = 0; >> > >> > ovs_mutex_lock(&dpdk_mutex); >> > /* Add device to the vhost port with the same name as that passed >> >down. */ >> >@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev) >> > } >> > ovsrcu_set(&netdev->virtio_dev, dev); >> > exists = true; >> >+ >> >+ /* Get NUMA information */ >> >+ err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE | >> >MPOL_F_ADDR); >> >+ if (err) { >> >+ VLOG_INFO("Error getting NUMA info for vHost Device >> >'%s'", >> >+ dev->ifname); >> >+ newnode = netdev->socket_id; >> >+ } else if (newnode != netdev->socket_id) { >> >+ netdev->socket_id = newnode; >> >+ /* Change mempool to new NUMA Node */ >> >+ dpdk_mp_put(netdev->dpdk_mp); >> >+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >> >netdev->mtu); >> >+ /* Request netdev reconfiguration. The port may now be >> >+ * serviced by a PMD on the new node if enabled in the >> >cpu >> >+ * mask */ >> >+ netdev_request_reconfigure(&netdev->up); >> >> I think here I would prefer: >> >> 1) Remembering the configuration change request >> netdev->requested_socket_id = newnode >> 2) Calling netdev_request_reconfigure() >> 3) In the netdev_dpdk_vhost_user_reconfigure() method: >> netdev->socket_id = netdev_requested_socket_id >> >> Otherwise the datapath might be confused, because it assumes that the >> socket_id doesn't change while the device is polled by the pmd threads. >> >> It's safe to change almost everything inside the >> netdev_dpdk_vhost_user_reconfigure(), because the device will not be >> polled by the datapath when the function is called. >> >> The same applies for the mempool: the actual change should be done >> in netdev_dpdk_vhost_user_reconfigure(), because there might still be >> threads using it. >> >> >+ } >> >+ >> > dev->flags |= VIRTIO_DEV_RUNNING; >> > /* Disable notifications. */ >> > set_irq_status(dev); >> >@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev) >> > return -1; >> > } >> > >> >- VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", >>dev->ifname, >> >- dev->device_fh); >> >+ VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket >>%i", >> >+ dev->ifname, dev->device_fh, newnode); >> > return 0; >> > } >> > >> >-- >> >2.4.3 >> > >> >_______________________________________________ >> >dev mailing list >> >dev@openvswitch.org >> >http://openvswitch.org/mailman/listinfo/dev >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev