How does this numa library relate to lib/ovs-numa.[ch]?
On Thu, Mar 10, 2016 at 01:22:42AM +0000, Daniele Di Proietto wrote: > Thanks for the patch, I'll put this in the use case list for > my series if I need to resend it! > > It would be nice to get the numa socket information without > linking OVS with libnuma, maybe using some DPDK api. From > a quick look I didn't find any way, but maybe you know a > better way. > > Some preliminary comments inline > > On 04/03/2016 02:08, "dev on behalf of Ciara Loftus" > <dev-boun...@openvswitch.org on behalf of ciara.lof...@intel.com> wrote: > > >This commit allows for vHost memory from QEMU, DPDK and OVS, as well > >as the servicing PMD, to all come from the same socket. > > > >DPDK v2.2 introduces a new configuration option: > >CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket > >from which a vhost device's memory has been allocated by QEMU, and > >accordingly reallocates device memory managed by DPDK to that same > >socket. > > > >OVS by default sets the socket id of a vhost port to that of the > >master lcore. This commit introduces the ability to update the > >socket id of the port if it is detected (during VM boot) that the > >port memory is not on the default NUMA node. If this is the case, the > >mempool of the port is also changed to the new node, and a PMD > >thread currently servicing the port will no longer, in favour of a > >thread from the new node (if enabled in the CPU mask). > > > >Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > >--- > > INSTALL.DPDK.md | 6 +++++- > > acinclude.m4 | 2 +- > > lib/netdev-dpdk.c | 25 +++++++++++++++++++++++-- > > 3 files changed, 29 insertions(+), 4 deletions(-) > > > >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > >index dca79bd..82e6908 100644 > >--- a/INSTALL.DPDK.md > >+++ b/INSTALL.DPDK.md > >@@ -33,6 +33,10 @@ on Debian/Ubuntu) > > > > `CONFIG_RTE_BUILD_COMBINE_LIBS=y` > > > >+ Enable NUMA-aware vHost by modifying the following in the same file: > >+ > >+ `CONFIG_RTE_LIBRTE_VHOST_NUMA=y` > >+ > > I guess we should also update install_dpdk() in ./travis/build.sh to do > this if it's required > > > Then run `make install` to build and install the library. > > For default install without IVSHMEM: > > > >@@ -383,7 +387,7 @@ Performance Tuning: > > > > It is good practice to ensure that threads that are in the datapath are > > pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs > >- responsible for forwarding. > >+ responsible for forwarding. This is now default behavior for vHost > >ports. > > > > 9. Rx Mergeable buffers > > > >diff --git a/acinclude.m4 b/acinclude.m4 > >index 11c7787..432bdbd 100644 > >--- a/acinclude.m4 > >+++ b/acinclude.m4 > >@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > > found=false > > save_LIBS=$LIBS > > for extras in "" "-ldl"; do > >- LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB" > >+ LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma" > > I guess we should also list libnuma-dev in .travis.yml and something > similar > in rhel/openvswitch-fedora.spec > > > AC_LINK_IFELSE( > > [AC_LANG_PROGRAM([#include <rte_config.h> > > #include <rte_eal.h>], > >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > >index 17b8d51..4e1ce53 100644 > >--- a/lib/netdev-dpdk.c > >+++ b/lib/netdev-dpdk.c > >@@ -29,6 +29,7 @@ > > #include <stdio.h> > > #include <sys/types.h> > > #include <sys/stat.h> > >+#include <numaif.h> > > > > #include "dirs.h" > > #include "dp-packet.h" > >@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev) > > { > > struct netdev_dpdk *netdev; > > bool exists = false; > >+ int newnode = 0; > >+ long err = 0; > > > > ovs_mutex_lock(&dpdk_mutex); > > /* Add device to the vhost port with the same name as that passed > >down. */ > >@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev) > > } > > ovsrcu_set(&netdev->virtio_dev, dev); > > exists = true; > >+ > >+ /* Get NUMA information */ > >+ err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE | > >MPOL_F_ADDR); > >+ if (err) { > >+ VLOG_INFO("Error getting NUMA info for vHost Device > >'%s'", > >+ dev->ifname); > >+ newnode = netdev->socket_id; > >+ } else if (newnode != netdev->socket_id) { > >+ netdev->socket_id = newnode; > >+ /* Change mempool to new NUMA Node */ > >+ dpdk_mp_put(netdev->dpdk_mp); > >+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, > >netdev->mtu); > >+ /* Request netdev reconfiguration. The port may now be > >+ * serviced by a PMD on the new node if enabled in the > >cpu > >+ * mask */ > >+ netdev_request_reconfigure(&netdev->up); > > I think here I would prefer: > > 1) Remembering the configuration change request > netdev->requested_socket_id = newnode > 2) Calling netdev_request_reconfigure() > 3) In the netdev_dpdk_vhost_user_reconfigure() method: > netdev->socket_id = netdev_requested_socket_id > > Otherwise the datapath might be confused, because it assumes that the > socket_id doesn't change while the device is polled by the pmd threads. > > It's safe to change almost everything inside the > netdev_dpdk_vhost_user_reconfigure(), because the device will not be > polled by the datapath when the function is called. > > The same applies for the mempool: the actual change should be done > in netdev_dpdk_vhost_user_reconfigure(), because there might still be > threads using it. > > >+ } > >+ > > dev->flags |= VIRTIO_DEV_RUNNING; > > /* Disable notifications. */ > > set_irq_status(dev); > >@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev) > > return -1; > > } > > > >- VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev->ifname, > >- dev->device_fh); > >+ VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket %i", > >+ dev->ifname, dev->device_fh, newnode); > > return 0; > > } > > > >-- > >2.4.3 > > > >_______________________________________________ > >dev mailing list > >dev@openvswitch.org > >http://openvswitch.org/mailman/listinfo/dev > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev