Hi Aaron, I ran a few tests today...some more comments on the back of this,
> -----Original Message----- > From: Aaron Conole [mailto:acon...@redhat.com] > Sent: Monday, December 21, 2015 7:24 PM > To: Traynor, Kevin > Cc: dev@openvswitch.org; Flavio Leitner > Subject: Re: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to > db > > "Traynor, Kevin" <kevin.tray...@intel.com> writes: > > Hi Aaron, > > > >> -----Original Message----- > >> From: Aaron Conole [mailto:acon...@redhat.com] > >> Sent: Friday, December 18, 2015 6:28 PM > >> To: dev@openvswitch.org > >> Cc: Flavio Leitner; Traynor, Kevin > >> Subject: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to > db > >> > >> Existing DPDK integration is provided by use of command line options which > >> must be split out and passed to librte in a special manner. However, this > >> forces any configuration to be passed by way of a special DPDK flag, and > >> interferes with ovs+dpdk packaging solutions. > >> > >> This commit delays dpdk initialization until after the OVS database > >> connection > >> is established, and then initializes librte. It pulls all of the config > data > >> from the OVS database, and assembles a new argv/argc pair to be passed > along. > > > > FYI - There is some whitespace warnings showing up when applying the > > patchset. > > D'oh! Okay, I'll make sure to fix that when I resubmit v2. > > >> > >> Signed-off-by: Aaron Conole <acon...@redhat.com> > >> --- > >> INSTALL.DPDK.md | 60 ++++++++++++----- > >> lib/netdev-dpdk.c | 172 +++++++++++++++++++++++++++++++++---------- > --- > >> -- > >> lib/netdev-dpdk.h | 19 ++++-- > >> vswitchd/bridge.c | 3 + > >> vswitchd/ovs-vswitchd.c | 25 ++----- > >> 5 files changed, 181 insertions(+), 98 deletions(-) > >> > >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > >> index 96b686c..b9d92d0 100644 > >> --- a/INSTALL.DPDK.md > >> +++ b/INSTALL.DPDK.md > >> @@ -143,22 +143,48 @@ Using the DPDK with ovs-vswitchd: > >> > >> 5. Start vswitchd: > >> > >> - DPDK configuration arguments can be passed to vswitchd via `--dpdk` > >> - argument. This needs to be first argument passed to vswitchd process. > >> - dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter > >> - for dpdk initialization. > >> + DPDK configuration arguments can be passed to vswitchd via > Open_vSwitch > >> + other_config database. The recognized configuration options are > listed. > >> > >> + * dpdk > >> + This is a bolean configuration option. A value of 'true' signals > > > > typo boolean I don't see the value of having a dpdk=true/false config option at present. It means that now a user will have to configure the build for dpdk and also set this in the database, which is more work than previous. If I build with dpdk and don't set this, I'm seeing "ovs-vswitchd: virtual memory exhausted". It might be a good idea to have a config like this in the context of a unified build where it is the only option needed to enable dpdk but that's for another day. The current build dependent dpdk_init()'s should be enough to cover with/without dpdk. > > > >> + Open_vSwitch to initialize the DPDK EAL at startup. A set of nominal > >> + defaults are provided so that simply enabling this option will be > >> sufficient > >> + to configure DPDK enabled ports. > >> + > >> + * dpdk_lcore_masknetdebv > >> + This sets the core mask affinity of non-PMD threads that are spawned > by > >> the > >> + EAL. It will not impact the affinities of the bridge, or other Open > >> vSwitch > > > > I think we need to explain a bit more about the behavior of this param and > the > > default wrt num of cores and which ones are default. > > Okay. I think you're right, because it seems like there's confusion all > over the place about the DPDK lcore threads relationship with OVS threads. > > >> + userspace threads. > >> + > >> + * dpdk_mem_channels > >> + This sets the number of memory spread channels in the CPU to be used > by > >> + DPDK. It is purely an optimization flag. > >> + > >> + * dpdk_hugepage_dir > >> + Directory where hugetlbfs is mounted > >> + > >> + * cuse_dev_name > >> + Option to set the vhost_cuse character device name. > >> + > >> + * vhost_sock_dir > >> + Option to set the path to the vhost_user unix socket files. socket-mem config is missing from list above. Also, these need to be added to vswitch.xml > >> + > >> + NOTE: Changing any of these options requires restarting the ovs- > vswitchd > >> + application. > >> + > >> ``` > >> export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > >> - ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach > >> + ovs-vsctl set Open_vSwitch . other_config:dpdk=true --no-wait needs to be used for ovs-vsctl as we're going to set these before vswitchd runs > >> + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > >> ``` > > > > To be consistent with the rest of the guide, we should leave in the command > to > > start vswitchd, even if there is nothing dpdk specific about it anymore. It > will > > save someone having to go find it. > > I did leave it in here (just cut the --dpdk -c ... -- from it). But, > I'll make sure to clean up any examples I find this way. oops...you're right, I guess I'm so used to seeing the longer dpdk version that I missed it :) > > >> > >> If allocated more than one GB hugepage (as for IVSHMEM), set amount > and > >> use NUMA node 0 memory: > >> > >> ``` > >> - ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \ > >> - -- unix:$DB_SOCK --pidfile --detach > >> + ovs-vsctl set Open_vSwitch . other_config:dpdk_socket_mem="1024,0" > >> + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > >> ``` > >> > >> 6. Add bridge & ports > >> @@ -521,11 +547,12 @@ have arbitrary names. > >> `/usr/local/var/run/openvswitch/vhost-user-1`, which you must > provide > >> to your VM on the QEMU command line. More instructions on this can > be > >> found in the next section "DPDK vhost-user VM configuration" > >> - Note: If you wish for the vhost-user sockets to be created in a > >> - directory other than `/usr/local/var/run/openvswitch`, you may > specify > >> - another location on the ovs-vswitchd command line like so: > >> + > >> + - If you wish for the vhost-user sockets to be created in a directory > >> other > >> + than `/usr/local/var/run/openvswitch`, you may specify another > location > >> + in the ovsdb like: > >> > >> - `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...` > >> + `./vswitchd/ovs-vsctl set Open_vSwitch . > >> other_config:vhost_sock_dir=path` > >> > >> DPDK vhost-user VM configuration: > >> --------------------------------- > >> @@ -638,14 +665,13 @@ DPDK vhost-cuse VM configuration: > >> > >> 1. This step is only needed if using an alternative character device. > >> > >> - The new character device filename must be specified on the vswitchd > >> - commandline: > >> + The new character device filename must be specified in the ovsdb: > >> > >> - `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c > 0x1 > >> ...` > >> + `./utilities/ovs-vsctl set Open_vSwitch . \ > >> + other_config:cuse_dev_name=my-vhost-net` > >> > >> - Note that the `--cuse_dev_name` argument and associated string must be > >> the first > >> - arguments after `--dpdk` and come before the EAL arguments. In the > >> example > >> - above, the character device to be used will be `/dev/my-vhost-net`. > >> + In the example above, the character device to be used will be > >> + `/dev/my-vhost-net`. > >> > >> 2. This step is only needed if reusing the standard character device. It > >> will > >> conflict with the kernel vhost character device so the user must first > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > >> index 9c302f0..2a81058 100644 > >> --- a/lib/netdev-dpdk.c > >> +++ b/lib/netdev-dpdk.c > >> @@ -29,6 +29,7 @@ > >> #include <stdio.h> > >> #include <sys/types.h> > >> #include <sys/stat.h> > >> +#include <getopt.h> > >> > >> #include "dirs.h" > >> #include "dp-packet.h" > >> @@ -49,6 +50,8 @@ > >> #include "timeval.h" > >> #include "unixctl.h" > >> #include "openvswitch/vlog.h" > >> +#include "smap.h" > >> +#include "vswitch-idl.h" > >> > >> #include "rte_config.h" > >> #include "rte_mbuf.h" > >> @@ -2107,10 +2110,14 @@ unlock_dpdk: > >> > >> static int > >> process_vhost_flags(char *flag, char *default_val, int size, > >> - char **argv, char **new_val) > >> + const struct ovsrec_open_vswitch *ovs_cfg, > >> + char **new_val) > >> { > >> + const char *val; > >> int changed = 0; > >> > >> + val = smap_get(&ovs_cfg->other_config, flag); > >> + > >> /* Depending on which version of vhost is in use, process the vhost- > >> specific > >> * flag if it is provided on the vswitchd command line, otherwise > resort > >> to > >> * a default value. > >> @@ -2120,9 +2127,9 @@ process_vhost_flags(char *flag, char *default_val, > int > >> size, > >> * For vhost-cuse: Process "-cuse_dev_name" to set the custom name of > >> the > >> * vhost-cuse character device. > >> */ > >> - if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) { > >> + if (val && (strlen(val) <= size)) { > >> changed = 1; > >> - *new_val = strdup(argv[2]); > >> + *new_val = strdup(val); > >> VLOG_INFO("User-provided %s in use: %s", flag, *new_val); > >> } else { > >> VLOG_INFO("No %s provided - defaulting to %s", flag, > default_val); > >> @@ -2132,68 +2139,117 @@ process_vhost_flags(char *flag, char > *default_val, > >> int size, > >> return changed; > >> } > >> > >> -int > >> -dpdk_init(int argc, char **argv) > >> +static char ** > >> +grow_argv(char ***argv, size_t cur_siz, size_t grow_by) > >> +{ > >> + char **new_argv = realloc(*argv, sizeof(char *) * (cur_siz + > grow_by)); > >> + return new_argv; > >> +} > >> + > >> +static int > >> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv) > >> +{ > >> + struct dpdk_options_map { > >> + const char *ovs_configuration; > >> + const char *dpdk_option; > >> + bool default_enabled; > >> + const char *default_value; > >> + } opts[] = { > >> + {"dpdk_lcore_mask", "-c", true, "0x1"}, > >> + {"dpdk_mem_channels", "-n", true, "4"}, > >> + {"dpdk_alloc_mem", "-m", false, NULL}, > >> + {"dpdk_socket_mem", "--socket-mem", true, "1024,0"}, > >> + {"dpdk_hugepage_dir", "--huge-dir", false, NULL}, > >> + }; > >> + int i, ret = 1; > >> + > >> + for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) { > >> + const char *lookup = smap_get(&ovs_cfg->other_config, > >> + opts[i].ovs_configuration); > >> + if(!lookup && opts[i].default_enabled) > >> + lookup = opts[i].default_value; > >> + > >> + if(lookup) { > >> + char **newargv = grow_argv(argv, ret, 2); > >> + > >> + if (!newargv) { > >> + ovs_abort(0, "grow_argv() failed to allocate memory."); > >> + } > >> + > >> + *argv = newargv; > >> + (*argv)[ret++] = strdup(opts[i].dpdk_option); > >> + (*argv)[ret++] = strdup(lookup); > >> + } > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +static char **dpdk_argv; > >> +static int dpdk_argc; > >> + > >> +static void > >> +deferred_argv_release(void) > >> { > >> int result; > >> - int base = 0; > >> - char *pragram_name = argv[0]; > >> - int err; > >> - int isset; > >> - cpu_set_t cpuset; > >> + for(result = 0; result < dpdk_argc; ++result) > >> + free(dpdk_argv[result]); > > > > coding stds - curly braces > > > >> > >> - if (argc < 2 || strcmp(argv[1], "--dpdk")) > >> - return 0; > >> + free(dpdk_argv); > >> +} > >> > >> - /* Remove the --dpdk argument from arg list.*/ > >> - argc--; > >> - argv++; > >> +static void > >> +__dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg) > >> +{ > >> + char **argv = NULL; > >> + int result; > >> + int argc; > >> + int err; > >> + cpu_set_t cpuset; > >> > >> - /* Reject --user option */ > >> - int i; > >> - for (i = 0; i < argc; i++) { > >> - if (!strcmp(argv[i], "--user")) { > >> - VLOG_ERR("Can not mix --dpdk and --user options, aborting."); > >> - } > >> + if(!smap_get_bool(&ovs_cfg->other_config, "dpdk", false)) { > >> + return; > >> } > >> > >> + VLOG_INFO("DPDK Enabled, initializing"); > >> + > >> #ifdef VHOST_CUSE > >> - if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"), > >> - PATH_MAX, argv, &cuse_dev_name)) { > >> + if (process_vhost_flags("cuse_dev_name", strdup("vhost-net"), > >> + PATH_MAX, ovs_cfg, &cuse_dev_name)) { > >> #else > >> - if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()), > >> - NAME_MAX, argv, &vhost_sock_dir)) { > >> + if (process_vhost_flags("vhost_sock_dir", strdup(ovs_rundir()), > >> + NAME_MAX, ovs_cfg, &vhost_sock_dir)) { > >> struct stat s; > >> - int err; > >> > >> err = stat(vhost_sock_dir, &s); > >> if (err) { > >> - VLOG_ERR("vHostUser socket DIR '%s' does not exist.", > >> - vhost_sock_dir); > >> - return err; > >> + ovs_abort(0, "vhost-user sock directory '%s' does not > exist.", > >> + vhost_sock_dir); > >> } > >> #endif > >> - /* Remove the vhost flag configuration parameters from the > argument > >> - * list, so that the correct elements are passed to the DPDK > >> - * initialization function > >> - */ > >> - argc -= 2; > >> - argv += 2; /* Increment by two to bypass the vhost flag > arguments > >> */ > >> - base = 2; > >> } > >> > >> /* Get the main thread affinity */ > >> CPU_ZERO(&cpuset); > >> err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), > >> &cpuset); > >> if (err) { > >> - VLOG_ERR("Thread getaffinity error %d.", err); > >> - return err; > >> + ovs_abort(0, "Thread getaffinity error %d.", err); > >> } > >> > >> - /* Keep the program name argument as this is needed for call to > >> - * rte_eal_init() > >> - */ > >> - argv[0] = pragram_name; > >> + argv = grow_argv(&argv, 0, 1); > >> + if (!argv) { > >> + ovs_abort(0, "Unable to allocate an initial argv."); > >> + } > >> + argv[0] = strdup("ovs"); /* TODO use prctl to get process name */ > >> + argc = get_dpdk_args(ovs_cfg, &argv); > >> + > >> + argv = grow_argv(&argv, argc, argc+1); > >> + if (!argv) { > >> + ovs_abort(0, "Unable to make final argv allocation."); > >> + } > >> + argv[argc] = 0; > >> + > >> + optind = 1; > >> > >> /* Make sure things are initialized ... */ > >> result = rte_eal_init(argc, argv); > >> @@ -2204,21 +2260,33 @@ dpdk_init(int argc, char **argv) > >> /* Set the main thread affinity back to pre rte_eal_init() value */ > >> err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), > >> &cpuset); > >> if (err) { > >> - VLOG_ERR("Thread setaffinity error %d", err); > >> - return err; > >> + ovs_abort(0, "Thread getaffinity error %d.", err); > >> } > >> - > >> + > >> + dpdk_argv = argv; > >> + dpdk_argc = argc; > >> + > >> + atexit(deferred_argv_release); > >> + > >> rte_memzone_dump(stdout); > >> rte_eal_init_ret = 0; > >> > >> - if (argc > result) { > >> - argv[result] = argv[0]; > >> - } > >> - > >> /* We are called from the main thread here */ > >> RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID; > >> +} > >> + > >> +void > >> +dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg) > >> +{ > >> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > >> + static bool initialized = false; /* optimization to bail out of the > >> + ovsthread_once_start lock early > */ > >> > >> - return result + 1 + base; > >> + if (!initialized && ovs_cfg && ovsthread_once_start(&once)) { > > > > If you are just worried about grabbing the mutex, then we don't need the > static > > initialized - we could move once_done() and after that once->done should do > the > > same thing. > > I think you're right. I didn't know if once->done was considered > 'public'-ly safe. > > >> + initialized = true; > >> + __dpdk_init(ovs_cfg); > >> + ovsthread_once_done(&once); > >> + } > >> } > >> > >> static const struct netdev_class dpdk_class = > >> @@ -2282,10 +2350,6 @@ netdev_dpdk_register(void) > >> { > >> static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > >> > >> - if (rte_eal_init_ret) { > >> - return; > >> - } > >> - > > > > Why did you remove this? > > Because the order of initialization changed, and drivers end up > registered before the bridge starts to run (where DPDK gets > initialized). Without this hunk, it will be difficult to add any DPDK > ports to your bridge. :) ok. I thought this might lead to some rte_* fns being called that needed rte_eal_init() first but I haven't been able to make this happen. > > >> if (ovsthread_once_start(&once)) { > >> dpdk_common_init(); > >> netdev_register_provider(&dpdk_class); > >> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h > >> index 646d3e2..6c1100a 100644 > >> --- a/lib/netdev-dpdk.h > >> +++ b/lib/netdev-dpdk.h > >> @@ -22,7 +22,9 @@ struct dp_packet; > >> > >> #define NON_PMD_CORE_ID LCORE_ID_ANY > >> > >> -int dpdk_init(int argc, char **argv); > >> +struct ovsrec_open_vswitch; > >> + > >> +void dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg); > >> void netdev_dpdk_register(void); > >> void free_dpdk_buf(struct dp_packet *); > >> int pmd_thread_setaffinity_cpu(unsigned cpu); > >> @@ -30,15 +32,20 @@ int pmd_thread_setaffinity_cpu(unsigned cpu); > >> #else > >> > >> #define NON_PMD_CORE_ID UINT32_MAX > >> - > >> #include "util.h" > >> > >> -static inline int > >> -dpdk_init(int argc, char **argv) > >> +static inline void > >> +dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg) > >> { > >> - if (argc >= 2 && !strcmp(argv[1], "--dpdk")) { > >> - ovs_fatal(0, "DPDK support not built into this copy of Open > >> vSwitch."); > >> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; > >> + > >> + if (ovs_cfg && ovsthread_once_start(&once)) { > >> + if(smap_get_bool(&ovs_cfg->other_config, "dpdk", false)) { > >> + ovs_fatal(0, "DPDK not supported in this copy of Open > >> vSwitch."); > >> + } > >> + ovsthread_once_done(&once); > >> } > >> + As per other comment regarding removing the dpdk=true/false config. If we remove the config option then we could just leave this as a dummy function. > >> return 0; > >> } > >> > >> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > >> index f8afe55..22b1891 100644 > >> --- a/vswitchd/bridge.c > >> +++ b/vswitchd/bridge.c > >> @@ -68,6 +68,7 @@ > >> #include "sflow_api.h" > >> #include "vlan-bitmap.h" > >> #include "packets.h" > >> +#include "lib/netdev-dpdk.h" > >> > >> VLOG_DEFINE_THIS_MODULE(bridge); > >> > >> @@ -2922,6 +2923,8 @@ bridge_run(void) > >> } > >> cfg = ovsrec_open_vswitch_first(idl); > >> > >> + dpdk_init(cfg); > >> + > >> /* Initialize the ofproto library. This only needs to run once, but > >> * it must be done after the configuration is set. If the > >> * initialization has already occurred, bridge_init_ofproto() > >> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c > >> index e78ecda..c448107 100644 > >> --- a/vswitchd/ovs-vswitchd.c > >> +++ b/vswitchd/ovs-vswitchd.c > >> @@ -48,7 +48,6 @@ > >> #include "openvswitch/vconn.h" > >> #include "openvswitch/vlog.h" > >> #include "lib/vswitch-idl.h" > >> -#include "lib/netdev-dpdk.h" > >> > >> VLOG_DEFINE_THIS_MODULE(vswitchd); > >> > >> @@ -71,13 +70,6 @@ main(int argc, char *argv[]) > >> int retval; > >> > >> set_program_name(argv[0]); > >> - retval = dpdk_init(argc,argv); > >> - if (retval < 0) { > >> - return retval; > >> - } > >> - > >> - argc -= retval; > >> - argv += retval; > >> > >> ovs_cmdl_proctitle_init(argc, argv); > >> service_start(&argc, &argv); > >> @@ -166,7 +158,7 @@ parse_options(int argc, char *argv[], char > >> **unixctl_pathp) > >> {"bootstrap-ca-cert", required_argument, NULL, > >> OPT_BOOTSTRAP_CA_CERT}, > >> {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY}, > >> {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM}, > >> - {"dpdk", required_argument, NULL, OPT_DPDK}, > >> + {"dpdk", optional_argument, NULL, OPT_DPDK}, > >> {NULL, 0, NULL, 0}, > >> }; > >> char *short_options = > >> ovs_cmdl_long_options_to_short_options(long_options); > >> @@ -219,7 +211,7 @@ parse_options(int argc, char *argv[], char > >> **unixctl_pathp) > >> exit(EXIT_FAILURE); > >> > >> case OPT_DPDK: > >> - ovs_fatal(0, "--dpdk must be given at beginning of command > >> line."); > >> + ovs_fatal(0, "Using --dpdk to configure DPDK is not > >> supported."); > >> break; > >> > >> default: > >> @@ -256,17 +248,8 @@ usage(void) > >> daemon_usage(); > >> vlog_usage(); > >> printf("\nDPDK options:\n" > >> - " --dpdk [VHOST] [DPDK] Initialize DPDK datapath.\n" > >> - " where DPDK are options for initializing DPDK lib and VHOST > >> is\n" > >> -#ifdef VHOST_CUSE > >> - " option to override default character device name used > for\n" > >> - " for use with userspace vHost\n" > >> - " -cuse_dev_name NAME\n" > >> -#else > >> - " option to override default directory where vhost-user\n" > >> - " sockets are created.\n" > >> - " -vhost_sock_dir DIR\n" > >> -#endif > >> + "Configuration of DPDK via command-line is removed from > this\n" > >> + "version of Open vSwitch. DPDK is configured through ovsdb.\n" > > > > cool - I think this message will be beneficial for anyone changing > > over to this code. > > I struggle with the right words to say "hey don't use this interface > anymore" - thanks for the affirmation. > > >> ); > >> printf("\nOther options:\n" > >> " --unixctl=SOCKET override default control socket > >> name\n" > >> -- > >> 2.6.1.133.gf5b6079 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev