Hi. I sent an email the other day about difficulties initializing DPDK
with a certain NIC card. Basically I got bizarre errors when I added a
dpdk port to a bridge using this card (Mellanox CX3Pro) with OVS
2.5+DPDK-16.04 (ovs 2.5 + some commits on branch-2.5, and with a patch
for DPDK 16.04 constants) after an apparently normal EAL initialization.
I found that not daemonizing the vswitchd process fixed the issue, and
created a patch to initialize the eal after daemonization instead of
before in vswitchd/ovs-vswitchd.c, and this fixed the issue. My email
was then replied to, and I was asked to try it also with 2.5.90. I did
this and the issue went away. I found that the commit that fixed it was
bab6940, which changed how dpdk was initialized amongst other things; it
was initialized this time during bridge_run which was after the
daemonization of the vswitchd process. This can't be backported to 2.5
because it's also the commit that changes DPDK to initialize itself from
the ovs database.
I then attempted to find out exactly what was causing this problem. An
obvious explanation for this issue was that rte_eal_init created threads
which were killed when, after they were created, the vswitchd process
was daemonized. So I set a watchpoint for pthread_create and fork and
ran ovs-vswitchd linked against DPDK 16.04 and found thatthere are
actually several calls to pthread_create:
RTE_LCORE_FOREACH_SLAVE(i) {
/*
* create communication pipes between master thread
* and children
*/
if (pipe(lcore_config[i].pipe_master2slave) < 0)
rte_panic("Cannot create pipe\n");
if (pipe(lcore_config[i].pipe_slave2master) < 0)
rte_panic("Cannot create pipe\n");
lcore_config[i].state = WAIT;
/* create a thread for each lcore */
ret = pthread_create(&lcore_config[i].thread_id, NULL,
eal_thread_loop, NULL);
if (ret != 0)
rte_panic("Cannot create thread\n");
/* Set thread_name for aid in debugging. */
snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
"lcore-slave-%d", i);
ret = rte_thread_setname(lcore_config[i].thread_id,
thread_name);
if (ret != 0)
RTE_LOG(ERR, EAL,
"Cannot set name for lcore thread\n");
}
This is during rte_eal_init which is called before daemonization (if
--daemonize is passed). This is true for DPDK 2.2 and DPDK 16.04 - all
these threads will die when the parent process exits as part of
daemonization according to the best of my (incomplete) knowledge about
how pthreads/unix processes work. Even so this same software without any
changes did work with the niantic NIC. Could someone explain to me if
and how this is correct, or if it needs fixing? If so, is there a chance
we can get a patch in branch-2.5 that changes the way DPDK initializes?
bab6940 can't be used because it changes also the way DPDK gets its
parameters.
Thanks,
John
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss