We got a crash of ovs-vswitchd.
I start ovs following the instructions of INSTALL.DPDK.md, with the ovs master 
code.
In my enironment, there are four cpu cores, the "real_n_txq" of dpdk port is 1 
and "txq_needs_locking" is "true""...ovs-vsctl add-br br0 -- set bridge br0 
datapath_type=netdevovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 
type=dpdkovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 
type=dpdkvhost"It works, then i bring up a guest and do some tests, a 
segmentfault happenned to ovs-vswitchd when I run "iperf3" clients on guest and 
host at the same time:guest:  iperf client sends tcp packets -> dpdkvhost0 -> 
OVS -> dpdk0 -> outsidehost:  iperf client sends tcp packets -> br0 -> OVS -> 
dpdk0 -> outside The segmentfault is like this:Program received signal SIGSEGV, 
Segmentation fault.eth_em_xmit_pkts (tx_queue=0x7f623d0f9880, 
tx_pkts=0x7f623d0ebee8,     nb_pkts=65535)    at  
dpdk-2.0.0/lib/librte_pmd_e1000/em_rxtx.c:436436   ol_flags = 
tx_pkt->ol_flags;(gdb) bt#0  eth_em_xmit_pkts (tx_queue=0x7f623d0f9880, 
tx_pkts=0x7f623d0ebee8,  nb_pkts=65535)    at 
dpdk-2.0.0/lib/librte_pmd_e1000/em_rxtx.c:436#1  0x0000000000625d3d in 
rte_eth_tx_burst  at 
dpdk-2.0.0/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2572#2  
dpdk_queue_flush__ (dev=dev@entry=0x7f623d0f9940, qid=qid@entry=0)  at 
lib/netdev-dpdk.c:808#3  0x0000000000627324 in dpdk_queue_pkts (cnt=1, 
pkts=0x7fff5da5a8f0, qid=0,  dev=0x7f623d0f9940) at lib/netdev-dpdk.c:1003#4  
dpdk_do_tx_copy (netdev=netdev@entry=0x7f623d0f9940, qid=qid@entry=0, 
pkts=pkts@entry=0x7fff5da5ae80, cnt=cnt@entry=1) at lib/netdev-dpdk.c:1073#5  
0x0000000000627e96 in netdev_dpdk_send__ (may_steal=<optimized out>, cnt=1, 
pkts=0x7fff5da5ae80, qid=0, dev=0x7f623d0f9940)  at lib/netdev-dpdk.c:1116#6  
netdev_dpdk_eth_send (netdev=0x7f623d0f9940, qid=<optimized out>,  
pkts=0x7fff5da5ae80, cnt=1, may_steal=<optimized out>) at 
lib/netdev-dpdk.c:1169....The traffics of guest and br0 are processing by  main 
thread and pmd thread seperately. The pkts of br0 are sent by 
"netdev_dpdk_send__" with lock "rte_spinlock_lock" via dpdk port, and the pmd 
main thread processes the sending direction pkts of dpdk port by 
dpdk_queue_flush without lock. The carsh seems to be caused by this.
We fixed it with this patch and the segmentfault did not happen again.The 
modification does not affect the performance  phy-phy when the txq num is equal 
to cpu num, we tesed it on server with Intel Xecn E5-2630  and 82599 xge nic.I 
guess it also does not affect the performance of the opposite condition(not 
equal) for the "flush_tx" of the txq is "true" and it will be flushed every 
time in "dpdk_queue_pkts".
We dont clearly kown if there is better modificaion, some help will be greately 
appreciated.  
------------------------------------------------------------------发件人:Gray, 
Mark D <mark.d.g...@intel.com>发送时间:2015年6月5日(星期五) 22:53收件人:钢锁0310 
<l...@dtdream.com>,d...@openvswitch.com <d...@openvswitch.com>主 题:Re: [ovs-dev] 
[PATCH] netdev-dpdk: Do not flush tx queue which  is      shared among CPUs 
since it is always flushed> > When tx queue is shared among CPUS,the pkts 
always be flush in> 'netdev_dpdk_eth_send'> So it is unnecessarily for flushing 
in netdev_dpdk_rxq_recv Otherwise tx will> be accessed without lockingAre you 
seeing a specific bug or is this just to account for a device withless queues 
than pmds?> > Signed-off-by: Wei li <l...@dtdream.com>> --->  lib/netdev-dpdk.c 
| 7 +++++-->  1 file changed, 5 insertions(+), 2 deletions(-)> > diff --git 
a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 63243d8..25e3a73> 100644> --- 
a/lib/netdev-dpdk.c> +++ b/lib/netdev-dpdk.c> @@ -892,8 +892,11 @@ 
netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_,> struct dp_packet **packets,>     
 int nb_rx;> >      /* There is only one tx queue for this core.  Do not flush 
other> -     * queueus. */> -    if (rxq_->queue_id == rte_lcore_id()) {> +     
* queueus.s/queueus/queues> +     * Do not flush tx queue which is shared among 
CPUs> +     * since it is always flushed */> +    if (rxq_->queue_id == 
rte_lcore_id() &&> +  OVS_LIKELY(!dev->txq_needs_locking)) {>          
dpdk_queue_flush(dev, rxq_->queue_id);Do you see any drop in performance in a 
simple phy-phy case beforeand after this patch?>      }> > --> 1.9.5.msysgit.1> 
> > _______________________________________________> dev mailing list> 
dev@openvswitch.org> 
http://openvswitch.org/mailman/listinfo/dev_______________________________________________dev
 mailing listdev@openvswitch.orghttp://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to