Re: [ovs-discuss] bug in ovs-vswitchd?!

Aaron Conole Mon, 11 Jul 2016 14:06:12 -0700

Mechthild Buescher <mechthild.buesc...@ericsson.com> writes:

> Hi Aaron,
>
> Sorry for being unclear regarding the VM: I meant the DPDK usage
> inside the VM. So, the fault happens when using the VM. Inside the VM
> I can either bind the interfaces to DPDK or to linux - in both cases,
> the fault occurs.
>
> And I haven't applied any patch. I used the latest available version
> from the master branch - I don't know whether any patch is upstreamed
> to the master branch.


Okay - I wonder what the vhost-sock-permissions command line is all
about, then?

Can you confirm that 21 tx queues is not intended, then (21 tx queues is
showing in your configuration output)?

Also, please send the cpu information (cat /proc/cpuinfo on the host).

> Thanks in advance for your help,
>
> BR/Mechthild
>
> -----Original Message-----
> From: Aaron Conole [mailto:acon...@redhat.com] 
> Sent: Monday, July 11, 2016 7:22 PM
> To: Mechthild Buescher
> Cc: Stokes, Ian; b...@openvswitch.org
> Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
>
> Mechthild Buescher <mechthild.buesc...@ericsson.com> writes:
>
>> Hi Ian,
>>
>> Thanks for the fast reply! I also did some further investigations 
>> where I could see that ovs-vswitchd usually keeps alive for receiving 
>> packets but crashes for sending packets.
>>
>> Regarding your question:
>> 1. We are running 1 VM with 2 vhost ports (in the simplified setup; in 
>> the complete setup we use 1 VM & 5 vhost ports)
>>
>> 2. We are using libvirt to start the VM which is configured to use qemu:
>> /usr/bin/qemu-system-x86_64 -name guest=ubuntu11_try,debug-threads=on
>> -S -machine pc-i440fx-wily,accel=kvm,usb=off -cpu host -m 8192 
>> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
>> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge_1G/li
>> bvirt/qemu,share=yes,size=8589934592
>> -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 
>> 8a2ad7a3-9da1-4c69-a2ff-c7a680d9bc4a -no-user-config -nodefaults 
>> -chardev 
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-300-ubuntu11_t
>> ry/monitor.sock,server,nowait -mon 
>> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
>> -boot strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
>> file=/root/perf/vms/ubuntu11.qcow2,format=qcow2,if=none,id=drive-virti
>> o-disk0
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id
>> =virtio-disk0,bootindex=1
>> -netdev tap,fd=21,id=hostnet0 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:92:47,bus=pci.0
>> ,addr=0x3
>> -netdev tap,fd=23,id=hostnet1 -device
>> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:3c:a3:47,bus=pci.0
>> ,addr=0x4 -chardev 
>> socket,id=charnet2,path=/var/run/openvswitch/vhost111 -netdev
>> type=vhost-user,id=hostnet2,chardev=charnet2 -device
>> virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:a0:11:02,bus=pci.0
>> ,addr=0x5 -chardev 
>> socket,id=charnet3,path=/var/run/openvswitch/vhost112 -netdev
>> type=vhost-user,id=hostnet3,chardev=charnet3 -device
>> virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:a0:11:03,bus=pci.0
>> ,addr=0x6
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
>>
>> 3. In the VM we have both kind of interface bindings, virtio-pci and 
>> igb_uio. For both type of interfaces, the crash of ovs-vswitchd can be 
>> observed (The VM is still alive).
>>
>> 4.  The ovs-vswitchd is started as follows and is configured to use vxlan 
>> tunnel:
>> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true 
>> ovs-vsctl --no-wait set Open_vSwitch . 
>> other_config:dpdk-lcore-mask=0x1 ovs-vsctl --no-wait set Open_vSwitch 
>> . other_config:dpdk-socket-mem=4096,0
>> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=6 
>> ovs-vsctl --no-wait set Interface dpdk0 options:n_rxq=2 ovs-vsctl 
>> --no-wait set Interface vhost111 options:n_rxq=1 ovs-vsctl --no-wait 
>> set Interface vhost112 options:n_rxq=1 ovs-vsctl --no-wait set 
>> Open_vSwitch . other_config:vhost-sock-permissions=766
>
> Have you, perchance, applied some extra patches?  This was proposed,
>> but not accepted, as a possible workaround for a permissions issue
>> with ovs dpdk.
>
>> ovs-vswitchd --pidfile=$DB_PID --detach --monitor --log-file=$LOG_FILE 
>> -vfile:dbg --no-chdir -vconsole:emer --mlockall unix:$DB_SOCK
>>
>> ovs-vsctl add-port br-int vhost111 -- set Interface vhost111 
>> type=dpdkvhostuser ofport_request=11 ovs-vsctl add-port br-int 
>> vhost112 -- set Interface vhost112 type=dpdkvhostuser 
>> ofport_request=12 ovs-vsctl add-br br-int -- set bridge br-int 
>> datapath_type=netdev ovs-vsctl set Bridge br-int 
>> other_config:datapath-id=0000f2b811144f41
>> ovs-vsctl set Bridge br-int protocols=OpenFlow13 ovs-vsctl add-port 
>> br-int vxlan0 -- set interface vxlan0 type=vxlan
>> options:remote_ip=10.1.2.2 options:key=flow ofport_request=100
>>
>> 5. The ovs-log is attached - it contains the log from start to crash 
>> (with debug information). The crash has been provoked by setting up an 
>> virtio-pci in the VM, so no DPDK is used in the VM for this scenario.
>>
>> 6. The DPDK versions are:
>> Host: dpdk 16.04 latest commit 
>> b3b9719f18ee83773c6ed7adda300c5ac63c37e9
>> VM: (not used in this scenario) dpdk 2.2.0
>
> For confirmation, this happens whether or not you use a VM?  I just
>> want to make sure.  It's usually best to pair dpdk versions whenever
>> possible.
>
>> BR/Mechthild
>>
>> -----Original Message-----
>> From: Stokes, Ian [mailto:ian.sto...@intel.com]
>> Sent: Thursday, July 07, 2016 1:57 PM
>> To: Mechthild Buescher; b...@openvswitch.org
>> Subject: RE: bug in ovs-vswitchd?!
>>
>> Hi Mechthild,
>>
>> I've tried to reproduce this issue on my setup (Fedora 22 kerenl
>> 4.1.8) but have not been able to reproduce it.
>>
>> A few questions to help the investigation
>>
>> 1. Are you running 1 or 2 VMs in the setup (i.e. 1 vm with 2 vhost 
>> user ports or 2 vms with 1 vhost user port each?) 2. What are the 
>> parameters being used to launch the VM/s attached to the vhost user 
>> ports?
>> 3. Inside the VM, are the interfaces bound to? igb_uio (i.e. using 
>> dpdk app inside the guest) or are the interfaces being used as kernel 
>> devices inside the VM?
>> 4. What parameters are you launching OVS with?
>> 5. Can you provide an ovs log?
>> 6. Can you confirm the DPDK version you are using in the host/VM (If 
>> being used in the VM).
>>
>> Thanks
>> Ian
>>
>>> From: discuss [mailto:discuss-boun...@openvswitch.org] On Behalf Of 
>>> Mechthild Buescher
>>> Sent: Wednesday, July 06, 2016 1:54 PM
>>> To: b...@openvswitch.org
>>> Subject: [ovs-discuss] bug in ovs-vswitchd?!
>>> 
>>> Hi all,
>>> 
>>> we are using ovs with dpdk-interfaces and vhostuser-interfaces and 
>>> want to try the VMs with different multi-queue settings. When we specify 2 
>>> cores and 2 multi-queues for a dpdk-interface but only one queue >for the 
>>> vhost-interfaces, ovs-vswitchd crashes at start of the VM (or latest when 
>>> traffic is sent).
>>>
>>> The version of ovs is : 2.5.90, master branch, latest commit
>>> 7a15be69b00fe8f66a3f3929434b39676f325a7a)
>>> It has been built and is running on: Linux version 3.13.0-87-generic
>>> (buildd@lgw01-25) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) 
>>> ) #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016
>>>
>>> The configuration is:
>>> ovs-vsctl show
>>> 0e191ed4-040b-458c-bad8-feb6f7c90e3a
>>>     Bridge br-prv
>>>         Port br-prv
>>>             Interface br-prv
>>>                 type: internal
>>>         Port "dpdk0"
>>>             Interface "dpdk0"
>>>                 type: dpdk
>>>                 options: {n_rxq="2"}
>>>     Bridge br-int
>>>         Port br-int
>>>             Interface br-int
>>>                 type: internal
>>>         Port "vhost112"
>>>             Interface "vhost112"
>>>                 type: dpdkvhostuser
>>>                 options: {n_rxq="1"}
>>>         Port "vhost111"
>>>             Interface "vhost111"
>>>                 type: dpdkvhostuser
>>>                options: {n_rxq="1"}
>>>         Port "vxlan0"
>>>             Interface "vxlan0"
>>>                 type: vxlan
>>>                 options: {key=flow, remote_ip="10.1.2.2"}
>>>
>>> ovs-appctl dpif-netdev/pmd-rxq-show
>>> pmd thread numa_id 0 core_id 1:
>>>                 port: vhost112   queue-id: 0
>>>                 port: dpdk0        queue-id: 1  pmd thread numa_id 0 
>>>core_id 2:
>>>                 port: dpdk0        queue-id: 0
>>>                 port: vhost111   queue-id: 0
>>>
>>> appctl-m dpif/show
>>>                 br-int:
>>>                                 br-int 65534/6: (tap)
>>>                                 vhost111 11/3: (dpdkvhostuser: 
>>>configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1,
>>>requested_tx_queues=21)
>>>                                 vhost112 12/5: (dpdkvhostuser: 
>>>configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1,
>>>requested_tx_queues=21)
>>>                                 vxlan0 100/4: (vxlan: key=flow,
>>>remote_ip=10.1.2.2)
>>>                 br-prv:
>>>                                 br-prv 65534/1: (tap)
>>>                                 dpdk0 1/2: (dpdk: 
>>>configured_rx_queues=2, configured_tx_queues=21, 
>>>requested_rx_queues=2,
>>>requested_tx_queues=21)
>
> I'm a little concerned about the numbers reported here.  21 tx queues is a 
> bit much, I think.  I haven't tried reproducing this yet, but can you confirm 
> this is desired?
>
>>> 
>>>  (gdb) bt
>>> #0  0x00000000005356e4 in ixgbe_xmit_pkts_vec ()
>>> #1  0x00000000006df384 in rte_eth_tx_burst (nb_pkts=<optimized out>, 
>>> tx_pkts=<optimized out>, queue_id=1, port_id=<optimized out>)
>>>     at
>>> /opt/dpdk-16.04/x86_64-native-linuxapp-gcc//include/rte_ethdev.h:2791
>>> #2  dpdk_queue_flush__ (qid=<optimized out>, dev=<optimized out>) at
>>> lib/netdev-dpdk.c:1099
>>> #3  dpdk_queue_flush (qid=<optimized out>, dev=<optimized out>) at
>>> lib/netdev-dpdk.c:1133
>>> #4  netdev_dpdk_rxq_recv (rxq=0x7fbe127ad4c0, packets=0x7fc26761e408,
>>> c=0x7fc26761e400) at lib/netdev-dpdk.c:1312
>>> #5  0x000000000061be98 in netdev_rxq_recv (rx=<optimized out>,
>>> batch=batch@entry=0x7fc26761e400) at lib/netdev.c:628
>>> #6  0x00000000005f17bb in dp_netdev_process_rxq_port 
>>> (pmd=pmd@entry=0x29ea810, rxq=<optimized out>, port=<optimized out>, 
>>> port=<optimized out>)
>>>     at lib/dpif-netdev.c:2619
>>> #7  0x00000000005f1b27 in pmd_thread_main (f_=0x29ea810) at
>>> lib/dpif-netdev.c:2864
>>> #8  0x000000000067dde4 in ovsthread_wrapper (aux_=<optimized out>) at
>>> lib/ovs-thread.c:342
>>> #9  0x00007fc26b90e184 in start_thread (arg=0x7fc26761f700) at
>>> pthread_create.c:312
>>> #10 0x00007fc26af2237d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>> 
>>> This is the minimal configuration which leads to the fault. Our 
>>> complete configuration contains more vhostuser interfaces than above. We 
>>> observed that only the combination of 2 cores/queues for dpdk- interface 
>>> and 1 queue for vhostuser interfaces results in an ovs-vswitchd crash, in 
>>> detail:
>>> Dpdk0: 1 cores/queues & all vhost-ports: 1 queue => successful
>>> Dpdk0: 2 cores/queues & all vhost-ports: 1 queue => crash
>>> Dpdk0: 2 cores/queues & all vhost-ports: 2 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 1 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 2 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 4 queue => successful
>>> 
>>> Do you have any suggestions? 
>
> Can you please also supply the cpu (model number) that you're using?
>
> Thanks,
> Aaron
>
>>> Best regards,
>>>
>>> Mechthild Buescher
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] bug in ovs-vswitchd?!

Reply via email to