Re: [ovs-discuss] bug in ovs-vswitchd?!

Daniele Di Proietto Thu, 14 Jul 2016 18:47:40 -0700

2016-07-13 2:54 GMT-07:00 Mechthild Buescher <
mechthild.buesc...@ericsson.com>:


> Hi Daniele,
>
>
>
> You are right – after pulling the latest master branch, the ovs-vswitchd
> doesn’t crash any more. Whether it’s
> b59cc14e032da370021794bfd1bdd3d67e88a9a3 (netdev-dpdk: Use instant sending
> instead of queueing of packets.) or one of the other queue-related patches,
> I can’t say. There were a bunch of updates since last week ;-)
>

Thanks for confirming this.  I think we're going to need to backport that
commit to branch-2.5


>
>
> A quick repetition of some of our performance measurements shows that the
> performance is slightly better for multi-queue than for single queue and a
> little bit higher than before with using the older ovs-version. But that
> needs to be proofed..
>
>
>
> The configured_tx_queues for dpdk ports is still 21.  I saw another patch
> which introduces n_txq which I haven’t tried yet but will do soon….
>

That n_txq is only used for testing, the number of transmission queues is
determined by the datapath or by qemu in case of vhost.


>
>
> Thank you very much!
>
>
>
> BR/Mechthild
>
>
>
> *From:* Daniele Di Proietto [mailto:diproiet...@ovn.org]
> *Sent:* Wednesday, July 13, 2016 3:40 AM
> *To:* Mechthild Buescher
> *Cc:* Aaron Conole; b...@openvswitch.org
>
> *Subject:* Re: [ovs-discuss] bug in ovs-vswitchd?!
>
>
>
> Hi Mechthild,
>
> Have you tried with the latest master?
>
> I suspect this might have already been fixed by b59cc14e032d("netdev-dpdk:
> Use instant sending instead of queueing of packets.").  In this case I need
> to backport that on branch-2.5 as well.
>
> Thanks,
>
> Daniele
>
>
>
> 2016-07-12 17:23 GMT-07:00 Mechthild Buescher <
> mechthild.buesc...@ericsson.com>:
>
> Hi Aaron,
>
> I checked whether vhost-sock-permissions is needed to run our tests and
> can confirm that it's not needed. The removal of vhost-sock-permissions,
> however, does not make a difference for 2 cores and 1 queue - ovs-vswitchd
> is still crashing. Regarding the solution for the socket access: It was
> needed to updated some libvirt related apparmor files to get it running
> (e.g.  in the dedicated /etc/apparmor.d/libvirt/libvirt-<uuid>.files). As
> far as I remember, for older libvirt version, it was also needed to update
> /etc/qemu.conf. But the updates very much depend on which libvirt version
> is used and where the sockets are stored. So I cannot give a general
> solution to socket-access problem as there might be different reason. Of
> course qemu must be client of the socket if ovs is the server - I guess you
> know this ;-)
>
> I failed to configure less than 21 tx queues, neither
> ovs-vsctl --no-wait set Open_vSwitch . other_config:n-dpdk-txqs=2
> nor
> ovs-vsctl --no-wait set Interface dpdk0 options:n_txq=2
> changes the value of
> dpdk0 1/2: (dpdk: configured_rx_queues=2, configured_tx_queues=21,
> requested_rx_queues=2, requested_tx_queues=21)
>
> But since the number of tx_queues is 21 for all our configurations, I
> wonder why it leads to a crash of ovs-vswitchd only if the number of cores
> is 2 and the number of queues is 1.
>
> Best regards,
>
> Mechthild
>
> -----Original Message-----
> From: Aaron Conole [mailto:acon...@redhat.com]
>
> Sent: Tuesday, July 12, 2016 4:46 PM
> To: Mechthild Buescher
> Cc: Stokes, Ian; b...@openvswitch.org
> Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
>
> Mechthild Buescher <mechthild.buesc...@ericsson.com> writes:
>
> > Hi Aaron,
> >
> > I think that the vhost-sock-permissions is not needed - I will check
> > whether it makes a different. It's a left-over from an earlier
> > configuration where we had the problem that qemu (started by libvirt)
> > wasn't able to access the socket. This problem has been solved.
>
> Can you share the solution?  It's unrelated, but this is something I'm
> trying to solve at the moment.
>
> > The tx queues are not configured by us, so I don't know where this
> > value comes from. Maybe it's the default value?! So, it's not
> > intended.
>
> Okay, that could likely be the problem.  Please try setting the TX queues
> first, and if that resolves the crash there is either a setup script or
> possibly code error.
>
> > cat /proc/cpuinfo
> > processor     : 0
> > vendor_id     : GenuineIntel
> > cpu family    : 6
> > model         : 62
> > model name    : Intel(R) Xeon(R) CPU E5-2658 v2 @ 2.40GHz
>
> Thanks for this, it helps to locate which vectorization code dpdk will use.
>
> > stepping      : 4
> > microcode     : 0x40c
> > cpu MHz               : 1200.000
> > cache size    : 25600 KB
> > physical id   : 0
> > siblings      : 20
> > core id               : 0
> > cpu cores     : 10
> > apicid                : 0
> > initial apicid        : 0
> > fpu           : yes
> > fpu_exception : yes
> > cpuid level   : 13
> > wp            : yes
> > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> > pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
> > pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
> > xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
> > ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2
> > x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida
> > arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
> > fsgsbase smep erms
> > bogomips      : 4799.93
> > clflush size  : 64
> > cache_alignment       : 64
> > address sizes : 46 bits physical, 48 bits virtual
> > power management:
> >
> > Thanks again,
> >
> > BR/Mechthild
> >
> > -----Original Message-----
> > From: Aaron Conole [mailto:acon...@redhat.com]
> > Sent: Monday, July 11, 2016 11:06 PM
> > To: Mechthild Buescher
> > Cc: Stokes, Ian; b...@openvswitch.org
> > Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
> >
> > Mechthild Buescher <mechthild.buesc...@ericsson.com> writes:
> >
> >> Hi Aaron,
> >>
> >> Sorry for being unclear regarding the VM: I meant the DPDK usage
> >> inside the VM. So, the fault happens when using the VM. Inside the VM
> >> I can either bind the interfaces to DPDK or to linux - in both cases,
> >> the fault occurs.
> >>
> >> And I haven't applied any patch. I used the latest available version
> >> from the master branch - I don't know whether any patch is upstreamed
> >> to the master branch.
> >
> > Okay - I wonder what the vhost-sock-permissions command line is all
> about, then?
> >
> > Can you confirm that 21 tx queues is not intended, then (21 tx queues
> >> is showing in your configuration output)?
> >
> > Also, please send the cpu information (cat /proc/cpuinfo on the host).
> >
> >> Thanks in advance for your help,
> >>
> >> BR/Mechthild
> >>
> >> -----Original Message-----
> >> From: Aaron Conole [mailto:acon...@redhat.com]
> >> Sent: Monday, July 11, 2016 7:22 PM
> >> To: Mechthild Buescher
> >> Cc: Stokes, Ian; b...@openvswitch.org
> >> Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
> >>
> >> Mechthild Buescher <mechthild.buesc...@ericsson.com> writes:
> >>
> >>> Hi Ian,
> >>>
> >>> Thanks for the fast reply! I also did some further investigations
> >>> where I could see that ovs-vswitchd usually keeps alive for
> >>> receiving packets but crashes for sending packets.
> >>>
> >>> Regarding your question:
> >>> 1. We are running 1 VM with 2 vhost ports (in the simplified setup;
> >>> in the complete setup we use 1 VM & 5 vhost ports)
> >>>
> >>> 2. We are using libvirt to start the VM which is configured to use
> qemu:
> >>> /usr/bin/qemu-system-x86_64 -name
> >>> guest=ubuntu11_try,debug-threads=on
> >>> -S -machine pc-i440fx-wily,accel=kvm,usb=off -cpu host -m 8192
> >>> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
> >>> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge_1G/
> >>> l
> >>> i
> >>> bvirt/qemu,share=yes,size=8589934592
> >>> -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid
> >>> 8a2ad7a3-9da1-4c69-a2ff-c7a680d9bc4a -no-user-config -nodefaults
> >>> -chardev
> >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-300-ubuntu11
> >>> _
> >>> t
> >>> ry/monitor.sock,server,nowait -mon
> >>> chardev=charmonitor,id=monitor,mode=control -rtc base=utc
> >>> -no-shutdown -boot strict=on -device
> >>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> >>> file=/root/perf/vms/ubuntu11.qcow2,format=qcow2,if=none,id=drive-vir
> >>> t
> >>> i
> >>> o-disk0
> >>> -device
> >>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,
> >>> i
> >>> d
> >>> =virtio-disk0,bootindex=1
> >>> -netdev tap,fd=21,id=hostnet0 -device
> >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:92:47,bus=pci.
> >>> 0
> >>> ,addr=0x3
> >>> -netdev tap,fd=23,id=hostnet1 -device
> >>> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:3c:a3:47,bus=pci.
> >>> 0
> >>> ,addr=0x4 -chardev
> >>> socket,id=charnet2,path=/var/run/openvswitch/vhost111 -netdev
> >>> type=vhost-user,id=hostnet2,chardev=charnet2 -device
> >>> virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:a0:11:02,bus=pci.
> >>> 0
> >>> ,addr=0x5 -chardev
> >>> socket,id=charnet3,path=/var/run/openvswitch/vhost112 -netdev
> >>> type=vhost-user,id=hostnet3,chardev=charnet3 -device
> >>> virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:a0:11:03,bus=pci.
> >>> 0
> >>> ,addr=0x6
> >>> -chardev pty,id=charserial0 -device
> >>> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device
> >>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> >>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
> >>>
> >>> 3. In the VM we have both kind of interface bindings, virtio-pci and
> >>> igb_uio. For both type of interfaces, the crash of ovs-vswitchd can
> >>> be observed (The VM is still alive).
> >>>
> >>> 4.  The ovs-vswitchd is started as follows and is configured to use
> >>> vxlan tunnel:
> >>> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> >>> ovs-vsctl --no-wait set Open_vSwitch .
> >>> other_config:dpdk-lcore-mask=0x1 ovs-vsctl --no-wait set
> >>> Open_vSwitch . other_config:dpdk-socket-mem=4096,0
> >>> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=6
> >>> ovs-vsctl --no-wait set Interface dpdk0 options:n_rxq=2 ovs-vsctl
> >>> --no-wait set Interface vhost111 options:n_rxq=1 ovs-vsctl --no-wait
> >>> set Interface vhost112 options:n_rxq=1 ovs-vsctl --no-wait set
> >>> Open_vSwitch . other_config:vhost-sock-permissions=766
> >>
> >> Have you, perchance, applied some extra patches?  This was proposed,
> >>> but not accepted, as a possible workaround for a permissions issue
> >>> with ovs dpdk.
> >>
> >>> ovs-vswitchd --pidfile=$DB_PID --detach --monitor
> >>> --log-file=$LOG_FILE -vfile:dbg --no-chdir -vconsole:emer --mlockall
> >>> unix:$DB_SOCK
> >>>
> >>> ovs-vsctl add-port br-int vhost111 -- set Interface vhost111
> >>> type=dpdkvhostuser ofport_request=11 ovs-vsctl add-port br-int
> >>> vhost112 -- set Interface vhost112 type=dpdkvhostuser
> >>> ofport_request=12 ovs-vsctl add-br br-int -- set bridge br-int
> >>> datapath_type=netdev ovs-vsctl set Bridge br-int
> >>> other_config:datapath-id=0000f2b811144f41
> >>> ovs-vsctl set Bridge br-int protocols=OpenFlow13 ovs-vsctl add-port
> >>> br-int vxlan0 -- set interface vxlan0 type=vxlan
> >>> options:remote_ip=10.1.2.2 options:key=flow ofport_request=100
> >>>
> >>> 5. The ovs-log is attached - it contains the log from start to crash
> >>> (with debug information). The crash has been provoked by setting up
> >>> an virtio-pci in the VM, so no DPDK is used in the VM for this
> scenario.
> >>>
> >>> 6. The DPDK versions are:
> >>> Host: dpdk 16.04 latest commit
> >>> b3b9719f18ee83773c6ed7adda300c5ac63c37e9
> >>> VM: (not used in this scenario) dpdk 2.2.0
> >>
> >> For confirmation, this happens whether or not you use a VM?  I just
> >>> want to make sure.  It's usually best to pair dpdk versions whenever
> >>> possible.
> >>
> >>> BR/Mechthild
> >>>
> >>> -----Original Message-----
> >>> From: Stokes, Ian [mailto:ian.sto...@intel.com]
> >>> Sent: Thursday, July 07, 2016 1:57 PM
> >>> To: Mechthild Buescher; b...@openvswitch.org
> >>> Subject: RE: bug in ovs-vswitchd?!
> >>>
> >>> Hi Mechthild,
> >>>
> >>> I've tried to reproduce this issue on my setup (Fedora 22 kerenl
> >>> 4.1.8) but have not been able to reproduce it.
> >>>
> >>> A few questions to help the investigation
> >>>
> >>> 1. Are you running 1 or 2 VMs in the setup (i.e. 1 vm with 2 vhost
> >>> user ports or 2 vms with 1 vhost user port each?) 2. What are the
> >>> parameters being used to launch the VM/s attached to the vhost user
> >>> ports?
> >>> 3. Inside the VM, are the interfaces bound to? igb_uio (i.e. using
> >>> dpdk app inside the guest) or are the interfaces being used as
> >>> kernel devices inside the VM?
> >>> 4. What parameters are you launching OVS with?
> >>> 5. Can you provide an ovs log?
> >>> 6. Can you confirm the DPDK version you are using in the host/VM (If
> >>> being used in the VM).
> >>>
> >>> Thanks
> >>> Ian
> >>>
> >>>> From: discuss [mailto:discuss-boun...@openvswitch.org] On Behalf Of
> >>>> Mechthild Buescher
> >>>> Sent: Wednesday, July 06, 2016 1:54 PM
> >>>> To: b...@openvswitch.org
> >>>> Subject: [ovs-discuss] bug in ovs-vswitchd?!
> >>>>
> >>>> Hi all,
> >>>>
> >>>> we are using ovs with dpdk-interfaces and vhostuser-interfaces and
> >>>> want to try the VMs with different multi-queue settings. When we
> >>>> specify 2 cores and 2 multi-queues for a dpdk-interface but only
> >>>> one queue >for the vhost-interfaces, ovs-vswitchd crashes at start
> >>>> of the VM (or latest when traffic is sent).
> >>>>
> >>>> The version of ovs is : 2.5.90, master branch, latest commit
> >>>> 7a15be69b00fe8f66a3f3929434b39676f325a7a)
> >>>> It has been built and is running on: Linux version
> >>>> 3.13.0-87-generic
> >>>> (buildd@lgw01-25) (gcc version 4.8.4 (Ubuntu
> >>>> 4.8.4-2ubuntu1~14.04.3)
> >>>> ) #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016
> >>>>
> >>>> The configuration is:
> >>>> ovs-vsctl show
> >>>> 0e191ed4-040b-458c-bad8-feb6f7c90e3a
> >>>>     Bridge br-prv
> >>>>         Port br-prv
> >>>>             Interface br-prv
> >>>>                 type: internal
> >>>>         Port "dpdk0"
> >>>>             Interface "dpdk0"
> >>>>                 type: dpdk
> >>>>                 options: {n_rxq="2"}
> >>>>     Bridge br-int
> >>>>         Port br-int
> >>>>             Interface br-int
> >>>>                 type: internal
> >>>>         Port "vhost112"
> >>>>             Interface "vhost112"
> >>>>                 type: dpdkvhostuser
> >>>>                 options: {n_rxq="1"}
> >>>>         Port "vhost111"
> >>>>             Interface "vhost111"
> >>>>                 type: dpdkvhostuser
> >>>>                options: {n_rxq="1"}
> >>>>         Port "vxlan0"
> >>>>             Interface "vxlan0"
> >>>>                 type: vxlan
> >>>>                 options: {key=flow, remote_ip="10.1.2.2"}
> >>>>
> >>>> ovs-appctl dpif-netdev/pmd-rxq-show  pmd thread numa_id 0 core_id
> >>>>1:
> >>>>                 port: vhost112   queue-id: 0
> >>>>                 port: dpdk0        queue-id: 1  pmd thread numa_id
> >>>>0 core_id 2:
> >>>>                 port: dpdk0        queue-id: 0
> >>>>                 port: vhost111   queue-id: 0
> >>>>
> >>>> appctl-m dpif/show
> >>>>                 br-int:
> >>>>                                 br-int 65534/6: (tap)
> >>>>                                 vhost111 11/3: (dpdkvhostuser:
> >>>>configured_rx_queues=1, configured_tx_queues=1,
> >>>>requested_rx_queues=1,
> >>>>requested_tx_queues=21)
> >>>>                                 vhost112 12/5: (dpdkvhostuser:
> >>>>configured_rx_queues=1, configured_tx_queues=1,
> >>>>requested_rx_queues=1,
> >>>>requested_tx_queues=21)
> >>>>                                 vxlan0 100/4: (vxlan: key=flow,
> >>>>remote_ip=10.1.2.2)
> >>>>                 br-prv:
> >>>>                                 br-prv 65534/1: (tap)
> >>>>                                 dpdk0 1/2: (dpdk:
> >>>>configured_rx_queues=2, configured_tx_queues=21,
> >>>>requested_rx_queues=2,
> >>>>requested_tx_queues=21)
> >>
> >> I'm a little concerned about the numbers reported here.  21 tx queues
> >> is a bit much, I think.  I haven't tried reproducing this yet, but
> >> can you confirm this is desired?
> >>
> >>>>
> >>>>  (gdb) bt
> >>>> #0  0x00000000005356e4 in ixgbe_xmit_pkts_vec ()
> >>>> #1  0x00000000006df384 in rte_eth_tx_burst (nb_pkts=<optimized
> >>>> out>, tx_pkts=<optimized out>, queue_id=1, port_id=<optimized out>)
> >>>>     at
> >>>> /opt/dpdk-16.04/x86_64-native-linuxapp-gcc//include/rte_ethdev.h:27
> >>>> 9
> >>>> 1
> >>>> #2  dpdk_queue_flush__ (qid=<optimized out>, dev=<optimized out>)
> >>>> at
> >>>> lib/netdev-dpdk.c:1099
> >>>> #3  dpdk_queue_flush (qid=<optimized out>, dev=<optimized out>) at
> >>>> lib/netdev-dpdk.c:1133
> >>>> #4  netdev_dpdk_rxq_recv (rxq=0x7fbe127ad4c0,
> >>>> packets=0x7fc26761e408,
> >>>> c=0x7fc26761e400) at lib/netdev-dpdk.c:1312
> >>>> #5  0x000000000061be98 in netdev_rxq_recv (rx=<optimized out>,
> >>>> batch=batch@entry=0x7fc26761e400) at lib/netdev.c:628
> >>>> #6  0x00000000005f17bb in dp_netdev_process_rxq_port
> >>>> (pmd=pmd@entry=0x29ea810, rxq=<optimized out>, port=<optimized
> >>>> out>, port=<optimized out>)
> >>>>     at lib/dpif-netdev.c:2619
> >>>> #7  0x00000000005f1b27 in pmd_thread_main (f_=0x29ea810) at
> >>>> lib/dpif-netdev.c:2864
> >>>> #8  0x000000000067dde4 in ovsthread_wrapper (aux_=<optimized out>)
> >>>> at
> >>>> lib/ovs-thread.c:342
> >>>> #9  0x00007fc26b90e184 in start_thread (arg=0x7fc26761f700) at
> >>>> pthread_create.c:312
> >>>> #10 0x00007fc26af2237d in clone () at
> >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> >>>>
> >>>> This is the minimal configuration which leads to the fault. Our
> >>>> complete configuration contains more vhostuser interfaces than
> >>>> above. We observed that only the combination of 2 cores/queues for
> >>>> dpdk- interface and 1 queue for vhostuser interfaces results in an
> >>>> ovs-vswitchd crash, in detail:
> >>>> Dpdk0: 1 cores/queues & all vhost-ports: 1 queue => successful
> >>>> Dpdk0: 2 cores/queues & all vhost-ports: 1 queue => crash
> >>>> Dpdk0: 2 cores/queues & all vhost-ports: 2 queue => successful
> >>>> Dpdk0: 4 cores/queues & all vhost-ports: 1 queue => successful
> >>>> Dpdk0: 4 cores/queues & all vhost-ports: 2 queue => successful
> >>>> Dpdk0: 4 cores/queues & all vhost-ports: 4 queue => successful
> >>>>
> >>>> Do you have any suggestions?
> >>
> >> Can you please also supply the cpu (model number) that you're using?
> >>
> >> Thanks,
> >> Aaron
> >>
> >>>> Best regards,
> >>>>
> >>>> Mechthild Buescher
> _______________________________________________
> discuss mailing list
> discuss@openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
>
>

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] bug in ovs-vswitchd?!

Reply via email to