Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?

Damjan Marion (damarion) Fri, 21 Apr 2017 02:48:38 -0700


> On 21 Apr 2017, at 04:10, Steven Luong (sluong) <slu...@cisco.com> wrote:
> 
> Eric,
> 
> How do you configure the startup.conf with multiple worker threads? Did you 
> change both corelist-workers and workers? For example, this is how I 
> configure 2 worker threads using core 2 and 14.
> 
>       corelist-workers 2,14
>       workers 2
> 
> Any chance you can start vpp with gdb to get the backtrace to see where it 
> went belly up?


Those 2 options are exclusive to each other, either corelist-workers or workers 
should be used…


> 
> Steven
> 
> On 4/20/17, 5:32 PM, "Ernst, Eric" <eric.er...@intel.com> wrote:
> 
>    Makes sense, thanks Steven.
> 
>    One more round of questions -- I expected the numbers I got between the 
> two VMs (~2gpbs) given that I had just a single core running for VPP.  I went 
> ahead and amended my startup.conf in order to make use of 2 and then again as 
> 4 worker threads, all within the same socket.
> 
>    After booting the VMs and testing basic connectivity (ping!), I begin to 
> either run ab and nginx, or just iperf between the VMs.  In either case, in 
> short time VPP crashes.  Does this ring a bell?  I am still ramping on VPP 
> and understand I likely am making some assumptions that are wrong.    
> Guidance?
> 
>    With two workers:
>    Apr 20 17:17:03 eernstworkstation systemd[1]: 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
> Job 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
>  timed out.
>    Apr 20 17:17:03 eernstworkstation systemd[1]: Timed out waiting for device 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device.
>    Apr 20 17:17:03 eernstworkstation systemd[1]: Dependency failed for 
> /dev/disk/by-uuid/def55f66-6b20-47c6-a02f-bdaf324ed3b7.
>    Apr 20 17:17:03 eernstworkstation systemd[1]: 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap: 
> Job 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap/start
>  failed with result 'dependenc
>    Apr 20 17:17:03 eernstworkstation systemd[1]: 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
> Job 
> dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
>  failed with result 'timeo
>    Apr 20 17:17:06 eernstworkstation vpp[38637]: /usr/bin/vpp[38637]: 
> received signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
>    Apr 20 17:17:06 eernstworkstation /usr/bin/vpp[38637]: received signal 
> SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
>    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Main process 
> exited, code=killed, status=6/ABRT
>    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Unit entered 
> failed state.
>    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Failed with 
> result 'signal'.
>    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Service 
> hold-off time over, scheduling restart.
> 
>    Apr 20 17:17:06 eernstworkstation systemd[1]: Stopped vector packet 
> processing engine.
> 
> 
> 
>    -----Original Message-----
>    From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
>    Sent: Thursday, April 20, 2017 4:33 PM
>    To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com>
>    Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
>    Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?
> 
>    Eric,
> 
>    In my testing, I notice my number is 2 to 3X better when coalesce is 
> disabled. I am using Ivy Bridge. So it looks like the mileage varies a lot 
> with Sandy Bridge, 40X better.
> 
>    What is coalesce?
>    When the driver places descriptors into the vring, it may request 
> interrupt or no interrupt after the device is done processing with the 
> descriptors. If the driver wants interrupt, the device may send it 
> immediately if coalesce is not enabled. If it is enabled, the device will 
> delay posting the interrupt until more descriptors are received to meet the 
> coalesce number. This is an attempt to reduce the number of interrupts 
> generated to the driver. My guess is when coalesce is enabled, the 
> application, iperf3 in this case, is not shooting packets as fast as it can 
> until it receives the interrupt for the packets sent. Thus the total 
> bandwidth number looks bad. By disabling coalesce, the application is 
> shooting a lot more packets in the interval at the expense of more interrupts 
> are generated in the VM.
> 
>    I don’t know why coalesce is enabled by default. This was done before I 
> was born. Damjan or others may chime in for this and the answer for 2) as 
> well. Show errors is all I know.
> 
>    Steven
> 
>    On 4/20/17, 3:54 PM, "Ernst, Eric" <eric.er...@intel.com> wrote:
> 
>        Steven,
> 
>        Thanks for the help.  As before, setup is described @ 
> https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 (updated 
> since I no longer am using the evil feature mask).
> 
>        I'm going to need to read up on what coalesce frames setting is doing 
> .... 
> 
>        Without that set, you can find my output from iperf3 appended.  No 
> retransmissions in the output, and no errors observed on VPP side (that is, 
> nothing notable in systemctl status vpp).
> 
>        When I set coalesce frames I see *major* improvements -- getting in 
> the ballbark of what I would expect for a single thread; about 2 gbps.  Phew 
> -a major relief .   Couple things:
>        1)  So, can you  tell me more about what this is doing, and why this 
> isn't enabled by default.
>        2) Is there a straight forward way to monitor VPP setup (particular 
> counters) to identify where the issue is?
> 
>        Thanks again!
> 
>        Cheers,
>        Eric
> 
>        -------
>        *Server*:
>        # iperf3 -s
>        -----------------------------------------------------------
>        Server listening on 5201
>        -----------------------------------------------------------
>        Accepted connection from 192.168.0.2, port 41058
>        [  5] local 192.168.0.1 port 5201 connected to 192.168.0.2 port 41060
>        [ ID] Interval           Transfer     Bandwidth
>        [  5]   0.00-1.00   sec  12.8 MBytes   107 Mbits/sec
>        [  5]   1.00-2.00   sec  7.93 MBytes  66.5 Mbits/sec
>        [  5]   2.00-3.00   sec  7.94 MBytes  66.6 Mbits/sec
>        [  5]   3.00-4.00   sec  5.37 MBytes  45.0 Mbits/sec
>        [  5]   4.00-5.00   sec  5.29 MBytes  44.4 Mbits/sec
>        [  5]   5.00-6.00   sec  4.28 MBytes  35.9 Mbits/sec
>        [  5]   6.00-7.00   sec  4.14 MBytes  34.8 Mbits/sec
>        [  5]   7.00-8.00   sec  4.14 MBytes  34.7 Mbits/sec
>        [  5]   8.00-9.00   sec  4.14 MBytes  34.8 Mbits/sec
>        [  5]   9.00-10.00  sec  4.14 MBytes  34.7 Mbits/sec
>        [  5]  10.00-10.03  sec   133 KBytes  34.9 Mbits/sec
>        - - - - - - - - - - - - - - - - - - - - - - - - -
>        [ ID] Interval           Transfer     Bandwidth
>        [  5]   0.00-10.03  sec  0.00 Bytes  0.00 bits/sec                  
> sender
>        [  5]   0.00-10.03  sec  60.3 MBytes  50.4 Mbits/sec                  
> receiver
>        -----------------------------------------------------------
>        Server listening on 5201
>        -----------------------------------------------------------
> 
>        *Client*:
>        # iperf3 -c 192.168.0.1
>        Connecting to host 192.168.0.1, port 5201
>        [  4] local 192.168.0.2 port 41060 connected to 192.168.0.1 port 5201
>        [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>        [  4]   0.00-1.00   sec  13.8 MBytes   116 Mbits/sec    0   8.48 KBytes
>        [  4]   1.00-2.00   sec  8.05 MBytes  67.5 Mbits/sec    0   8.48 KBytes
>        [  4]   2.00-3.00   sec  7.74 MBytes  64.9 Mbits/sec    0   8.48 KBytes
>        [  4]   3.00-4.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
>        [  4]   4.00-5.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
>        [  4]   5.00-6.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
>        [  4]   6.00-7.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
>        [  4]   7.00-8.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
>        [  4]   8.00-9.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
>        [  4]   9.00-10.00  sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
>        - - - - - - - - - - - - - - - - - - - - - - - - -
>        [ ID] Interval           Transfer     Bandwidth       Retr
>        [  4]   0.00-10.00  sec  61.0 MBytes  51.2 Mbits/sec    0             
> sender
>        [  4]   0.00-10.00  sec  60.3 MBytes  50.6 Mbits/sec                  
> receiver
> 
>        iperf Done.
>        -----
> 
> 
> 
> 
> 
> 
> 
> 
>        -----Original Message-----
>        From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
>        Sent: Thursday, April 20, 2017 3:05 PM
>        To: Ernst, Eric <eric.er...@intel.com>; Billy McFall 
> <bmcf...@redhat.com>
>        Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
>        Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 
> 17.04?
> 
>        Eric,
> 
>        As a first step, please share the output of iperf3 to see how many 
> retransmissions that you have for the run. From VPP, please collect show 
> errors to see if vhost drops anything. As an additional data point for 
> comparison, please also try disabling vhost coalesce to see if you get better 
> result by adding the following configuration to /etc/vpp/startup.conf
> 
>        vhost-user {
>          coalesce-frames 0
>        }
> 
>        Steven
> 
>        On 4/20/17, 2:19 PM, "vpp-dev-boun...@lists.fd.io on behalf of Ernst, 
> Eric" <vpp-dev-boun...@lists.fd.io on behalf of eric.er...@intel.com> wrote:
> 
>            Thanks Billy - it was through some examples that i had found that 
> I ended up
>            grabbing that.  I reinstalled 1704 and can verify connectivity 
> when removing the
>            evil feature-mask.
> 
>            Thanks for the quick feedback, Damjan.  If we could only go back 
> in time!  
> 
>            Now if I could just figure out why I'm getting capped bandwidth 
> (via iperf)
>            of ~45 mbps between two VMs on the same socket on a sandybridge 
> xeon, I will
>            be really happy!  If anyone has suggestions on debug methods for 
> this, it'd be
>            appreciated.  I see a huge difference when switching to ovs 
> vhost-user, keeping
>            all else the same.
> 
>            --Eric
> 
> 
>            On Thu, Apr 20, 2017 at 04:29:23PM -0400, Billy McFall wrote:
>> The vHost examples on the Wiki used the feature-mask of 0xFF. I think that
>> is how it got propagated. In 16.09 when I did the CLI documentation for the
>> vHost, I expanded what the bits meant and used feature-mask 0x40400000 as
>> the example. I will gladly add an additional comment indicating that the
>> recommended use is to leave blank if this was intended to be debug.
>> 
>> https://docs.fd.io/vpp/17.07/clicmd_src_vnet_devices_virtio.html
>> 
>> Billy
>> 
>> On Thu, Apr 20, 2017 at 4:17 PM, Damjan Marion (damarion) <
>> damar...@cisco.com> wrote:
>> 
>>> 
>>> Eric,
>>> 
>>> long time ago ( i think 3+ years) when I wrote original vhost-user driver
>>> in vpp,
>>> I added feature-mask knob to cli which messes up with feature bitmap
>>> purely for debugging
>>> reasons.
>>> 
>>> And I regret many times…
>>> 
>>> Somebody dig it out and documented it somewhere, for to me unknown reasons.
>>> Now it spreads like a virus and I cannot stop it :)
>>> 
>>> So please don’t use it, it is evil….
>>> 
>>> Thanks,
>>> 
>>> Damjan
>>> 
>>>> On 20 Apr 2017, at 20:49, Ernst, Eric <eric.er...@intel.com> wrote:
>>>> 
>>>> All,
>>>> 
>>>> After updating the startup.conf to not reference DPDK, per direction in
>>> release
>>>> notification thread, I was able to startup vpp and create interfaces.
>>>> 
>>>> Now that I'm testing, I noticed that I can no longer ping between VM
>>> hosts which
>>>> make use of vhost-user interfaces and are connected via l2 bridge domain
>>>> (nor l2 xconnect).  I double checked, then reverted back to 17.01, where
>>> I could
>>>> again verify connectivity between the guests.
>>>> 
>>>> Any else seeing this, or was there a change in how this should be set
>>> up?  For
>>>> reference, I have my (simple) setup described @ a gist at [1].
>>>> 
>>>> Thanks,
>>>> eric
>>>> 
>>>> 
>>>> [1] - https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545
>>>> _______________________________________________
>>>> vpp-dev mailing list
>>>> vpp-dev@lists.fd.io
>>>> https://lists.fd.io/mailman/listinfo/vpp-dev
>>> 
>>> _______________________________________________
>>> vpp-dev mailing list
>>> vpp-dev@lists.fd.io
>>> https://lists.fd.io/mailman/listinfo/vpp-dev
>> 
>> 
>> 
>> 
>> -- 
>> *Billy McFall*
>> SDN Group
>> Office of Technology
>> *Red Hat*
>            _______________________________________________
>            vpp-dev mailing list
>            vpp-dev@lists.fd.io
>            https://lists.fd.io/mailman/listinfo/vpp-dev
> 
> 
> 
> 
> 

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?

Reply via email to