Eric,

How do you configure the startup.conf with multiple worker threads? Did you 
change both corelist-workers and workers? For example, this is how I configure 
2 worker threads using core 2 and 14.

        corelist-workers 2,14
        workers 2

Any chance you can start vpp with gdb to get the backtrace to see where it went 
belly up?

Steven

On 4/20/17, 5:32 PM, "Ernst, Eric" <eric.er...@intel.com> wrote:

    Makes sense, thanks Steven.
    
    One more round of questions -- I expected the numbers I got between the two 
VMs (~2gpbs) given that I had just a single core running for VPP.  I went ahead 
and amended my startup.conf in order to make use of 2 and then again as 4 
worker threads, all within the same socket.
    
    After booting the VMs and testing basic connectivity (ping!), I begin to 
either run ab and nginx, or just iperf between the VMs.  In either case, in 
short time VPP crashes.  Does this ring a bell?  I am still ramping on VPP and 
understand I likely am making some assumptions that are wrong.    Guidance?
    
    With two workers:
    Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
 timed out.
    Apr 20 17:17:03 eernstworkstation systemd[1]: Timed out waiting for device 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device.
    Apr 20 17:17:03 eernstworkstation systemd[1]: Dependency failed for 
/dev/disk/by-uuid/def55f66-6b20-47c6-a02f-bdaf324ed3b7.
    Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap: Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap/start 
failed with result 'dependenc
    Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
 failed with result 'timeo
    Apr 20 17:17:06 eernstworkstation vpp[38637]: /usr/bin/vpp[38637]: received 
signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
    Apr 20 17:17:06 eernstworkstation /usr/bin/vpp[38637]: received signal 
SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Main process 
exited, code=killed, status=6/ABRT
    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Unit entered 
failed state.
    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Failed with 
result 'signal'.
    Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Service hold-off 
time over, scheduling restart.
    
    Apr 20 17:17:06 eernstworkstation systemd[1]: Stopped vector packet 
processing engine.
    
    
    
    -----Original Message-----
    From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
    Sent: Thursday, April 20, 2017 4:33 PM
    To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com>
    Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
    Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?
    
    Eric,
    
    In my testing, I notice my number is 2 to 3X better when coalesce is 
disabled. I am using Ivy Bridge. So it looks like the mileage varies a lot with 
Sandy Bridge, 40X better.
    
    What is coalesce?
    When the driver places descriptors into the vring, it may request interrupt 
or no interrupt after the device is done processing with the descriptors. If 
the driver wants interrupt, the device may send it immediately if coalesce is 
not enabled. If it is enabled, the device will delay posting the interrupt 
until more descriptors are received to meet the coalesce number. This is an 
attempt to reduce the number of interrupts generated to the driver. My guess is 
when coalesce is enabled, the application, iperf3 in this case, is not shooting 
packets as fast as it can until it receives the interrupt for the packets sent. 
Thus the total bandwidth number looks bad. By disabling coalesce, the 
application is shooting a lot more packets in the interval at the expense of 
more interrupts are generated in the VM.
    
    I don’t know why coalesce is enabled by default. This was done before I was 
born. Damjan or others may chime in for this and the answer for 2) as well. 
Show errors is all I know.
    
    Steven
    
    On 4/20/17, 3:54 PM, "Ernst, Eric" <eric.er...@intel.com> wrote:
    
        Steven,
        
        Thanks for the help.  As before, setup is described @ 
https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 (updated since 
I no longer am using the evil feature mask).
        
        I'm going to need to read up on what coalesce frames setting is doing 
.... 
        
        Without that set, you can find my output from iperf3 appended.  No 
retransmissions in the output, and no errors observed on VPP side (that is, 
nothing notable in systemctl status vpp).
        
        When I set coalesce frames I see *major* improvements -- getting in the 
ballbark of what I would expect for a single thread; about 2 gbps.  Phew -a 
major relief .   Couple things:
        1)  So, can you  tell me more about what this is doing, and why this 
isn't enabled by default.
        2) Is there a straight forward way to monitor VPP setup (particular 
counters) to identify where the issue is?
        
        Thanks again!
        
        Cheers,
        Eric
        
        -------
        *Server*:
        # iperf3 -s
        -----------------------------------------------------------
        Server listening on 5201
        -----------------------------------------------------------
        Accepted connection from 192.168.0.2, port 41058
        [  5] local 192.168.0.1 port 5201 connected to 192.168.0.2 port 41060
        [ ID] Interval           Transfer     Bandwidth
        [  5]   0.00-1.00   sec  12.8 MBytes   107 Mbits/sec
        [  5]   1.00-2.00   sec  7.93 MBytes  66.5 Mbits/sec
        [  5]   2.00-3.00   sec  7.94 MBytes  66.6 Mbits/sec
        [  5]   3.00-4.00   sec  5.37 MBytes  45.0 Mbits/sec
        [  5]   4.00-5.00   sec  5.29 MBytes  44.4 Mbits/sec
        [  5]   5.00-6.00   sec  4.28 MBytes  35.9 Mbits/sec
        [  5]   6.00-7.00   sec  4.14 MBytes  34.8 Mbits/sec
        [  5]   7.00-8.00   sec  4.14 MBytes  34.7 Mbits/sec
        [  5]   8.00-9.00   sec  4.14 MBytes  34.8 Mbits/sec
        [  5]   9.00-10.00  sec  4.14 MBytes  34.7 Mbits/sec
        [  5]  10.00-10.03  sec   133 KBytes  34.9 Mbits/sec
        - - - - - - - - - - - - - - - - - - - - - - - - -
        [ ID] Interval           Transfer     Bandwidth
        [  5]   0.00-10.03  sec  0.00 Bytes  0.00 bits/sec                  
sender
        [  5]   0.00-10.03  sec  60.3 MBytes  50.4 Mbits/sec                  
receiver
        -----------------------------------------------------------
        Server listening on 5201
        -----------------------------------------------------------
        
        *Client*:
        # iperf3 -c 192.168.0.1
        Connecting to host 192.168.0.1, port 5201
        [  4] local 192.168.0.2 port 41060 connected to 192.168.0.1 port 5201
        [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
        [  4]   0.00-1.00   sec  13.8 MBytes   116 Mbits/sec    0   8.48 KBytes
        [  4]   1.00-2.00   sec  8.05 MBytes  67.5 Mbits/sec    0   8.48 KBytes
        [  4]   2.00-3.00   sec  7.74 MBytes  64.9 Mbits/sec    0   8.48 KBytes
        [  4]   3.00-4.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
        [  4]   4.00-5.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
        [  4]   5.00-6.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
        [  4]   6.00-7.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
        [  4]   7.00-8.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
        [  4]   8.00-9.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
        [  4]   9.00-10.00  sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
        - - - - - - - - - - - - - - - - - - - - - - - - -
        [ ID] Interval           Transfer     Bandwidth       Retr
        [  4]   0.00-10.00  sec  61.0 MBytes  51.2 Mbits/sec    0             
sender
        [  4]   0.00-10.00  sec  60.3 MBytes  50.6 Mbits/sec                  
receiver
        
        iperf Done.
        -----
        
        
        
        
        
        
        
        
        -----Original Message-----
        From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
        Sent: Thursday, April 20, 2017 3:05 PM
        To: Ernst, Eric <eric.er...@intel.com>; Billy McFall 
<bmcf...@redhat.com>
        Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
        Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 
17.04?
        
        Eric,
        
        As a first step, please share the output of iperf3 to see how many 
retransmissions that you have for the run. From VPP, please collect show errors 
to see if vhost drops anything. As an additional data point for comparison, 
please also try disabling vhost coalesce to see if you get better result by 
adding the following configuration to /etc/vpp/startup.conf
        
        vhost-user {
          coalesce-frames 0
        }
        
        Steven
        
        On 4/20/17, 2:19 PM, "vpp-dev-boun...@lists.fd.io on behalf of Ernst, 
Eric" <vpp-dev-boun...@lists.fd.io on behalf of eric.er...@intel.com> wrote:
        
            Thanks Billy - it was through some examples that i had found that I 
ended up
            grabbing that.  I reinstalled 1704 and can verify connectivity when 
removing the
            evil feature-mask.
            
            Thanks for the quick feedback, Damjan.  If we could only go back in 
time!  
            
            Now if I could just figure out why I'm getting capped bandwidth 
(via iperf)
            of ~45 mbps between two VMs on the same socket on a sandybridge 
xeon, I will
            be really happy!  If anyone has suggestions on debug methods for 
this, it'd be
            appreciated.  I see a huge difference when switching to ovs 
vhost-user, keeping
            all else the same.
            
            --Eric
            
            
            On Thu, Apr 20, 2017 at 04:29:23PM -0400, Billy McFall wrote:
            > The vHost examples on the Wiki used the feature-mask of 0xFF. I 
think that
            > is how it got propagated. In 16.09 when I did the CLI 
documentation for the
            > vHost, I expanded what the bits meant and used feature-mask 
0x40400000 as
            > the example. I will gladly add an additional comment indicating 
that the
            > recommended use is to leave blank if this was intended to be 
debug.
            > 
            > https://docs.fd.io/vpp/17.07/clicmd_src_vnet_devices_virtio.html
            > 
            > Billy
            > 
            > On Thu, Apr 20, 2017 at 4:17 PM, Damjan Marion (damarion) <
            > damar...@cisco.com> wrote:
            > 
            > >
            > > Eric,
            > >
            > > long time ago ( i think 3+ years) when I wrote original 
vhost-user driver
            > > in vpp,
            > > I added feature-mask knob to cli which messes up with feature 
bitmap
            > > purely for debugging
            > > reasons.
            > >
            > > And I regret many times…
            > >
            > > Somebody dig it out and documented it somewhere, for to me 
unknown reasons.
            > > Now it spreads like a virus and I cannot stop it :)
            > >
            > > So please don’t use it, it is evil….
            > >
            > > Thanks,
            > >
            > > Damjan
            > >
            > > > On 20 Apr 2017, at 20:49, Ernst, Eric <eric.er...@intel.com> 
wrote:
            > > >
            > > > All,
            > > >
            > > > After updating the startup.conf to not reference DPDK, per 
direction in
            > > release
            > > > notification thread, I was able to startup vpp and create 
interfaces.
            > > >
            > > > Now that I'm testing, I noticed that I can no longer ping 
between VM
            > > hosts which
            > > > make use of vhost-user interfaces and are connected via l2 
bridge domain
            > > > (nor l2 xconnect).  I double checked, then reverted back to 
17.01, where
            > > I could
            > > > again verify connectivity between the guests.
            > > >
            > > > Any else seeing this, or was there a change in how this 
should be set
            > > up?  For
            > > > reference, I have my (simple) setup described @ a gist at [1].
            > > >
            > > > Thanks,
            > > > eric
            > > >
            > > >
            > > > [1] - 
https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545
            > > > _______________________________________________
            > > > vpp-dev mailing list
            > > > vpp-dev@lists.fd.io
            > > > https://lists.fd.io/mailman/listinfo/vpp-dev
            > >
            > > _______________________________________________
            > > vpp-dev mailing list
            > > vpp-dev@lists.fd.io
            > > https://lists.fd.io/mailman/listinfo/vpp-dev
            > 
            > 
            > 
            > 
            > -- 
            > *Billy McFall*
            > SDN Group
            > Office of Technology
            > *Red Hat*
            _______________________________________________
            vpp-dev mailing list
            vpp-dev@lists.fd.io
            https://lists.fd.io/mailman/listinfo/vpp-dev
        
        
    
    

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to