> On 21 Apr 2017, at 04:10, Steven Luong (sluong) <slu...@cisco.com> wrote: > > Eric, > > How do you configure the startup.conf with multiple worker threads? Did you > change both corelist-workers and workers? For example, this is how I > configure 2 worker threads using core 2 and 14. > > corelist-workers 2,14 > workers 2 > > Any chance you can start vpp with gdb to get the backtrace to see where it > went belly up?
Those 2 options are exclusive to each other, either corelist-workers or workers should be used… > > Steven > > On 4/20/17, 5:32 PM, "Ernst, Eric" <eric.er...@intel.com> wrote: > > Makes sense, thanks Steven. > > One more round of questions -- I expected the numbers I got between the > two VMs (~2gpbs) given that I had just a single core running for VPP. I went > ahead and amended my startup.conf in order to make use of 2 and then again as > 4 worker threads, all within the same socket. > > After booting the VMs and testing basic connectivity (ping!), I begin to > either run ab and nginx, or just iperf between the VMs. In either case, in > short time VPP crashes. Does this ring a bell? I am still ramping on VPP > and understand I likely am making some assumptions that are wrong. > Guidance? > > With two workers: > Apr 20 17:17:03 eernstworkstation systemd[1]: > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: > Job > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start > timed out. > Apr 20 17:17:03 eernstworkstation systemd[1]: Timed out waiting for device > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device. > Apr 20 17:17:03 eernstworkstation systemd[1]: Dependency failed for > /dev/disk/by-uuid/def55f66-6b20-47c6-a02f-bdaf324ed3b7. > Apr 20 17:17:03 eernstworkstation systemd[1]: > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap: > Job > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap/start > failed with result 'dependenc > Apr 20 17:17:03 eernstworkstation systemd[1]: > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: > Job > dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start > failed with result 'timeo > Apr 20 17:17:06 eernstworkstation vpp[38637]: /usr/bin/vpp[38637]: > received signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770 > Apr 20 17:17:06 eernstworkstation /usr/bin/vpp[38637]: received signal > SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770 > Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Main process > exited, code=killed, status=6/ABRT > Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Unit entered > failed state. > Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Failed with > result 'signal'. > Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Service > hold-off time over, scheduling restart. > > Apr 20 17:17:06 eernstworkstation systemd[1]: Stopped vector packet > processing engine. > > > > -----Original Message----- > From: Steven Luong (sluong) [mailto:slu...@cisco.com] > Sent: Thursday, April 20, 2017 4:33 PM > To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com> > Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04? > > Eric, > > In my testing, I notice my number is 2 to 3X better when coalesce is > disabled. I am using Ivy Bridge. So it looks like the mileage varies a lot > with Sandy Bridge, 40X better. > > What is coalesce? > When the driver places descriptors into the vring, it may request > interrupt or no interrupt after the device is done processing with the > descriptors. If the driver wants interrupt, the device may send it > immediately if coalesce is not enabled. If it is enabled, the device will > delay posting the interrupt until more descriptors are received to meet the > coalesce number. This is an attempt to reduce the number of interrupts > generated to the driver. My guess is when coalesce is enabled, the > application, iperf3 in this case, is not shooting packets as fast as it can > until it receives the interrupt for the packets sent. Thus the total > bandwidth number looks bad. By disabling coalesce, the application is > shooting a lot more packets in the interval at the expense of more interrupts > are generated in the VM. > > I don’t know why coalesce is enabled by default. This was done before I > was born. Damjan or others may chime in for this and the answer for 2) as > well. Show errors is all I know. > > Steven > > On 4/20/17, 3:54 PM, "Ernst, Eric" <eric.er...@intel.com> wrote: > > Steven, > > Thanks for the help. As before, setup is described @ > https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 (updated > since I no longer am using the evil feature mask). > > I'm going to need to read up on what coalesce frames setting is doing > .... > > Without that set, you can find my output from iperf3 appended. No > retransmissions in the output, and no errors observed on VPP side (that is, > nothing notable in systemctl status vpp). > > When I set coalesce frames I see *major* improvements -- getting in > the ballbark of what I would expect for a single thread; about 2 gbps. Phew > -a major relief . Couple things: > 1) So, can you tell me more about what this is doing, and why this > isn't enabled by default. > 2) Is there a straight forward way to monitor VPP setup (particular > counters) to identify where the issue is? > > Thanks again! > > Cheers, > Eric > > ------- > *Server*: > # iperf3 -s > ----------------------------------------------------------- > Server listening on 5201 > ----------------------------------------------------------- > Accepted connection from 192.168.0.2, port 41058 > [ 5] local 192.168.0.1 port 5201 connected to 192.168.0.2 port 41060 > [ ID] Interval Transfer Bandwidth > [ 5] 0.00-1.00 sec 12.8 MBytes 107 Mbits/sec > [ 5] 1.00-2.00 sec 7.93 MBytes 66.5 Mbits/sec > [ 5] 2.00-3.00 sec 7.94 MBytes 66.6 Mbits/sec > [ 5] 3.00-4.00 sec 5.37 MBytes 45.0 Mbits/sec > [ 5] 4.00-5.00 sec 5.29 MBytes 44.4 Mbits/sec > [ 5] 5.00-6.00 sec 4.28 MBytes 35.9 Mbits/sec > [ 5] 6.00-7.00 sec 4.14 MBytes 34.8 Mbits/sec > [ 5] 7.00-8.00 sec 4.14 MBytes 34.7 Mbits/sec > [ 5] 8.00-9.00 sec 4.14 MBytes 34.8 Mbits/sec > [ 5] 9.00-10.00 sec 4.14 MBytes 34.7 Mbits/sec > [ 5] 10.00-10.03 sec 133 KBytes 34.9 Mbits/sec > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth > [ 5] 0.00-10.03 sec 0.00 Bytes 0.00 bits/sec > sender > [ 5] 0.00-10.03 sec 60.3 MBytes 50.4 Mbits/sec > receiver > ----------------------------------------------------------- > Server listening on 5201 > ----------------------------------------------------------- > > *Client*: > # iperf3 -c 192.168.0.1 > Connecting to host 192.168.0.1, port 5201 > [ 4] local 192.168.0.2 port 41060 connected to 192.168.0.1 port 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 13.8 MBytes 116 Mbits/sec 0 8.48 KBytes > [ 4] 1.00-2.00 sec 8.05 MBytes 67.5 Mbits/sec 0 8.48 KBytes > [ 4] 2.00-3.00 sec 7.74 MBytes 64.9 Mbits/sec 0 8.48 KBytes > [ 4] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 5.66 KBytes > [ 4] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 5.66 KBytes > [ 4] 5.00-6.00 sec 4.35 MBytes 36.5 Mbits/sec 0 5.66 KBytes > [ 4] 6.00-7.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes > [ 4] 7.00-8.00 sec 4.35 MBytes 36.5 Mbits/sec 0 5.66 KBytes > [ 4] 8.00-9.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes > [ 4] 9.00-10.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 61.0 MBytes 51.2 Mbits/sec 0 > sender > [ 4] 0.00-10.00 sec 60.3 MBytes 50.6 Mbits/sec > receiver > > iperf Done. > ----- > > > > > > > > > -----Original Message----- > From: Steven Luong (sluong) [mailto:slu...@cisco.com] > Sent: Thursday, April 20, 2017 3:05 PM > To: Ernst, Eric <eric.er...@intel.com>; Billy McFall > <bmcf...@redhat.com> > Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on > 17.04? > > Eric, > > As a first step, please share the output of iperf3 to see how many > retransmissions that you have for the run. From VPP, please collect show > errors to see if vhost drops anything. As an additional data point for > comparison, please also try disabling vhost coalesce to see if you get better > result by adding the following configuration to /etc/vpp/startup.conf > > vhost-user { > coalesce-frames 0 > } > > Steven > > On 4/20/17, 2:19 PM, "vpp-dev-boun...@lists.fd.io on behalf of Ernst, > Eric" <vpp-dev-boun...@lists.fd.io on behalf of eric.er...@intel.com> wrote: > > Thanks Billy - it was through some examples that i had found that > I ended up > grabbing that. I reinstalled 1704 and can verify connectivity > when removing the > evil feature-mask. > > Thanks for the quick feedback, Damjan. If we could only go back > in time! > > Now if I could just figure out why I'm getting capped bandwidth > (via iperf) > of ~45 mbps between two VMs on the same socket on a sandybridge > xeon, I will > be really happy! If anyone has suggestions on debug methods for > this, it'd be > appreciated. I see a huge difference when switching to ovs > vhost-user, keeping > all else the same. > > --Eric > > > On Thu, Apr 20, 2017 at 04:29:23PM -0400, Billy McFall wrote: >> The vHost examples on the Wiki used the feature-mask of 0xFF. I think that >> is how it got propagated. In 16.09 when I did the CLI documentation for the >> vHost, I expanded what the bits meant and used feature-mask 0x40400000 as >> the example. I will gladly add an additional comment indicating that the >> recommended use is to leave blank if this was intended to be debug. >> >> https://docs.fd.io/vpp/17.07/clicmd_src_vnet_devices_virtio.html >> >> Billy >> >> On Thu, Apr 20, 2017 at 4:17 PM, Damjan Marion (damarion) < >> damar...@cisco.com> wrote: >> >>> >>> Eric, >>> >>> long time ago ( i think 3+ years) when I wrote original vhost-user driver >>> in vpp, >>> I added feature-mask knob to cli which messes up with feature bitmap >>> purely for debugging >>> reasons. >>> >>> And I regret many times… >>> >>> Somebody dig it out and documented it somewhere, for to me unknown reasons. >>> Now it spreads like a virus and I cannot stop it :) >>> >>> So please don’t use it, it is evil…. >>> >>> Thanks, >>> >>> Damjan >>> >>>> On 20 Apr 2017, at 20:49, Ernst, Eric <eric.er...@intel.com> wrote: >>>> >>>> All, >>>> >>>> After updating the startup.conf to not reference DPDK, per direction in >>> release >>>> notification thread, I was able to startup vpp and create interfaces. >>>> >>>> Now that I'm testing, I noticed that I can no longer ping between VM >>> hosts which >>>> make use of vhost-user interfaces and are connected via l2 bridge domain >>>> (nor l2 xconnect). I double checked, then reverted back to 17.01, where >>> I could >>>> again verify connectivity between the guests. >>>> >>>> Any else seeing this, or was there a change in how this should be set >>> up? For >>>> reference, I have my (simple) setup described @ a gist at [1]. >>>> >>>> Thanks, >>>> eric >>>> >>>> >>>> [1] - https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 >>>> _______________________________________________ >>>> vpp-dev mailing list >>>> vpp-dev@lists.fd.io >>>> https://lists.fd.io/mailman/listinfo/vpp-dev >>> >>> _______________________________________________ >>> vpp-dev mailing list >>> vpp-dev@lists.fd.io >>> https://lists.fd.io/mailman/listinfo/vpp-dev >> >> >> >> >> -- >> *Billy McFall* >> SDN Group >> Office of Technology >> *Red Hat* > _______________________________________________ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev > > > > > _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev