Concerning the CI: I'd be glad to add that test to "make test", but not
sure how to approach it. The test is not about containers but more about
using network namespaces and some tools like wrk to create a lot of TCP
connections to do some "stress testing" of VPP host stack (and as it was
noted, it fails not only on the proxy example, but also on http_static
plugin). It's probably doable w/o any external tooling at all, and even
without the network namespaces either, using only VPP's own TCP stack, but
that is probably rather hard. Could you suggest some ideas how it could be
added to "make test"? Should I add a `test_....py` under `tests/` that
creates host interfaces in VPP and uses these via OS networking instead of
the packet generator? As far as I can see there's something like that in
srv6-mobile plugin [1].

[1]
https://github.com/travelping/vpp/blob/feature/2005/upf/src/plugins/srv6-mobile/extra/runner.py#L125

On Wed, Jul 22, 2020 at 8:25 PM Florin Coras <fcoras.li...@gmail.com> wrote:

> I missed the point about the CI in my other reply. If we can somehow
> integrate some container based tests into the “make test” infra, I wouldn’t
> mind at all! :-)
>
> Regards,
> Florin
>
> On Jul 22, 2020, at 4:17 AM, Ivan Shvedunov <ivan...@gmail.com> wrote:
>
> Hi,
> sadly the patch apparently didn't work. It should have worked but for some
> reason it didn't ...
>
> On the bright side, I've made a test case [1] using fresh upstream VPP
> code with no UPF that reproduces the issues I mentioned, including both
> timer and TCP retransmit one along with some other possible problems using
> http_static plugin and the proxy example, along with nginx (with proxy) and
> wrk.
>
> It is docker-based, but the main scripts (start.sh and test.sh) can be
> used without Docker, too.
> I've used our own Dockerfiles to build the images, but I'm not sure if
> that makes any difference.
> I've added some log files resulting from the runs that crashed in
> different places. For me, the tests crash on each run, but in different
> places.
>
> The TCP retransmit problem happens with http_static when using the debug
> build. When using release build, some unrelated crash in timer_remove()
> happens instead.
> The SVM FIFO crash happens when using the proxy. It can happen with both
> release and debug builds.
>
> Please see the repo [1] for details and crash logs.
>
> [1] https://github.com/ivan4th/vpp-tcp-test
>
> P.S. As the tests do expose some problems with VPP host stack and some of
> the VPP plugins/examples, maybe we should consider adding them to the VPP
> CI, too?
>
> On Thu, Jul 16, 2020 at 8:33 PM Florin Coras <fcoras.li...@gmail.com>
> wrote:
>
>> Hi Ivan,
>>
>> Thanks for the detailed report!
>>
>> I assume this is a situation where most of the connections time out and
>> the rate limiting we apply on the pending timer queue delays handling for
>> long enough to be in a situation like the one you described. Here’s a draft
>> patch that starts tracking pending timers [1]. Let me know if it solves the
>> first problem.
>>
>> Regarding the second, it looks like the first chunk in the fifo is not
>> properly initialized/corrupted. It’s hard to tell what leads to that given
>> that I haven’t seen this sort of issues even with larger number of
>> connections. You could maybe try calling svm_fifo_is_sane() in the
>> enqueue/dequeue functions, or after the proxy allocates/shares the fifos to
>> catch the issue as early as possible.
>>
>> Regards,
>> Florin
>>
>> [1] https://gerrit.fd.io/r/c/vpp/+/27952
>>
>> On Jul 16, 2020, at 2:03 AM, ivan...@gmail.com wrote:
>>
>>   Hi,
>>   I'm working on the Travelping UPF project
>> https://github.com/travelping/vpp  <https://github.com/travelping/vpp>For
>> variety of reasons, it's presently maintained as a fork of UPF that's
>> rebased on top of upstream master from time to time, but really it's just a
>> plugin. During 40K TCP connection test with netem, I found an issue with
>> TCP timer race (timers firing after tcp_timer_reset() was called for them)
>> which I tried to work around only to stumble into another crash, which I'm
>> presently debugging (an SVM FIFO bug, possibly) but maybe some of you folks
>> have some ideas what it could be.
>>   I've described my findings in this JIRA ticket:
>> https://jira.fd.io/browse/VPP-1923
>>   Although the last upstream commit UPF is presently based on 
>> (afc233aa93c3f23b30b756cb4ae2967f968bbbb1)
>> was some time ago, I believe  the problems are still relevant as there
>> were no changes in these parts of code in master since that commit.
>> 
>>
>>
>>
>
> --
> Ivan Shvedunov <ivan...@gmail.com>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5
> 2807
>
>
>

-- 
Ivan Shvedunov <ivan...@gmail.com>
;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17047): https://lists.fd.io/g/vpp-dev/message/17047
Mute This Topic: https://lists.fd.io/mt/75537746/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to