Some preliminary observations concerning the crashes in the proxy example:
* !rb_tree_is_init(...) assertion failures are likely caused by
multiple active_open_connected_callback() invocations for the same
connection
* f_update_ooo_deq() SIGSEGV crash is possibly caused for late callbacks
for connections that are already gone

On Wed, Jul 22, 2020 at 2:18 PM Ivan Shvedunov via lists.fd.io <ivan4th=
gmail....@lists.fd.io> wrote:

> Hi,
> sadly the patch apparently didn't work. It should have worked but for some
> reason it didn't ...
>
> On the bright side, I've made a test case [1] using fresh upstream VPP
> code with no UPF that reproduces the issues I mentioned, including both
> timer and TCP retransmit one along with some other possible problems using
> http_static plugin and the proxy example, along with nginx (with proxy) and
> wrk.
>
> It is docker-based, but the main scripts (start.sh and test.sh) can be
> used without Docker, too.
> I've used our own Dockerfiles to build the images, but I'm not sure if
> that makes any difference.
> I've added some log files resulting from the runs that crashed in
> different places. For me, the tests crash on each run, but in different
> places.
>
> The TCP retransmit problem happens with http_static when using the debug
> build. When using release build, some unrelated crash in timer_remove()
> happens instead.
> The SVM FIFO crash happens when using the proxy. It can happen with both
> release and debug builds.
>
> Please see the repo [1] for details and crash logs.
>
> [1] https://github.com/ivan4th/vpp-tcp-test
>
> P.S. As the tests do expose some problems with VPP host stack and some of
> the VPP plugins/examples, maybe we should consider adding them to the VPP
> CI, too?
>
> On Thu, Jul 16, 2020 at 8:33 PM Florin Coras <fcoras.li...@gmail.com>
> wrote:
>
>> Hi Ivan,
>>
>> Thanks for the detailed report!
>>
>> I assume this is a situation where most of the connections time out and
>> the rate limiting we apply on the pending timer queue delays handling for
>> long enough to be in a situation like the one you described. Here’s a draft
>> patch that starts tracking pending timers [1]. Let me know if it solves the
>> first problem.
>>
>> Regarding the second, it looks like the first chunk in the fifo is not
>> properly initialized/corrupted. It’s hard to tell what leads to that given
>> that I haven’t seen this sort of issues even with larger number of
>> connections. You could maybe try calling svm_fifo_is_sane() in the
>> enqueue/dequeue functions, or after the proxy allocates/shares the fifos to
>> catch the issue as early as possible.
>>
>> Regards,
>> Florin
>>
>> [1] https://gerrit.fd.io/r/c/vpp/+/27952
>>
>> On Jul 16, 2020, at 2:03 AM, ivan...@gmail.com wrote:
>>
>>   Hi,
>>   I'm working on the Travelping UPF project
>> https://github.com/travelping/vpp  <https://github.com/travelping/vpp>For
>> variety of reasons, it's presently maintained as a fork of UPF that's
>> rebased on top of upstream master from time to time, but really it's just a
>> plugin. During 40K TCP connection test with netem, I found an issue with
>> TCP timer race (timers firing after tcp_timer_reset() was called for them)
>> which I tried to work around only to stumble into another crash, which I'm
>> presently debugging (an SVM FIFO bug, possibly) but maybe some of you folks
>> have some ideas what it could be.
>>   I've described my findings in this JIRA ticket:
>> https://jira.fd.io/browse/VPP-1923
>>   Although the last upstream commit UPF is presently based on 
>> (afc233aa93c3f23b30b756cb4ae2967f968bbbb1)
>> was some time ago, I believe  the problems are still relevant as there
>> were no changes in these parts of code in master since that commit.
>>
>>
>>
>
> --
> Ivan Shvedunov <ivan...@gmail.com>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5
> 2807
> 
>


-- 
Ivan Shvedunov <ivan...@gmail.com>
;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17037): https://lists.fd.io/g/vpp-dev/message/17037
Mute This Topic: https://lists.fd.io/mt/75537746/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to