Some preliminary observations concerning the crashes in the proxy example: * !rb_tree_is_init(...) assertion failures are likely caused by multiple active_open_connected_callback() invocations for the same connection * f_update_ooo_deq() SIGSEGV crash is possibly caused for late callbacks for connections that are already gone
On Wed, Jul 22, 2020 at 2:18 PM Ivan Shvedunov via lists.fd.io <ivan4th= gmail....@lists.fd.io> wrote: > Hi, > sadly the patch apparently didn't work. It should have worked but for some > reason it didn't ... > > On the bright side, I've made a test case [1] using fresh upstream VPP > code with no UPF that reproduces the issues I mentioned, including both > timer and TCP retransmit one along with some other possible problems using > http_static plugin and the proxy example, along with nginx (with proxy) and > wrk. > > It is docker-based, but the main scripts (start.sh and test.sh) can be > used without Docker, too. > I've used our own Dockerfiles to build the images, but I'm not sure if > that makes any difference. > I've added some log files resulting from the runs that crashed in > different places. For me, the tests crash on each run, but in different > places. > > The TCP retransmit problem happens with http_static when using the debug > build. When using release build, some unrelated crash in timer_remove() > happens instead. > The SVM FIFO crash happens when using the proxy. It can happen with both > release and debug builds. > > Please see the repo [1] for details and crash logs. > > [1] https://github.com/ivan4th/vpp-tcp-test > > P.S. As the tests do expose some problems with VPP host stack and some of > the VPP plugins/examples, maybe we should consider adding them to the VPP > CI, too? > > On Thu, Jul 16, 2020 at 8:33 PM Florin Coras <fcoras.li...@gmail.com> > wrote: > >> Hi Ivan, >> >> Thanks for the detailed report! >> >> I assume this is a situation where most of the connections time out and >> the rate limiting we apply on the pending timer queue delays handling for >> long enough to be in a situation like the one you described. Here’s a draft >> patch that starts tracking pending timers [1]. Let me know if it solves the >> first problem. >> >> Regarding the second, it looks like the first chunk in the fifo is not >> properly initialized/corrupted. It’s hard to tell what leads to that given >> that I haven’t seen this sort of issues even with larger number of >> connections. You could maybe try calling svm_fifo_is_sane() in the >> enqueue/dequeue functions, or after the proxy allocates/shares the fifos to >> catch the issue as early as possible. >> >> Regards, >> Florin >> >> [1] https://gerrit.fd.io/r/c/vpp/+/27952 >> >> On Jul 16, 2020, at 2:03 AM, ivan...@gmail.com wrote: >> >> Hi, >> I'm working on the Travelping UPF project >> https://github.com/travelping/vpp <https://github.com/travelping/vpp>For >> variety of reasons, it's presently maintained as a fork of UPF that's >> rebased on top of upstream master from time to time, but really it's just a >> plugin. During 40K TCP connection test with netem, I found an issue with >> TCP timer race (timers firing after tcp_timer_reset() was called for them) >> which I tried to work around only to stumble into another crash, which I'm >> presently debugging (an SVM FIFO bug, possibly) but maybe some of you folks >> have some ideas what it could be. >> I've described my findings in this JIRA ticket: >> https://jira.fd.io/browse/VPP-1923 >> Although the last upstream commit UPF is presently based on >> (afc233aa93c3f23b30b756cb4ae2967f968bbbb1) >> was some time ago, I believe the problems are still relevant as there >> were no changes in these parts of code in master since that commit. >> >> >> > > -- > Ivan Shvedunov <ivan...@gmail.com> > ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 > 2807 > > -- Ivan Shvedunov <ivan...@gmail.com> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 2807
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#17037): https://lists.fd.io/g/vpp-dev/message/17037 Mute This Topic: https://lists.fd.io/mt/75537746/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-