Paul, This is an excellent catch, thanks!! I will give it a go in test-debug...
--a > On 28 May 2020, at 16:15, Paul Vinciguerra <pvi...@vinciconsulting.com> wrote: > > > A few weeks back, I became aware of the following issue with the LISP tests: > > /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: > vnet_lisp_add_del_locator_set:2140: Can't delete a locator that supports a > mapping! > /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: received signal > SIGSEGV, PC 0xa0a0a00, faulting address 0xa0a0a00 > 14:32:44,290 Child test runner process unresponsive and core-file exists in > test temporary directory (last test running was `Test case for basic > encapsulation' in `/tmp/vpp-unittest-TestLisp-wtuvdu4q')! > > which seems to be triggered by the api trace commands in tearDown in > framework.py. I see it while running tests in a docker container. > > > > On Thu, May 28, 2020 at 4:51 AM Andrew Yourtchenko <ayour...@gmail.com> wrote: >> Hi Elias, >> >> Yeah it all does point to something like uninitialized data - I ran >> yesterday the tests on two different machines for a while, apparently >> without the issues... >> >> The CI runtime environment is much more dynamic - it’s an ephemeral docker >> container that is orchestrated by the nomad and is destroyed after the job >> is run. >> >> Could you push as a separate change the code that reliably gives you the >> error in the LISP unit test in the CI, and let me know the change# ? >> >> >> I will then test some tooling enhancement ideas I had for a while - to >> check within the job whether the core exists, and if it does, to load it >> into gdb and do some scripted processing of it and output the results... >> (Iterate over the call stack and issue stuff like ‘info locals’, ‘info >> regs’, etc). >> >> I did some experiments with that approach earlier and it seemed like a >> rather scalable technique for most of the issues, which should also save >> disk space and the developer time ... >> >> --a >> >> > On 28 May 2020, at 10:33, Elias Rudberg <elias.rudb...@bahnhof.net> wrote: >> > >> > Hi Andrew, >> > >> > In my case it failed several times and appeared to be triggered by >> > seemingly harmless code changes, but it seemed like the problem was >> > reproducible for a given version of the code. What seemed to matter was >> > when I changed things related to local variables inside the >> > set_ipfix_exporter_command_fn() function. The test logs said "Core-file >> > exists" which I suppose means that vpp crashed. The testing framework >> > repeats the test several times, saying "3 attempt(s) left", then "2 >> > attempt(s) left" and so on, all those repeated attempts seemed to crash >> > in the same way. >> > >> > It could be something with uninitialized variables, e.g. something that >> > is assumed to be zero but is never explicitly initialized so it can >> > work when it happens to be zero but depending on platform and compiler >> > details there could be some garbage there causing a problem. Then >> > unrelated code changes like adding variables somewhere making things >> > end up at slightly different memory ocations could make the error come >> > and go. This is just guessing of course. >> > >> > Is it possible to get login access to the machine where the >> > gerrit/jenkins tests are run, to debug it there where the issue can be >> > reproduced? >> > >> > / Elias >> > >> > >> >> On Wed, 2020-05-27 at 19:03 +0200, Andrew 👽 Yourtchenko wrote: >> >> Yep, so it looks like we have an issue... >> >> >> >> https://gerrit.fd.io/r/c/vpp/+/27305 has the same failures, I am >> >> rerunning it now to see how intermittent it is - as well as testing >> >> the latest master locally.... >> >> >> >> --a >> >> >> >>> On 27 May 2020, at 18:56, Elias Rudberg <elias.rudb...@bahnhof.net> >> >>> wrote: >> >>> >> >>> Hi Andrew, >> >>> >> >>> Yes, it was Basic LISP test. It looked like this in the >> >>> console.log.gz >> >>> for vpp-verify-master-ubuntu1804: >> >>> >> >>> =================================================================== >> >>> ==== >> >>> ======= >> >>> TEST RESULTS: >> >>> Scheduled tests: 1177 >> >>> Executed tests: 1176 >> >>> Passed tests: 1039 >> >>> Skipped tests: 137 >> >>> Not Executed tests: 1 >> >>> Errors: 1 >> >>> FAILURES AND ERRORS IN TESTS: >> >>> Testcase name: Basic LISP test >> >>> ERROR: Test case for basic encapsulation >> >>> [test_lisp.TestLisp.test_lisp_basic_encap] >> >>> TESTCASES WHERE NO TESTS WERE SUCCESSFULLY EXECUTED: >> >>> Basic LISP test >> >>> =================================================================== >> >>> ==== >> >>> ======= >> >>> >> >>> / Elias >> >>> >> >>> >> >>> >> >>> On Wed, 2020-05-27 at 18:42 +0200, Andrew 👽 Yourtchenko wrote: >> >>>> Basic LISP test - was it the one that was failing for you ? >> >>>> That particular test intermittently failed a couple of times for >> >>>> me >> >>>> as well, on a doc-only change, so we have an unrelated issue. >> >>>> I am running it locally to see what is going on. >> >>>> --a >>
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16562): https://lists.fd.io/g/vpp-dev/message/16562 Mute This Topic: https://lists.fd.io/mt/74491544/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-