Ouch. Attempting to run “make test-debug” resulted in a lot of unrelated sadness... I will do some more testing to ensure it’s not a PEBCAC...
--a > On 28 May 2020, at 21:25, Andrew Yourtchenko via lists.fd.io > <ayourtch=gmail....@lists.fd.io> wrote: > > Paul, > > This is an excellent catch, thanks!! I will give it a go in test-debug... > > --a > >>> On 28 May 2020, at 16:15, Paul Vinciguerra <pvi...@vinciconsulting.com> >>> wrote: >>> >> >> A few weeks back, I became aware of the following issue with the LISP tests: >> >> /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: >> vnet_lisp_add_del_locator_set:2140: Can't delete a locator that supports a >> mapping! >> /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: received signal >> SIGSEGV, PC 0xa0a0a00, faulting address 0xa0a0a00 >> 14:32:44,290 Child test runner process unresponsive and core-file exists in >> test temporary directory (last test running was `Test case for basic >> encapsulation' in `/tmp/vpp-unittest-TestLisp-wtuvdu4q')! >> >> which seems to be triggered by the api trace commands in tearDown in >> framework.py. I see it while running tests in a docker container. >> >> >> >> On Thu, May 28, 2020 at 4:51 AM Andrew Yourtchenko <ayour...@gmail.com> >> wrote: >>> Hi Elias, >>> >>> Yeah it all does point to something like uninitialized data - I ran >>> yesterday the tests on two different machines for a while, apparently >>> without the issues... >>> >>> The CI runtime environment is much more dynamic - it’s an ephemeral docker >>> container that is orchestrated by the nomad and is destroyed after the job >>> is run. >>> >>> Could you push as a separate change the code that reliably gives you the >>> error in the LISP unit test in the CI, and let me know the change# ? >>> >>> >>> I will then test some tooling enhancement ideas I had for a while - to >>> check within the job whether the core exists, and if it does, to load it >>> into gdb and do some scripted processing of it and output the results... >>> (Iterate over the call stack and issue stuff like ‘info locals’, ‘info >>> regs’, etc). >>> >>> I did some experiments with that approach earlier and it seemed like a >>> rather scalable technique for most of the issues, which should also save >>> disk space and the developer time ... >>> >>> --a >>> >>> > On 28 May 2020, at 10:33, Elias Rudberg <elias.rudb...@bahnhof.net> wrote: >>> > >>> > Hi Andrew, >>> > >>> > In my case it failed several times and appeared to be triggered by >>> > seemingly harmless code changes, but it seemed like the problem was >>> > reproducible for a given version of the code. What seemed to matter was >>> > when I changed things related to local variables inside the >>> > set_ipfix_exporter_command_fn() function. The test logs said "Core-file >>> > exists" which I suppose means that vpp crashed. The testing framework >>> > repeats the test several times, saying "3 attempt(s) left", then "2 >>> > attempt(s) left" and so on, all those repeated attempts seemed to crash >>> > in the same way. >>> > >>> > It could be something with uninitialized variables, e.g. something that >>> > is assumed to be zero but is never explicitly initialized so it can >>> > work when it happens to be zero but depending on platform and compiler >>> > details there could be some garbage there causing a problem. Then >>> > unrelated code changes like adding variables somewhere making things >>> > end up at slightly different memory ocations could make the error come >>> > and go. This is just guessing of course. >>> > >>> > Is it possible to get login access to the machine where the >>> > gerrit/jenkins tests are run, to debug it there where the issue can be >>> > reproduced? >>> > >>> > / Elias >>> > >>> > >>> >> On Wed, 2020-05-27 at 19:03 +0200, Andrew 👽 Yourtchenko wrote: >>> >> Yep, so it looks like we have an issue... >>> >> >>> >> https://gerrit.fd.io/r/c/vpp/+/27305 has the same failures, I am >>> >> rerunning it now to see how intermittent it is - as well as testing >>> >> the latest master locally.... >>> >> >>> >> --a >>> >> >>> >>> On 27 May 2020, at 18:56, Elias Rudberg <elias.rudb...@bahnhof.net> >>> >>> wrote: >>> >>> >>> >>> Hi Andrew, >>> >>> >>> >>> Yes, it was Basic LISP test. It looked like this in the >>> >>> console.log.gz >>> >>> for vpp-verify-master-ubuntu1804: >>> >>> >>> >>> =================================================================== >>> >>> ==== >>> >>> ======= >>> >>> TEST RESULTS: >>> >>> Scheduled tests: 1177 >>> >>> Executed tests: 1176 >>> >>> Passed tests: 1039 >>> >>> Skipped tests: 137 >>> >>> Not Executed tests: 1 >>> >>> Errors: 1 >>> >>> FAILURES AND ERRORS IN TESTS: >>> >>> Testcase name: Basic LISP test >>> >>> ERROR: Test case for basic encapsulation >>> >>> [test_lisp.TestLisp.test_lisp_basic_encap] >>> >>> TESTCASES WHERE NO TESTS WERE SUCCESSFULLY EXECUTED: >>> >>> Basic LISP test >>> >>> =================================================================== >>> >>> ==== >>> >>> ======= >>> >>> >>> >>> / Elias >>> >>> >>> >>> >>> >>> >>> >>> On Wed, 2020-05-27 at 18:42 +0200, Andrew 👽 Yourtchenko wrote: >>> >>>> Basic LISP test - was it the one that was failing for you ? >>> >>>> That particular test intermittently failed a couple of times for >>> >>>> me >>> >>>> as well, on a doc-only change, so we have an unrelated issue. >>> >>>> I am running it locally to see what is going on. >>> >>>> --a >>> >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16564): https://lists.fd.io/g/vpp-dev/message/16564 Mute This Topic: https://lists.fd.io/mt/74491544/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-