Paul,

This is an excellent catch, thanks!! I will give it a go in test-debug...

--a

> On 28 May 2020, at 16:15, Paul Vinciguerra <pvi...@vinciconsulting.com> wrote:
> 
> 
> A few weeks back, I became aware of the following issue with the LISP tests:
> 
> /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: 
> vnet_lisp_add_del_locator_set:2140: Can't delete a locator that supports a 
> mapping!
> /vpp/build-root/install-vpp_debug-native/vpp/bin/vpp[63989]: received signal 
> SIGSEGV, PC 0xa0a0a00, faulting address 0xa0a0a00
> 14:32:44,290 Child test runner process unresponsive and core-file exists in 
> test temporary directory (last test running was `Test case for basic 
> encapsulation' in `/tmp/vpp-unittest-TestLisp-wtuvdu4q')!
> 
> which seems to be triggered by the api trace commands in tearDown in 
> framework.py.  I see it while running tests in a docker container.
> 
> 
> 
> On Thu, May 28, 2020 at 4:51 AM Andrew Yourtchenko <ayour...@gmail.com> wrote:
>> Hi Elias,
>> 
>> Yeah it all does point to something like uninitialized data - I ran 
>> yesterday the tests on two different machines for a while, apparently 
>> without the issues...
>> 
>> The CI runtime environment is much more dynamic - it’s an ephemeral docker 
>> container that is orchestrated by the nomad and is destroyed after the job 
>> is run.
>> 
>> Could you push as a separate change the code that reliably gives you the 
>> error in the LISP unit test in the CI, and let me know the change# ?
>> 
>> 
>>  I will then test some tooling enhancement ideas I had for a while - to 
>> check within the job whether the core exists, and if it does, to load it 
>> into gdb and do some scripted processing of it and output the results... 
>> (Iterate over the call stack and issue stuff like ‘info locals’, ‘info 
>> regs’, etc).
>> 
>> I did some experiments with that approach earlier and it seemed like a 
>> rather scalable technique for most of the issues, which should also save 
>> disk space and the developer time ...
>> 
>> --a
>> 
>> > On 28 May 2020, at 10:33, Elias Rudberg <elias.rudb...@bahnhof.net> wrote:
>> > 
>> > Hi Andrew,
>> > 
>> > In my case it failed several times and appeared to be triggered by
>> > seemingly harmless code changes, but it seemed like the problem was
>> > reproducible for a given version of the code. What seemed to matter was
>> > when I changed things related to local variables inside the
>> > set_ipfix_exporter_command_fn() function. The test logs said "Core-file 
>> > exists" which I suppose means that vpp crashed. The testing framework
>> > repeats the test several times, saying "3 attempt(s) left", then "2
>> > attempt(s) left" and so on, all those repeated attempts seemed to crash
>> > in the same way.
>> > 
>> > It could be something with uninitialized variables, e.g. something that
>> > is assumed to be zero but is never explicitly initialized so it can
>> > work when it happens to be zero but depending on platform and compiler
>> > details there could be some garbage there causing a problem. Then
>> > unrelated code changes like adding variables somewhere making things
>> > end up at slightly different memory ocations could make the error come
>> > and go. This is just guessing of course.
>> > 
>> > Is it possible to get login access to the machine where the
>> > gerrit/jenkins tests are run, to debug it there where the issue can be
>> > reproduced?
>> > 
>> > / Elias
>> > 
>> > 
>> >> On Wed, 2020-05-27 at 19:03 +0200, Andrew 👽 Yourtchenko wrote:
>> >> Yep, so it looks like we have an issue...
>> >> 
>> >> https://gerrit.fd.io/r/c/vpp/+/27305 has the same failures, I am
>> >> rerunning it now to see how intermittent it is - as well as testing
>> >> the latest master locally....
>> >> 
>> >> --a
>> >> 
>> >>> On 27 May 2020, at 18:56, Elias Rudberg <elias.rudb...@bahnhof.net>
>> >>> wrote:
>> >>> 
>> >>> Hi Andrew,
>> >>> 
>> >>> Yes, it was Basic LISP test. It looked like this in the
>> >>> console.log.gz
>> >>> for vpp-verify-master-ubuntu1804:
>> >>> 
>> >>> ===================================================================
>> >>> ====
>> >>> =======
>> >>> TEST RESULTS:
>> >>>    Scheduled tests: 1177
>> >>>     Executed tests: 1176
>> >>>       Passed tests: 1039
>> >>>      Skipped tests: 137
>> >>> Not Executed tests: 1
>> >>>             Errors: 1
>> >>> FAILURES AND ERRORS IN TESTS:
>> >>> Testcase name: Basic LISP test 
>> >>>     ERROR: Test case for basic encapsulation
>> >>> [test_lisp.TestLisp.test_lisp_basic_encap]
>> >>> TESTCASES WHERE NO TESTS WERE SUCCESSFULLY EXECUTED:
>> >>> Basic LISP test 
>> >>> ===================================================================
>> >>> ====
>> >>> =======
>> >>> 
>> >>> / Elias
>> >>> 
>> >>> 
>> >>> 
>> >>> On Wed, 2020-05-27 at 18:42 +0200, Andrew 👽 Yourtchenko wrote:
>> >>>> Basic LISP test - was it the one that was failing for you ?
>> >>>> That particular test intermittently failed a couple of times for
>> >>>> me
>> >>>> as well, on a doc-only change, so we have an unrelated issue.
>> >>>> I am running it locally to see what is going on.
>> >>>> --a
>> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16562): https://lists.fd.io/g/vpp-dev/message/16562
Mute This Topic: https://lists.fd.io/mt/74491544/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to