Hi Dave, Thanks a lot for thorough analysis! I’d also really like to see 1 be fixed as soon as possible.
Cheers, Florin From: Dave Wallace <dwallac...@gmail.com> Date: Friday, August 25, 2017 at 10:56 AM To: "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io> Cc: "Florin Coras (fcoras)" <fco...@cisco.com>, "csit-...@lists.fd.io" <csit-...@lists.fd.io> Subject: Re: [csit-dev] make test python segfault in ubuntu 16.04 vpp-dev, Florin, Below is an analysis of the all of the failures that this patch encountered before finally passing. None of the failures were related in any way to the code changes in the patch. In summary, there appear to be a number of different factors involved with these failures. · Two failures appear to be caused by the run-time environment. · An intermittent bug appears to exist in `L2BD Multi-instance test 5 - delete 5 BDs' · The segfault shows lots of threads being run. Are tests being executed in parallel? If so, it would be interesting to serialize the tests to see if that fixes any of these issues. I'm also seeing a variation in the order that the "make tests" are run (or at least in the order of the status reports). My understanding of the 'make test' python infrastructure is insufficient to make an intelligent guess as to whether this has any bearing on any of these failures. I get more predictable result output when running make test locally on my own server, but the order of test output is different than in the CI test runs. Locally, the order of tests appears to be the same between different runs of 'make test'. I have also not seen any of these errors on my server which is running Ubuntu 17.04, although I have not done an endurance test either. My recommendation based on this analysis is as follows: 1. The L2BD unit test issue be investigated by the appropriate 'make test' experts 2. vpp-verify-master-centos7, vpp-verify-master-ubuntu1604, and vpp-test-debug-master-ubuntu1604 jobs should be run operationally in the Container PoC environment with the rest of the jjb jobs run in the cloud infra. Thanks, -daw- ---- %< ---- [ From https://gerrit.fd.io/r/#/c/8133 ] => Container PoC Aug 24 8:36 PM Patch Set 9: Build Successful http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1515/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1512/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1983/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1301/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2022/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1695/ : SUCCESS => fd.io JJB Aug 24 9:19 PM Patch Set 9: Verified-1 Build Failed https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/ : FAILURE Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6775 Failure Signature: 01:08:59 verify templates on IP6 datapath Fatal Python error: Segmentation fault Comment: Python bug or resource starvation? Lots of threads running... Possibly due to bad environment/sick minion. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3098/ : SUCCESS https://jenkins.fd.io/job/vpp-verify-master-centos7/6770/ : SUCCESS https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6781/ : SUCCESS https://jenkins.fd.io/job/vpp-docs-verify-master/5370/ : SUCCESS => Container PoC Aug 24 10:54 PM Patch Set 9: Build Successful http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1519/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1516/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1987/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1305/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2027/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1699/ : SUCCESS => fd.io JJB Aug 24 11:13 PM Patch Set 9: Verified-1 Build Failed https://jenkins.fd.io/job/vpp-verify-master-centos7/6774/ : FAILURE Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6774 Failure Signature: 00:23:17.198 CCLD vcl_test_client 00:24:32.936 FATAL: command execution failed 00:24:32.937 java.io.IOException Comment: Bad environment/sick minion? There's no reason for compilation to kill the build. https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6779/ : FAILURE Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6779 Failure Signature: 03:02:47 ============================================================================== 03:02:47 collect information on Ethernet, IP4 and IP6 datapath (no timers) 03:02:47 ============================================================================== 03:02:47 no timers, one CFLOW packet, 9 Flows inside OK 03:02:47 no timers, two CFLOW packets (mtu=256), 3 Flows in each OK 03:02:47 L2 data on IP4 datapath OK 03:02:47 L2 data on IP6 datapath OK 03:02:47 L2 data on L2 datapath OK 03:02:48 L3 data on IP4 datapath OK 03:02:48 L3 data on IP6 datapath OK 03:02:48 L3 data on L2 datapath OK 03:02:48 L4 data on IP4 datapath OK 03:02:48 L4 data on IP6 datapath OK 03:02:48 L4 data on L2 datapath OK 03:02:48 verify templates on IP6 datapath 03:02:47,401 Timeout while waiting for child test runner process (last test running was `L2BD Multi-instance test 5 - delete 5 BDs' in Comment: Unknown level of parallelism going on here -- L2BD test status has not been flushed to console. Order of test results is different in later test runs. https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3102/ : SUCCESS https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6785/ : SUCCESS https://jenkins.fd.io/job/vpp-docs-verify-master/5374/ : SUCCESS => Container PoC 3:11 AM Patch Set 9: Build Failed http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1307/ : FAILURE Failure Signature: 06:51:59 ============================================================================== 06:51:59 Bidirectional Forwarding Detection (BFD) 06:51:59 ============================================================================== 06:51:59 put session admin-up and admin-down SKIP 06:51:59 configuration change while peer in demand mode SKIP 06:51:59 verify session goes down after inactivity SKIP 06:51:59 echo function SKIP 06:51:59 session goes down if echo function fails SKIP 06:51:59 echo packets looped back SKIP 06:51:59 echo function stops if echo source is removed SKIP 06:51:59 echo function stops if peer sets required min echo rx zero SKIP 06:51:59 hold BFD session up SKIP 06:51:59 immediately honor remote required min rx reduction SKIP 06:51:59 interface with bfd session deleted SKIP 06:51:59 echo packets with invalid checksum don't keep a session up SKIP 06:51:59 large remote required min rx interval SKIP 06:51:59 modify detect multiplier SKIP 06:51:59 modify session - double required min rx SKIP 06:51:59 modify session - halve required min rx SKIP 06:51:59 no periodic frames outside poll sequence if remote demand set SKIP 06:51:59 test correct response to control frame with poll bit set SKIP 06:51:59 test poll sequence queueing SKIP 06:51:59 bring BFD session down SKIP 06:51:59 bring BFD session up SKIP 06:51:59 bring BFD session up - first frame looked up by address pair SKIP 06:51:59 verify slow periodic control frames while session down SKIP 06:51:59 stale echo packets don't keep a session up SKIP 06:51:59 n07:03:51,792 Timeout while waiting for child test runner process (last test running was `L2BD Multi-instance test 5 - delete 5 BDs' in `/tmp/vpp-unittest-TestL2bdMultiInst-AG7L1W')! 07:02:08 Killing possible remaining process IDs: 21754 21764 21766 Comment: Unknown level of parallelism going on here -- L2BD test status has not been flushed to console. Order of test results is different in the test runs on cloud infra. !The failure signature is the same as above! http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1521/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1518/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1989/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2030/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1702/ : SUCCESS => fd.io JJB 3:42 AM Patch Set 9: Verified-1 Build Failed https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6781/ : FAILURE Failure Signature: 07:29:09 ============================================================================== 07:29:09 collect information on Ethernet, IP4 and IP6 datapath (no timers) 07:29:09 ============================================================================== 07:29:09 no timers, one CFLOW packet, 9 Flows inside OK 07:29:09 no timers, two CFLOW packets (mtu=256), 3 Flows in each OK 07:29:09 L2 data on IP4 datapath OK 07:29:09 L2 data on IP6 datapath OK 07:29:09 L2 data on L2 datapath OK 07:29:09 L3 data on IP4 datapath OK 07:29:09 L3 data on IP6 datapath OK 07:29:09 L3 data on L2 datapath OK 07:29:09 L4 data on IP4 datapath OK 07:29:09 L4 data on IP6 datapath OK 07:29:09 L4 data on L2 datapath OK 07:29:09 verify templates on IP6 datapath 07:29:08,087 Timeout while waiting for child test runner process (last test running was `L2BD Multi-instance test 5 - delete 5 BDs' in `/tmp/vpp-unittest-TestL2bdMultiInst-gbzkP4')! 07:29:09 Killing possible remaining process IDs: 1883 1897 1899 Comment: Unknown level of parallelism going on here -- L2BD test status has not been flushed to console. Order of test results is the same as the previous cloud infra run, but different than the Container PoC. !The failure signature is the same as both of the previous Timeout Failures above! Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6781 https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3104/ : SUCCESS https://jenkins.fd.io/job/vpp-verify-master-centos7/6776/ : SUCCESS https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6787/ : SUCCESS https://jenkins.fd.io/job/vpp-docs-verify-master/5376/ : SUCCESS => Container PoC 9:26 AM Patch Set 9: Build Failed http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1527/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1524/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1997/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1313/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2039/ : SUCCESS http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : NOT_BUILT Comment: Only seen on Container PoC. Erroneous Build Failure status. Subsequent Container PoC included only this job and was successful. => Container PoC 9:44 AM Patch Set 9: Build Successful http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : SUCCESS => fd.io JJB 10:02 AM Patch Set 9: Verified+1 Build Successful https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3110/ : SUCCESS Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-make-test-docs-verify-master/3110 https://jenkins.fd.io/job/vpp-verify-master-centos7/6782/ : SUCCESS Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6782 https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6793/ : SUCCESS Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/6793 https://jenkins.fd.io/job/vpp-docs-verify-master/5382/ : SUCCESS Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-docs-verify-master/5382 https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6787/ : SUCCESS Logs: https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6787 ---- %< ---- On 08/24/2017 10:21 PM, Florin Coras wrote: Hi, Build 6775 failed with: 01:07:20 verify templates on IP6 datapath Fatal Python error: Segmentation fault 01:08:59 01:08:59 Thread 0x00007fccdfabf700 <python> (most recent call first): 01:08:59 File "/usr/lib/python2.7/threading.py", line 340 in wait 01:08:59 File "/usr/lib/python2.7/Queue.py", line 168 in get 01:08:59 File "build/bdist.linux-x86_64/egg/vpp_papi.py", line 664 in thread_msg_handler 01:08:59 File "/usr/lib/python2.7/threading.py", line 754 in run 01:08:59 File "/usr/lib/python2.7/threading.py", line 801 in __bootstrap_inner 01:08:59 File "/usr/lib/python2.7/threading.py", line 774 in __bootstrap 01:08:59 More details here [1]. Just my luck? Thanks, Florin [1] https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/console _______________________________________________ csit-dev mailing list csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> https://lists.fd.io/mailman/listinfo/csit-dev
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev