Hi Dave,

Thanks a lot for thorough analysis! I’d also really like to see 1 be fixed as 
soon as possible.

Cheers,
Florin

From: Dave Wallace <dwallac...@gmail.com>
Date: Friday, August 25, 2017 at 10:56 AM
To: "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io>
Cc: "Florin Coras (fcoras)" <fco...@cisco.com>, "csit-...@lists.fd.io" 
<csit-...@lists.fd.io>
Subject: Re: [csit-dev] make test python segfault in ubuntu 16.04

vpp-dev, Florin,

Below is an analysis of the all of the failures that this patch encountered 
before finally passing. None of the failures were related in any way to the 
code changes in the patch.

In summary, there appear to be a number of different factors involved with 
these failures.
·         Two failures appear to be caused by the run-time environment.
·         An intermittent bug appears to exist in `L2BD Multi-instance test 5 - 
delete 5 BDs'
·         The segfault shows lots of threads being run.  Are tests being 
executed in parallel?  If so, it would be interesting to serialize the tests to 
see if that fixes any of these issues.
I'm also seeing a variation in the order that the "make tests" are run (or at 
least in the order of the status reports).  My understanding of the 'make test' 
python infrastructure is insufficient to make an intelligent guess as to 
whether this has any bearing on any of these failures.

I get more predictable result output when running make test locally on my own 
server, but the order of test output is different than in the CI test runs.  
Locally, the order of tests appears to be the same between different runs of 
'make test'.  I have also not seen any of these errors on my server which is 
running Ubuntu 17.04, although I have not done an endurance test either.

My recommendation based on this analysis is as follows:
  1. The L2BD unit test issue be investigated by the appropriate 'make test' 
experts
  2. vpp-verify-master-centos7, vpp-verify-master-ubuntu1604, and 
vpp-test-debug-master-ubuntu1604 jobs should be run operationally in the 
Container PoC environment with the rest of the jjb jobs run in the cloud infra.

Thanks,
-daw-


---- %< ----
[ From https://gerrit.fd.io/r/#/c/8133 ]

=> Container PoC Aug 24 8:36 PM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1515/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1512/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1983/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1301/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2022/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1695/ : SUCCESS

=> fd.io JJB  Aug 24 9:19 PM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/ : FAILURE
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6775
Failure Signature:
  01:08:59  verify templates on IP6 datapath      Fatal Python error: 
Segmentation fault

Comment:
  Python bug or resource starvation?  Lots of threads running...
  Possibly due to bad environment/sick minion.
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3098/ : SUCCESS
https://jenkins.fd.io/job/vpp-verify-master-centos7/6770/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6781/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5370/ : SUCCESS

=> Container PoC  Aug 24 10:54 PM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1519/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1516/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1987/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1305/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2027/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1699/ : SUCCESS

=> fd.io JJB  Aug 24 11:13 PM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-centos7/6774/ : FAILURE
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6774
Failure Signature:
  00:23:17.198 CCLD     vcl_test_client
  00:24:32.936 FATAL: command execution failed
  00:24:32.937 java.io.IOException

Comment:
  Bad environment/sick minion?
  There's no reason for compilation to kill the build.
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6779/ : FAILURE
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6779
Failure Signature:
  03:02:47  
==============================================================================
  03:02:47  collect information on Ethernet, IP4 and IP6 datapath (no timers)
  03:02:47  
==============================================================================
  03:02:47  no timers, one CFLOW packet, 9 Flows inside                         
     OK
  03:02:47  no timers, two CFLOW packets (mtu=256), 3 Flows in each             
     OK
  03:02:47  L2 data on IP4 datapath                                             
     OK
  03:02:47  L2 data on IP6 datapath                                             
     OK
  03:02:47  L2 data on L2 datapath                                              
     OK
  03:02:48  L3 data on IP4 datapath                                             
     OK
  03:02:48  L3 data on IP6 datapath                                             
     OK
  03:02:48  L3 data on L2 datapath                                              
     OK
  03:02:48  L4 data on IP4 datapath                                             
     OK
  03:02:48  L4 data on IP6 datapath                                             
     OK
  03:02:48  L4 data on L2 datapath                                              
     OK
  03:02:48  verify templates on IP6 datapath
  03:02:47,401 Timeout while waiting for child test runner process (last test 
running was `L2BD Multi-instance test 5 - delete 5 BDs' in

Comment:
  Unknown level of parallelism going on here -- L2BD test status has not been 
flushed to console.
  Order of test results is different in later test runs.
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3102/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6785/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5374/ : SUCCESS

=> Container PoC  3:11 AM  Patch Set 9:  Build Failed
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1307/ : 
FAILURE
Failure Signature:
  06:51:59 
==============================================================================
  06:51:59 Bidirectional Forwarding Detection (BFD)
  06:51:59 
==============================================================================
  06:51:59 put session admin-up and admin-down                                  
    SKIP
  06:51:59 configuration change while peer in demand mode                       
    SKIP
  06:51:59 verify session goes down after inactivity                            
    SKIP
  06:51:59 echo function                                                        
    SKIP
  06:51:59 session goes down if echo function fails                             
    SKIP
  06:51:59 echo packets looped back                                             
    SKIP
  06:51:59 echo function stops if echo source is removed                        
    SKIP
  06:51:59 echo function stops if peer sets required min echo rx zero           
    SKIP
  06:51:59 hold BFD session up                                                  
    SKIP
  06:51:59 immediately honor remote required min rx reduction                   
    SKIP
  06:51:59 interface with bfd session deleted                                   
    SKIP
  06:51:59 echo packets with invalid checksum don't keep a session up           
    SKIP
  06:51:59 large remote required min rx interval                                
    SKIP
  06:51:59 modify detect multiplier                                             
    SKIP
  06:51:59 modify session - double required min rx                              
    SKIP
  06:51:59 modify session - halve required min rx                               
    SKIP
  06:51:59 no periodic frames outside poll sequence if remote demand set        
    SKIP
  06:51:59 test correct response to control frame with poll bit set             
    SKIP
  06:51:59 test poll sequence queueing                                          
    SKIP
  06:51:59 bring BFD session down                                               
    SKIP
  06:51:59 bring BFD session up                                                 
    SKIP
  06:51:59 bring BFD session up - first frame looked up by address pair         
    SKIP
  06:51:59 verify slow periodic control frames while session down               
    SKIP
  06:51:59 stale echo packets don't keep a session up                           
    SKIP
  06:51:59 n07:03:51,792 Timeout while waiting for child test runner process 
(last test running was `L2BD Multi-instance test 5 - delete 5 BDs' in 
`/tmp/vpp-unittest-TestL2bdMultiInst-AG7L1W')!
  07:02:08 Killing possible remaining process IDs:  21754 21764 21766

Comment:
  Unknown level of parallelism going on here -- L2BD test status has not been 
flushed to console.
  Order of test results is different in the test runs on cloud infra.
  !The failure signature is the same as above!
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1521/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1518/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1989/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2030/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1702/ : SUCCESS

=> fd.io JJB  3:42 AM  Patch Set 9:  Verified-1  Build Failed
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6781/ : FAILURE
Failure Signature:
  07:29:09  
==============================================================================
  07:29:09  collect information on Ethernet, IP4 and IP6 datapath (no timers)
  07:29:09  
==============================================================================
  07:29:09  no timers, one CFLOW packet, 9 Flows inside                         
     OK
  07:29:09  no timers, two CFLOW packets (mtu=256), 3 Flows in each             
     OK
  07:29:09  L2 data on IP4 datapath                                             
     OK
  07:29:09  L2 data on IP6 datapath                                             
     OK
  07:29:09  L2 data on L2 datapath                                              
     OK
  07:29:09  L3 data on IP4 datapath                                             
     OK
  07:29:09  L3 data on IP6 datapath                                             
     OK
  07:29:09  L3 data on L2 datapath                                              
     OK
  07:29:09  L4 data on IP4 datapath                                             
     OK
  07:29:09  L4 data on IP6 datapath                                             
     OK
  07:29:09  L4 data on L2 datapath                                              
     OK
  07:29:09  verify templates on IP6 datapath      07:29:08,087 Timeout while 
waiting for child test runner process (last test running was `L2BD 
Multi-instance test 5 - delete 5 BDs' in 
`/tmp/vpp-unittest-TestL2bdMultiInst-gbzkP4')!
  07:29:09  Killing possible remaining process IDs:  1883 1897 1899
Comment:
  Unknown level of parallelism going on here -- L2BD test status has not been 
flushed to console.
  Order of test results is the same as the previous cloud infra run, but 
different than the Container PoC.
  !The failure signature is the same as both of the previous Timeout Failures 
above!
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6781
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3104/ : SUCCESS
https://jenkins.fd.io/job/vpp-verify-master-centos7/6776/ : SUCCESS
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6787/ : SUCCESS
https://jenkins.fd.io/job/vpp-docs-verify-master/5376/ : SUCCESS

=> Container PoC  9:26 AM  Patch Set 9:  Build Failed
http://jenkins.ejkern.net:8080/job/vpp-docs-verify-master/1527/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-make-test-docs-verify-master/1524/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-centos7/1997/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1313/ : 
SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-verify-master-ubuntu1604/2039/ : SUCCESS
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : NOT_BUILT
Comment:
  Only seen on Container PoC.
  Erroneous Build Failure status.
  Subsequent Container PoC included only this job and was successful.
=> Container PoC  9:44 AM  Patch Set 9:  Build Successful
http://jenkins.ejkern.net:8080/job/vpp-fake-csit-verify-master/1715/ : SUCCESS

=> fd.io JJB  10:02 AM  Patch Set 9:  Verified+1  Build Successful
https://jenkins.fd.io/job/vpp-make-test-docs-verify-master/3110/ : SUCCESS
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-make-test-docs-verify-master/3110
https://jenkins.fd.io/job/vpp-verify-master-centos7/6782/ : SUCCESS
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-centos7/6782
https://jenkins.fd.io/job/vpp-csit-verify-virl-master/6793/ : SUCCESS
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-csit-verify-virl-master/6793
https://jenkins.fd.io/job/vpp-docs-verify-master/5382/ : SUCCESS
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-docs-verify-master/5382
https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6787/ : SUCCESS
Logs: 
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1604/6787

---- %< ----

On 08/24/2017 10:21 PM, Florin Coras wrote:
Hi,

Build 6775 failed with:


01:07:20 verify templates on IP6 datapath      Fatal Python error: Segmentation 
fault

01:08:59

01:08:59 Thread 0x00007fccdfabf700 <python> (most recent call first):

01:08:59   File "/usr/lib/python2.7/threading.py", line 340 in wait

01:08:59   File "/usr/lib/python2.7/Queue.py", line 168 in get

01:08:59   File "build/bdist.linux-x86_64/egg/vpp_papi.py", line 664 in 
thread_msg_handler

01:08:59   File "/usr/lib/python2.7/threading.py", line 754 in run

01:08:59   File "/usr/lib/python2.7/threading.py", line 801 in __bootstrap_inner

01:08:59   File "/usr/lib/python2.7/threading.py", line 774 in __bootstrap

01:08:59

More details here [1]. Just my luck?

Thanks,
Florin

[1] https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/6775/console




_______________________________________________

csit-dev mailing list

csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>

https://lists.fd.io/mailman/listinfo/csit-dev


_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev
  • Re: [vpp-dev]... Dave Wallace
    • Re: [vpp... Florin Coras (fcoras)
    • Re: [vpp... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)

Reply via email to