these are NOT with verify… specifically with test-debug that I added as a separate run at someones request..(sorry can’t remember who at this moment)
Ed On Aug 10, 2017, at 1:07 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco) <ksek...@cisco.com<mailto:ksek...@cisco.com>> wrote: The 2 minute timeout is the result of my recent change. The framework now forks and runs the test in a child process, and if the child process fails to send a keep-alive (sent when a test case starts), then it's killed. Otherwise there'd be no way to recover from stuck mutex or deadlock.. Are you running the extended tests or the stock verify? Quoting Ed Kern (ejk) (2017-08-10 00:08:19) klement, ok…ill think about how to do that without too much trouble in its current state.. in the meantime…blowing out the cpu and memory a bit changed the error…… 21:49:42 create 1k of p2p subifs OK 21:49:42 ============================================================================== 21:51:52 21:53:13,610 Timeout while waiting for child test runner process (last test running was `drop rx packet not matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-GDHSDK')! 21:51:52 Killing possible remaining process IDs: 19954 19962 19964 21:45:05 PPPoE Test Case 21:45:05 ===================================21:48:13,778 Timeout while waiting for child test runner process (last test running was `drop rx packet not matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-I0REOQ')! 21:47:45 Killing possible remaining process IDs: 20017 20025 20027 20:48:46 PPPoE Test Case 20:48:46 ===================================20:51:34,082 Timeout while waiting for child test runner process (last test running was `drop rx packet not matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-tQ5sP0')! 20:51:05 Killing possible remaining process IDs: 19919 19927 19929 anything new/different/exciting in here? Also the memory/cpu expansion (by roughly a third) these failures happen in the order of 2/3 minutes as opposed to a 90 leading to timeout failure. Since the verifies are still happily chugging along I ASSuME that this drop packet check isn’t happening in that suite? Ed On Aug 9, 2017, at 1:04 PM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com<mailto:ksek...@cisco.com>> wrote: Ed, it'd help if you could collect log.txt from a failed run so we could peek under the hood... please see my other email in this thread... Thanks, Klement Quoting Ed Kern (ejk) (2017-08-09 20:48:46) this is not you…or this patch… the make test-debug has had a 90+% failure rate (read not 100%) for at least the last 100 builds (far back as my current logs go but will probably blow that out a bit now) you hit the one that is seen most often… on that create 1k of p2p subifs the other much less frequent is 13:40:24 CGNAT TCP session close initiated from outside network OK 13:40:24 =================================================Build timed out (after 120 minutes). Marking the build as failed. so currently I’m allocating 10000 MHz in cpu and 8G in memory for verify and also for test-debug runs… Im not obviously getting (as you can see) errors about it running out of memory but I wonder if thats possibly whats happening.. its easy enough to blow my allocations out a bit and see if that makes a difference.. If anyone has other ideas to try and happy to give them a shot.. appreciate the heads up Ed On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach) <[1][2]dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote: Please see [2][3]https://gerrit.fd.io/r/#/c/7927, and [3][4]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console The patch in question is highly unlikely to cause this failure... 14:37:11 ============================================================================== 14:37:11 P2P Ethernet tests 14:37:11 ============================================================================== 14:37:11 delete/create p2p subif OK 14:37:11 create 100k of p2p subifs SKIP 14:37:11 create 1k of p2p subifs Build timed out (after 120 minutes). Marking the build as failed. 16:24:49 $ ssh-agent -k 16:24:54 unset SSH_AUTH_SOCK; 16:24:54 unset SSH_AGENT_PID; 16:24:54 echo Agent pid 84 killed; 16:25:07 [ssh-agent] Stopped. 16:25:07 Build was aborted 16:25:09 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done 16:25:11 Finished: FAILURE Thanks… Dave References Visible links 1. [5]mailto:dbar...@cisco.com 2. [6]https://gerrit.fd.io/r/#/c/7927 3. [7]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console References Visible links 1. mailto:ksek...@cisco.com 2. mailto:dbar...@cisco.com 3. https://gerrit.fd.io/r/#/c/7927 4. http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console 5. mailto:dbar...@cisco.com 6. https://gerrit.fd.io/r/#/c/7927 7. http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev