These flush tests should be disabled for now, until Eyal put in a fix. It has to do with timing, that after flush, there is a slight delay before ager run to scan the FIB to delete the stale MACs. We are adding extra code in the L2 forwarding path to fix this issue. There is an ongoing patch that did part 1 of the fix and Eyal will add an update to this patch to complete the fix: https://gerrit.fd.io/r/#/c/7136/
Eyal has another patch: https://gerrit.fd.io/r/#/c/7023/ to add VLAN tag rewrite tests and also disable these flush tests. However, it kept failing virl due to some unrelated IPv6 error which we are not sure why… Regards, John From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Ed Kern (ejk) Sent: Wednesday, June 14, 2017 4:07 PM To: Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco) <ksek...@cisco.com>; Dave Wallace <dwallac...@gmail.com> Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] shadow build system change adding test-debug job alright klement/dave im a bit stuck again… i get about an 60+% failure rate out of test-debug even with higher than normal cpu settings (higher than just what i use for build verify) always right here 19:43:38 ============================================================================== 19:43:38 ERROR: L2 FIB test 7 - flush bd_id 19:43:38 ------------------------------------------------------------------------------ 19:43:38 Traceback (most recent call last): 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 508, in test_l2_fib_07 19:43:38 self.run_verify_negat_test(bd_id=1, dst_hosts=flushed) 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 418, in run_verify_negat_test 19:43:38 i.get_capture(0, timeout=timeout) 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/vpp_pg_interface.py", line 240, in get_capture 19:43:38 (len(capture.res), expected_count, name)) 19:43:38 Exception: Captured packets mismatch, captured 9 packets, expected 0 packets on pg0 19:43:38 19:43:38 ============================================================================== 19:43:38 ERROR: L2 FIB test 8 - flush all 19:43:38 ------------------------------------------------------------------------------ 19:43:38 Traceback (most recent call last): 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 522, in test_l2_fib_08 19:43:38 self.run_verify_negat_test(bd_id=1, dst_hosts=flushed) 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/test_l2_fib.py", line 418, in run_verify_negat_test 19:43:38 i.get_capture(0, timeout=timeout) 19:43:38 File "/workspace/vpp-test-debug-master-ubuntu1604/test/vpp_pg_interface.py", line 240, in get_capture 19:43:38 (len(capture.res), expected_count, name)) 19:43:38 Exception: Captured packets mismatch, captured 9 packets, expected 0 packets on pg0 19:43:38 when it fails its always the same two tests…always the same exception (captured 9, expected 0) its so consistent in its ‘death’ but so intermittent in frequency its freaking me out a bit… any thoughts? Ed On May 24, 2017, at 8:42 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco) <ksek...@cisco.com<mailto:ksek...@cisco.com>> wrote: I know that the functional BFD tests passed so unless there is a bug in the tests, the failures are pretty much timing issues. From my experience the load is the culprit as the BFD tests test interactive sessions, which need to be kept alive. The timings currently are set at 300ms and for most tests two keep-alives can be missed before the session goes down on vpp side and asserts start failing. While this might seem like ample time, especially on loaded systems there is a high chance that at least one test will derp ... I've also seen derps even on idle systems, where a select() call (used by python in its own sleep() implementation) with timeout of 100ms returns after 1-3 seconds. Try running the bfd tests only (make test-all TEST=bfd) while no other tasks are running - I think they should pass on your box just fine. Thanks, Klement Quoting Ed Kern (ejk) (2017-05-24 16:27:10) right now its a VERY intentional mix…but depending on loading I could easily see this coming up if those timings are strict. To not dodge your question max loading on my slowest node would be 3 concurrent builds on an Xeon™ E3-1240 v3 (4 cores @ 3.4Ghz) yeah yeah stop laughing…..Do you have suggested or even guesstimate minimums in this regard…I could pretty trivially route them towards the larger set that I have right now if you think magic will result :) Ed PS thanks though..for whatever reason the type of errors I was getting didn’t naturally steer my mind towards cpu/io binding. On May 24, 2017, at 12:57 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com<mailto:ksek...@cisco.com>> wrote: Hi Ed, how fast are your boxes? And how many cores? The BFD tests struggle to meet the aggresive timings on slower boxes... Thanks, Klement Quoting Ed Kern (ejk) (2017-05-23 20:43:55) No problem. If anyone is curious in rubbernecking the accident that is the current test-all (at least for my build system) adding a comment of testall SHOULD trigger and fire it off on my end. make it all pass and you win a beer (or beverage of your choice) Ed On May 23, 2017, at 11:34 AM, Dave Wallace <[1][2]dwallac...@gmail.com<mailto:dwallac...@gmail.com>> wrote: Ed, Thanks for adding this to the shadow build system. Real data on the cost and effectiveness of this will be most useful. -daw- On 5/23/2017 1:30 PM, Ed Kern (ejk) wrote: In the vpp-dev call a couple hours ago there was a discussion of running test-debug on a regular/default? basis. As a trial I’ve added a new job to the shadow build system: vpp-test-debug-master-ubuntu1604 Will do a make test-debug, as part of verify set, as an ADDITIONAL job. I gave a couple passes with test-all but can’t ever get a clean run with test-all (errors in test_bfd and test_ip6 ). I don’t think this is unusual or unexpected. Ill leave it to someone else to say that ‘all’ throwing failures is a good thing. I’m happy to add another job for/with test-all if someone wants to actually debug those errors. flames, comments,concerns welcome.. Ed PS Please note/remember that all these tests are non-voting regardless of success or failure. _______________________________________________ vpp-dev mailing list [2][3]vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> [3][4]https://lists.fd.io/mailman/listinfo/vpp-dev References Visible links 1. [5]mailto:dwallac...@gmail.com 2. [6]mailto:vpp-dev@lists.fd.io 3. [7]https://lists.fd.io/mailman/listinfo/vpp-dev References Visible links 1. mailto:ksek...@cisco.com 2. mailto:dwallac...@gmail.com 3. mailto:vpp-dev@lists.fd.io 4. https://lists.fd.io/mailman/listinfo/vpp-dev 5. mailto:dwallac...@gmail.com 6. mailto:vpp-dev@lists.fd.io 7. https://lists.fd.io/mailman/listinfo/vpp-dev
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev