Hi Matt,

Your patch [0] verified, Ray +1'd it, and I merged it.

In my investigation on Naginator retries, I found an unrelated gerrit change [1] where there was a VRRP test failure [2] on which failed the vpp-arm-verify-master-ubuntu1804 job but subsequently passed on both the Naginator retry [3] as well as the verify of the next patch [4] to the gerrit change.

This failure occurred on March 02, 2020 prior to the recent timekeeping related changes.

In case you are not aware, I wrote a bash function [5] which allows iterative running of make test until it encounters a failure. This function has been helpful in tracking down and fixing intermittent test failures in the quic tests which were very hard to reproduce outside of 'make test'. Note that in particular, I have seen many more intermittent failures with 'make test' running tests in parallel (make test TEST_JOBS=auto) when running them serially. Also, the grep (-g) option is most useful for detecting clib_warning() instrumentation of suspected errant conditions in release images.

Hope this helps,
-daw-

[0] https://gerrit.fd.io/r/c/vpp/+/25834
[1] https://gerrit.fd.io/r/c/vpp/+/25581
[2] https://gerrit.fd.io/r/c/vpp/+/25581#message-cb3ca555_cb3c5e63
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-arm-verify-master-ubuntu1804/8899/console-timestamp.log.gz
[3] https://gerrit.fd.io/r/c/vpp/+/25581#message-b01de4c2_560ef9c6
[4] https://gerrit.fd.io/r/c/vpp/+/25581#message-d2ecb27d_a9d52cc9
[5] https://git.fd.io/vpp/tree/extras/bash/functions.bash
----- %< -----
Usage: vpp-make-test [-a][-d][-f][-g <text>][-r <retry count>] <testcase> [<retry_count>]
         -a                Run extended tests
         -d                Run vpp debug image (i.e. with ASSERTS)
         -f                Testcase is a feature set (e.g. tcp)
         -g <text>         Text to grep for in log, FAIL on match.
                           Enclose <text> in single quotes when it contains any dashes:
                           e.g.  vpp-make-test -g 'goof-bad-' test_xyz
         -r <retry count>  Retry Count (default = 100 for individual | 1 for feature)
----- %< -----


On 3/12/2020 12:41 PM, Matthew Smith wrote:
Hi Dave,

That sounds fine to me.

Thanks,
-Matt


On Thu, Mar 12, 2020 at 11:32 AM Dave Wallace <dwallac...@gmail.com <mailto:dwallac...@gmail.com>> wrote:

    Matt,

    I will keep an eye on this gerrit and merge it once the verify
    jobs have completed.
    If there are other tests which fail, are you ok if I add them to
    this patch and turn it into a generic 'disable failing tests'
    gerrit change?

    The other possibility is that this is due to the recent disabling
    of the Naginator retry plugin.

    I'm going to investigate if this issue may have been masked by
    Naginator...

    Thanks for your help on keeping the CI operational!
    -daw-

    On 3/12/2020 12:09 PM, Matthew Smith via Lists.Fd.Io
    <http://Lists.Fd.Io> wrote:

    Change submitted - https://gerrit.fd.io/r/c/vpp/+/25834.
    Verification jobs are running. Hopefully they won't fail :)

    -Matt


    On Thu, Mar 12, 2020 at 10:22 AM Matthew Smith via Lists.Fd.Io
    <http://Lists.Fd.Io> <mgsmith=netgate....@lists.fd.io
    <mailto:netgate....@lists.fd.io>> wrote:


        I don't have a solution yet, but one observation has popped
        up quickly....

        In the 2 failed jobs Ray sent links for, one of them had a
        test fail which was not related to VRRP. There is a BFD6 test
        failure for the NAT change https://gerrit.fd.io/r/c/vpp/+/25462:

        
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1804/2678/archives/

        Looking back through a couple of recent failed runs of that
        job, there is also a DHCP6 PD test failure for rdma change
        https://gerrit.fd.io/r/c/vpp/+/25823:

        
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1804/2682/archives/

        The most obvious common thread between BFD6, DHCP6 and VRRP
        to me seems to be that they all maintain state which is
        dependent on timers. There could be a more general issue with
        timing-sensitive tests. I am going to submit a change which
        will prevent the VRRP tests from running temporarily while I
        can figure out a proper solution. Based on the above, other
        tests may need the same treatment.

        -Matt



        On Thu, Mar 12, 2020 at 8:57 AM Matthew Smith
        <mgsm...@netgate.com <mailto:mgsm...@netgate.com>> wrote:

            Hi Ray,

            Thanks for bringing it to my attention. I'll look into it.

            -Matt


            On Thu, Mar 12, 2020 at 8:24 AM Ray Kinsella
            <m...@ashroe.eu <mailto:m...@ashroe.eu>> wrote:

                Anyone else noticing seeming spurious failures
                related to the VRRP plugin's unit tests.
                Some examples from un-related commits.

                Ray K

                nat: timed out session scavenging upgrade
                (https://gerrit.fd.io/r/c/vpp/+/25462)
                
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1804/2678/console.log.gz

                
==============================================================================
                TEST RESULTS:
                     Scheduled tests: 1138
                      Executed tests: 1138
                        Passed tests: 1021
                       Skipped tests: 112
                            Failures: 3
                              Errors: 2
                FAILURES AND ERRORS IN TESTS:
                  Testcase name: IPv4 VRRP Test Case
                    FAILURE: IPv4 Master VR does not reply for VIP w/
                accept mode off
                [test_vrrp.TestVRRP4.test_vrrp4_accept_mode_disabled]
                    FAILURE: IPv4 Master VR preempted by higher
                priority backup
                [test_vrrp.TestVRRP4.test_vrrp4_master_preempted]
                  Testcase name: IPv6 VRRP Test Case
                    FAILURE: IPv6 Master VR preempted by higher
                priority backup
                [test_vrrp.TestVRRP6.test_vrrp6_master_preempted]
                      ERROR: IPv6 Backup VR preempts lower priority
                master [test_vrrp.TestVRRP6.test_vrrp6_backup_preempts]
                  Testcase name: Bidirectional Forwarding Detection
                (BFD) (IPv6)
                      ERROR: echo function
                [test_bfd.BFD6TestCase.test_echo]
                
==============================================================================

                vlib: startup multi-arch variant configuration
                (https://gerrit.fd.io/r/c/vpp/+/25798_
                
https://logs.fd.io/production/vex-yul-rot-jenkins-1/vpp-verify-master-ubuntu1804/2675/console.log.gz

                
==============================================================================
                TEST RESULTS:
                     Scheduled tests: 22
                      Executed tests: 22
                        Passed tests: 21
                            Failures: 1
                FAILURES AND ERRORS IN TESTS:
                  Testcase name: IPv4 VRRP Test Case
                    FAILURE: IPv4 Master VR preempted by higher
                priority backup
                [test_vrrp.TestVRRP4.test_vrrp4_master_preempted]
                
==============================================================================







-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15765): https://lists.fd.io/g/vpp-dev/message/15765
Mute This Topic: https://lists.fd.io/mt/71901798/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to