Jan, So just to make sure... this does not appear to be an infra issue at this time, but rather a timeout waiting for a vpp restart? Correct?
Ed On Fri, Oct 21, 2016 at 8:46 AM, Dave Wallace <dwallac...@gmail.com> wrote: > Jan, > > I see that there is a "sleep 3" in ...csit/resources/libraries/ > bash/dut_setup.sh after the vpp service restart in order to allow VPP to > complete the restart case. I'm wondering if VPP sometimes takes longer > than that to restart? > > Do you recall if there was ever an attempt to find a determinant test to > verify that VPP has completed initialization? > > I'm going to do some VPP restart testing on the VIRL testbed using the > images built as part of https://jenkins.fd.io/job/vpp- > csit-verify-virl-master/1851/. However if my theory is correct, I'm not > comfortable resolving the problem by simply extending the sleep time. > > IMHO, the VPP restart/initialization time should be measured and verified > to be within a specified period of time. It may be necessary to add > instrumentation to VPP to achieve this goal. There should really be a > separate set of unit tests (with different hugepage allocations) which > pass/fail prior to feature testing. At the very least, we should not be > waiting an arbitrary amount of time during test setup and the actual time > to restart/complete initialization should be measured and logged. > > Thanks, > -daw- > > On 10/21/16 10:00 AM, Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at > Cisco) wrote: > > Hello Dave, > > > > I had a look to failed CSIT runs under https://gerrit.fd.io/r/#/c/3511/ : > > https://jenkins.fd.io/job/vpp-csit-verify-virl-master/1851/ > > https://jenkins.fd.io/job/vpp-csit-verify-virl-master/1853/ > > > > In both cases the timeout exception occurred during test case or test > suite setup phase after restarting vpp (part of the script dut_setup.sh): > > > > Command_done_exec] 'cat /etc/vpp/startup.conf' > > [Command_start_exec] 'sudo -S service vpp restart' > > [Command_outputs] > > Current contents of stderr buffer: > > > > It seems that one of the DUTs has stop to communicate and not responding > even after cca 30s later in the tear down of the test… But there was not a > crash of DUT/VM as the frozen process for dut_setup.sh script is still > there when the connection is working again (the next test case setup phase): > > > > 20:54:26.422 FAIL timeout: Timeout exception. > > Current contents of stdout buffer: > > [Command_desc] Starting /tmp/openvpp-testing/resources/libraries/bash/dut_ > setup.sh > > [Command_start_exec] 'dpkg -l vpp\*' > > [Command_outputs] Desired=Unknown/Install/Remove/Purge/Hold > > | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/ > trig-aWait/Trig-pend > > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > > ||/ Name Version Architecture Description > > +++-==============-============================-============ > -============================================= > > ii vpp 16.12-rc0~238-g0d1509f~b1851 amd64 Vector > Packet Processing--executables > > ii vpp-dbg 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--debug symbols > > ii vpp-dev 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--development support > > ii vpp-dpdk-dev 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--development support > > ii vpp-dpdk-dkms 16.12-rc0~238-g0d1509f~b1851 amd64 DPDK 2.1 > igb_uio_driver > > ii vpp-lib 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--runtime libraries > > ii vpp-plugins 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--runtime plugins > > [Command_done_exec] 'dpkg -l vpp\*' > > [Command_start_exec] 'ps aux | grep vpp' > > [Command_outputs] > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > root 10323 99.3 0.8 2477148 33580 ? Rsl 14:54 0:13 > /usr/bin/vpp -c /etc/vpp/startup.conf > > root 10461 0.0 0.0 47296 3684 ? Ss 14:54 0:00 sudo -Sn > bash /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10462 0.0 0.0 11540 2964 ? S 14:54 0:00 bash > /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10465 0.0 0.0 13232 924 ? S 14:54 0:00 grep vpp > > [Command_done_exec] 'ps aux | grep vpp' > > > > ... > > (the next tests case setup phase) > > 20:55:06.951 FAIL timeout: Timeout exception. > > Current contents of stdout buffer: > > [Command_desc] Starting /tmp/openvpp-testing/resources/libraries/bash/dut_ > setup.sh > > [Command_start_exec] 'dpkg -l vpp\*' > > [Command_outputs] Desired=Unknown/Install/Remove/Purge/Hold > > | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/ > trig-aWait/Trig-pend > > |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) > > ||/ Name Version Architecture Description > > +++-==============-============================-============ > -============================================= > > ii vpp 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--executables > > ii vpp-dbg 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--debug symbols > > ii vpp-dev 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--development support > > ii vpp-dpdk-dev 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--development support > > ii vpp-dpdk-dkms 16.12-rc0~238-g0d1509f~b1851 amd64 DPDK 2.1 > igb_uio_driver > > ii vpp-lib 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--runtime libraries > > ii vpp-plugins 16.12-rc0~238-g0d1509f~b1851 amd64 Vector Packet > Processing--runtime plugins > > [Command_done_exec] 'dpkg -l vpp\*' > > [Command_start_exec] 'ps aux | grep vpp' > > [Command_outputs] > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > > root 10323 25.7 0.8 2477148 35604 ? Ssl 14:54 0:13 > /usr/bin/vpp -c /etc/vpp/startup.conf > > root 10461 0.0 0.0 47296 3684 ? Ss 14:54 0:00 sudo -Sn > bash /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10462 0.0 0.0 11540 2964 ? S 14:54 0:00 bash > /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10467 0.0 0.0 47296 3648 ? S 14:54 0:00 sudo -S > service vpp restart > > root 10468 0.0 0.0 25176 1332 ? S 14:54 0:00 systemctl > restart vpp.service > > root 10482 0.0 0.0 47296 3700 ? Ss 14:54 0:00 sudo -Sn > bash /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10483 0.0 0.0 11540 2960 ? S 14:54 0:00 bash > /tmp/openvpp-testing/resources/libraries/bash/dut_setup.sh > > root 10486 0.0 0.0 13232 1012 ? S 14:54 0:00 grep vpp > > [Command_done_exec] 'ps aux | grep vpp' > > > > [Command_start_exec] 'cat /etc/vpp/startup.conf' > > [Command_outputs] > > > > Regards, > > Jan > > > > *From:* vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io > <vpp-dev-boun...@lists.fd.io>] *On Behalf Of *Dave Wallace > *Sent:* Friday, October 21, 2016 06:01 > *To:* Edward Warnicke <hagb...@gmail.com> <hagb...@gmail.com> > *Cc:* csit-...@lists.fd.io; vpp-dev@lists.fd.io > *Subject:* Re: [vpp-dev] [csit-dev] vpp-csit-verify-virl-master - some > more build failures > > > > I started to +2 this, because the build failures appear to be fixed. Then > decided not to merge it, but managed to hit the submit button anyways. So > it has been merged. > > I'll investigate the random CSIT test failures tomorrow. > > Thanks, > -daw- > > On 10/20/16 3:31 PM, Edward Warnicke wrote: > > So the first run of https://gerrit.fd.io/r/#/c/3511/ succeeded, the > second failed (see: > > > > https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/1833/console ) > which looks like test failures unrelated to the > > build failures we were seeing before. > > > > CSIT folks, thoughts? > > > > Ed > > > > On Thu, Oct 20, 2016 at 11:20 AM, Dave Wallace <dwallac...@gmail.com> > wrote: > > Thanks Ed! > -daw- > > > > On 10/20/2016 01:01 PM, Edward Warnicke wrote: > > You mean like this: https://gerrit.fd.io/r/#/c/3511/1/vpp-api/java/ > Makefile.am :) > > > > Ed > > > > On Thu, Oct 20, 2016 at 9:50 AM, Dave Wallace <dwallac...@gmail.com> > wrote: > > Klement, > > Thanks for helping improve the java build infrastructure. > > I think that you may find that the execution of the build system is very > different on the LF cloud infrastructure then on a typical development > platform. IIRC, this is not the first time we've been bitten by that > particular bug. > > Here is Marek's patch which fixed this issue earlier: > https://gerrit.fd.io/r/#/c/3131 > > Looking at your patch, I think this deletion may be the primary culprit in > opening up the race condition: > > -EXTRA_libjvpp_registry_la_DEPENDENCIES=libjvpp_common.la > > I think it would be worth pushing a patch with that dependency restored > before reverting the entire patch. > > Thanks, > -daw- > > > > On 10/20/2016 12:24 PM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES > at Cisco) wrote: > > It's quite possible, I've made multiple builds and rebuilds to make sure I > didn't break anything, but I apparently did. Feel free to revert the patch > and I'll take a look at it again. That java build stuff is a real mess... > > Klement > > > > On Oct 20, 2016 6:22 PM, Edward Warnicke <hagb...@gmail.com> > <hagb...@gmail.com> wrote: > > This change: > > > > https://gerrit.fd.io/r/#/c/3486/ > > > > Was touching that part of the world... *if* it is a race condition, is it > possible that this patch got lucky, and the ones that followed did not? > > > > Ed > > > > On Thu, Oct 20, 2016 at 9:15 AM, Dave Wallace <dwallac...@gmail.com> > wrote: > > Maciek, > > This looks like a race condition in the build for JVPP. All of the > failures contain the following error: > > 09:32:35 make[5]: Entering directory '/w/workspace/vpp-verify- > master-ubuntu1604/build-root/build-vpp-native/vpp-api/java' > 09:32:35 CC jvpp-common/libjvpp_common_la-jvpp_common.lo > 09:32:35 CC jvpp-core/libjvpp_core_la-jvpp_core.lo > 09:32:35 CC jvpp-registry/libjvpp_registry_la-jvpp_registry.lo > 09:32:39 CCLD libjvpp_common.la > 09:32:39 CCLD libjvpp_registry.la > 09:32:40 ar: `u' modifier ignored since `D' is the default (see `U') > 09:32:40 /usr/bin/ld: cannot find -ljvpp_common > 09:32:40 collect2: error: ld returned 1 exit status > 09:32:40 Makefile:551: recipe for target 'libjvpp_registry.la' failed > > It appears that the output of "CCLD libjvpp_common.la" is an input to > "CCLD libjvpp_registry.la" and in the error case it is not found. > > I had thought that Maros or Marek had provided a fix for this issue a > couple of weeks ago. When I get the chance investigate further. > > Thanks, > -daw- > > > > On 10/20/2016 10:31 AM, Maciek Konstantynowicz (mkonstan) wrote: > > Hi, > > > > Anybody with time and willingness to look at below jobs and see why they > pass 1st time and fail 2nd time, > > ie not repeatable.. > > > > All running on the new ubuntu1604 VMs in VIRL. Related to two vpp patches: > > > > https://gerrit.fd.io/r/#/c/3499/ > > https://jenkins.fd.io/job/vpp-csit-verify-virl-master/1838/ : SUCCESS > > https://jenkins.fd.io/job/vpp-csit-verify-virl-master/1841/ : FAILURE > > > > Seem to be vpp build related, as the standalone ubuntu1604 build jobs for > this patch are also yielding not repeatable results: > > https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/1833/ : FAILURE > > https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/1836/ : SUCCESS > > > > > -Maciek > > > > from https://jenkins.fd.io/job/vpp-csit-verify-virl-master/ > 1841/consoleFull > > 10:52:23 make[1]: Entering directory '/w/workspace/vpp-csit-verify- > virl-master/build-root/deb' > > 10:52:23 dh clean --with dkms --with systemd > > 10:52:24 dh: error: cannot read debian/changelog: No such file or directory > > 10:52:24 debian/rules:21: recipe for target 'clean' failed > > 10:52:24 make[1]: *** [clean] Error 2 > > 10:52:24 make[1]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root/deb' > > 10:52:24 Makefile:1159: recipe for target 'distclean' failed > > 10:52:24 make: *** [distclean] Error 2 > > > > 10:59:49 collect2: error: ld returned 1 exit status > > 10:59:49 Makefile:551: recipe for target 'libjvpp_registry.la' failed > > 10:59:49 make[5]: *** [libjvpp_registry.la] Error 1 > > 10:59:49 make[5]: *** Waiting for unfinished jobs.... > > 11:00:00 make[5]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root/build-vpp-native/vpp-api/java' > > 11:00:00 Makefile:445: recipe for target 'all' failed > > 11:00:00 make[4]: *** [all] Error 2 > > 11:00:00 make[4]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root/build-vpp-native/vpp-api/java' > > 11:00:00 Makefile:377: recipe for target 'all-recursive' failed > > 11:00:00 make[3]: *** [all-recursive] Error 1 > > 11:00:00 make[3]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root/build-vpp-native/vpp-api' > > 11:00:00 Makefile:699: recipe for target 'vpp-api-build' failed > > 11:00:00 make[2]: *** [vpp-api-build] Error 2 > > 11:00:00 make[2]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root' > > 11:00:00 /w/workspace/vpp-csit-verify-virl-master/build-data/platfor > ms.mk:20: recipe for target 'install-deb' failed > > 11:00:00 make[1]: *** [install-deb] Error 1 > > 11:00:00 make[1]: Leaving directory '/w/workspace/vpp-csit-verify- > virl-master/build-root' > > 11:00:00 Makefile:264: recipe for target 'pkg-deb' failed > > 11:00:00 make: *** [pkg-deb] Error 2 > > > > _______________________________________________ > > csit-dev mailing list > > csit-...@lists.fd.io > > https://lists.fd.io/mailman/listinfo/csit-dev > > _______________________________________________ vpp-dev mailing list > vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev > > >
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev