these are NOT with verify…

specifically with test-debug that I added as a separate run at someones 
request..(sorry can’t remember who at this moment)

Ed


On Aug 10, 2017, at 1:07 AM, Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES 
at Cisco) <ksek...@cisco.com<mailto:ksek...@cisco.com>> wrote:

The 2 minute timeout is the result of my recent change. The framework
now forks and runs the test in a child process, and if the child process
fails to send a keep-alive (sent when a test case starts), then it's
killed. Otherwise there'd be no way to recover from stuck mutex or
deadlock..

Are you running the extended tests or the stock verify?

Quoting Ed Kern (ejk) (2017-08-10 00:08:19)
  klement,
  ok…ill think about how to do that without too much trouble in its current
  state..
  in the meantime…blowing out the cpu and memory a bit changed the error……

21:49:42 create 1k of p2p subifs                                                
  OK
21:49:42 
==============================================================================
21:51:52 21:53:13,610 Timeout while waiting for child test runner process (last 
test running was `drop rx packet not matching p2p subinterface' in 
`/tmp/vpp-unittest-P2PEthernetIPV6-GDHSDK')!
21:51:52 Killing possible remaining process IDs:  19954 19962 19964

21:45:05 PPPoE Test Case
21:45:05 ===================================21:48:13,778 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-I0REOQ')!
21:47:45 Killing possible remaining process IDs:  20017 20025 20027

20:48:46 PPPoE Test Case
20:48:46 ===================================20:51:34,082 Timeout while waiting 
for child test runner process (last test running was `drop rx packet not 
matching p2p subinterface' in `/tmp/vpp-unittest-P2PEthernetIPV6-tQ5sP0')!
20:51:05 Killing possible remaining process IDs:  19919 19927 19929

  anything new/different/exciting in here?
  Also the memory/cpu expansion (by roughly a third) these failures happen
  in the order of 2/3 minutes as opposed to a 90 leading to timeout failure.
  Since the verifies are still happily chugging along I ASSuME that this
  drop packet check isn’t happening in that suite?
  Ed

    On Aug 9, 2017, at 1:04 PM, Klement Sekera -X (ksekera - PANTHEON
    TECHNOLOGIES at Cisco) <[1]ksek...@cisco.com<mailto:ksek...@cisco.com>> 
wrote:
    Ed,

    it'd help if you could collect log.txt from a failed run so we could
    peek under the hood... please see my other email in this thread...

    Thanks,
    Klement

    Quoting Ed Kern (ejk) (2017-08-09 20:48:46)

        this is not you…or this patch…
        the make test-debug has had a 90+% failure rate (read not 100%) for
      at
        least the last 100 builds
        (far back as my current logs go but will probably blow that out a
      bit now)
        you hit the one that is seen most often… on that create 1k of p2p
      subifs
        the other much less frequent is

      13:40:24 CGNAT TCP session close initiated from outside network
                        OK
      13:40:24 =================================================Build timed
      out (after 120 minutes). Marking the build as failed.

        so currently I’m allocating 10000 MHz in cpu and 8G in memory for
      verify
        and also for test-debug runs…
        Im not obviously getting (as you can see) errors about it running
      out of
        memory but I wonder if thats possibly whats happening..
        its easy enough to blow my allocations out a bit and see if that
      makes a
        difference..
        If anyone has other ideas to try and happy to give them a shot..
        appreciate the heads up
        Ed

          On Aug 9, 2017, at 12:07 PM, Dave Barach (dbarach)
          <[1][2]dbar...@cisco.com<mailto:dbar...@cisco.com>> wrote:
          Please see [2][3]https://gerrit.fd.io/r/#/c/7927, and

          
[3][4]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

          The patch in question is highly unlikely to cause this failure...


          14:37:11
          
==============================================================================
          14:37:11 P2P Ethernet tests
          14:37:11
          
==============================================================================
          14:37:11 delete/create p2p
          subif                                                  OK
          14:37:11 create 100k of p2p
          subifs                                                SKIP
          14:37:11 create 1k of p2p
          subifs                                                  Build
      timed out
          (after 120 minutes). Marking the build as failed.
          16:24:49 $ ssh-agent -k
          16:24:54 unset SSH_AUTH_SOCK;
          16:24:54 unset SSH_AGENT_PID;
          16:24:54 echo Agent pid 84 killed;
          16:25:07 [ssh-agent] Stopped.
          16:25:07 Build was aborted
          16:25:09 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP]
      done
          16:25:11 Finished: FAILURE

          Thanks… Dave

      References

        Visible links
        1. [5]mailto:dbar...@cisco.com
        2. [6]https://gerrit.fd.io/r/#/c/7927
        3. 
[7]http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

References

  Visible links
  1. mailto:ksek...@cisco.com
  2. mailto:dbar...@cisco.com
  3. https://gerrit.fd.io/r/#/c/7927
  4. 
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console
  5. mailto:dbar...@cisco.com
  6. https://gerrit.fd.io/r/#/c/7927
  7. 
http://jenkins.ejkern.net:8080/job/vpp-test-debug-master-ubuntu1604/1056/console

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev
  • [vpp-dev] Spu... Dave Barach (dbarach)
    • Re: [vpp... Ed Kern (ejk)
      • Re: ... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)
        • ... Ed Kern (ejk)
          • ... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)
            • ... Ed Kern (ejk)
              • ... Neale Ranns (nranns)
    • Re: [vpp... Klement Sekera -X (ksekera - PANTHEON TECHNOLOGIES at Cisco)

Reply via email to