"dev" <dev-boun...@openvswitch.org> wrote on 07/21/2016 06:32:02 AM:

> From: Lance Richardson <lrich...@redhat.com>
> To: ovs dev <dev@openvswitch.org>
> Date: 07/21/2016 06:32 AM
> Subject: [ovs-dev] ovn test failures
> Sent by: "dev" <dev-boun...@openvswitch.org>
>
> It seems the failure rate for OVN end-to-end tests went up significantly
> when commit 70c7cfef188b5ae9940abd5b7d9fe46b1fa88c8e was merged earlier
> this week.
>
> After this commit, 100 iterations of "make check TESTSUITEFLAGs='-j8 -k
ovn'"
> gave (number of failures in left-most column):
>       2 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> (ovn.at:1312)
>      10 2183: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> (ovn.at:2416)
>      52 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> (ovn.at:2529)
>      45 2185: ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR            FAILED
> (ovn.at:2668)
>      23 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> routes FAILED (ovn.at:2819)
>      53 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> FAILED (ovn.at:3053)
>      32 2189: ovn -- 2 HVs, 2 LRs connected via LS, gateway router
> FAILED (ovn.at:3237)
>      50 2190: ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR
> FAILED (ovn.at:3389)
>
> Immediately prior to this (at commit
> 48ff3e25abe31b761d2d3f3a2fd6ccaa783c79dc),
> the number of failures per 100 iterations was much lower:
>       1 2178: ovn -- 2 HVs, 4 lports/HV, localnet ports       FAILED
> (ovn.at:1020)
>       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> (ovn.at:1307)
>       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> (ovn.at:1312)
>       9 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> (ovn.at:2529)
>       7 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> routes FAILED (ovn.at:2819)
>       1 2187: ovn -- send gratuitous arp on localnet          FAILED
> (ovn.at:2874)
>      16 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> FAILED (ovn.at:3053)
>
> Any ideas?
>
> Thanks,
>
>     Lance

As author of that patch, I will admit that those numbers are a
bit disturbing, because they aren't consistent with what I was
seeing while developing and testing the patch series.

What they make me suspect is that that patches doesn't catch all
state transitions (similar to what you uncovered with commit
f94705d729459d808fd139c8f95d5f1f8d8becc6) correctly.

Two things come to mind:
1) Make sure that all of the places where the code needs to request
   a full process of tables are correctly handled.
2) If a later step in the process finds that an earlier step in
   the process needs to process the database rows fully during the
   next cycle, use poll_immediate_wake so that processing happens
   sooner than later.

Ryan
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to