On 12/04/2015 01:47 PM, Joe Stringer wrote: > On 3 December 2015 at 23:49, Numan Siddique <nusid...@redhat.com> wrote: >> On 12/04/2015 12:39 PM, Justin Pettit wrote: >>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <b...@ovn.org> wrote: >>>> >>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote: >>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we >>>>> would use a sequence to see if anything changed in NB database to >>>>> compute and notify the SB database, and vice versa. This logic got >>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl >>>>> test. Reintroduce the IDL sequence number checking. >>>>> >>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use >>>>> ovsdb_idl_loop_* >>>>> functions") >>>>> Suggested-by: Numan Siddique <nusid...@redhat.com> >>>>> Signed-off-by: Joe Stringer <j...@ovn.org> >>>> Acked-by: Ben Pfaff <b...@ovn.org> >>> I pushed this myself so we can branch for 2.5 with your Acked-by and >>> Cascardo's Tested-by. >> Thanks Justin. >> >> I see the "send error: Broken pipe" warn logs [1] in the file - >> tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case >> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and >> make it fail at the end by putting >> "AT_CHECK([echo hi], [1])" at line 1106 before clean up. > Thanks for the report. Does that mean that we're filtering the broken > pipe errors out in that test case? I'd expect the test case to fail if > these logs showed up, without needing someone to modify it. I don't think that test case is calling "AT_CHECK([check_logs ...])" to check the logs. Hence it is passing. >> I tested this with out the refactored ovn-northd main loop code and I still >> see the same logs in the ovsdb-server.log >> Not sure what is the cause of this. May be the issue is somewhere else. > It sounds like this patch just makes the bug less likely (perhaps by > virtue of sending less transactions). > >> Also with this patch, I can see the issue [2] again. I guess its not a major >> issue. > Are you able to tell us a bit more about your setup and why you're > able to reproduce it? I have a single node devstack setup running on my guest Fedora 22 libvirt VM (hosted on my Fedora 23 laptop). To reproduce the issue, I applied the /"ovn: support ARP response for known IPs" //patch from Han Zhou/. Please see the attached ovn-northd.c file at line 1149.
These are the steps which I followed to reproduce the issue. 1. Deploy the devstack with networking-ovn 2. Apply the ARP patch, compile ovn-northd and restart it. 3. When I run the command "sudo ovn-sbctl dump-flows", I don't see any logical flows related to ARP. 4. If i run "neutron port-create private" or any other neutron command which updates the ovn-nb.db, I see the below ARP logical flows reflected when i dump the logical flows. ---- table=3( ls_in_l2_lkup), priority= 150, match=(arp.tpa == 10.0.0.1 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:d6:20:d7; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:d6:20:d7; arp.tpa = arp.spa; arp.spa = 10.0.0.1; outport = inport; inport = ""; /* Allow sending out inport. */ output;) table=3( ls_in_l2_lkup), priority= 150, match=(arp.tpa == 10.0.0.2 && arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:81:21:dc; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:81:21:dc; arp.tpa = arp.spa; arp.spa = 10.0.0.2; outport = inport; inport = ""; /* Allow sending out inport. */ output;) ---- 5. When I revert the changes and restart ovn-northd, I still see those ARP logical flows until the ovn-nb.db is updated. 6. Sometimes the issue is not seen, in which case I revert the ARP code in ovn-northd.c and try again until I see the issue. Please also see this - http://openvswitch.org/pipermail/discuss/2015-November/019444.html > I have only been able to reproduce the broken pipe issue on build > systems like travis where the underlying CPU resource is shared with > other users, and I suspect they're using nested virtualization which > can exacerbate certain types of failures. I had been assuming that > this is related to why it fails. I am able to reproduce the broken pip issue on my laptop running Fedora 23 by running "make check TESTSUITEFLAGS='1713'" where 1713 is the test case number for "[ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR]" Thanks Numan _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev