On 12/04/2015 01:47 PM, Joe Stringer wrote:
> On 3 December 2015 at 23:49, Numan Siddique <nusid...@redhat.com> wrote:
>> On 12/04/2015 12:39 PM, Justin Pettit wrote:
>>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <b...@ovn.org> wrote:
>>>>
>>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>>>> would use a sequence to see if anything changed in NB database to
>>>>> compute and notify the SB database, and vice versa. This logic got
>>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>>>> test. Reintroduce the IDL sequence number checking.
>>>>>
>>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use 
>>>>> ovsdb_idl_loop_*
>>>>> functions")
>>>>> Suggested-by: Numan Siddique <nusid...@redhat.com>
>>>>> Signed-off-by: Joe Stringer <j...@ovn.org>
>>>> Acked-by: Ben Pfaff <b...@ovn.org>
>>> I pushed this myself so we can branch for 2.5 with your Acked-by and 
>>> Cascardo's Tested-by.
>> Thanks Justin.
>>
>> I see the "send error: Broken pipe" warn logs [1] in the file - 
>> tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case
>> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and 
>> make it fail at the end by putting
>> "AT_CHECK([echo hi], [1])" at line 1106 before clean up.
> Thanks for the report. Does that mean that we're filtering the broken
> pipe errors out in that test case? I'd expect the test case to fail if
> these logs showed up, without needing someone to modify it.
I don't think that test case is calling "AT_CHECK([check_logs ...])" to check 
the logs. Hence it is passing.
>> I tested this with out the refactored ovn-northd main loop code and I still 
>> see the same logs in the ovsdb-server.log
>> Not sure what is the cause of this. May be the issue is somewhere else.
> It sounds like this patch just makes the bug less likely (perhaps by
> virtue of sending less transactions).
>
>> Also with this patch, I can see the issue [2] again. I guess its not a major 
>> issue.
> Are you able to tell us a bit more about your setup and why you're
> able to reproduce it?
I have a single node devstack setup running on my guest Fedora 22 libvirt VM 
(hosted on my Fedora 23 laptop).
To reproduce the issue, I applied the /"ovn: support ARP response for known 
IPs" //patch from Han Zhou/.
Please see the attached ovn-northd.c file at line 1149.

These are the steps which I followed to reproduce the issue.
 1. Deploy the devstack with networking-ovn
 2. Apply the ARP patch, compile ovn-northd and restart it.
 3. When I run the command "sudo ovn-sbctl dump-flows", I don't see any logical 
flows related to ARP.
 4. If i run "neutron port-create private" or any other neutron command which 
updates the ovn-nb.db, I see the below ARP logical flows reflected
    when i dump the logical flows.
 ----
      table=3(   ls_in_l2_lkup), priority=  150, match=(arp.tpa == 10.0.0.1 && 
arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:d6:20:d7; arp.op = 
2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:d6:20:d7; arp.tpa = 
arp.spa; arp.spa = 10.0.0.1; outport = inport; inport = ""; /* Allow sending 
out inport. */ output;)
  table=3(   ls_in_l2_lkup), priority=  150, match=(arp.tpa == 10.0.0.2 && 
arp.op == 1), action=(eth.dst = eth.src; eth.src = fa:16:3e:81:21:dc; arp.op = 
2; /* ARP reply */ arp.tha = arp.sha; arp.sha = fa:16:3e:81:21:dc; arp.tpa = 
arp.spa; arp.spa = 10.0.0.2; outport = inport; inport = ""; /* Allow sending 
out inport. */ output;)

 ----

 5. When I revert the changes and restart ovn-northd, I still see those ARP 
logical flows until the ovn-nb.db is updated.
 6. Sometimes the issue is not seen, in which case I revert the ARP code in 
ovn-northd.c and try again until I see the issue.

Please also see this - 
http://openvswitch.org/pipermail/discuss/2015-November/019444.html


> I have only been able to reproduce the broken pipe issue on build
> systems like travis where the underlying CPU resource is shared with
> other users, and I suspect they're using nested virtualization which
> can exacerbate certain types of failures. I had been assuming that
> this is related to why it fails.
I am able to reproduce the broken pip issue on my laptop running Fedora 23 by 
running "make check TESTSUITEFLAGS='1713'" where 1713 is the test case number 
for
"[ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR]"


Thanks
Numan

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to