Hi. Are you sure about "reconnecting" switches? As i wrote before , to reproduce the problem, i had to use 2 switches/bridges.
$ grep -r rconn ovs-vswitchd.log | grep 6653 2016-04-22T08:48:52.725Z|00022|rconn|INFO|s2<->tcp:127.0.0.1:6653: connecting... 2016-04-22T08:48:52.726Z|00023|rconn|WARN|s2<->tcp:127.0.0.1:6653: connection failed (Connection refused) 2016-04-22T08:48:52.726Z|00024|rconn|INFO|s2<->tcp:127.0.0.1:6653: waiting 1 seconds before reconnect 2016-04-22T08:48:52.726Z|00029|rconn|INFO|s1<->tcp:127.0.0.1:6653: connecting... 2016-04-22T08:48:52.726Z|00030|rconn|WARN|s1<->tcp:127.0.0.1:6653: connection failed (Connection refused) 2016-04-22T08:48:52.726Z|00031|rconn|INFO|s1<->tcp:127.0.0.1:6653: waiting 1 seconds before reconnect 2016-04-22T08:48:52.811Z|00032|rconn|WARN|s2<->tcp:127.0.0.1:6653: connection failed (Connection refused) 2016-04-22T08:48:52.811Z|00033|rconn|WARN|s1<->tcp:127.0.0.1:6653: connection failed (Connection refused) 2016-04-22T08:48:53.317Z|00070|rconn|INFO|s1<->tcp:10.25.2.14:6653: connecting... 2016-04-22T08:48:53.330Z|00075|rconn|INFO|s1<->tcp:10.25.2.14:6653: connected 2016-04-22T08:48:53.449Z|00085|rconn|INFO|s2<->tcp:10.25.2.13:6653: connecting... 2016-04-22T08:48:53.459Z|00090|rconn|INFO|s2<->tcp:10.25.2.13:6653: connected 2016-04-22T08:48:56.690Z|00184|rconn|INFO|s1<->tcp:10.25.2.12:6653: connecting... 2016-04-22T08:48:56.706Z|00189|rconn|INFO|s1<->tcp:10.25.2.12:6653: connected 2016-04-22T08:48:56.854Z|00199|rconn|INFO|s1<->tcp:10.25.2.13:6653: connecting... 2016-04-22T08:48:56.865Z|00204|rconn|INFO|s1<->tcp:10.25.2.13:6653: connected 2016-04-22T08:48:57.039Z|00214|rconn|INFO|s2<->tcp:10.25.2.12:6653: connecting... 2016-04-22T08:48:57.049Z|00219|rconn|INFO|s2<->tcp:10.25.2.12:6653: connected 2016-04-22T08:48:57.184Z|00229|rconn|INFO|s2<->tcp:10.25.2.14:6653: connecting... 2016-04-22T08:48:57.199Z|00234|rconn|INFO|s2<->tcp:10.25.2.14:6653: connected There is only 6x "connected", so i believe that was no reconnection. 2 bridges with 3 controllers each. 1) Around time 08:48:53 14 became master s1 and 13 for s2 2) After time 08:48:56 i setup 2 more controllers for both s1 (12,13) and s2(12,14). How do i know if i see "vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) " if it is a request towards s1 or s2? Peter Gubka -----Original Message----- From: Ben Pfaff [mailto:b...@ovn.org] Sent: Monday, May 02, 2016 11:14 PM To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) <pgu...@cisco.com> Cc: b...@openvswitch.org Subject: Re: [ovs-discuss] controller's role mismatch? On Fri, Apr 22, 2016 at 09:32:26AM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote: > Hello, > > I had to use 2 switches/bridges to reproduce the problem. Logs in attachments. > > Just for the time orientation: > Enabling 2 masters for 2 switches (controller firstly sent slave > automatically, and when it finds out that it is the first connection > from that device, it sends master then) Thanks for the logs. Here is my interpretation. First, 14 makes itself master: vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x3): role=master generation_id=1 vconn|DBG|tcp:10.25.2.14:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x3): role=master generation_id=1 vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x3): role=nochange vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x3): role=slave generation_id=0 Then 13 makes itself master: vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x4): role=master generation_id=1 vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x4): role=master generation_id=1 rconn|INFO|s1<->tcp:10.25.2.12:6653: connected vconn|DBG|tcp:10.25.2.12:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x0): role=nochange vconn|DBG|tcp:10.25.2.12:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x0): role=equal generation_id=1 vconn|DBG|tcp:10.25.2.12:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x1): role=slave generation_id=2 vconn|DBG|tcp:10.25.2.12:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x1): role=slave generation_id=2 Then 13 drops the connection and reconnects. Therefore it's initially "equal" and there's no master: rconn|INFO|s1<->tcp:10.25.2.13:6653: connected vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x0): role=nochange vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x0): role=equal generation_id=2 Then 13 requests that it become a "slave" and there's still no master: vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) vconn|DBG|(xid=0x1): role=slave generation_id=3 vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) vconn|DBG|(xid=0x1): role=slave generation_id=3 And no controller ever after in the logs asks to become master, so there's never any master. _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss