One note: In fact the split of MASTER is not a big issue, because that would only happen if network runs bad enough, which already cause packet loss.
The problem is it should recover from that situation fast enough. Previously due to ARP ping from BACKUP router(which thought it would replace MASTER), upstream switch would redirect the traffic to original BACKUP router for a while, then as soon as network recovered, MASTER would preempt BACKUP once again. But it may take some time for upstream switch to aware that MAC/Port/IP mapping has been changed. We once tried different MAC for MASTER and BACKUP but found it would result in upstream switch fail to recognize the MASTER again. Now we're still using same MAC for MASTER and BACKUP, and upstream switch can handle the situation better. --Sheng On Wed, Jun 11, 2014 at 12:48 AM, Daan Hoogland < dhoogl...@schubergphilis.com> wrote: > H, > > We had a little meeting on the state of this feature and the way to go. I > have no karma for ASFBot meetings so here is my excerpt from the transcript: > > Attendance: > K3KH Karl Harris > Yasker Sheng Yang > Spark404 Hugo Trippaers > echaz Eric Chazas > LeoSimons Leo Simons > dahn Daan Hoogland > > others where present in the room but not active in the meeting > > Agenda: > - Feasibility experiment plans by Schuberg Philis > - Reusable work by Karl > - Problems Citrix encountered with the regular redundant router > (and how to avoid them) > - Work division > - (next meeting needed?) > > We tried to follow the agenda but were not very strict on it. I'll > summarize outcome per agenda bullet: > > Schuberg Philis wants to implement a feasibility redundant router on a > simulated vpc environment using the operational expertise it has in house. > The outcome would then be back ported to the device, it's agent and the > management server. > > The implementation tactics is to create a json like configuration > description and to let the device do its own configuration. The idea is to > have a single device for normal and vpc routers and to let the redundancy > be a mere property of it. This should lead to the ultimate objective which > is to have a single relatively simple maintainable device. > > Karl will describe his endeavors in adapting the existing device on list. > > Sheng described the QA problems Citrix had with the existing redundant > capabilities of the VR and assured us that only one real problem persists. > The failover time of 3 seconds occasionally leads to a split brain which > leads to two VR's assuming the role of master. As the management server in > a busy environment can take up to 30 seconds the to detect a failover this > can lead to unacceptable outage. One possible solution, to have the > management server serve as negotiator on such occasions, will be hard to > implement due to this latency. Noticeably both routers use the same mac > address on the interface to the load balancer. > > The resources available by Citrix are uncertain. Plan and design needs to > be done. It is agreed that we will work in parallel (Schuberg Philis and > Citrix) but keep in close contact. The amount of resources Sungard has for > this is not discussed. Karl will keep involved. > > We agreed to have a next meeting at 20:00 UTC on June the 17th > > Can someone give me Karma to use ASFBot for this one, please? > > \DaanH > >