On Wed, Sep 28, 2011 at 8:52 AM, Mark Smith <m...@bumptechnologies.com> wrote: > Hi all, > > Here at Bump we currently have our handset traffic routed through a > single server. For obvious reasons, we want to expand this to > multiple nodes for redundancy. The load balancer is doing two tasks: > TLS termination and then directing traffic to one of our internal > application servers. > > We want to split the single load balancer into an HA cluster. Our > chosen solution involves creating one public facing VIP for each > machine, and then floating those VIPs between the load balancer > machines. Ideally there is one public IP per machine and we use DNS > round robin to send traffic to the IPs. > > We considered having two nodes and floating a single VIP between them, > the canonical heartbeat setup, but would prefer to avoid that because > we know we're going to run into the situation where our TLS > termination takes more CPU than we have available on a single node. > Balancing across N nodes seems the most obvious way to address that. > > We have allocated three (3) nodes to our cluster. I want to run our > design by this group and tell you our problems and see if anybody has > some advice. > > * no-quorum-policy set to ignore. We would, ideally, like to have our > cluster continue to operate even if we lose the majority of nodes. > Even if we're in a CPU limited situation, it would be better to serve > slowly than to drop 33% or 66% of our traffic on the floor because we > lost quorum and the floating VIPs weren't migrated to the remaining > nodes. > > * STONITH disabled. Originally I tried to enable this, but with the > no-quorum-policy set to ignore, it seems to go on killing sprees.
Try no-quorum-policy=freeze instead. > It > has fenced healthy nodes for no reason I could determine: > > - "node standby lb1" > * resources properly migrate to lb2, lb3 > * everything looks stable and correct > - "node online lb1" > * resources start migrating back to lb1 > * lb2 gets fenced! (why? it was healthy) Did a stop action fail? > * resources migrating off of lb2 > > I have seen it double-fence, too, with lb1 being the only surviving > node and lb2 and lb3 being unceremoniously rebooted. I'm not sure > why. STONITH seems to be suboptimal (heh) in this particular set up. > > Anyway -- that means our configuration is very, very simple: > > node $id="65c71911-737e-4848-b7d7-897d0ede172a" patron > node $id="b5f2fd18-acf1-4b25-a571-a0827e07188b" oldfashioned > node $id="ef11cced-0062-411b-93dd-d03c2b8b198c" nattylight > primitive cluster-monitor ocf:pacemaker:ClusterMon \ > params extra_options="--mail-to blah" htmlfile="blah" \ > meta target-role="Started" > primitive floating_216 ocf:heartbeat:IPaddr \ > params ip="173.192.13.216" cidr_netmask="255.255.255.252" nic="eth1" \ > op monitor interval="60s" timeout="30s" \ > meta target-role="Started" > primitive floating_217 ocf:heartbeat:IPaddr \ > params ip="173.192.13.217" cidr_netmask="255.255.255.252" nic="eth1" \ > op monitor interval="60s" timeout="30s" \ > meta target-role="Started" > primitive floating_218 ocf:heartbeat:IPaddr \ > params ip="173.192.13.218" cidr_netmask="255.255.255.252" nic="eth1" \ > op monitor interval="60s" timeout="30s" \ > meta target-role="Started" > property $id="cib-bootstrap-options" \ > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > cluster-infrastructure="Heartbeat" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > symmetric-cluster="true" \ > last-lrm-refresh="1317079926" > > Am I on the right track with this? Am I missing something obvious? > Am I misapplying this tool to our problem and should I go in a > different direction? > > In the real world, I would use ECMP (or something like that) between > the router and my load balancers. However, I'm living in the world of > managed server hosting (we're not quite big enough to colo) so I don't > have that option. :-) > > > -- > Mark Smith // Operations Lead > m...@bumptechnologies.com > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker