Re: High Availability

guy keren Thu, 15 Apr 2010 00:54:19 -0700

Marc Volovic wrote:

A number of issues:
First - what. You need to replicate (a) links, (b) storage, (c) servicemachines.
Links are internal and external. Multipath internet connexions.Multipath LAN connexions. Multipath storage links. Redund networkinfrastructure (switches, routers, firewalls, IDS/IPS).
Replicate storage. If you use SAN with dedicated links, multipath linksand storage. Redund storage hardware and add storage replication. Addauto-promotion, takeover, and (if possible) partition preventionmechanisms. Use STONITH.
Service machines are the easiest to replicte. Simple heartbeat willprovide a significant level of failover and/or failback. Here, likewise,use STONITH or other partition prevention mechanisms.
Under-utilize. 70% duty cycle is good.

Expect costing hikes.


and - test test test.....

many people fail to test their "highly-available" setup, and as aresult, think they are 'highly available" when they are not.

testing should include various types of scenarios that will show youbugs in various tools as well as configuration errors.

examples: you set up multi-path to the storage, but the default I/Otimeouts are too large -> this easily causes multi-path fail over takingseveral minutes in some scenarios.

you set up heartbeat and think eerything is ok - but then you find thatit doesn't really notice failure in access to the storage system, andwhen there's a connectivity problem just to your SAN system from theactive node - it doesn't fail over to the passive node.

only with rigorous testing you'll find these issues - and usually not onthe first time you test (because this testing is tedious, and becausesome problems are not easy to simulate - e.g. try to simulate ahard-disk failure - plus, sometimes there are races - and a given testtype will fail only once every few attempts...)


--guy

_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Re: High Availability

Reply via email to