[IPsec] HA/LS terminology

Rodney Van Meter Tue, 23 Mar 2010 10:59:52 -0700

I am *NOT* an expert on fault tolerance, but I have studied it a
little (long ago, if not so far away), and I worked on Network
Alchemy's fault tolerant implementation of an IPsec gateway (a decade
ago, and a little farther away).  So, some suggestions on the
terminology for the HA&LS draft.


Terminology:

"High Availability" refers to a gateway or cluster whose expected
downtime is low on e.g. an annual basis.  High availability may be
achieved using fault tolerant techniques such as hardware/software
clustering.  It may also be achieved by e.g. using extremely robust
components and having a very low reboot time.

"Fault Tolerant" refers to a gateway or cluster that will maintain
service availability even when a specified set of fault conditions
occurs, such as the loss of one or more cluster members to a hardware
or software fault.

Clusters whose purpose is improving availability may operate using a
"hot standby" model, in which one or more gateways is active and one
or more gateways is held in reserve and activated when the failure of
an active member is detected.  Clusters whose purpose is improving
scalability (of performance, number of active connections, etc.),
using a "load sharing" model, have more than one member active.

IPsec gateways must be prepared for their peers to lose state, e.g. as
a result of a reboot, resulting in each peer attempting to reconnect.
The latency of that reconnection, and the computational load of
reconnecting a large number of peers, means that a fast-rebooting
gateway alone is not sufficient to provide high availability service,
driving the need for fault tolerance in an IPsec gateway
implementation.

If a fault is never visible to peers, the cluster is said to be
"completely transparent".  If some peers must reconnect, or a change
of IP address is visible, the cluster is said to be "partially
transparent".  It is possible to create an implementation with lazy
synchronization or an otherwise incompletely redundant state,
resulting in e.g. a few percent of peers (or a few percent probability
of any given peer) being aware of the fault.

                --Rod

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

[IPsec] HA/LS terminology

Reply via email to