I am a bit sceptical about the draft as it appears to be solving something
that doesn't have to be such a huge problem by introducing a new exchange.
First, the ESP sequence number sync. In case of failover the online node
should simply increment the sequence number with a large enough number;
the anti-replay window can always move forward. Implementations should
also perform periodical sequence number sync in the cluster (fe. every 2-5
seconds) to keep the numbers close enough in nodes. I see no reason to
sync this information between peers. It will never work perfectly anyway
(there is too much traffic). More frequently you sync the sequence
numbers in the cluster smaller the increment needs to be (calculate
expected pps).
Second, the message ID problem in failover is a real problem but isn't the
problem really the size of the message window? If everyone would do
SET_WINDOW_SIZE with large enough number (like 32), in failover we could
do something like next_message_id += window_size / 2, and be happy.
Though, implementation must ensure it never sends more than the increment
(that's why window size of 1 doesn't work to begin with). Why was the
window size defined by default to 1 anyway? Is there a reason why this
wouldn't work? (SET_WINDOW_SIZE specifically allows us to move the
window)
Any ongoing exchange at the time of the failover can be an issue (rare)
but most can be eliminated with careful implementation. For example, the
following:
- Delete IKE SA or IPSEC SA
-> Sync delete to cluster before sending packet to network. Nodes don't
actually have to delete the SA, just mark it to be deleted. This
applies to both sending delete and receiving delete.
- Rekey
-> I don't see this as a problem. New CHILD_SA or crash recovery solves
the problem either immediately or relatively quickly.
It's impossible to make this work perfectly (machines can crash at any
point), but important thing is that your implementation can recover
(support crash recovery, do DPD when oddities occur).
I don't much see need for a new exchange, though a draft that explains
best ideas for implementing clustering and HA would be nice.
Pekka
_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec