Hi Pekka,
Thanks for the comments on draft. Please find my comments inline. Regards, Kalyani -----Original Message----- From: ipsec-boun...@ietf.org [mailto:ipsec-boun...@ietf.org] On Behalf Of Pekka Riikonen Sent: Friday, September 03, 2010 2:48 PM To: y...@checkpoint.com Cc: ipsec@ietf.org; kivi...@iki.fi Subject: Re: [IPsec] Comments draft-kagarigi-ipsecme-ikev2-windowsync-04 I am a bit sceptical about the draft as it appears to be solving something that doesn't have to be such a huge problem by introducing a new exchange. First, the ESP sequence number sync. In case of failover the online node should simply increment the sequence number with a large enough number; the anti-replay window can always move forward. Implementations should also perform periodical sequence number sync in the cluster (fe. every 2-5 seconds) to keep the numbers close enough in nodes. I see no reason to sync this information between peers. It will never work perfectly anyway (there is too much traffic). More frequently you sync the sequence numbers in the cluster smaller the increment needs to be (calculate expected pps). Second, the message ID problem in failover is a real problem but isn't the problem really the size of the message window? If everyone would do SET_WINDOW_SIZE with large enough number (like 32), in failover we could do something like next_message_id += window_size / 2, and be happy. Though, implementation must ensure it never sends more than the increment (that's why window size of 1 doesn't work to begin with). Why was the window size defined by default to 1 anyway? Is there a reason why this wouldn't work? (SET_WINDOW_SIZE specifically allows us to move the window) [KALYANI] We can always have the larger window size and send the new request from failover device with the incremented message Id. But the problem with this approach is, In case of windowing unless all the messages are received with in the window range , the window never moves, hence the lost Request will never be sent and eventually the sa will have to be deleted, which can be avoided if this draft is implemented. Any ongoing exchange at the time of the failover can be an issue (rare) but most can be eliminated with careful implementation. For example, the following: - Delete IKE SA or IPSEC SA -> Sync delete to cluster before sending packet to network. Nodes don't actually have to delete the SA, just mark it to be deleted. This applies to both sending delete and receiving delete. [KALYANI] If the message to delete the IKE SA is lost , then this would make the active and failover device to be out-of-sync. - Rekey -> I don't see this as a problem. New CHILD_SA or crash recovery solves the problem either immediately or relatively quickly. It's impossible to make this work perfectly (machines can crash at any point), but important thing is that your implementation can recover (support crash recovery, do DPD when oddities occur). [KALYANI] This draft proposes the synchronization of message Id's using the IKE SA which is present on failover and peer devices. In case of active member crash during IKE SA delete/rekey, the SA at peer and failover device does not match( which means old sa is present on failover and new sa is present on peer). IKE message Id synchronization is not meant to solve such issues. I don't much see need for a new exchange, though a draft that explains best ideas for implementing clustering and HA would be nice. Pekka _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec
_______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec