Re: [IPsec] Comments on draft-pwouters-multi-sa-performance

Paul Wouters Mon, 01 Nov 2021 20:26:59 -0700

On Fri, 29 Oct 2021, Panwei (William) wrote:

Hi William,

Subject: [IPsec] Comments on draft-pwouters-multi-sa-performance

I’ve read the recent version. This is an interesting solution. I think it 
should be adopted. Below are some comments.


Thnks for reading the draft and giving us feedback!

1.

The CPU_QUEUES notification value refers to the number of additional

resource-specific Child SAs that may be installed for this particular

TSi/TSr combination excluding the Fallback Child SA.

Is it necessary to limit the amount of additional SAs at the beginning while 
TS_MAX_QUEUE can be used to reject the request of
creating additional SA at any time? In the virtualization scenario, the new VMs 
can be launched on-demand, in other words, it may be
seen as the number of CPUs isn’t fixed, so maybe limiting the addition SAs at 
the beginning will damage the flexibility.


The limit is really a very high maximum number and not a very low number
exactly matching CPUs. We had that at first, to try and optimize it but
there were too many race conditions, eg with rekeying. So if you have
a peer with 4 CPUs and a peer with 2 CPUs, you might just want to set
the max to 8 or even 12. It is mostly meant to try and avoid doing
CREATE_CHILD_SA's that are just doomed to failure anyway. So see it
more as a resource cap that a strict physical limitation.

2.

The CPU_QUEUES notification payload is sent in the IKE_AUTH or

CREATE_CHILD_SA Exchange indicating the negotiated Child SA is a

Fallback SA.

Before additional SAs are created, is there any difference between using this 
first/Fallback SA and using other normal SA? I think
there is no difference. So maybe we don’t need to add this notification payload 
when creating the first SA.


the problem right now is that implementations often will discard an
an older identical Child SA for the newest one. So one of the key
intentions of the document is for the initiator/responder to clearly
negotiate from the start they are going to be using Child SA's with
identical Traffic Selectors and they want the older ones to stick
around. Also, it is important that there is always 1 fallback Child
SA that can be used on any CPU resource. So we really wanted to mark
that one very clearly. For instance, if it becomes idle, it should
NOT be deleted.

When the initiator wants
to create an additional SA, it can directly send the request with 
CPU_QUEUE_INFO notification payload.


It would be good to know from the responder if they support this and if
they are willing to do this before doing the CREATE_CHILD_SA. And as I
said above, to ensure both parties agree on which Child SA is the
"always be present" fallback SA to ensure things like adding a new CPU
always results in encrypted packets via the fallback SA.

There are 3 ways that the
responder may reply: 1) The responder doesn’t support/recognize this 
notification, it will ignore this notification and reply as
usual.


But there is no "as usual" for what happens to the older Child SA. Some
implementations will allow it, some will only allow it if it has its own
IKE SA, and some will just delete the old one. This is the ambiguity we
are trying to address with the draft.

2) It supports this function and is willing to create the additional SA, so it 
will reply with CPU_QUEUE_INFO notification
too. 3) It supports this function, but it isn’t willing to create more 
additional SAs, so it will reply with TS_MAX_QUEUE.
Therefore, it seems like that CPU_QUEUE_INFO and TS_MAX_QUEUE these 2 
notifications are enough to use, and the draft can be
simplified to only use these 2 notifications.


I hope I explained why we think some clear signal has its use. If you
take your assumptions to the max, one would need no document at all,
as the IKEv2 specification states there can be Child SAs that are
duplicates or with overlapping IP ranges, so in theory, nothing is
needed.

3.

Both peers send

the preferred minimum number of additional Child SAs to install.

First, I think sending the number of additional Child SAs is unnecessary. 
Second, when using “minimum” here my first impression is
that it means 0, so in order to remove ambiguity I suggest just saying “the 
preferred number” (if you think sending the number is
necessary).


The use of minimum is indicating what the peer needs. A peer with 4 CPUs
does not prefer 4, it really prefers as many as the highest number of
CPUs of the two peers - within reason. The preference is really a
combination of what works best for the combination of the two peers.

Note the minimum is not about the minimum number required for
functioning, but the minimum number to get optimum performance.

By indicating the minimum, both sides can pick the highest minimum and
then allow a few more (for race conditions during rekeying).

4.

If a CREATE_CHILD_SA exchange request containing both a
CPU_QUEUE_INFO and a CPU_QUEUES notification is received, the
responder MUST ignore the CPU_QUEUE_INFO payload. If a
CREATE_CHILD_SA exchange reply is received with both CPU_QUEUE_INFO
and CPU_QUEUES notifications, the initiator MUST ignore the
notification that it did not send in the request.

I think there is ambiguity here. When the initiator sends the CREATE_CHILD_SA 
exchange request containing both a CPU_QUEUE_INFO and
a CPU_QUEUES notification, and the responder also adds CPU_QUEUE_INFO and 
CPU_QUEUES notifications in the reply, the initiator
doesn’t know how to process with this situation, should the initiator ignore 
the CPU_QUEUE_INFO payload or notify an error to the
responder?


We went back and forth on this a couple of times with the authors. We
really wanted to keep it as simple as possible but also not be too
pedantic. From a protocol point of view, we could say to just return
an error like SYNTAX_ERROR, but that would cause the IKE SA and all
its working Child SAs to also be torn down, and we wanted to avoid
that so bugs in the performance implementation does not result in
completely tunnel failures. Hence our phrasing of "just ignore X"
on both the initiator and responder.

We agree that a broken initiator with a broken responder leads to
something broken. I think specifying how a broken initiator should
respond to a broken responder is taking the Postel Principle a step
too far? :)

Paul

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] Comments on draft-pwouters-multi-sa-performance

Reply via email to