Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Tero Kivinen Wed, 26 Oct 2022 15:01:15 -0700

[Replying to this email, but commenting about the others also]

Paul Wouters writes:
> On Oct 21, 2022, at 03:37, Steffen Klassert <steffen.klass...@secunet.com> 
> wrote:
> > Another possibility would be to use the same keymat on all
> > percpu SAs
> 
> You cannot do that. You need to ensure unique IVs for AEAD so you
> would need to subdivide the IV space. You would also still reach max
> operations on these SAs on different times AND things like FIPS puts
> an operational max count on the key usage which you can’t do if the
> key is used by multiple different states.
> 
> Using different real child SA’s was needed to ensure the
> cryptographic security properties.


This is something that is really a important. The keymat between the
CPUs can't be same, but we could in theory create a new key hierarchy
that generates keys for each sub Child SAs for each CPU, but I think
that will just complicate things more, and having real Child SAs for
each cpu is the correct solution.

In your discussion you were talking about cases where one device has
hundreds of cpus and other have few. Only case where such
configurations would be useful when other has lots of really low
powered cpus and other one has few very fast ones. My understanding is
that this is not really happening. Usually the one that has more cpus
has cpus which are about the same speed then the one having fewer
cpus.

There is no point of one having for example 10 fast cpus sending
traffic over 10 Child SA, when the receiving end only has two cpus
which are about same than the other ends cpus. The receiving end will
not be able to keep up with the traffic it is getting in, thus it will
drop packets as it can't decrypt them fast enough.

So I think we should try to concentrate in the cases where the number
of cpus for each end is in the same ballpark. We can have one host
having 2 cpus that is twice as fast as the other host having 4 cpus,
so creating 4 Child SAs is ok in that case, but I do not think there
is ever cases where we are generating more than 2-4 SAs per cpu, i.e.,
if one end has 2 cpus then practical limit is 8 Child SAs. Any more
than that will not help. Also host having hundreds of CPUs will most
likely talk to hundreds of other hosts too, so using 10 of cpus to
talk to one host, and 10 to talk to other host etc is also a way of
splitting up the work. I.e. that "gateway" would most likely advertise
having fewer CPUs it actually has. The other host having two cpus will
most likely be the "client" end and only talk to that one "gateway"
(or someone used way too much of money for device that does not need
to be that big)...

And I do agree on Valery that there is no point of trying to guess
what kind of broken implementations there are out there, we should
assume that implementations are following RFC7296, and if there are so
many broken implementations we need to take them in to account, then
we might want to update RFC7296....

Talking about locking and such thing is bit distracting, as you can do
lots of things without locking depending on the datastructures and who
writes them and so on. This goes so low level that I am not sure it is
that beneficial to talk about them here. For example there are ways of
updating the per cpu SAD without locks provided there is only one
entity that can update them...

We should make sure that all the stable state processing can be done
efficiently i.e., without locking etc, but IKE SA creation etc happens
every few hours etc, and trying to optimize locking behivior of them
is not that useful in the big picture.

Also I think it is just better to create all Child SAs at the
beginning, i.e., no point of doing that much per CPU aquiring etc. I
mean you have some way of distributing packets going out to CPUs
before that and if that is round robin then you will create all per
CPU SAs very quickly, and if that is something else (like this TCP
stream is locked to this CPU), then you mostly keep using only that
one CPU (in which case per cpu aquire will be useful), but all of
these depends so much on the implementation we are not talking about
here that I think that should be left to implementations to decide.

If we use per cpu aquiring things then other end might need to create
Child SAs too, just in case if the one inititing the connection only
sent out one packet and create one SA, and then the other end would
like to have 8 SAs for its 8 cpus, but only one was created, so would
it now create 7 missing one, or wait for the other end to create them
etc. 
-- 
kivi...@iki.fi

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] Discussion of draft-pwouters-ipsecme-multi-sa-performance

Reply via email to