[Replying to this email, but commenting about the others also] Paul Wouters writes: > On Oct 21, 2022, at 03:37, Steffen Klassert <steffen.klass...@secunet.com> > wrote: > > Another possibility would be to use the same keymat on all > > percpu SAs > > You cannot do that. You need to ensure unique IVs for AEAD so you > would need to subdivide the IV space. You would also still reach max > operations on these SAs on different times AND things like FIPS puts > an operational max count on the key usage which you can’t do if the > key is used by multiple different states. > > Using different real child SA’s was needed to ensure the > cryptographic security properties.
This is something that is really a important. The keymat between the CPUs can't be same, but we could in theory create a new key hierarchy that generates keys for each sub Child SAs for each CPU, but I think that will just complicate things more, and having real Child SAs for each cpu is the correct solution. In your discussion you were talking about cases where one device has hundreds of cpus and other have few. Only case where such configurations would be useful when other has lots of really low powered cpus and other one has few very fast ones. My understanding is that this is not really happening. Usually the one that has more cpus has cpus which are about the same speed then the one having fewer cpus. There is no point of one having for example 10 fast cpus sending traffic over 10 Child SA, when the receiving end only has two cpus which are about same than the other ends cpus. The receiving end will not be able to keep up with the traffic it is getting in, thus it will drop packets as it can't decrypt them fast enough. So I think we should try to concentrate in the cases where the number of cpus for each end is in the same ballpark. We can have one host having 2 cpus that is twice as fast as the other host having 4 cpus, so creating 4 Child SAs is ok in that case, but I do not think there is ever cases where we are generating more than 2-4 SAs per cpu, i.e., if one end has 2 cpus then practical limit is 8 Child SAs. Any more than that will not help. Also host having hundreds of CPUs will most likely talk to hundreds of other hosts too, so using 10 of cpus to talk to one host, and 10 to talk to other host etc is also a way of splitting up the work. I.e. that "gateway" would most likely advertise having fewer CPUs it actually has. The other host having two cpus will most likely be the "client" end and only talk to that one "gateway" (or someone used way too much of money for device that does not need to be that big)... And I do agree on Valery that there is no point of trying to guess what kind of broken implementations there are out there, we should assume that implementations are following RFC7296, and if there are so many broken implementations we need to take them in to account, then we might want to update RFC7296.... Talking about locking and such thing is bit distracting, as you can do lots of things without locking depending on the datastructures and who writes them and so on. This goes so low level that I am not sure it is that beneficial to talk about them here. For example there are ways of updating the per cpu SAD without locks provided there is only one entity that can update them... We should make sure that all the stable state processing can be done efficiently i.e., without locking etc, but IKE SA creation etc happens every few hours etc, and trying to optimize locking behivior of them is not that useful in the big picture. Also I think it is just better to create all Child SAs at the beginning, i.e., no point of doing that much per CPU aquiring etc. I mean you have some way of distributing packets going out to CPUs before that and if that is round robin then you will create all per CPU SAs very quickly, and if that is something else (like this TCP stream is locked to this CPU), then you mostly keep using only that one CPU (in which case per cpu aquire will be useful), but all of these depends so much on the implementation we are not talking about here that I think that should be left to implementations to decide. If we use per cpu aquiring things then other end might need to create Child SAs too, just in case if the one inititing the connection only sent out one packet and create one SA, and then the other end would like to have 8 SAs for its 8 cpus, but only one was created, so would it now create 7 missing one, or wait for the other end to create them etc. -- kivi...@iki.fi _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec