On Tue, 19 Mar 2024, Roman Danyliw wrote:
I performed an AD review of draft-ietf-ipsecme-multi-sa-performance-05. I have
a mostly editorial feedback below:
** Abstract. Editorial. s/crypto/cryptography/
Fixed.
** Section 1.
Most IPsec implementations are currently limited to using one queue
or CPU resource for a Child SA.
-- (Editorial) What kind of queue? Should it be “network queues”?
We left it undefined on purpose. It is hardware/implementation specific.
It could be a nic with 4 ASICs using 1 ASIC only or a computer with 4 CPUs
which each have a queue. I changed it to ...
-- Why couldn’t implementations be changed to use multiple CPUs?
And answering that:
Most IPsec implementations are currently limited to using one
hardware queue or a single CPU resource for a Child SA. Paralyzing
the packet encryption can be done, but there is a bottleneck of
different parts of the hardware locking or waiting to get their
sequence number assigned for the packet it is enrypting. The
result is that a machine with many such resources is limited to
only using one of these resources per Child SA. This severely
limits the throughput that can be attained. For example, at the
time of writing, an unencrypted link of 10Gbps or more is commonly
reduced to 2-5Gbps when IPsec is used to encrypt the link using
AES-GCM. By using the implementation specified in this document,
aggregate throughput increased from 5Gbps using 1 CPU to 40-60
Gbps using 25-30 CPUs.
** Section 1. Editorial.
An
unencrypted link of 10Gbps or more is commonly reduced to 2-5Gbps
when IPsec is used to encrypt the link using AES-GCM. By using the
implementation specified in this document, aggregate throughput
increased from 5Gbps using 1 CPU to 40-60 Gbps using 25-30 CPUs
-- Will this text age well?
See above, wrapped it in "example at the time of writing".
-- Can these statistics be cited?
It was done by the authors, and only presented at an earlier IETF.
So, not really?
** Section 1. Editorial.
While this could be (partially) mitigated by setting up multiple
narrowed Child SAs, for example using Populate From Packet (PFP) as
specified in IPsec Architecture [RFC4301], this IPsec feature would
cause too many Child SAs (one per netflow)
Is it “netflow” or “network flow”? “netflow” is a logging format.
Changed to network flow.
** Section 1. Editorial.
When an IKEv2 peer is receiving more additional Child SA's
It is “more additional Child SA’s” or “additional Child SA’s”?
Removed "more".
** Section 3.
If this initial Child SA
will be tied to a specific resource, it MAY indicate this by
including an identifier in the Notification Data.
How does one tie the Child SA to the specific resource if the identifier is NOT
included in the Notification data? Wouldn’t it be mandatory to include this
identifier?
Tieing the Child SA to a specific resource is a local policy decision
and not negotiated. The identifier is only there for debugging purposes.
** Section 4. Is this section normative? Why are RFC2119 key words used in an
example?
Why do you say this is an example? It is the Implementation
Considerations section telling you to do or do not some things?
** Section 4. Doesn’t the example in the paragraph starting with “A simple
distribution could be to install one additional Child SA on each CPU” violate
the situation described in Section 2 (i.e., coordination across CPUs)?
No. There is a still central control on negotiating and installing the
Child SAs, but once installed (on a CPU), it can run indepedently of
other CPUs.
** Section 5. The diagrams in Section 5.* show “Next Payload”, a fields flags
and a “Payload length”. Where are those defined?
This was fixed yesterday by adding this line to the start of Section 5:
The Notify Payload format is defined in IKEv2 [RFC7296] section 3.10,
and is copied here for convenience.
** Section 5.1. Editorial. The diagram says “optional resource identifier”.
The description of the fields says “Optional Payload Data”
Fixed.
** Section 5.1
The opaque data SHOULD be a
unique identifier within all the Child SAs with the same TS
payloads and the peer SHOULD only use it for debugging purposes.
-- Why is normative guidance being provided on the contents or semantics of an
“opaque data” blob?
Moved up to Section 2.
-- I don’t understand how to read the “SHOULD” in this text. If not intended
to be a unique identifier, what else should this opaque data be used for?
It should not be used to make protocol or hardware decisions. For
example, one should not send an identier of "2" and use that to place
the Child SA on CPU#2.
-- When and why should this be used for “non-debugging purposes”?
never. The SHOULD has been changed to MUST.
** Section 6.
Peers SHOULD be lenient with the maximum number of Child SAs they
allow for a given TSi/TSr combination to account for corner cases.
What does “lenient” mean here?
"account for corner cases" as explained further done?
Eg one should not use a hard max of 4 when one runs on a 4-CPU system.
** Section 6.
As additional Child SAs consume
little additional overhead, allowing at the very least double its
intended CPUs is RECOMMENDED.
Can this language please be clarified? I don’t understand. “Double” relative
to what baseline? Is this recommending the number Child SAs per CPU?
Changed to: allowing at the very least double the number of available CPUs is
RECOMMENDED.
** Section 6.
An implementation MAY limit the number
of Child SAs only based on its resources - that is only limit these
based on regular denial of service protection.
Why can’t an implementation limit SAs based on any policy?
Changed to:
As additional Child SAs
consume little additional resources, allowing at the very least double
the number of available CPUs is RECOMMENDED. An implementation MAY allow
unlimited additional Child SAs and only limit this number based on its
generic resource protection strategies that are used to require COOKIES
or refuse new IKE or Child SA negotiations. Although having a very large
number (eg hundreds or thousands) of SAs may slow down per-packet SAD
lookup.
** Section 7.
Similar to how an implementation should limit the number of half-open
SAs to limit the impact of a denial of service attack, an
implementation SHOULD limit the maximum number of additional Child
SAs allowed per unique TSi/TSr.
Is it a SHOULD or RECOMMENDED?
According to https://datatracker.ietf.org/doc/html/rfc2119#section-3
that is exactly the same thing? I reworded to use RECOMMENDED.
** Section 7. Editorial.
Using multiple resource specific child SAs makes sense for high
volume IPsec connections on IPsec gateway machines where the
administrator has a reasonable trust relationship with the peer's
administrator and abuse is unlikely and easilly escalated to resolve.
-- What makes a trust relationship “reasonable”? Would it be clear to omit
that word?
-- Typo. s/easilly/easily/
** Section 7. Typo. s/identifer/identifier/?
All fixed.
** Section 9. Typo. The registry names is incorrect (i.e., s/... Notify
Messages/... Notify Messages Types/)
Tero had asked me to use these names :P I had the right ones there.
There is redundency in the naming that makes it all confusing, eg:
eg "Notify Messages Types - Status Types", with the table name
"Notify Message Status Type". I've fixed it up the the official
names.
Paul
_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec