Re: [IPsec] AD Review of draft-ietf-ipsecme-multi-sa-performance-05

Paul Wouters Tue, 19 Mar 2024 21:26:45 -0700

On Tue, 19 Mar 2024, Roman Danyliw wrote:

I performed an AD review of draft-ietf-ipsecme-multi-sa-performance-05.  I have 
a mostly editorial feedback below:


** Abstract.  Editorial.  s/crypto/cryptography/


Fixed.

** Section 1.
  Most IPsec implementations are currently limited to using one queue
  or CPU resource for a Child SA.

-- (Editorial) What kind of queue?  Should it be “network queues”?


We left it undefined on purpose. It is hardware/implementation specific.
It could be a nic with 4 ASICs using 1 ASIC only or a computer with 4 CPUs
which each have a queue. I changed it to ...

-- Why couldn’t implementations be changed to use multiple CPUs?


And answering that:

       Most IPsec implementations are currently limited to using one
       hardware queue or a single CPU resource for a Child SA. Paralyzing
       the packet encryption can be done, but there is a bottleneck of
       different parts of the hardware locking or waiting to get their
       sequence number assigned for the packet it is enrypting. The
       result is that a machine with many such resources is limited to
       only using one of these resources per Child SA. This severely
       limits the throughput that can be attained. For example, at the
       time of writing, an unencrypted link of 10Gbps or more is commonly
       reduced to 2-5Gbps when IPsec is used to encrypt the link using
       AES-GCM. By using the implementation specified in this document,
       aggregate throughput increased from 5Gbps using 1 CPU to 40-60
       Gbps using 25-30 CPUs.

** Section 1.  Editorial.
  An
  unencrypted link of 10Gbps or more is commonly reduced to 2-5Gbps
  when IPsec is used to encrypt the link using AES-GCM.  By using the
  implementation specified in this document, aggregate throughput
  increased from 5Gbps using 1 CPU to 40-60 Gbps using 25-30 CPUs

-- Will this text age well?


See above, wrapped it in "example at the time of writing".

-- Can these statistics be cited?


It was done by the authors, and only presented at an earlier IETF.
So, not really?

** Section 1.  Editorial.
  While this could be (partially) mitigated by setting up multiple
  narrowed Child SAs, for example using Populate From Packet (PFP) as
  specified in IPsec Architecture [RFC4301], this IPsec feature would
  cause too many Child SAs (one per netflow)

Is it “netflow” or “network flow”?  “netflow” is a logging format.


Changed to network flow.

** Section 1.  Editorial.
When an IKEv2 peer is receiving more additional Child SA's

It is “more additional Child SA’s” or “additional Child SA’s”?


Removed "more".

** Section 3.
  If this initial Child SA
  will be tied to a specific resource, it MAY indicate this by
  including an identifier in the Notification Data.

How does one tie the Child SA to the specific resource if the identifier is NOT 
included in the Notification data?  Wouldn’t it be mandatory to include this 
identifier?


Tieing the Child SA to a specific resource is a local policy decision
and not negotiated. The identifier is only there for debugging purposes.

** Section 4.  Is this section normative?  Why are RFC2119 key words used in an 
example?


Why do you say this is an example? It is the Implementation
Considerations section telling you to do or do not some things?

** Section 4.  Doesn’t the example in the paragraph starting with “A simple 
distribution could be to install one additional Child SA on each CPU” violate 
the situation described in Section 2 (i.e., coordination across CPUs)?


No. There is a still central control on negotiating and installing the
Child SAs, but once installed (on a CPU), it can run indepedently of
other CPUs.

** Section 5.  The diagrams in Section 5.* show “Next Payload”, a fields flags 
and a “Payload length”.  Where are those defined?


This was fixed yesterday by adding this line to the start of Section 5:

        The Notify Payload format is defined in IKEv2 [RFC7296] section 3.10, 
and is copied here for convenience.

** Section 5.1.  Editorial. The diagram says “optional resource identifier”.  
The description of the fields says “Optional Payload Data”


Fixed.

** Section 5.1
     The opaque data SHOULD be a
     unique identifier within all the Child SAs with the same TS
     payloads and the peer SHOULD only use it for debugging purposes.

-- Why is normative guidance being provided on the contents or semantics of an 
“opaque data” blob?


Moved up to Section 2.

-- I don’t understand how to read the “SHOULD” in this text.  If not intended 
to be a unique identifier, what else should this opaque data be used for?


It should not be used to make protocol or hardware decisions. For
example, one should not send an identier of "2" and use that to place
the Child SA on CPU#2.

-- When and why should this be used for “non-debugging purposes”?


never. The SHOULD has been changed to MUST.

** Section 6.
  Peers SHOULD be lenient with the maximum number of Child SAs they
  allow for a given TSi/TSr combination to account for corner cases.

What does “lenient” mean here?


"account for corner cases" as explained further done?

Eg one should not use a hard max of 4 when one runs on a 4-CPU system.

** Section 6.
  As additional Child SAs consume
  little additional overhead, allowing at the very least double its
  intended CPUs is RECOMMENDED.

Can this language please be clarified?  I don’t understand.  “Double” relative 
to what baseline?  Is this recommending the number Child SAs per CPU?


Changed to: allowing at the very least double the number of available CPUs is 
RECOMMENDED.

** Section 6.
  An implementation MAY limit the number
  of Child SAs only based on its resources - that is only limit these
  based on regular denial of service protection.

Why can’t an implementation limit SAs based on any policy?


Changed to:

      As additional Child SAs
      consume little additional resources, allowing at the very least double
      the number of available CPUs is RECOMMENDED. An implementation MAY allow
      unlimited additional Child SAs and only limit this number based on its
      generic resource protection strategies that are used to require COOKIES
      or refuse new IKE or Child SA negotiations. Although having a very large
      number (eg hundreds or thousands) of SAs may slow down per-packet SAD 
lookup.

** Section 7.
  Similar to how an implementation should limit the number of half-open
  SAs to limit the impact of a denial of service attack, an
  implementation SHOULD limit the maximum number of additional Child
  SAs allowed per unique TSi/TSr.

Is it a SHOULD or RECOMMENDED?


According to https://datatracker.ietf.org/doc/html/rfc2119#section-3
that is exactly the same thing? I reworded to use RECOMMENDED.

** Section 7.  Editorial.
  Using multiple resource specific child SAs makes sense for high
  volume IPsec connections on IPsec gateway machines where the
  administrator has a reasonable trust relationship with the peer's
  administrator and abuse is unlikely and easilly escalated to resolve.

-- What makes a trust relationship “reasonable”?  Would it be clear to omit 
that word?

-- Typo.  s/easilly/easily/

** Section 7.  Typo. s/identifer/identifier/?


All fixed.

** Section 9.  Typo.  The registry names is incorrect (i.e., s/... Notify 
Messages/... Notify Messages Types/)


Tero had asked me to use these names :P I had the right ones there.

There is redundency in the naming that makes it all confusing, eg:
eg "Notify Messages Types - Status Types", with the table name
"Notify Message Status Type". I've fixed it up the the official
names.

Paul

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] AD Review of draft-ietf-ipsecme-multi-sa-performance-05

Reply via email to