Filippo Valsorda writes:
> For what it's worth, implementing P-256 these days is way easier than
> when X25519 was designed. First, Renes, Costello, and Batina published
> complete formulas for it in https://eprint.iacr.org/2015/1060

Actually, Lange and I published complete formulas for P-256 six years
before that, on pages 23-24 of https://cr.yp.to/talks.html#2009.07.17.
The P-256 curve was just one example; the talk title was "Complete
addition laws for all elliptic curves over finite fields".

The bigger picture, however, is that completeness is just one of many
desiderata for ECC implementations. What's amazing about the Montgomery
ladder for Montgomery curves---

   def x25519(x1,n):
     A = 486662
     x2,z2,x3,z3 = 1,0,x1,1
     for i in reversed(range(255)):
       ni = bit(n,i)
       x2,x3 = cswap(x2,x3,ni)
       z2,z3 = cswap(z2,z3,ni)
       x3,z3 = 4*(x2*x3-z2*z3)**2,4*x1*(x2*z3-z2*x3)**2
       x2,z2 = (x2**2-z2**2)**2,4*x2*z2*(x2**2+A*x2*z2+z2**2)
       x3,z3 = x3%p,z3%p
       x2,z2 = x2%p,z2%p
       x2,x3 = cswap(x2,x3,ni)
       z2,z3 = cswap(z2,z3,ni)
     return (x2*pow(z2,p-2,p))%p

---is that it's complete _and_ fast _and_ simpler than any other
approach to ECDH. Part of this streamlining comes from not having any
special cases; part of it comes from the formulas being short; part of
it comes from skipping point-on-curve checks. (The curve is chosen to
be twist-secure, and the formulas work with just the x-coordinate as
input, so _all_ inputs are on secure curves.)

Edwards curves are similarly amazing for signatures---simultaneously
complete and fast and simpler than any other approach to elliptic-curve
signatures. Of course, signatures are more code than ECDH, and there's
correspondingly more to say about how EdDSA is simpler than ECDSA; see
https://blog.cr.yp.to/20140323-ecdsa.html for an introduction.

P-256 ECDH doesn't have code as simple or fast as X25519, and P-256
ECDSA doesn't have code as simple or fast as Ed25519. What's especially
harmful about the NSA/NIST approach to ECC is how large the gaps are
between simple code, fast code, and secure code. Sure, the complete
formulas for P-256 etc. provide further options, but they're more
complicated in other ways---again, it's _not_ the amazing combination
illustrated by the Python code above.

We've seen repeatedly how tensions between streamlined code and secure
code lead to attacks. As https://safecurves.cr.yp.to puts it:

   Most of these attacks would have been ruled out by better choices of
   curves that allow _simple_ implementations to be _secure_
   implementations. This is the primary motivation for SafeCurves. **The
   SafeCurves criteria are designed to ensure ECC security, not just
   ECDLP security.**

The word "most" is important---it's not that the curve choice can
magically solve all problems---but comparative attack examples such as
Minerva (see https://blog.cr.yp.to/20191024-eddsa.html) illustrate how
successful this approach has been at reducing risks.

> although interestingly P-256 still has a red False in the "complete"
> column in version 2017.01.22 of https://safecurves.cr.yp.to

https://safecurves.cr.yp.to/complete.html defines that column:
"SafeCurves requires curves to support simple, fast, _complete_,
constant-time single-coordinate single-scalar multiplication. This
includes the SafeCurves ladder requirement but goes further by requiring
completeness. SafeCurves also requires curves to support simple, fast,
_complete_, constant-time multi-scalar multiplication."

A bait-and-switch game---this code is complete, that code is simple,
etc.; when implementations use the simple code and end up not being
complete, blame the implementor---doesn't meet the definition of this
column. There has to be a single-coordinate single-scalar multiplication
method that's _simultaneously_ simple and fast and so on. Ditto for
multi-scalar multiplication.

The same page also explains why this is important, referring to the
Montgomery/Edwards completeness results from 2006/2007 and then
summarizing why the subsequent completeness results for various other
curves aren't as good:

   Subsequent research has introduced other complete
   scalar-multiplication formulas. However, many of these formulas are
   considerably slower and more complicated than standard incomplete
   scalar-multiplication formulas, creating major conflicts between
   simplicity, efficiency, and security.

We're not living in some fantasy world where implementors are magically
doing what has to be done for security. Implementors are typically
writing the simplest code that they can that passes some tests, and then
modifying it for speed if there are speed complaints (or fears of speed
complaints). Sure, _sometimes_ implementors can be convinced to take
extra steps for security (especially if there are tools enforcing those
steps), but needing more of those steps means more chances of disaster.

Understanding the goal of SafeCurves also makes clear why SafeCurves
defines "fast" and "simple" the way it does:

   "Fast" means that implementations of scalar multiplication for the
   same curve cannot be much faster, and "simple" means that reasonably
   fast implementations of scalar multiplication for the same curve
   cannot be much more concise. At this time there are no examples close
   enough to the edge to warrant quantification of "much".

See https://cr.yp.to/talks.html#2015.01.07 for further examples of how
we can do better than blaming every security problem on implementors.

> Second, we have formally verified field implementation generators such
> as fiat-crypto.

Yes, some useful fragments of ECC software for some environments are
generated by fiat-crypto---but the ECC ecosystem has much more
security-critical code than that.

The state of the art in ECC verification is far beyond fiat-crypto: it's
represented by, e.g., the CURVE25519_X25519_BYTE_SUBROUTINE_CORRECT
theorem from

   
https://github.com/awslabs/s2n-bignum/blob/main/x86/proofs/curve25519_x25519.ml

stating that particular x86 code computes X25519 correctly on all
inputs. In principle we can have software proven correct for _any_
cryptosystem, but in reality this is still a lot of work, and it gets
done fastest for cryptosystems with the simplest software.

> Third, we learned to make key shares always ephemeral which makes
> invalid curve attacks irrelevant.

Focusing on TLS here: I agree that current TLS encourages ECDHE keys to
be used only once. However, there are at least two reasons that this
doesn't justify building an encryption ecosystem where reusing keys
breaks security:

   (1) There's a gap between recommendations and reality. In a world
       where people with a straight face stand up and say things like
       "This hash would cost Google one MILLION dollars, so we should
       skip it!" and "Cryptography is expensive: out of every 2000 CPU
       cycles at Meta, 1 is going to Curve25519!", do you seriously
       imagine that nobody is reusing short-term TLS keys?

   (2) There has been a lot of work aiming at upgrading long-term
       server-identity keys from signature keys to encryption keys.
       The long-term keys are, of course, reused. This upgrade hasn't
       happened yet, but having it happen wouldn't be a surprise.

As AGL put it in the post-quantum context: "CPA vs CCA security is a
subtle and dangerous distinction, and if we're going to invest in a
post-quantum primitive, better it not be fragile."

> If most clients send X25519 and/or X25519+ML-KEM-768 key shares, then
> I am *also *on the hook to carry an optimized and secure X25519
> implementation. That's double the work, double the opportunity for
> mistakes, double the complexity, double the attack surface.

I agree with one part of this. But I disagree with all of the numbers
and, for three reasons, I disagree with the resulting recommendation.

The part I agree with is that choosing X25519+PQ over P-256+PQ is asking
implementors for extra work _if_ the implementors have no other reason
to implement X25519 while they're forced to implement P-256 for current
certificates.

Regarding the volume of extra work, opportunity for mistakes,
complexity, and attack surface, I have three questions:

   (1) Where exactly is the P-256 DH code that's supposed to be as
       simple as X25519 code?

   (2) Where exactly is the P-256 ECDSA code that's supposed to be as
       simple as Ed25519 code?

   (3) Where exactly is the P-256 ECDSA code that's supposed to be as
       simple as X25519 code?

The fact that signatures are more complicated than DH doesn't make the
third question unfair. The message I'm replying to is saying that P-256
is required for a _signature system_, and is then making claims about
the added code volume for "X25519+ML-KEM-768" vs. "P-256+ML-KEM-768", so
the third question is exactly on point. But the first two questions are
useful for other scenarios. I don't want anyone thinking that the
disadvantages of NSA/NIST ECC appear only in limited scenarios.

As for the recommendation to "standardize around P-256+ML-KEM-768 in
TLS", here are my reasons for disagreeing.

First, fundamentally, the rationale is delegating decisions to NIST and
the CA/Browser Forum. That's not what IETF should be doing. Sure, NIST
took until 2023 to standardize Ed25519 and hasn't gotten to X25519 yet,
but that's not a reason for IETF to revert to inferior technology. IETF
should be making its own evaluations of what's best---exactly what added
X25519 and Ed25519 to the IETF portfolio in the first place.

Second, TLS is not living in a vacuum. For applications today that are
using _just_ X25519, adding P-256+PQ is more work than adding X25519+PQ,
and it's easy to imagine this being more work overall than the work
mentioned above. Sure, there could be one hybrid for TLS and a different
hybrid for other protocols, but that split has its own costs. I think
everyone agrees that, all else being equal, having fewer hybrids is
better---but I'm applying this idea to a broader ecosystem.

Third, even if we ignore all the other uses of X25519, my understanding
of the TLS situation today is that X25519 is in basically all TLS stacks
anyway, for reasons that aren't going to disappear any time soon---and,
in particular, that won't be changed by the PQ rollout. So it's not true
that choosing P-256+PQ instead of X25519+PQ will suddenly create simpler
TLS code. As for the long term, I don't see why speculation that X25519
will go away in N years should be given higher weight than speculation
that P-256 will go away in N years.

> marginal performance benefit

Why say "marginal" instead of presenting some actual numbers?

I ran the script below on a 2.245GHz AMD Zen 2 core (with Turbo Boost
disabled; see https://blog.cr.yp.to/20230609-turboboost.html) running
Debian 12 (where the built-in OpenSSL is OpenSSL 3.0.11 from September).
The script downloads and compiles lib25519, and then runs benchmarks
using "openssl speed" with a beta OpenSSL "provider" for lib25519.
Here's the output:

                              op      op/s
 256 bits ecdh (nistp256)   0.0001s  12982.0
 253 bits ecdh (X25519)   0.0000s  24152.0

                               sign    verify    sign/s verify/s
 256 bits ecdsa (nistp256)   0.0000s   0.0001s  27993.0   9834.0
                              sign    verify    sign/s verify/s
 253 bits EdDSA (Ed25519)   0.0000s   0.0001s  69821.4  17070.0

The actual gap is bigger: OpenSSL's built-in code cheats by not
measuring various key-expansion steps, whereas lib25519 never cheats.

---D. J. Bernstein


#!/bin/sh

export LD_LIBRARY_PATH="$HOME/lib"
export LIBRARY_PATH="$HOME/lib"
export CPATH="$HOME/include"
export PATH="$HOME/bin:$PATH"

( wget -m https://randombytes.cr.yp.to/librandombytes-latest-version.txt
  version=$(cat randombytes.cr.yp.to/librandombytes-latest-version.txt)
  wget -m https://randombytes.cr.yp.to/librandombytes-$version.tar.gz
  tar -xzf randombytes.cr.yp.to/librandombytes-$version.tar.gz
  cd librandombytes-$version
  ./configure --prefix=$HOME && make -j install
)
( wget -m https://cpucycles.cr.yp.to/libcpucycles-latest-version.txt
  version=$(cat cpucycles.cr.yp.to/libcpucycles-latest-version.txt)
  wget -m https://cpucycles.cr.yp.to/libcpucycles-$version.tar.gz
  tar -xzf cpucycles.cr.yp.to/libcpucycles-$version.tar.gz
  cd libcpucycles-$version
  ./configure --prefix=$HOME && make -j install
)
( wget -m https://lib25519.cr.yp.to/lib25519-latest-version.txt
  version=$(cat lib25519.cr.yp.to/lib25519-latest-version.txt)
  wget -m https://lib25519.cr.yp.to/lib25519-$version.tar.gz
  tar -xzf lib25519.cr.yp.to/lib25519-$version.tar.gz
  cd lib25519-$version
  ./use-s2n-bignum
  ./configure --prefix=$HOME && make -j install
)
( wget -m https://cr.yp.to/2024/20240314/openssl_x25519_lib25519.c
  wget -m https://cr.yp.to/2024/20240314/openssl_ed25519_lib25519.c
  wget -m https://cr.yp.to/2024/20240314/xtest
  wget -m https://cr.yp.to/2024/20240314/edtest
  cd cr.yp.to/2024/20240314
  sh xtest
  sh edtest
)

_______________________________________________
TLS mailing list -- tls@ietf.org
To unsubscribe send an email to tls-le...@ietf.org

Reply via email to