On Sat, Jun 4, 2016 at 6:15 PM, Stanislav Malyshev <smalys...@gmail.com> wrote:
>
> Hi!
>
> > Let's begin discussing the prospect of adding libsodium as a core extension
> > in PHP 7.1. I've updated the RFC to explain why this would be a good idea
> > and the benefits it offers.
> >
> > https://wiki.php.net/rfc/libsodium
> >
> > If the subsequent discussion goes smoothly, I would like to open voting on
> > June 15.
>
> Some notes based on the docs in https://paragonie.com/book/pecl-libsodium:
> TLDR: I think the idea of having something like this in core is very
> nice, but I think it needs some tweaks in order to be a solution we can
> recommend as core module.
>
> - I would appreciate deeper namespacing. It is rarely that you would use
> all sub-modules (such as encryption, both symmetric and asymmetric,
> password hashing, decryption, random, etc.) in the same piece of code.
> So if I could just "use Sodium\Crypto;" and then use short names it
> would be much nicer than having to spell out the whole story.

This would entail a BC break against all software currently written
using libsodium.
Are you certain that deeper namespacing would be worth that trade-off?

> - Also, I'm not sure why functions like
> crypto_aead_chacha20poly1305_encrypt() exist. Shouldn't algorythm be a
> parameter to encryption function and not function name? Putting
> parameter into function names makes it much harder to be flexible and
> configurable. Or, if there's only one, then why not just call it
> encrypt() and state the fixed algorithm in the docs?

Libsodium offers few primitives. For AEAD, you have:

- ChaCha20-Poly1305
- AES-256-GCM (but only with modern hardware)
- ChaCha20-Poly1305 (IETF version with a bigger nonce)

These are distinct functions in the C API to discourage "Hey, I'll add
another algorithm and users can access it by changing the string
that's passed".

When all is said and done, crypto_aead_encrypt() and
crypto_aead_decrypt() will be the CAESAR winner.
https://competitions.cr.yp.to/caesar.html

> - The function names in general are kind of all over the place. I.e., to
> get a random, you get:
> - for string: _buf
> - for ranged (I assume unsigned?) integer: _uniform
> - for 16-bit unsigned integer: _random16
> If you can notice a pattern here, I can't. Also, I don't see why you
> have a function for 16-bit int but not for 8-bit or 32-bit. Additionally
> weird that _uniform gets int, but has limit different from PHP int.

To be honest, most of these aren't necessary in PHP anymore. They were
mostly useful in PHP 5.x before we had random_bytes() and
random_int().

> - Naming/namespacing of constant-time function should make it clear they
> are there because they are constant time, not just because somebody
> wanted to reimplement bin2hex. It's also unclear why we need memcmp as
> separate function from hash_equals().

Alternatively, if we can make bin2hex() and hex2bin() always
constant-time in PHP, we can eschew exposing these libsodium
functions. (Just make an alias for BC purposes.) I'd actually prefer
that.

> Also, I'm still not convinced bin2hex in PHP even has a timing problem.
> I haven't seen anything but vague generic theoretising in this regard.
> Same with hex2bin (in fact, even more since hex2bin doesn't even have
> index lookups).

These are the facts:

1. Index lookups based on the contents cryptographic secrets is a
recipe for microarchitecture side-channels (e.g. FLUSH+RELOAD).
2. The current implementation of all RFC 4648 encoding functions that
PHP has (which, strangely, doesn't include base32) violates rule 1.

https://cryptocoding.net/index.php/Coding_rules#Avoid_table_look-ups_indexed_by_secret_data

See how Halite stores keys for an example of "Yes, this does get used
for crypto secrets":
https://github.com/paragonie/halite/blob/6c026f7dc6a57ecd6c65e5944057acdefa8c9d67/src/KeyFactory.php#L604-L627

> - What exactly memzero does in PHP? Looks like it accepts argument
> by-ref, which means if you do something like:

It overwrites every byte in a string with NUL characters.

> function foo($secret_key) {
> // do stuff with key
> memzero($secret_key);
> }
>
> and then:
>
> $key = get_key(); foo($key);
>
> then $key still has the key. So I'm not sure how that all memory wiping
> is still working in practice. I mean, it may work with random keys that
> you generate, use once and immediately destroy, but in any other
> scenario it's just wasting time. Even with random keys it's iffy since
> it assumes your RNG state is either not in memory or is protected.

Right, it has to be used with care. But the existence of a
well-written cross-platform memory-zeroing function is almost reason
enough to vote in favor of libsodium.

See also: https://stackoverflow.com/a/29331937/2224584

> E.g. in the example at
> https://paragonie.com/book/pecl-libsodium/read/09-recipes.md#encrypted-cookies
> keys are meticulously erased, but since they are all deterministically
> derived from $this->key (and cookie name which is public) which stays in
> memory, what exactly is the point?

That's a good question:
http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html

> - increment() needs clearer explanation what it does (e.g. does it
> accept any string? What exactly does it do to that string?) and also it
> seems to be it would be nicer to just return the value. Or at least have
> this option - mutation in place is not always what you want.

It's used for incrementing a nonce for, e.g. CTR mode. In little-endian.

That's a design issue that should be relatively non-controversial if
you want to raise it:
https://github.com/jedisct1/libsodium-php

> - Why version functions are functions and not constants?
>
> - Names like "secretbox" are weird to me - what exactly is "secretbox" -
> is it encryption? Decryption? hashing? Is it symmetric or asymmetric? It
> would be nicer to use established and more clear terminology. Same with
> other names - e.g. crypto_auth - what it does? Does it check the message
> for authenticity or generates the signature for somebody else to check?
> Turns out the latter, but it's not clear from the name.

See http://nacl.cr.yp.to

> - Same with crypto_box() - actually it differs from crypto_secretbox()
> by... try to guess what. You would never guess. The former is
> asymmetric, the latter is symmetric.

The origins are academic. All you need to know is:

 - crypto_box means public key encryption
 - crypto_secretbox means secret key encryption

> - It's not clear why keypair parameter is string - if it's two keys, how
> it is one string? Also, why you need a keypair to encrypt-decrypt if
> you'd use only one key in each case (public or private)? What you do if
> you only have public key - how would you decrypt a message? Can you
> decrypt a message at all if you don't have secret key - and if so, how?
>
> - crypto_box_keypair_from_secretkey_and_publickey seems to use secret
> and public key from different people, so the result is not what is
> usually called a key pair (set of public+private keys belonging to one
> user) - in fact, I don't know what the result is. Could you explain?

A keypair is used in some APIs (crypto_box_seal_open comes to mind).
All that function does is concatenate the two.

It has nothing to do with key exchange.

> - After looking into crypto_sign_open I understands that it separates
> the message from attached signature and verifies the signature. But it's
> a bit confusing as it doesn't actually open anything.

crypto_sign doesn't do detached signatures. You take a signed message
and a public key and, if the signature is good, you got the unsigned
message.

I personally prefer crypto_sign_detached for that reason.

> - It is unclear why you need different functions to generate encryption
> keypair and signing keypair.

Different algorithms. One is Elliptic Curve Diffie Hellman, the other
is DSA over an Edwards Curve. You can transform keys from one to the
other, but using one in both places without conversion is something I
would encourage great caution.

> - The API never seems to either document or allow any choice of the
> algorithms.

Yes, as mentioned above, that is INTENTIONAL.

Libsodium is an opinionated cryptography library that doesn't give you
a lot of levers to pull -- only secure defaults you can't change,
consisting of carefully-selected cryptography primitives.

> While it can be a boon for simplicity, I do not see how such
> code could be made to interoperate with code in other languages and
> other systems.

That isn't its purpose. Its purpose is to be secure. (It also happens
to also be extremely fast.)

> I mean, I can generate the signature easily, but how
> would I check such signature in, say, Python or Java?

By using Libsodium. There are bindings for most popular programming
languages already.

> The docs give no clue. It's OK to support only one algorithm if it's
> considered superior (though what the users would do if it turns out
> not to be the case?) but at least it should be thoroughly documented.

https://download.libsodium.org/libsodium/content/public-key_cryptography/authenticated_encryption.html#algorithm-details

> - If crypto_pwhash_str highly recommends using provided constants, why
> not make these parameters optional with defaults being recommended
> constants?
>
> - crypto_pwhash_scryptsalsa208sha256 looks awfully specific (and long).
> Is that the only algorithm supported? If yes, do we need to spell it out
> in the name (and not the docs)? If not, why it's not a parameter?

That's the long-form of "scrypt". Personally, I'd say exclude it and
just use crypto_pwhash (Argon2i). No disrespect to Colin Percival.

> - Is there any support for importing/exporting keys in any of the
> established formats? Is the library inter-operable with any of the
> existing key/data interchange formats?

Keys are raw binary strings, unless you use a high level function.

> - Function names like crypto_sign_ed25519_sk_to_curve25519 seem to not
> be very usable to common people. The docs says "Transform crypto_sign
> key into crypto_box key" - maybe the name should reflect it?

These are auxiliary functions, not mainline functions. Most PHP
developers will only be using:

* crypto_box
* crypto_secretbox
* crypto_sign
* crypto_auth
* crypto_generichash
* crypto_pwhash

> Names like crypto_stream_xor have a similar problem - what it does is
> encrypts stream with a key, according to docs. Looks like it uses XOR
> for it, but is it the most important part for the user or the fact it
> encrypts? Maybe it should be crypto_stream_encrypt or something? Also,
> does it really do just XOR? Because XOR is not really the best idea to
> do even if you don't care about authenticity - due to trivial chosen
> plaintext attack. Probably it does more than mere XOR, but then I'm not
> sure what exactly.

This is how a stream cipher works:

1. You take your key and nonce, and generate a long string of random
bytes deterministically from your key and nonce.
2. To encrypt: You XOR every byte of plaintext with a corresponding
byte of the keystream.
3. To decrypt: You XOR every byte of _ciphertext_ with a corresponding
byte of the keystream.

This is unauthenticated encryption; don't use it unless you're
building something with it and authenticate elsewhere.

> - Why one would use crypto_scalarmult? That looks like a random piece of
> low level API without clear indication why it exists.

It's used to derive a shared secret from an X25519 secret key and an
X25519 public key.

> - What is the difference between crypto_stream and randombytes_buf? Both
> seem to produce a random string of given length. I assume crypto_stream
> does more but the docs do not explain.

crypto_stream is deterministic; randombytes_buf is non-deterministic

> - For crypto_kx, it is not clear what exactly this function does and why
> the same parameter is passed twice in different places. I mean, if you
> know the details of how DH key exchange works, it may make sense, but if
> we want to make it simple for the common user, it needs some better
> explanations and maybe better parameters.

This is basically a multi-step crypto_scalarmult in one complete pacakge.

> - Names like CRYPTO_PWHASH_SCRYPTSALSA208SHA256_OPSLIMIT_INTERACTIVE
> look very painful. Yes, people have IDEs and copy-paste, but imaging to
> have to type it just once over ssh connection in vim, where copy-paste
> is not working. Can't we have something more concise and put the rest in
> the docs?

That's a point worth considering. (Alternatively: Eschew scrypt.)

> This came out longer than I expected (hopefully somebody made this far?)
> and more disorganised, so I provide a quick summary:
> - Need better naming, both namespaces and functions
> - Need better docs explaining what the functions do and why

The docs are currently my responsibility; I apologize for not putting
more time into them.

> The stated goal is "You shouldn't need a Ph.D in Applied Cryptography to
> build a secure web application." I fully agree with this goal. I however
> feel that current implementation, while making admirable progress
> towards this goal, still needs some work to actually achieve it.
>
> --
> Stas Malyshev
> smalys...@gmail.com

I hope I've addressed your important questions adequately.

Scott Arciszewski
Chief Development Officer
Paragon Initiative Enterprises <https://paragonie.com>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to