Hi
On 7/28/24 08:42, Rowan Tommins [IMSoP] wrote:
Why a SHA2 algorithm? Why not a SHA3 one? How about standalone functions for
both, and then when SHA4 comes along (as it inevitably will) another standalone
function for one of its variants?
You tell me. As I have repeatedly said, I don't actually know anything about these
algorithms. SHA-256 is the only one on the list which I've heard of, and I'm aware it's
newer than SHA-1. I don't know why SHA-512 isn't "better", I don't know why
nobody talks about SHA-3, and I don't know if one of the others in the list is absolutely
amazing and should be everyone's default forever.
As far as I can see, nobody, in this whole discussion, has actually stepped up
and explained what users should be using, once we have taught them that MD5 and
SHA-1 are bad.
Let me attempt to give an explanation. As of today users should use in
order of priority:
1. The hash function they need for interoperability: If a service
provides a SHA-1 checksum, then there is no choice and SHA-1 needs to be
used.
2. The hash function their security team requests them to use.
3. A function from the SHA-2 family, with SHA-256 being a good default
choice, because that's the secure default choice across the industry.
See also: https://news.ycombinator.com/item?id=14469614 and specifically
https://news.ycombinator.com/item?id=14469730 ("there are hash
cryptographers who think SHA-2 may never be broken").
To expand on (3):
- SHA-256 and SHA-224 are literally the same, except for the initial
values and the fact that SHA-224 returns fewer bits.
- SHA-512, SHA-384, SHA-512/224 and SHA-512/256 are literally the same,
except for the initial values and the fact that the latter 3 return
fewer bits.
- The main structure of SHA-512 and SHA-256 is the same, SHA-512 just
uses 64-bit operations and larger chunks. Wikipedia explains this in
detail: https://en.wikipedia.org/wiki/SHA-2#Pseudocode
- SHA-512 and its variants are faster than SHA-256 and its variants, the
reason is that SHA-256 is restricted to 32-bit operations. But: See below.
- The truncated variants are immune to so-called length-extension
attacks, but using a HMAC protects against that and thus is the
recommended usage.
As for the speed difference, I've created a (pending) PR to improve the
speed of SHA-256 2x to 5x (depending on the input length), by leveraging
the SHA-NI instruction set when available. When it's not available, the
SSE2 implementation improves the speed by 1.3x:
https://github.com/php/php-src/pull/15152
(Credit where credit is due: The implementation was written by Dr. Colin
Percival, I just did the PHP integration).
Or leave them them the 60-piece set (which includes flat-head and Phillips
screwdrivers, so they're not being taken away), and write some tips on how to
use it correctly.
So go ahead and write those tips. You don't need an RFC vote to improve the
documentation.
Here is my offer to those arguing in favour of this deprecation: If you show me
a draft of a comprehensive improvement to the manual to explain how users
should be choosing a hashing algorithm, I will consider changing my vote.
I am also happy to help with proofreading, and working out how to format it
into DocBook that fits nicely in the manual.
As long as the deprecation rests on "somebody in the next 10 years might get round
to improving the manual", my vote remains a firm No.
I'm seeing that you already found the issue discussing improvements to
the documentation, but for reference for readers following along:
https://github.com/php/doc-en/issues/3616
Please also see my previous email regarding the docs improvements I've
already made: The examples for the hash() functions should now all use
sha256 (matching the explanation above), please point out if I missed any.
Best regards
Tim Düsterhus
PS: I know that life can get in the way, but as it fits the topic of
your last paragraph I'd like to note that I don't believe you followed
up regarding the documentation feedback back when the PHP 8.3
deprecation RFC (https://externals.io/message/120422#120601) happened.