Hi internals! This mail turned out to be rather long, so I'll start with a TL;DR:
To fix the HashDos vulnerability for *all* cases (rather than just GET/POST parsing), I propose to introduce collision counting during hashtable insertion operations. This will throw a fatal error if the number of collisions during an insertion operation exceed a certain threshold. Implementation: https://github.com/php/php-src/pull/1565 >From my testing the change has no negative performance impact. The change is small and does not break ABI. Tracking bug (with various implementations): https://bugs.php.net/bug.php?id=70644 What are your thoughts on this? Nikita --------- For those not familiar with HashDos or interested in some considerations regarding alternative implementations, the original mail: In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in the form of max_input_vars. As a recap, HashDos exploits the fact that hashtables (which PHP uses ubiquitously) perform very badly if the hashes of many elements collide. In particular the performance of inserting N elements into a hashtable can degrade from O(N) to O(N^2) using carefully chosen keys. A very simple demo script for creating a hashtable with many collisions: $s = 1 << 16; $a = []; for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; } This code creates an array with about 65k elements and runs in 30s (PHP 5.6) or 4s (PHP 7) on my machine. This behavior can be exploited by an attacker whenever a server running PHP can be made to create a decent-sized array (or object) with attacker-controlled keys. This DOS vulnerability is efficient: A 700kb payload can easily take 5 CPU seconds to process on PHP 7 and from there it goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc). The most obvious attack vector are GET and POST variables, which are automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue by introducing max_input_vars, which prevents creating GET/POST arrays with more than a certain number of elements. As the HashDos attack relies on O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix. However, GET/POST are not the only ways an attacker can trigger array/object creation with specific keys. For example many PHP applications have endpoints that accept JSON encoded payloads, which are not protected by max_input_vars. The only means of protection currently available to userland code is to never decode JSON payloads that exceed some conservative size limit like 50-100KB (often not practical). Apart from JSON, there will likely be various other application-specific situations where arrays are generated which contain user-provided keys. While we could add something similar to max_input_vars to json_decode and unserialize, this would not solve the problem for any custom code. There are essentially three ways to prevent HashDos attacks in general (rather than patching specific places): 1. If there are many collisions for a hash slot, switch collision resolution from linked lists to self-balancing binary trees. 2. Switch hashtables to a keyed cryptographic hash like SipHash. 3. If there are very many collisions for a hash slot, throw a fatal error. I will comment on these possibilities as they apply to PHP. 1. (Self-balancing binary trees). The idea here is that a balanced binary tree has worst-case insertion time O(log N), while the linked list we normally use has worst-case insertion time O(N). This means that the worst-case execution time for N insertions into a hashtable goes from O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than usual". I don't think this solution is viable for PHP. Supporting binary trees next to linked list collision resolution will make the hashtable implementation *a lot* more complicated -- and it isn't exactly simple as is. 2. (Randomized cryptographic hash). This approach doesn't change anything about how collisions are handled, instead it prevents the attacker from constructing such collisions in the first place. This is done by using a fast cryptographic hash function (typically SipHash), which is keyed with a (cryptographically strong) random key. As the attacker does not know the key, he cannot efficiently construct arrays with many collisions. There are a number of factors to take into account when trying to apply this to PHP: a) The SipHash key would have to be the same for all processes in an FPM pool. We can't use a per-process or per-request key due to arrays living in opcache SHM. As far as I know SipHash is designed to be secure even when the used key is long living, so this does not appear to be an issue. b) SipHash is slower (especially for very small strings) than the hash function we currently use (DJBX33A). As PHP 7 caches the hashes for strings, I do not believe this to be a significant problem. c) The real issue: Currently PHP does not hash integer keys, but they are just as susceptible to HashDos as string keys. To protect against this vector we'd have to start hashing integer keys as well. Especially when using a hash function as expensive as SipHash, this would make operations on arrays with (non-ascending or non-dense) integer keys significantly slower. Here is an implementation of using SipHash for both strings and integers: https://github.com/php/php-src/compare/master...nikic:integerHash Testing this against Wordpress showed only a small (but non-trivial) slowdown. This is probably attributable to the fact that most integer-indexed arrays are packed (not using a hash) and thus this change has less impact than it might have had on PHP 5. However, testing raw array operations, working on non-packed integer arrays can easily be three times slower using this implementation. I do not think this is acceptable. Maybe Wordpress doesn't use many unpacked integer arrays, but that does not have to hold generally and this difference is just too drastic. 3. (Fatal error on many collisions). While the two previous options try to ensure that hashtables stay efficient regardless of the used keys, the last option aims to simply detect malicious array keys and abort the script in such a case. This is done by counting the number of collisions during hashtable insertion operations and throwing a fatal error if this collisions count exceeds a certain threshold. Here is an implementation of this mechanism for PHP: https://github.com/php/php-src/pull/1565 This is the approach I would recommend for PHP. The patch for this change is small and non-intrusive and does not break ABI. From my testing it has no adverse performance impact. (In microbenchmarks I found it to even be consistently faster, as it uses a more specialized implementation for a commonly used type of insertion operation.) While this is certainly less elegant than the alternatives, this would allow us to fix the security issue in a timely matter. Switching to SipHash, while maybe a possibility for the future, would still require a significant amount of work to be at all feasible. Nikita <https://github.com/php/php-src/pull/1565>