Hi internals!

This mail turned out to be rather long, so I'll start with a TL;DR:

To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
parsing), I propose to introduce collision counting during hashtable
insertion operations. This will throw a fatal error if the number of
collisions during an insertion operation exceed a certain threshold.

Implementation: https://github.com/php/php-src/pull/1565

>From my testing the change has no negative performance impact. The change
is small and does not break ABI.

Tracking bug (with various implementations):
https://bugs.php.net/bug.php?id=70644

What are your thoughts on this?

Nikita

---------

For those not familiar with HashDos or interested in some considerations
regarding alternative implementations, the original mail:

In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
the form of max_input_vars. As a recap, HashDos exploits the fact that
hashtables (which PHP uses ubiquitously) perform very badly if the hashes
of many elements collide. In particular the performance of inserting N
elements into a hashtable can degrade from O(N) to O(N^2) using carefully
chosen keys.

A very simple demo script for creating a hashtable with many collisions:

    $s = 1 << 16; $a = [];
    for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }

This code creates an array with about 65k elements and runs in 30s (PHP
5.6) or 4s (PHP 7) on my machine.

This behavior can be exploited by an attacker whenever a server running PHP
can be made to create a decent-sized array (or object) with
attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
payload can easily take 5 CPU seconds to process on PHP 7 and from there it
goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).

The most obvious attack vector are GET and POST variables, which are
automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
by introducing max_input_vars, which prevents creating GET/POST arrays with
more than a certain number of elements. As the HashDos attack relies on
O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.

However, GET/POST are not the only ways an attacker can trigger
array/object creation with specific keys. For example many PHP applications
have endpoints that accept JSON encoded payloads, which are not protected
by max_input_vars. The only means of protection currently available to
userland code is to never decode JSON payloads that exceed some
conservative size limit like 50-100KB (often not practical).

Apart from JSON, there will likely be various other application-specific
situations where arrays are generated which contain user-provided keys.
While we could add something similar to max_input_vars to json_decode and
unserialize, this would not solve the problem for any custom code.

There are essentially three ways to prevent HashDos attacks in general
(rather than patching specific places):

1. If there are many collisions for a hash slot, switch collision
resolution from linked lists to self-balancing binary trees.
2. Switch hashtables to a keyed cryptographic hash like SipHash.
3. If there are very many collisions for a hash slot, throw a fatal error.

I will comment on these possibilities as they apply to PHP.

1. (Self-balancing binary trees). The idea here is that a balanced binary
tree has worst-case insertion time O(log N), while the linked list we
normally use has worst-case insertion time O(N). This means that the
worst-case execution time for N insertions into a hashtable goes from
O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
usual".

I don't think this solution is viable for PHP. Supporting binary trees next
to linked list collision resolution will make the hashtable implementation
*a lot* more complicated -- and it isn't exactly simple as is.

2. (Randomized cryptographic hash). This approach doesn't change anything
about how collisions are handled, instead it prevents the attacker from
constructing such collisions in the first place.

This is done by using a fast cryptographic hash function (typically
SipHash), which is keyed with a (cryptographically strong) random key. As
the attacker does not know the key, he cannot efficiently construct arrays
with many collisions.

There are a number of factors to take into account when trying to apply
this to PHP:

a) The SipHash key would have to be the same for all processes in an FPM
pool. We can't use a per-process or per-request key due to arrays living in
opcache SHM. As far as I know SipHash is designed to be secure even when
the used key is long living, so this does not appear to be an issue.

b) SipHash is slower (especially for very small strings) than the hash
function we currently use (DJBX33A). As PHP 7 caches the hashes for
strings, I do not believe this to be a significant problem.

c) The real issue: Currently PHP does not hash integer keys, but they are
just as susceptible to HashDos as string keys. To protect against this
vector we'd have to start hashing integer keys as well. Especially when
using a hash function as expensive as SipHash, this would make operations
on arrays with (non-ascending or non-dense) integer keys significantly
slower.

Here is an implementation of using SipHash for both strings and integers:
https://github.com/php/php-src/compare/master...nikic:integerHash

Testing this against Wordpress showed only a small (but non-trivial)
slowdown. This is probably attributable to the fact that most
integer-indexed arrays are packed (not using a hash) and thus this change
has less impact than it might have had on PHP 5.

However, testing raw array operations, working on non-packed integer arrays
can easily be three times slower using this implementation. I do not think
this is acceptable. Maybe Wordpress doesn't use many unpacked integer
arrays, but that does not have to hold generally and this difference is
just too drastic.

3. (Fatal error on many collisions). While the two previous options try to
ensure that hashtables stay efficient regardless of the used keys, the last
option aims to simply detect malicious array keys and abort the script in
such a case.

This is done by counting the number of collisions during hashtable
insertion operations and throwing a fatal error if this collisions count
exceeds a certain threshold.

Here is an implementation of this mechanism for PHP:
https://github.com/php/php-src/pull/1565

This is the approach I would recommend for PHP. The patch for this change
is small and non-intrusive and does not break ABI. From my testing it has
no adverse performance impact. (In microbenchmarks I found it to even be
consistently faster, as it uses a more specialized implementation for a
commonly used type of insertion operation.)

While this is certainly less elegant than the alternatives, this would
allow us to fix the security issue in a timely matter. Switching to
SipHash, while maybe a possibility for the future, would still require a
significant amount of work to be at all feasible.

Nikita
<https://github.com/php/php-src/pull/1565>

Reply via email to