Hi Nikita,

few notes:

It looks like the patch messes the attempt of catching the problem (few
lines related to zend_hash_find_bucket changes) with optimization to
compensate collision check overhead (everything else). I think it's better
to separate these parts.

Introduction of zend_hash_add_or_return() in 7.0.1 is going to make forward
incompatibility with 7.0.0. However, we may reserve it for internal usage
removing ZEND_API.

I don't think PHP should prevent user from construction of arrays he likes,
because of internal problems (like in your example).

>  $s = 1 << 16; $a = [];
>  for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }

It makes performance problem, but it doesn't mean we should kill it.
We have "max_execution_time" to catch them all.

I think only big arrays coming from external sources should be checked.

Your solution is incomplete, anyway, because of crafting a single list with
10000 collisions, attacker will able to use N lists with 1000 collisions.

Thanks. Dmitry.


On Thu, Nov 26, 2015 at 8:24 PM, Nikita Popov <nikita....@gmail.com> wrote:

> Hi internals!
>
> This mail turned out to be rather long, so I'll start with a TL;DR:
>
> To fix the HashDos vulnerability for *all* cases (rather than just GET/POST
> parsing), I propose to introduce collision counting during hashtable
> insertion operations. This will throw a fatal error if the number of
> collisions during an insertion operation exceed a certain threshold.
>
> Implementation: https://github.com/php/php-src/pull/1565
>
> From my testing the change has no negative performance impact. The change
> is small and does not break ABI.
>
> Tracking bug (with various implementations):
> https://bugs.php.net/bug.php?id=70644
>
> What are your thoughts on this?
>
> Nikita
>
> ---------
>
> For those not familiar with HashDos or interested in some considerations
> regarding alternative implementations, the original mail:
>
> In PHP 5.3.9 a partial fix for the HashDos vulnerability was introduced in
> the form of max_input_vars. As a recap, HashDos exploits the fact that
> hashtables (which PHP uses ubiquitously) perform very badly if the hashes
> of many elements collide. In particular the performance of inserting N
> elements into a hashtable can degrade from O(N) to O(N^2) using carefully
> chosen keys.
>
> A very simple demo script for creating a hashtable with many collisions:
>
>     $s = 1 << 16; $a = [];
>     for ($i = 0; count($a) < $s; $i += $s) { $a[$i] = 0; }
>
> This code creates an array with about 65k elements and runs in 30s (PHP
> 5.6) or 4s (PHP 7) on my machine.
>
> This behavior can be exploited by an attacker whenever a server running PHP
> can be made to create a decent-sized array (or object) with
> attacker-controlled keys. This DOS vulnerability is efficient: A 700kb
> payload can easily take 5 CPU seconds to process on PHP 7 and from there it
> goes up quadratically (i.e. 1.4MB payload takes 20 seconds etc).
>
> The most obvious attack vector are GET and POST variables, which are
> automatically decoded into arrays. PHP 5.3.9 prevented this attack avenue
> by introducing max_input_vars, which prevents creating GET/POST arrays with
> more than a certain number of elements. As the HashDos attack relies on
> O(N^2) asymptotic runtime, limiting N to small numbers is an effective fix.
>
> However, GET/POST are not the only ways an attacker can trigger
> array/object creation with specific keys. For example many PHP applications
> have endpoints that accept JSON encoded payloads, which are not protected
> by max_input_vars. The only means of protection currently available to
> userland code is to never decode JSON payloads that exceed some
> conservative size limit like 50-100KB (often not practical).
>
> Apart from JSON, there will likely be various other application-specific
> situations where arrays are generated which contain user-provided keys.
> While we could add something similar to max_input_vars to json_decode and
> unserialize, this would not solve the problem for any custom code.
>
> There are essentially three ways to prevent HashDos attacks in general
> (rather than patching specific places):
>
> 1. If there are many collisions for a hash slot, switch collision
> resolution from linked lists to self-balancing binary trees.
> 2. Switch hashtables to a keyed cryptographic hash like SipHash.
> 3. If there are very many collisions for a hash slot, throw a fatal error.
>
> I will comment on these possibilities as they apply to PHP.
>
> 1. (Self-balancing binary trees). The idea here is that a balanced binary
> tree has worst-case insertion time O(log N), while the linked list we
> normally use has worst-case insertion time O(N). This means that the
> worst-case execution time for N insertions into a hashtable goes from
> O(N^2) to O(N log N), i.e. from catastrophic to "a few times slower than
> usual".
>
> I don't think this solution is viable for PHP. Supporting binary trees next
> to linked list collision resolution will make the hashtable implementation
> *a lot* more complicated -- and it isn't exactly simple as is.
>
> 2. (Randomized cryptographic hash). This approach doesn't change anything
> about how collisions are handled, instead it prevents the attacker from
> constructing such collisions in the first place.
>
> This is done by using a fast cryptographic hash function (typically
> SipHash), which is keyed with a (cryptographically strong) random key. As
> the attacker does not know the key, he cannot efficiently construct arrays
> with many collisions.
>
> There are a number of factors to take into account when trying to apply
> this to PHP:
>
> a) The SipHash key would have to be the same for all processes in an FPM
> pool. We can't use a per-process or per-request key due to arrays living in
> opcache SHM. As far as I know SipHash is designed to be secure even when
> the used key is long living, so this does not appear to be an issue.
>
> b) SipHash is slower (especially for very small strings) than the hash
> function we currently use (DJBX33A). As PHP 7 caches the hashes for
> strings, I do not believe this to be a significant problem.
>
> c) The real issue: Currently PHP does not hash integer keys, but they are
> just as susceptible to HashDos as string keys. To protect against this
> vector we'd have to start hashing integer keys as well. Especially when
> using a hash function as expensive as SipHash, this would make operations
> on arrays with (non-ascending or non-dense) integer keys significantly
> slower.
>
> Here is an implementation of using SipHash for both strings and integers:
> https://github.com/php/php-src/compare/master...nikic:integerHash
>
> Testing this against Wordpress showed only a small (but non-trivial)
> slowdown. This is probably attributable to the fact that most
> integer-indexed arrays are packed (not using a hash) and thus this change
> has less impact than it might have had on PHP 5.
>
> However, testing raw array operations, working on non-packed integer arrays
> can easily be three times slower using this implementation. I do not think
> this is acceptable. Maybe Wordpress doesn't use many unpacked integer
> arrays, but that does not have to hold generally and this difference is
> just too drastic.
>
> 3. (Fatal error on many collisions). While the two previous options try to
> ensure that hashtables stay efficient regardless of the used keys, the last
> option aims to simply detect malicious array keys and abort the script in
> such a case.
>
> This is done by counting the number of collisions during hashtable
> insertion operations and throwing a fatal error if this collisions count
> exceeds a certain threshold.
>
> Here is an implementation of this mechanism for PHP:
> https://github.com/php/php-src/pull/1565
>
> This is the approach I would recommend for PHP. The patch for this change
> is small and non-intrusive and does not break ABI. From my testing it has
> no adverse performance impact. (In microbenchmarks I found it to even be
> consistently faster, as it uses a more specialized implementation for a
> commonly used type of insertion operation.)
>
> While this is certainly less elegant than the alternatives, this would
> allow us to fix the security issue in a timely matter. Switching to
> SipHash, while maybe a possibility for the future, would still require a
> significant amount of work to be at all feasible.
>
> Nikita
> <https://github.com/php/php-src/pull/1565>
>

Reply via email to