Hi everyone,

A longstanding, documented issue in PHP is that casts between objects and arrays leave numeric keys inaccessible. Specifically, numeric string properties on objects become inaccessible string keys in arrays, and integer keys in arrays become inacessible integer properties on objects.

This is due to the Zend Engine's internal HashTable structure supporting both integer and string keys, but objects and arrays using it differently. Objects look for string keys, never integer keys, whereas arrays look for string keys only where a string is non-numeric, and integer keys in all other cases. This means that a naïve copy of a HashTable from an array to an object or vice-versa can result in having keys that cannot be accessed (or indeed deleted) from userland.

Fixing this issue could be done in one of two ways. The first would be to change how objects (or arrays) perform HashTable lookups internally. This would require extensive code changes, and could hurt performance for object property lookups (performing numeric string coercions is unhelpful given object property names are rarely numeric strings). Because of this, such an approach unlikely to be pursued.

The other approach is to make the object and array casting operations less naïve, that is, making them convert numeric string keys to integers when converting to an array, and convert integer keys to numeric strings when converting to an object. This is simple to implement as only the code for casting needs changing, and the conversion is not complex.

However, this has the danger of significantly reducing the performance of object to array and array to object casts, and this performance concern seems to have been one reason why fixing this issue has not happened so far.

This needn't be the case, though. Therefore, I have written a patch that would fix this issue, and have tried to design it to minimise the performance impact in the common case. You can find the pull request here, along with a link to the simple benchmark I have done: https://github.com/php/php-src/pull/2142

Specifically, it limits the performance impact by first checking whether an object or array contains any numeric string properties or integer keys, respectively. If it does not, it simply uses the normal, naïve copy operation (or performs no copy at all, if possible), which is very fast. On the other hand, if the object or array does contain numeric string properties or integer keys, it will take the slow path and perform the required conversions. This way, the common case (object to array and array to object conversions with only non-numeric properties/keys) remains fast, although slightly (between 7% and 21%, depending on the specific operation) slower than before.

This pull request also affects get_object_vars(), as it has the same issue with inaccessible keys.

I'd appreciate any feedback on this. Particularly, I wonder if it is too late to fix this in PHP 7.1, and whether it should go into PHP 7.2. In the former case, it is worth considering that this changes documented, if perhaps unintended, behaviour and is therefore arguably a backwards-compatibility break.

Also, does this need an RFC? It is a bug fix.

Thanks!

--
Andrea Faulds
https://ajf.me/

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to