On 30 Jul 2014, at 07:50, Tjerk Meesters <tjerk.meest...@gmail.com> wrote:

>> That would make sense, but doesn't solve all edge cases as your maximum array
>> index is still more than 2 times the largest positive integer on 32-bit.
> 
> Is that by design, a bug or something else entirely? Could you explain this 
> edge case with some code?

On a 32-bit platform, the maximum signed long is 0x7FFFFFFF, but the maximum 
unsigned long is 0xFFFFFFFF, slightly more than twice as big.

For example, this does what you’d expect on my machine (OS X 64-bit Intel Core 
i5):

andreas-air:~ ajf$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2; var_dump($x);'
array(2) {
  [4294967295]=>
  int(1)
  [4294967296]=>
  int(2)
}

On my 32-bit Ubuntu VM (which I use precisely to test this kind of issue when 
working on bigints), however, it wraps around:

ajf@andrea-VirtualBox:~$ php -r '$x = [0xFFFFFFFF => 1]; $x[] = 2; 
var_dump($x);'
array(2) {
  [-1]=>
  int(1)
  [0]=>
  int(2)
}

I think we should probably use an unsigned long internally, but prevent 
negative values.

> Forbidding negative indices is a bit harsh and imho quite unnecessary;

Actually, I missed the bit of your email suggesting treating them as strings 
the first time I read it. I’d be fine with that.

> turning “out of range” indices into strings should work just fine afaict. Is 
> there a reason why it shouldn’t?

Well… there is one issue. Basically, some array functions treat integer and 
string keys completely differently. 

> A compromise could be to allow string keys that would otherwise have 
> converted into a negative integer, but disallow negative int/float explicitly.

It’d be a complete BC break, but we could make negative indices work like they 
do in Python and grab the (length + index)th item (i.e. -1 returns item 4 in a 
list of 5, -2 returns item 3, and so on). However, because our arrays are weird 
semi-indexed semi-hashmap things, this probably isn’t good, as it’d prevent you 
from using strings like “-1” as keys. Alas, I can dream.

To actually respond to your suggestion, I don’t like the idea of blocking -1 
but allowing “-1”. In PHP, numeric strings, integers and floats are supposed to 
be equivalent, and I’m already unhappy that large integer indexes and large 
numeric string indexes work differently. Whatever we do, I’d like PHP 7’s 
arrays to treat integer, float and numeric string indexes consistently.


Thinking about it a little more, if we use a long for indexes, we don’t even 
need to make them strings. It would fit the principle of least astonishment IMO 
if any valid PHP int is a valid index and won’t be a string. I was going to say 
that negative indexes don’t work right internally, but then I realised they 
could work fine for indexing into the buckets if we just cast them to unsigned 
longs internally (hence getting the 2’s complement representation on modern 
CPUs) for indexing and hashing, but only expose signed longs to the outside 
world, including through the API.

So in summary, I think we should use signed longs for indexes (or at least 
whatever type PHP’s basic int is), and anything outside of the range of one 
should be treated as a string. This would make numeric strings and ints 
consistent, would solve all the weird overflow issues, and is the most 
intuitive approach IMO.

--
Andrea Faulds
http://ajf.me/





--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to