Re: [PHP-DEV] State of Generics and Collections

Bob Weinand Tue, 20 Aug 2024 19:00:45 -0700

Hey Arnauld,

On 20.8.2024 15:01:28, Arnaud Le Blanc wrote:

Hi Bob,


On Tue, Aug 20, 2024 at 12:18 AM Bob Weinand<bobw...@hotmail.com> wrote:

The fluid Arrays section says "A PoC has been implemented, but the performance 
impact is still uncertain". Where may I find that PoC for my curiosity? I'm 
imagining the implementation of the array types as a counted collection of types of the 
entries. But without the PoC I may only guess.

I may publish the PoC at some point, but in the meantime here is a
short description of how it's implemented:

- The zend_array has a zend_type member representing the type of its elements
- Everytime we add or update a member, we union its type with the
array type. For simple types it's just a |= operation. For arrays with
a single class it's also simple. For complex types it's more expensive
currently, but it may be possible to cache transitions to make this
cheaper.
- Updating the array type on deletes requires to either maintain a
counter of every type, or to re-compute the type entirely everytime.
Both are probably too expensive. Instead, we don't update the type on
deletes, but we re-compute the type entirely when a type check fails.
This is based on two hypotheses: 1. A delete rarely changes an array's
type in practice, and 2. Type checks rarely fail


That sounds like a clever way to do it. I like this approach.

- References are treated as mixed, so adding a reference to an array
or taking a reference to an element changes its type to mixed. Passing
an array<mixed> to a more specific array<something> will cause a
re-compute, which also de-refs every reference.

Classifying a reference as mixed certainly makes this work and I guessit's probably an acceptable overhead. References into (big) arrays arenot that common. Short of doing a foreach by-ref, but that's anyway anO(n) operation generally.

- Updating a nested element requires updating the type of every parent

Does it actually? It just requires updating the type of the parent, ifthe own type is actually changed. But types of arrays don't change allthe time, so that's likely an amortized constant time operation withrespect to inserts/updates.

It also says "Another issue is that [...] typed properties may not be 
possible.". Why would that be the case? Essentially a typed property would just be a 
static array, which you describe in the section right below.

It becomes complicated when arrays contain references or nested
arrays. Type constraints must be propagated to nested arrays, but also
removed when an array is not reachable via a typed property anymore.

E.g.

class C {
     public array<array<int>> $prop;
}

$a = &$c->prop[0];
$a[] = 'string'; // must be an error
unset($c->prop[0]);
$a[] = 'string'; // must be accepted

In this case $a will decay from a RC=1 reference to a normal value.During the unreferencing operation the type restrictions can be dropped.That operation is only O(n) if it actually contains other references.


$b = &$c->prop[1];
$b[] = 'string'; // must be an error
$c->prop = [];
$a[] = 'string'; // must be accepted

I don't remember all the possible cases, but I didn't find a way to
support this that didn't involve recursively scanning an array at some
point. IIRC, without references it's less of an issue, so a possible
way forward would be to forbid references to members of typed
properties. Unfortunately this breaks pass-by-reference, e.g.
`sort($c->prop)`. out/inout parameters may be part of a solution, but
with more array separations than pass-by-ref.

Yes, you'll have to scan the array recursively, but only if it containsreferences (which you know thanks to array<mixed> orarray<array<mixed>>). And you also only need to descend into arrayswhich contain references.

If something contains a reference, you just slap a property type onto it- like "foreach entry in array { if entry is reference {add_type_source(inner type of entry) } }" - thus, in case ofarray<array<int>>, you slap array<int> onto it. This operation is onlyO(n) if the array type actually contains references (i.e. it willmismatch due to array<mixed>, and you have to iterate anyway).

So it will just work like references to property types do: these canalso never violate the type containing them. At least in my mind.

I'd also be happy to chat more about it off-list, but possibly easiertoo once the patch is public.

Best Regards,
Arnaud

Overall I would not focus too much on making the case "reference intoarray" too much of a blocker. It should work, but it's fine if it comeswith a couple rough edges regarding performance. I don't think arrayswhere you hold a reference into them are commonly passed around or big.

There are a few edge cases like state machines built with arrayreferences, but the solution to these is ... don't type the propertycontaining it then. And if it really becomes a problem, we may stillinvest time into it after landing it.



Thanks,

Bob

Re: [PHP-DEV] State of Generics and Collections

Reply via email to