On Sun, Jul 8, 2018 at 10:42 AM, Nicolas Grekas <
nicolas.grekas+...@gmail.com> wrote:

> Hi Nikita,
>
>
> Before talking about solutions, can the people who need this first outline
>> what functionality is needed and what it is needed for (and maybe what
>> workarounds you currently use). E.g. do you only need to know whether
>> something is a reference, or do you need to know whether two somethings
>> are
>> part of the same reference, etc. There are probably multiple use cases for
>> this with different needs.
>>
>
> We're using reference introspection to do both: we need to know when a
> zval is a reference, and we also need to track each of them separately.
>
> The use case is being able to intropect any arbitrary PHP datastructure,
> with one main application: providing an enhanced "dump()" function.
>
> See e.g. this screenshot for what we get using the dump() function
> provided by Symfony VarDumper component:
> https://symfony.com/doc/current/_images/07-hard-ref.png
>
> In PHP5 days, Julien Pauli wrote a PHP extension to do zval introspection.
> Here is the code + README (see test case 001.phpt for example with
> references.):
> https://github.com/symfony/symfony/tree/3.4/src/Symfony/Comp
> onent/Debug/Resources/ext
>
> With PHP7, using pure PHP introspection is easier to maintain and still
> very fast so we deprecated the extension.
> Here is the code doing reference introspection:
> https://github.com/symfony/symfony/blob/master/src/Symfony/C
> omponent/VarDumper/Cloner/VarCloner.php#L83
>
> it might not be easy to follow, but the basic blocks are:
>
> $array2 = $array1;
> $array2[$key] = $unique_cookie;
> if ($array1[$key] === $unique_cookie) => we found a reference
> then we also maintain a registry of $unique_cookie so that we know if we
> already saw that reference or not (the check is done before the above "if"
> or course.)
>

Thanks for the explanation. I think that the VarCloner use case needs two
bits of functionality:

1. Detecting whether a variable is a reference, so you can handle this
specially.
2. An efficient way of determining whether a variable is part of a
reference that has already been seen (and which one).

The second requirement is stronger than just the ability to detect whether
two variables are part of the same reference. Given just a same_ref($v1,
$v2) function, one would have to check against a list of all previously
seen references one at a time, rather than only performing a hashtable
lookup.

Currently this functionality is implemented as:

1. Copying the array, assigning a cookie to the copy and seeing if the
original array is modified. With an extra catch for TypeErrors, this is
compatible with typed properties.
2. Replacing the reference with a Stub object, which can be looked up by
object id. At the end the Stub objects are replaced with their values
again. This is fundamentally incompatible with typed properties, as the
type will likely not permit the Stub class.

Here are my thoughts on possible APIs for this use case.

Construction of reference-reflection objects
-----

An issue already discussed in the other threads is that in PHP we need to
specify whether a parameter is accepted by reference, by value or by
preferred-reference. We don't have the possibility of accepting either a
value or reference, whatever we get. This leaves us with a few options:

1. Introducing a VM-level primitive that is not subject to this limitation.
The typed properties thread suggested a reflect_variable() language
construct. I'm not too fond of this option because reference reflection
seems like an awfully specific thing to introduce a new language construct
for.

2. A ReflectionReference::fromVariable(&$var) constructor. Contrary to what
was said in the other thread, this does not cause issues with the
copy-on-write mechanism. Since PHP 7 references and non-references can
share values (including immutablized values in SHM). However, this approach
does have two issues:
a) It is impossible to distinguish whether $var was a singleton reference
or a value beforehand. Both will show up as rc=2 references inside
ReflectionReference::fromVariable(). (This may also be an advantage,
because from a language-design perspective, we treat singleton references
as non-references.)
b) In case the original $var was a variable, it will now be a reference, so
this has a side-effect.

3. A ReflectionReference::fromArrayElem(array $array, string|int $key)
constructor, as suggested by Nicolas. This avoids the reference/value
problem and solves the specific VarCloner case efficiently and directly. On
the other hand, introspection of references inside non-arrays requires some
workarounds (e.g, casting objects to arrays).

4. A combination of these. For example we could have...
... ReflectionReference::fromArrayElem(array $array, string|int $key) for
array items.
... ReflectionReference::fromObjectProp(object $object, string $key) for
object properties.
... ReflectionReference::fromVariable(&$var) for any other special cases.
This would allow to cover the common and interesting cases with specialized
methods, and leave a less efficient fallback for the general case. This is
probably the option I'd favor.

Determining whether something is a reference
-----

I think the best way to handle this (and the reason why I used named
constructors above) is to return null if the value is not a reference. This
should be the most common case and it would be best to avoid the overhead
of constructing an unnecessary object in this case.

One important question in this context would be whether we consider
singleton references as references or not. If we do, then the
ReflectionReference::fromVariable() constructor will always return a
non-null value, as the variables will be turned into a singleton reference
if it was a reference. If we consider them as references, we'll also want
an API method to distinguish them. E.g. a specialized isSingleton() or more
generally getNumUsers() == 1.

The alternative would be to always construct a ReflectionReference object
which may or may not be a reference and has an isReference() method. I
don't see any advantages to that approach though.

Reference equality
-----

A couple of approaches:

1. Have an isEqual(ReflectionReference $other): bool method, which
determined whether two references are the same. The disadvantage is that
this only allows pair-wise comparisons, so it does not fully solve the
VarCloner use-case.

2. Make ReflectionReference constructor uniquing. That is, if a
ReflectionReference for a certain reference already exists, then the
constructor will return the same object. This means that references can be
compared by identity $ref1 === $ref2. It also means that they can be used
in hashtables via spl_object_id(). (Caveat: It's important to keep the
ReflectionReference object alive for the during in which spl_object_id() is
used, as usual.)

3. Some variation on 2 via a separate API. That is don't unique
ReflectionReferences themselves, but provide a separate getId() API. The
returned ID would only be meaningful as long as at least one
ReflectionReference object for the reference is live, otherwise it may be
reused.

Actual API
-----

If we go with null return value on non-reference and uniquing, then most of
the functionality is already provided by the constructor. The only useful
API method I can think of is something like getNumUsers().

So, my overall suggestion would be the following API:

class ReflectionReference {
    // Constructors return null if not a reference, object is uniqued
    static function fromArrayElem(array $array, string|int $key):
?ReflectionReference;
    static function fromObjectProp(object $object, string $key):
?ReflectionReference;
    static function fromVariable(&$var): ReflectionReference;

    // Basically the reference count. Would subtract 1 for the
    // fromVariable() constructor, to make the values consistent.
    function getNumUsers(): int;
}

Thoughts?

Nikita

Reply via email to