Re: [PHP-DEV] [RFC] Lazy Objects

Arnaud Le Blanc Thu, 27 Jun 2024 07:29:26 -0700

Hi Marco,

Thank you for the very detailed feedback. Please find my answers below.
Nicolas will follow-up on some points.

On Mon, Jun 24, 2024 at 1:03 AM Marco Pivetta <ocram...@gmail.com> wrote:

>  * lazy proxies should probably explore replacing object identity further
>

We have thought about a few approaches for that, but none is really
concluding:

Fusioning the identity of the proxy and the real instance, such that both
objects are considered to have the same identity with regard to
spl_object_id(), SplObjectStorage, WeakMap, and strict equality after
initialization will lead to weird effects.

For example, we don't know what the behavior of this should be:

```
$reflector = new ReflectionClass(MyClass::class);

$realInstance = new MyClass();
$proxy = $reflector->newLazyProxy(function () use ($realInstance) {
    return $realInstance;
});

$store = new SplObjectStorage();
$store[$realInstance] = 1;
$store[$proxy] = 2;

$reflector->initialize($proxy);

var_dump(count($store)); // 1 or 2 ?
 ```

A second approach, suggested by Benjamin, would be that properties of the
instance returned by the factory are copied to the proxy, the proxy marked
as initialized (so it's not a proxy anymore), and the instance returned by
the factory discarded. One problem of this approach is that the instance
may continue to exist independently of the proxy if it's referenced
somewhere. So, proper usage of the lazy proxy API would require the factory
to be aware of that, and to return an object that is not referenced
anywhere. As we implemented the proxy strategy primarily for use-cases
where we don't control the factory, this approach would not work.

A third approach would be to replace all references to the proxy by
references to the backing instance after initialization. Implementing this
approach would be prohibitively slow as it requires to scan the entire
object graph of the process.

None of these approaches are concluding unfortunately. Having two distinct
identities for the proxy and the real instance doesn't appear to be an
issue in practice, however (e.g. in Symfony).

 * don't like expanding `ReflectionClass` further: `LazyGhost` and
> `LazyProxy` classes (or such) instead
>

The rationale for expanding ReflectionClass and ReflectionProperty is that
code creating lazy objects tend to also use these two classes, for
introspection or to initialize the object. Also, we feel that the new
methods are directly related to existing methods. E.g. newLazyGhost() is
just another variant of newInstance() and newInstanceWithoutConstructor(),
and setRawValueWithoutLazyInitialization() is just another variant of
setValue() and setRawValue() (added in the hooks RFC).

* `initialize()` shouldn't have a boolean behavioral switch parameter: make
> 2 methods
>

Agreed. We have updated the RFC to split the method into `initalize()` and
`markAsInitialized()`.

>  * flags should be a `list<SomeEnumAroundProxies>` instead. A bitmask for
> a new API feels unsafe and anachronistic, given the tiny performance hit.
>

Unfortunately this leads to a 30% slowdown in newLazyGhost() when switching
to an array of enums, in a micro benchmark. I'm not sure how this would
impact a real application, but given this is a performance critical
feature, the slowdown is an issue. Besides that, no existing API in core is
using an array of enums, and we don't want to introduce this concept in
this RFC.

> From an abstraction point of view, lazy objects from this RFC are
> indistinguishable from non-lazy ones

* do the following aspects always apply? I understand they don't for lazy
> proxies, just for ghosts?
>    * `spl_object_id($object) === spl_object_id($proxy)`?
>    * `get_class($object) === get_class($proxy)`?
>

Good catch, this phrase is not true for proxies, with relation to the
identity of a proxy and its real instance. Apart from that, the main point
of this phrase remains: interaction with a proxy has the same behavior as
interaction with a real instance, and they can be used without knowing they
are lazy.

> Proxies: The initializer returns a new instance, and interactions with
> the proxy object are forwarded to this instance
> * another note on naming: I used "value holder" inside ProxyManager,
> because using just "proxy" as a name led to a lot of confusion. This also
> applies to me trying to distinguish proxy types inside the RFC and this
> discussion.
>

The name "Value Holder" appears to be taken for another pattern already:
https://martinfowler.com/eaaCatalog/lazyLoad.html
The exact name of the pattern used in this RFC is the "Virtual
State-Proxy", but others have suggested using just "Proxy" in the RFC. In
the API we use "LazyProxy".

> Internal objects are not supported because their state is usually not
> managed via regular properties.
>
> * sad, but understandable limitation
> * what happens if a class `extends` an internal engine class, like the
> gruesome `ArrayObject`?
>

This also applies to sub-classes of internal objects. This is specified
under ReflectionClass::newLazyGhost(), but I've clarified that here too.

>
> * Given the recent improvements around closures and the `...` syntax (
> https://wiki.php.net/rfc/first_class_callable_syntax), is it worth having
> `Closure` only as argument type?
> * should we declare generic types in the RFC/docs, even if just in the
> stubs?
>     * they would serve as massive documentation improvement for Psalm and
> PHPStan
>     * it would be helpful to document `$initializer` and `$factory` as
> `:void` or `:object` functions
>         * can the engine check that, perhaps? I have no idea if `Closure`
> can provide such information on return types, inside the engine.
>

Could you say more about the benefits of type hinting as Closure instead of
callable? Is this purely to be able to type check the return type earlier?
We may be able to achieve this while retaining the callable type hint, when
a Closure is given, but this would be unusual. Maybe this is something we
can push in a different RFC?

>
> * what happens if `ReflectionClass#reset*()` methods are used on a
> different class instance?
>     * considered?
>

The object must be an instance of the class represented by ReflectionClass
(including of a sub-class)

>
> ```php
> $reflector->getProperty('id')->skipLazyInitialization($post);
> ```
>
> * perfect for partial objects / understanding why this was implemented
>     * would it make sense to have an API to set "bulk" values in an object
> this way, instead of having to do this for each property manually?
>         * avoids instantiating reflection properties / marking individual
> properties manually
>         * perhaps future scope?
>            * thinking `(new
> GhostObject($class))->initializePropertiesTo(['foo' => 'bar', 'baz' =>
> 'tab'])`
>

This is something we thought about but it's not as simple as it seems: we
have to support setting private properties of parent classes, so the
argument has to represent that. A format that would be able to represent
that is as follows:
- The argument is a map of class names to properties
- The properties are a map of property name to property value. To support
skipping a property (and setting it to its default value), numeric keys (no
key specified) denote that the value is a property name to skip.

Example:

class ParentOfA {
    private $c;
    private $d = 1;
}
class A extends ParentOfA {
    private $a;
    private $b;
}

->initializePropertiesTo([
    A::class => ['a' => 'value-a', 'b' => 'value-b'],
    ParentOfA::class => ['c' => 'value-c', 'd'], // 'd' is init to its
default value
])

An alternative is to use the same binary format as
get_mangled_object_vars().

WDYT?

> Initialization Triggers
>
> * really happy to see all these edge cases being considered here!
>     * how much of this new API has been tried against the test suite of
> (for example) `ocramius/proxy-manager`?
>         * mostly asking because there's tons of edge cases noted in there
>

Nicolas has tested the implementation extensively on the VarExporter and
DIC components, and on Doctrine. We would be happy if you could check with
the ocramius/proxy-manager test suite.

>
> > Cloning, unless `__clone()` is implemented and accesses a property.
>
> * how is the initializer of a cloned proxy used?
>     * Is the initializer cloned too?
>

The initializer is not cloned. I've clarified this in the RFC. The
Initialization Sequence section has an example of how an initializer can
detect being called for a clone of the origin lazy object, if necessary:

$init = function ($object) use (&$originalObject) {
    if ($object !== $originalObject) {
        // we are initializing a clone
    }};$originalObject = $reflector->newLazyProxy($init);

>     * what about the object that a lazy proxy forwards state access to? Is
> it cloned too?
>

Cloning an initialized proxy is the same as cloning the real instance (this
clones the real instance and returns it).

> > The following special cases do not trigger initialization of a lazy
> object:
>
> * Will accessing a property via a debugger (such as XDebug) trigger
> initialization here?
>     * asking because debugging proxy initialization often led to problems,
> in the past
>        * sometimes even IDEs crashing, or segfaults
>

Good point, I will check this. The implementation is biased towards
initialization, so accessing a lazy object will usually initialize it.
However the internal APIs used by var_dump and get_mangled_object_vars() do
not trigger initialization, so if a debugger uses that, it shouldn't
trigger initialization.

>
> * this wording is a bit confusing:
>
> > Proxy Objects
> > The actual instance is set to the return value.
>

> * considering the following paragraph:
>
> > The proxy object is _not_ replaced or substituted for the actual
> instance.
>
>
Indeed. "actual instance" is supposed to designate the instance that the
proxy forwards to, but I see why it's confusing. I've renamed "actual
instance" to "real instance". WDYT?

>
> > After initialization, property accesses on the proxy are forwarded to
> the actual instance.
> > Observing properties of the proxy has the same result as observing
> properties of the actual instance.
>
> * This is some sort of "quantum locking" of both objects?
> * How hard is it to break this linkage?
>     * Can properties be `unset()`, for example?
>     * what happens to dynamic properties?
>         * I don't use them myself, and I discourage their usage, but it
> would be OK to just document the expected behavior
>

The linkage can not be broken. unset() and dynamic properties are not
special, in that these are just property accesses. All property accesses on
the proxy are forwarded to the real instance.

> > If the initializer throws, the object properties are reverted to their
> pre-initialization state and the object is
> > marked as lazy again.
>
> * this is some sort of "transactional" behavior
>     * welcome API, but is it worth having this complexity?
>         * is there a performance tradeoff?
>         * is a copy of the original state kept during initializer calls?
>         * OK with it myself, just probing design considerations
>

I believe it's worth having this, as it prevents leaving an object in a
corrupt state, which could then be accessed later, in case of temporary
initializer failure.

The performance overhead and complexity are small. A shallow copy of the
original state is kept during initialization. As the copy is shallow this
is just a few pointer copies and refcount increases.

> * the example uses `setRawValueWithoutLazyInitialization()`, and
> initialization then accesses `public` properties
>     * shouldn't a property that now **has** a value not trigger
> initialization anymore?
>         * or does that require
> `ReflectionProperty#skipLazyInitialization()` calls, for that to work?
>

setRawValueWithoutLazyInitialization() has the same effect as
skipLazyInitialization(), in addition to setting the specified value. I've
clarified this in the RFC.

>
> > `ReflectionClass::SKIP_INITIALIZATION_ON_SERIALIZE`: By default,
> serializing a lazy object triggers its initialization
> > This flag disables that behavior, allowing lazy objects to be serialized
> as empty objects.
>
> * how would one deserialize an empty object into a proxy again?
>     * would this understanding be deferred to the (de-)serializer of
> choice?
>     * exercise for userland?
>

Yes this is left as exercise to userland when this flag is used.

>
> > `ReflectionClass::newLazyProxy()`
> > The factory should return a new object: the actual instance.
>
> * what happens if the user mis-implements the factory as `function (object
> $proxy): object { return $proxy; }`?
>     * this is obviously a mistake on their end, but is it somehow
> preventable?
>

Returning a lazy object (including an initialized proxy) is not allowed and
will throw. I've clarified this in the RFC (the RFC specified that
returning a lazy object was not allowed, but whether this included
initialized proxies was not clear).

>
> > `ReflectionClass::resetAsLazyProxy()`
> > The proxy and the actual instance are distinct objects, with distinct
> identities.
>
> * When creating a lazy proxy, all property accesses are forwarded to a new
> instance
>     * are all property accesses re-bound to the new instance?
>     * are there any leftovers pointing to the old instance anywhere?
>         * thinking dynamic properties and similar
>
>
Yes all property accesses are forwarded to the new instance after that.
There can not be any leftovers to the old instance anywhere, including in
dynamic properties (the resetAsLazy*() methods reset the object entirely).

>
> > ReflectionProperty::setRawValueWithoutLazyInitialization()
> > The method does not call hooks, if any, when setting the property value.
>
> * So far, it has been possible to `unset($object->property)` to force
> `__get` and `__set` to be called
>     * will `setRawValueWithoutLazyInitialization` skip also this "unset
> properties" behavior that is possible in userland?
>         * this is fine, just a documentation detail to note
>         * if it is like that, is it worth renaming the method
> `setValueWithoutCallingHooks` or such?
>             * not important, just noting this opportunity
>
>
Yes, setRawValueWithoutLazyInitialization() skips magic methods and hooks.

The "setRawValue" part of the method name was borrowed from the
ReflectionProperty::setRawValue() method introduced by the hooks RFC, which
is an equivalent to setValue() but doesn't call hooks.

>
> > Destructors
> > The destructor of proxy objects is never called. We rely on the
> destructor of the proxied instance instead.
>
> * raising an edge case here: `spl_object_*` and object identity checks may
> be used inside a destructor
>     * for example, a DB connection de-registering itself from a connection
> pool somewhere, such as `$pool->deRegister($this)`
>         * the connection pool may have the `spl_object_id()` of the proxy,
> not the real instance
>     * this is not a blocker, just an edge case that may require
> documentation
>         * it reconnects with "can we replace the object in-place?"
> question above: replacing objects worth exploring
>

This is an interesting case. This may be an issue if the proxy itself was
registered in the pool. In case the initializer or the constructor
registers the connection, then the real instance will have been
registered.

Best Regards,
Arnaud

>

Re: [PHP-DEV] [RFC] Lazy Objects

Reply via email to