On Mon, Mar 4, 2019 at 9:18 PM Marc <dev@mabe.berlin> wrote:

>
> On 04.03.19 10:08, Nikita Popov wrote:
> > On Mon, Mar 4, 2019 at 6:31 AM Marc <dev@mabe.berlin> wrote:
> >
> >> (sorry for top posting)
> >>
> >> Hi Nikita,
> >>
> >> as for the question about magic methods vs. interface.
> >>
> >> I have the feeling that it could be done with both in a clean way to
> >> make both parties happy.
> >>
> >>
> >> 1. Having __serialize/ __unserialize as magic methods - allowing to
> >> throw exceptions also to explicitly disallow serialization.
> >>
> >> 2. Having some kind of Serializable interface (I know this is already
> >> used) which expects both methods and also disallows throwing exceptions
> >> to force mark an object serializable. It's not possible to define such
> >> thing in the signature, but it can be documented as well as exceptions
> >> here could be catched internally and re-thrown as
> UnexpectedExceptionError.
>
> Any thoughts about this?
>
>
> >>
> >> Another question about static unserialize:
> >>
> >>   > Some people have expressed a desire to make |__unserialize()| a
> >> static method which creates and returns the unserialized object (rather
> >> than first constructing the object and then calling |__unserialize()| to
> >> initialize it).
> >>
> >>   > This would allow an even greater degree of control over the
> >> serialization mechanism, for example it would allow to return an already
> >> existing object from |__unserialize()|.
> >>
> >>   > However, allowing this would once again require immediately calling
> >> |__unserialize()| functions (interleaved with unserialization) to make
> >> the object available for backreferences, which would reintroduce some of
> >> the problems that |Serializable| suffers from. As such, this will not be
> >> supported.
> >>
> >> I would be very happy to also have a solution for this as I have a
> >> library which would benefit from it:
> >>
> >> (
> >>
> https://github.com/marc-mabe/php-enum/blob/v3.1.1/src/EnumSerializableTrait.php#L46-L72
> >> )
> >>
> >> Would it be possible to instead get the new instance as argument and
> >> expect the instance to be returned but on the same time allow to return
> >> a self instantiated instance?
> >>
> >> Something like this:
> >>
> >> public  static function  __unserialize(static $new, array  <
> >> http://www.php.net/array>  $data):  static  {
> >>       $new->prop_a = $data['prop_a'];
> >>       return $new;
> >> }
> >>
> >> // or ignoring the argument and instantiate a new one
> >>
> >> public  static function  __unserialize(static $new, array  <
> >> http://www.php.net/array>  $data):  static  {
> >>       $new =
> (ReflectionClass(__CLASS__))->newInstanceWithoutConstructor();
> >>       $new->prop_a = $data['prop_a'];
> >>       return $new;
> >> }
> >>
> >> The second variant sounds like a wast of resources but if it makes an
> >> edge-case possible it still better then just ignore it.
> >>
> > I would love to have __unserialize() return a new object, I just don't
> > think it's technically possible. Let me explain in more detail what the
> > problem here is:
> >
> > Serialization has a notion of object identity implemented through use of
> > backreferences. In serialize([$obj, $obj]) the first use of the object is
> > serialized as usual, while the second is a backreference to the first. To
> > resolve that backreference, we can either
> >
> > a) construct objects immediately during unserialization, so we can just
> > directly use an existing object when we encounter a backreference, or
> > b) try to keep track of all the places where backreferences are used, so
> we
> > can fill them out later.
> >
> > If __unserialize() is itself responsible for constructing the object,
> then
> > a) means that __unserialize() needs to be called in the middle of
> > unserialization. This is problematic, because the backreference mechanism
> > uses direct pointers into structures, which can easily become invalid if
> > the structures are modified inside the __unserialize() implementation.
> This
> > is why __wakeup() calls are delayed until the end of serialization.
> Variant
> > b) also runs into the same issue. Keeping pointers to all the places that
> > use backreferences works fine until the first __unserialize() call, which
> > might modify structures and invalidate those pointers.
> >
> > Beyond these issues related to implementation details, you have the more
> > general issue of circular references. If you have an object that has a
> > reference to itself, how do you unserialize that? You need the already
> > instantiated object to pass it to __unserialize(), but only
> __unserialize()
> > is going to give you the object! You could do something like pass the
> data
> > with the circular reference nulled out and then patch it after
> > __unserialize() runs, but as you don't really know what happens to the
> data
> > passed to __unserialize(), you wouldn't know what exactly needs to be
> > changed.
> >
> > As such, I think the only way we could have __unserialize() return a new
> > object is if there was also a way to opt-out from the preservation of
> > object identity.
> >
> > Regards,
> > Nikita
>
>
> Thank you for the very detailed explanation!
>
> Do I understand it correctly that it's about self references in the
> serialized object or is that self-reference an internal implementation
> detail of the serializing logic.
>
>
> In case about self references in the serialized object and without any
> knowledge about the internal implementation I can just assume and based
> in that I still thing it should be possible.
>
> -> see https://3v4l.org/AbQ2s
>
> If my assumptions are just stupid please ignore
>

If I understand right, your suggestion here is basically to make
__unserialize() responsible for dealing with self-references by itself: It
it wants to return a new object, then it needs to also take care of
replacing any potential self-references.

This somewhat works, but only if the references are really self-references.
You can also simply have multiple references to the same object without
there being an actual cycle, in which case __unserialize() will not have
access to those references.

An additional issue is that this basically requires the object to be fully
aware of it's contents: You can't have an unspecified $data property that
holds arbitrary values, because there might be a self-reference *somewhere*
in there, but you wouldn't know whether to find it or how to update it.

Nikita


> >
> > On 26.02.19 15:17, Nikita Popov wrote:
> >>> On Tue, Feb 19, 2019 at 11:42 AM Nicolas Grekas <
> >>> nicolas.grekas+...@gmail.com> wrote:
> >>>
> >>>> Le mar. 19 févr. 2019 à 11:31, Nikita Popov <nikita....@gmail.com> a
> >>>> écrit :
> >>>>
> >>>>> On Mon, Feb 18, 2019 at 4:12 PM Nicolas Grekas <
> >>>>> nicolas.grekas+...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Nikita,
> >>>>>>
> >>>>>> I'd like to propose a new custom object serialization mechanism
> >> intended
> >>>>>>> to replace the broken Serializable interface:
> >>>>>>>
> >>>>>>> https://wiki.php.net/rfc/custom_object_serialization
> >>>>>>>
> >>>>>>> This was already previously discussed in
> >>>>>>> https://externals.io/message/98834, this just brings it into RFC
> >> form.
> >>>>>>> The latest motivation for this is
> >> https://bugs.php.net/bug.php?id=77302,
> >>>>>>> a compatibility issue in 7.3 affecting Symfony, caused by
> >> Serializable. We
> >>>>>>> can't fix Serializable, but we can at least make sure that a
> working
> >>>>>>> alternative exists.
> >>>>>>>
> >>>>>> Is there anything we can to do to help the RFC move forward? I'm
> >>>>>> spending a great deal of time removing Serializable from the Symfony
> >> code
> >>>>>> base. It's not pretty nor fun... but we have no choice since as ppl
> >> move to
> >>>>>> PHP 7.3, they can now spot when the current mechanism is breaking
> >> their
> >>>>>> serialized payloads (typically from user objects put in the session
> >>>>>> storage).
> >>>>>>
> >>>>>> About interface vs magic methods, technicaly, we can work with an
> >>>>>> interface provided by PHP core, so that if that's a blocker for
> >> voters to
> >>>>>> approve the RFC, let's do it - the current situation is really
> aweful
> >> :).
> >>>>>> FYI, I found myself implementing some getState()/setState() methods
> >> of my
> >>>>>> own, decoupled from the underlying mechanisms used by PHP. That
> >> allows me
> >>>>>> to still use Serializable for BC reasons when needed, or  move to
> >> __sleep
> >>>>>> when possible, then move to the new 7.4 style when ready, without
> >> requiring
> >>>>>> any changes from the consumer POV. That's a good illustration of
> what
> >> I
> >>>>>> meant in my previous email: domain-specific interfaces in everyone's
> >> code
> >>>>>> is a better design as it allows better decoupling. It's also a
> better
> >>>>>> design to express that "instances of this interface of mine MUST be
> >>>>>> serializable". IMHO.
> >>>>>>
> >>>>>> Nicolas
> >>>>>>
> >>>>> I think for me the main open question here is whether we want to just
> >>>>> handle the serialization issue or try to come up with something more
> >>>>> general. If we limit this to just serialization, then the design
> should
> >>>>> stay as-is -- for all the reasons you have already outlined, using an
> >>>>> interface for this is inappropriate.
> >>>>>
> >>>>> For a more general mechanism, I think we would need something along
> the
> >>>>> lines of (ignoring naming):
> >>>>>
> >>>>> interface GetState {
> >>>>>       public function getState(string $purpose): array;
> >>>>> }
> >>>>> interface SetState {
> >>>>>       public function setState(array $data, string $purpose): void;
> >>>>> }
> >>>>>
> >>>>> We need separate interfaces for get/set to properly cater to cases
> like
> >>>>> JSON, where only get is meaningful (right now). We need the $purpose
> >>>>> argument to allow different state representations for different
> >> purposes,
> >>>>> e.g. JSON will often need more preparation than PHP serialization.
> The
> >>>>> $purpose argument is a string, to allow extension for custom
> purposes.
> >>>>>
> >>>>> Seeing that, this is really not something I would feel comfortable
> with
> >>>>> introducing in core PHP. Our track record for introducing reusable
> >>>>> general-purpose interfaces is not exactly stellar (say, did you hear
> >> about
> >>>>> SplObserver and SplSubject?) and I wouldn't want to do this without
> >> seeing
> >>>>> the concept fleshed out extensively in userland first.
> >>>>>
> >>>>> Nikita
> >>>>>
> >>>> I think the per-purpose representation logic doesn't belong to each
> >>>> classes.
> >>>> Symfony Serializer does it completly from the outside, and I think
> >> that's
> >>>> a much better design.
> >>>> I'm on the side this should not be in PHP code indeed.
> >>>>
> >>> As there hasn't been further input here, I'd like to go forward with
> the
> >>> RFC as-is and plan to start voting by Friday.
> >>>
> >>> Nikita
> >>>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Reply via email to