Please allow me to up vote for this proposal. I'm working on some serialization logic these days, and `Serializable` is totally broken: it breaks internal references in serialized data structures, and breaks custom serializers (e.g. igbinaryà from inspecting nested structures. The proposal here would just fix all these issues. +1000 from me for what it matters :)
2017-04-21 15:50 GMT+02:00 Nikita Popov <nikita....@gmail.com>: > On Fri, Apr 21, 2017 at 2:47 PM, Michał Brzuchalski < > michal.brzuchal...@gmail.com> wrote: > > > I know my voice is doesn't mean anything but IMHO interface with magic > > methods could bring more inconsistency. > > > > I know PHP is consistently inconsistent but I would prefer if it is > > posdible to fix an issue with present method naming. > > > > Cheers > > > Magic methods have a distinct backwards compatibility advantage. They allow > you to add __serialize/__unserialize to an existing class that currently > uses Serializable. Older PHP versions will then use the Serializable > implementation, while never versions will use __serialize/__unserialize. An > interface makes this a lot more complicated, because you either have to > bump your PHP version requirement (unlikely), or you have to provide a shim > interface for older PHP versions. This shim interface would then be part of > any library currently implementing Serializable, which seems sub-optimal to > me. That's why I think magic methods are better for this case (though I > don't strongly care). > > Also, to answer an OTR question: We cannot simply reuse the Serializable > interface by allowing an array return value from > Serializable::unserialize(). The array return value is only a means to an > end: the important part is that the new serialization mechanism does not > share serialization state -- using arrays instead of strings is just a > convenient way to achieve this. However, Serializable::unserialize() > currently shares the state and we cannot change this without breaking BC -- > so we cannot reuse this interface. > > Nikita > > > > > 21.04.2017 11:39 "Nikita Popov" <nikita....@gmail.com> napisał(a): > > > >> Hi internals, > >> > >> As you are surely aware, serialization in PHP is a big mess. Said mess > is > >> caused by some fundamental issues in the serialization format, and > >> exacerbated by the existence of the Serializable interface. Fixing the > >> serialization format is likely not possible at this point, but we can > >> replace Serializable with a better alternative and I'd like to start a > >> discussion on that. > >> > >> The problem is essentially that Serializable::serialize() is expected to > >> return a string, which is generally obtained by recursively calling > >> serialize() in the Serializable::serialize() implementation. This > >> serialize() call shares state information with the outer serialize(), to > >> ensure that two references to the same object (or the same reference) > will > >> continue referring to a single object/reference after serialization. > >> > >> This causes two big issues: > >> > >> First, the implementation is highly order-dependent. If > >> Serializable::serialize() contains multiple calls to serialize(), then > >> calls to unserialize() have to be repeated **in the same order** in > >> Serializable::unserialize(), otherwise unserialization may fail or be > >> corrupted. In particular this means that using parent::serialize() and > >> parent::unserialize() is unsafe. (See also > >> https://bugs.php.net/bug.php?id=66052 and linked bugs.) > >> > >> Second, the existence of Serializable introduces security issues that we > >> cannot fix. Allowing the execution of PHP code during unserialization is > >> unsafe, and even innocuous looking code is easily exploited. We have > >> recently mitigated __wakeup() based attacks by delaying __wakeup() calls > >> until the end of the unserialization. We cannot do the same for > >> Serializable::unserialize() calls, as their design strictly requires the > >> unserialization context to still be active during the call. Similarly, > >> Serializable prevents an up-front validation pass of the serialized > >> string, > >> as the format used for Serializable objects is user-defined. > >> > >> The delayed __wakeup() mitigation mentioned in the previous point also > >> interacts badly with Serializable, because we have to delay __wakeup() > >> calls to the end of the unserialization, which in particular also > implies > >> that Serializable::unserialize() sees objects prior to wakeup. (See also > >> https://bugs.php.net/bug.php?id=74436.) > >> > >> In the end, everything comes down to the fact that Serializable requires > >> nested serialization calls with context sharing. > >> > >> The alternative mechanism (__sleep + __wakeup) does not have these > issues > >> (anymore), but it is not sufficiently flexible for general use: Notably, > >> __sleep() allows you to limit which properties are serialized, but the > >> properties still have to actually exist on the object. > >> > >> I'd like to propose the addition of a new mechanism which essentially > >> works > >> the same way as Serializable, but uses arrays instead of strings and > does > >> not share context. I'm not sure about the naming (RealSerializable, > >> anyone?), so I'll just go with magic methods __serialize() and > >> __unserialize() for now: > >> > >> public function __serialize() : array; > >> public function __unserialize(array $data) : void; > >> > >> From a userland perspective the implementation should be the same as for > >> Serializable methods, but with interior serialize()/unserialize() calls > >> stripped out. Right now Serializable implementations already usually > work > >> by doing something like "return serialize([ ... ])", this would change > it > >> to just "return [ ... ]" and move the serialize()/unserialize() call > into > >> the engine, where we can perform it safely and robustly. > >> > >> The new methods should reuse the "O" serialization format, rather than > >> introducing a new one. This allows a measure of interoperability with > >> previous PHP versions, which can still decode serialized strings from > >> newer > >> versions using __wakeup(). > >> > >> If an object has both __wakeup() and __unserialize(), then > __unserialize() > >> should be called. If an object implements both > Serializable::unserialize() > >> and __unserialize(), then we should invoke one or the other based on > >> whether "C" or "O" serialization is used. > >> > >> Thoughts? > >> > >> Nikita > >> > > >