Hi internals, As you are surely aware, serialization in PHP is a big mess. Said mess is caused by some fundamental issues in the serialization format, and exacerbated by the existence of the Serializable interface. Fixing the serialization format is likely not possible at this point, but we can replace Serializable with a better alternative and I'd like to start a discussion on that.
The problem is essentially that Serializable::serialize() is expected to return a string, which is generally obtained by recursively calling serialize() in the Serializable::serialize() implementation. This serialize() call shares state information with the outer serialize(), to ensure that two references to the same object (or the same reference) will continue referring to a single object/reference after serialization. This causes two big issues: First, the implementation is highly order-dependent. If Serializable::serialize() contains multiple calls to serialize(), then calls to unserialize() have to be repeated **in the same order** in Serializable::unserialize(), otherwise unserialization may fail or be corrupted. In particular this means that using parent::serialize() and parent::unserialize() is unsafe. (See also https://bugs.php.net/bug.php?id=66052 and linked bugs.) Second, the existence of Serializable introduces security issues that we cannot fix. Allowing the execution of PHP code during unserialization is unsafe, and even innocuous looking code is easily exploited. We have recently mitigated __wakeup() based attacks by delaying __wakeup() calls until the end of the unserialization. We cannot do the same for Serializable::unserialize() calls, as their design strictly requires the unserialization context to still be active during the call. Similarly, Serializable prevents an up-front validation pass of the serialized string, as the format used for Serializable objects is user-defined. The delayed __wakeup() mitigation mentioned in the previous point also interacts badly with Serializable, because we have to delay __wakeup() calls to the end of the unserialization, which in particular also implies that Serializable::unserialize() sees objects prior to wakeup. (See also https://bugs.php.net/bug.php?id=74436.) In the end, everything comes down to the fact that Serializable requires nested serialization calls with context sharing. The alternative mechanism (__sleep + __wakeup) does not have these issues (anymore), but it is not sufficiently flexible for general use: Notably, __sleep() allows you to limit which properties are serialized, but the properties still have to actually exist on the object. I'd like to propose the addition of a new mechanism which essentially works the same way as Serializable, but uses arrays instead of strings and does not share context. I'm not sure about the naming (RealSerializable, anyone?), so I'll just go with magic methods __serialize() and __unserialize() for now: public function __serialize() : array; public function __unserialize(array $data) : void; >From a userland perspective the implementation should be the same as for Serializable methods, but with interior serialize()/unserialize() calls stripped out. Right now Serializable implementations already usually work by doing something like "return serialize([ ... ])", this would change it to just "return [ ... ]" and move the serialize()/unserialize() call into the engine, where we can perform it safely and robustly. The new methods should reuse the "O" serialization format, rather than introducing a new one. This allows a measure of interoperability with previous PHP versions, which can still decode serialized strings from newer versions using __wakeup(). If an object has both __wakeup() and __unserialize(), then __unserialize() should be called. If an object implements both Serializable::unserialize() and __unserialize(), then we should invoke one or the other based on whether "C" or "O" serialization is used. Thoughts? Nikita