Hi,
Over the past several months there have been various discussions about
PHP 6, and backwards compatibility, and what that entails, and seeming
it's come up again, I've finally written my thoughts:
Unicode is probably the biggest change, and the one that has the most
repercussions for backwards compatibility. If we maintain the
unicode.semantics switch (which IMO absolutely must not be kept,
regardless of which way becomes the default), then I, and others who
have codebases sensitive to such things, will need to deal with four
different cases in functions/methods:
- Unicode argument, unicode.semantics=Off
- Binary argument, unicode.semantics=Off (and PHP 5)
- Unicode argument, unicode.semantics=On
- Binary argument, unicode.semantics=On
Ending up with four code branches to deal with such things is
ludicrous. I can accept what I'm doing will be broken by Unicode,
necessitating two code branches if it defaults to Off, or three if On
(as I still need the binary/off one for PHP 5), but four is just
insane. That said against the greater number of code branches, I do
very much think we need to default to On, as there is currently no way
to explicitly create a Unicode string (u'', for the sake of argument)
without causing a compile time error on PHP 5 (allowing …'' at a
compiler level would be good, IMO, and just throwing an E_FATAL when
we actually try and parse it (which, if it in an if statement
dependent on version, would be never)), except for doing something
like unicode_decode("\x00\x00\xFF\xFD", 'UTF-32'), which gets horrible
quickly (I already do that for cases when unicode.semantics is Off in
some code I have locally, which really isn't fun). We already have
b''. PHP 5.2.1 is pushing it enough for most projects, and adding a
u'' to even 5.3 would be a bit too late. Realistically, the only way I
can see happening is to default to On.
Now, this means we don't have to care about code working on anything
less than PHP 5.2.1 in many ways — also making it On by default means
a fair amount of code will break — so the aim really should therefore
be that code that currently works on PHP 5.2.1 doesn't need to work
verbatim on PHP 6, but it must be possible to have code work on both
PHP 5.2.1 and PHP 6, using (almost) all the new features of PHP 6,
albeit branching with if statements to keep PHP 5 compatibility (using
things like namespaces would inevitably push that up to PHP 5.3 and
PHP 6).
Now, taking the fact we don't need stuff to work verbatim, we can do
all kinds of crazy cleanup (beyond the likes of removing magic_*,
safe_mode, register_globals, etc.) like stopping dynamic methods being
called statically, and visa-versa (IIRC, this is already in HEAD), as
well as getting rid of deprecated stuff that's been around forever.
Going back to Unicode briefly, there are some special cases regardless
of the default, such as chr() — we need a way to have both binary and
Unicode chr() functions (else we end up doing hell for Unicode, and
something like unicode_encode(chr(42), 'UTF-8') (this matches the
behaviour of chr() on the GNU userland).
--
Geoffrey Sneddon
<http://gsnedders.com/>
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php