Hi,

Over the past several months there have been various discussions about PHP 6, and backwards compatibility, and what that entails, and seeming it's come up again, I've finally written my thoughts:

Unicode is probably the biggest change, and the one that has the most repercussions for backwards compatibility. If we maintain the unicode.semantics switch (which IMO absolutely must not be kept, regardless of which way becomes the default), then I, and others who have codebases sensitive to such things, will need to deal with four different cases in functions/methods:
- Unicode argument, unicode.semantics=Off
- Binary argument, unicode.semantics=Off (and PHP 5)
- Unicode argument, unicode.semantics=On
- Binary argument, unicode.semantics=On

Ending up with four code branches to deal with such things is ludicrous. I can accept what I'm doing will be broken by Unicode, necessitating two code branches if it defaults to Off, or three if On (as I still need the binary/off one for PHP 5), but four is just insane. That said against the greater number of code branches, I do very much think we need to default to On, as there is currently no way to explicitly create a Unicode string (u'', for the sake of argument) without causing a compile time error on PHP 5 (allowing …'' at a compiler level would be good, IMO, and just throwing an E_FATAL when we actually try and parse it (which, if it in an if statement dependent on version, would be never)), except for doing something like unicode_decode("\x00\x00\xFF\xFD", 'UTF-32'), which gets horrible quickly (I already do that for cases when unicode.semantics is Off in some code I have locally, which really isn't fun). We already have b''. PHP 5.2.1 is pushing it enough for most projects, and adding a u'' to even 5.3 would be a bit too late. Realistically, the only way I can see happening is to default to On.

Now, this means we don't have to care about code working on anything less than PHP 5.2.1 in many ways — also making it On by default means a fair amount of code will break — so the aim really should therefore be that code that currently works on PHP 5.2.1 doesn't need to work verbatim on PHP 6, but it must be possible to have code work on both PHP 5.2.1 and PHP 6, using (almost) all the new features of PHP 6, albeit branching with if statements to keep PHP 5 compatibility (using things like namespaces would inevitably push that up to PHP 5.3 and PHP 6).

Now, taking the fact we don't need stuff to work verbatim, we can do all kinds of crazy cleanup (beyond the likes of removing magic_*, safe_mode, register_globals, etc.) like stopping dynamic methods being called statically, and visa-versa (IIRC, this is already in HEAD), as well as getting rid of deprecated stuff that's been around forever.

Going back to Unicode briefly, there are some special cases regardless of the default, such as chr() — we need a way to have both binary and Unicode chr() functions (else we end up doing hell for Unicode, and something like unicode_encode(chr(42), 'UTF-8') (this matches the behaviour of chr() on the GNU userland).


--
Geoffrey Sneddon
<http://gsnedders.com/>


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to