Hello! I am thinking that we're doing something with the unicode implementation and that's that we're now getting duplicate implementations of quite some things: functions, internal functions, hash implementations, two ways for storing identifiers... only because we need to support both IS_STRING and IS_UNICODE and unicode=off mode.
I think I would prefer an IS_UNICODE/unicode=on only PHP. This would mean that: - no duplicate functionality for tons of functions that will make maintaining the thing very hard - a cleaner (and a bit faster) Unicode implementation - we have a bit less BC. Internally we would only see IS_UNICODE and IS_BINARY, where we can have a small layer around extensions which return IS_STRING where we automatically convert it to and from unicode for those extensions. IS_STRING strings will still exist, but should not be there for the "user level". For things like: $str = unicode_convert($unicode, 'iso-2022'); and $unicode being "IS_UNICODE". $str will now be an IS_BINARY string, with all the restrictions that we already have on those strings (like no automatic conversions). Functions that work on binary strings can be quite limited (we wouldn't need a strtolower for that f.e.), so we are cutting down in a lot of duplicated code. The same goes for not having to support both unicode=off and unicode=on mode, as that can make things a bit complicated too. This will limit functionality on binary strings a bit though, but I think this is 10 times better than an unmaintainable PHP with Unicode support. Besides this, I ran some micro benchmarks on about 600 characters of text with a few functions and benchmarked their behavior between unicode=1 and unicode=0 mode. Results: strrev (100.000 iterations over 600 characters of normalized latin text): unicode off: 1.8secs unicode on: 5.0secs strtoupper (100.000 iterations over the same text): unicode off: 2.2secs unicode on: 7.9secs substr(50, 100) (1.000.000 over the same text): unicode off: 3.9secs unicode on: 11.9secs This is something I find quite not acceptable, and we need to figure out a way on how to optimize this - for substr the penalty is probably what we are using an iterator and not a direct memcpy (because of surrogates), I am not so sure about the others. regards, Derick -- Derick Rethans http://derickrethans.nl | http://ez.no | http://xdebug.org -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php