Stanislav Malyshev wrote:
Hi!

What I am probably asking is what was the brick wall PHP6 hit. I was
under the impression that there was no agreement on 'switchable or only'
to unicode core? ( And those who did write PHP6 books seemed to have
their own views on which way the discussions would go ;) ).

 From what I can see, the biggest issues are these:
1. Performance - Unicode-based PHP right now requires tons of
conversions when talking to outside world (like MySQL) which slows down
the app significantly. Many extensions frequently used by PHP app
writers (such as mysql, pcre, etc.) do not support UTF-16 properly.
Also, inflated memory usage hurts scalability a lot.
2. Compatibility - it's hard to make existing app works with Unicode and
doesn't lose in performance or doesn't have any weird scenarios where
your passwords suddenly stop working because there's an extra recoding
step in some md5() call.

I think that there does need to be a proper review of just what the target is?

There are a number of 'unknowns' such as how does one identify the version of unicode being used. Differences seem to exist between OS's which don't help with that problem?

On disk storage should probably be UTF-8 without any question? Windows use of widestrings for some files simple doubles up the on disk storage requirements for very little gain? And remembering to convert '.reg' files back to normal raw text so I can read them on the Linux machines adds to the fun.

In memory handling of character strings is I think where some alternative methods may be appropriate. Firebird's original UNICODE_FSS collation was 3 bytes per character ( that IS the limit for Unicode ;) ) and so all of the character counting stuff works transparently. Firebird records are automatically compressed before storage, so white space in character strings is not wasting space on disk, and the unicode collations get compressed in the same way.

'3' is not a very processor friendly number, so working with 4 even though wasteful on memory, does make perfect sense. How long is it since we had a 640k limit on working memory? SERVERS should have a good amount of memory for caching information anyway. SO is UTF-16 the right approach for processing wide strings? It needs special code to handle everything wider than 16 bits, but at what gain really? If all core functionality is handled as 32 bit characters is there that much of an overhead over the additional processing to get around strings of dissimilar sizes in UTF-16 ?

Most of my own data handling is done via the database anyway, so queries return data already sorted and filtered. There is no point pulling un-proccessed data and then throwing much of it away, hence the rest of the infrastructure being used is important to get the best performance?

Probably 90% of the time a string will come in and go out without requiring any processing at all, so leave it as UTF-8 ? The only time we need to accurately know the number and position of characters is when we need to do some sting processing, and then only if the strings use multibyte characters. SO how about an additional couple of flags on a string variable. When a UTF-8 string is loaded, it is counted for bytes, and characters, and number of bytes per. If bytes and characters are the same ... no problems. If number of bytes is greater than 1, then sting handling needs to 'open them up' before processing, and '2' just uses an efficient UTF-16 processing, while '3+' goes to 32 bit processing?

Am I missing something? Why does unicode have to complicate things when in reality they are quite simple? Legacy stuff gets converted to UTF-8 and in many cases the user will not even see a difference, but the 'unicode on/off' switch just allows 127 single byte characters rather than 255 ? Currently all the multilingual stuff IS passing through PHP transparently and it would seem we can use unicode for variable names? So what IS missing?

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to