Chen Ze wrote:
On Sat, Mar 13, 2010 at 2:34 AM, Derick Rethans<der...@php.net>  wrote:
On Fri, 12 Mar 2010, Hannes Magnusson wrote:

On Fri, Mar 12, 2010 at 17:38, Moriyoshi Koizumi<m...@mozo.jp>  wrote:
I'd love to see my brand-new mbstring implementation in the release.
Dropping mbstring completely won't be any good because lots of
applications rely on it, but I don't really want to maintain the funky
library bundled with it.

Thats actually one of the ideas we had on IRC.
That mbstring patch and more ext/intl features should be enough to
solve "the unicode problem".

Sorry, but that is not true. intl and mbstring can provide functionality
to deal with UTF 8 string manipulation functions, they can not provide
proper Unicode support. Proper Unicode support is *not* only just
dealing with UTF-8 strings. Proper Unicode support includes dealing with
file streams, with different encodings, with localiztion, with sorting,
with locales, with formatting numbers. Offloading this to extensions
makes Unicode support an add-on hack, and not a language feature. I am
not saying that intl and mbstring aren't *useful*, but they definitely
do not solve "the unicode problem".


I think unicode should only care for string handling. Formatting
numbers should not be the thing that unicode cares. Unicode is a
standard for text, not for text or number formatting.

Back to the days we don't have unicode, the number formatting have
already existed. It even exists when computer was not invented.

That is same for sorting.

When we think about Unicode, we should think about those really
related to Unicode,like file system. Number formatting and sorting are
other things which intl cares.

For the unicode, I think we should implement something like:

$chars=new mchar($bytes,$bytes_encoding);
echo $chars;//output encoding
foreach ($chars as $char) {
       echo $char;//output single utf-16/utf-8 char (depends on default
output encoding)
}
echo $chars->bytes('gbk');

$chars->outputEncoding('gbk');
echo $chars;

ini_set('mchar_output_encoding','gbk');
echo $chars;

ini_set('mchar_filesystem_encoding','gbk');
echo $chars->filepath();

I think this probably highlights the fundamental difference of opinions on 
Unicode?

Handling unicode CONTENT is not the problem here. People nowadays expect to be able to use their own language to write code, and create functions using words that they recognize. In databases, table and field names are now expected to support unicode, rather than just handling unicode data pumped into ascii titled fields.

Personally I'm quite happy with just using ascii names for things, but more and more overseas customers provide contact details in 'strange' character sets that only unicode can handle, and handling THAT in PHP5 is not a problem. It's when people start building databases with unicode metadata and expect the tools interfacing with that to understand unicode as well.

It was my understanding that PHP6 was intended to provide international users with something that they could use in their own native language? Unicode titled files with unicode titled classes and functions.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to