Abdelrazak Younes <[EMAIL PROTECTED]> writes:

| > UTF-8 is a multi-byte encoding. It's useful for output to file
| > because the data are stored as characters (bytes). So, much of a
| > UTF-8 encoded file will be human readable; only the multi-byte
| > sequences will not.
| > Storing UTF-8 encoded data in a std::vector<char> (or uchar) is
| > eminently sensible because you're telling your users that the
| > container is just that; a container. You don't plan on using it for
| > anything other than storage and transport from one part of the code
| > to another. In particular, you certainly don't plan on using it to
| > perform string manipulations.
| > UCS-4 encodes all characters in the known universe, just as UTF-8
| > does, but each and every character takes up 4 bytes. It's reasonable
| > to use a 32 bit unsigned int to store each character. The advantage
| > of UCS-4 is that all characters take up the same space, so
| > std::basic_string<boost::uin32_t>::length
| > () is actually meaningful.
| 
| exactly my point ;-)
| Switching from vector to basic_string for ucs2 and 4 will simplify the code.

But that does not mean that the unicode.[Ch] api should change.

-- 
        Lgb

Reply via email to