----- Original Message ----- From: "Graeme Geldenhuys" <[EMAIL PROTECTED]>
To: "FPC-Pascal users discussions" <fpc-pascal@lists.freepascal.org>
Sent: Saturday, May 19, 2007 11:58 AM
Subject: Re: [fpc-pascal] UTF-8 versions of Copy() and Length()


On 5/19/07, Daniël Mantione <[EMAIL PROTECTED]> wrote:
> Does FPC have UTF-8 versions of the Copy() and Length() functions?

They don't exist. FPC has been designed to either use the system encoding
(which can be utf8). In this case, the string routines from sysutils do
what you want. The other option is to use widestrings;
length(utf8decode(s)) will return the length of an utf-8 string.

Sorry, I'm very new to Unicode support.  Wouldn't it be useful to have
UTF-8 and UTF-16 (and all the other encodings) functions in FPC?  For
example the Lazarus LCL (LCLProc unit) has loads of such functions.


You can find info on Unicode standard at http://www.unicode.org/versions/Unicode4.0.0/. http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf presents Unicode encoding forms.

It is not useful to have functions for both encodings, because these encodings are interconvertable and it is more effective to use UTF-16 for data processing. Actually, UTF-8 is suitable only for storing of external dada, because it is more compact. It expresses characters that are outside ASCII as sequences of 8-bit code points (actually 2 or 3) while UTF-16 expesses them using single (~actually) 16-bit code points. Thus processing of internal data (iterating, counting, etc.) using UTF-16 encoding may be done more effectivelly and easy.




The Length function is easy to get around, but the Copy, Pos ,etc
functions are not.



--
Graeme Geldenhuys

General error, hit any user to continue.
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal



_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Reply via email to