Re: [fpc-pascal] UTF-8 versions of Copy() and Length()

Daniël Mantione Sat, 19 May 2007 03:50:53 -0700


Op Sat, 19 May 2007, schreef Felipe Monteiro de Carvalho:


> On 5/19/07, Rimgaudas Laucius <[EMAIL PROTECTED]> wrote:
> > It is not useful to have functions for both encodings, because these
> > encodings are interconvertable and it is more effective to use UTF-16 for
> > data processing
> 
> I disagree. The conversion impacts performance heavely. It will also
> require memory to store the converted string, and after you perform a
> operation you need to convert back.
>
> Further, UTF-16 contains both 2-byte characters and 4-byte characters,
> so I don't see how it would be any faster to process it in comparison
> to process a utf-8 string.

For most operations, it is not necessary to process characters outside 
the BMP separately, i.e.:

for i:=1 to length(s) do
  s[i]:=upcase(i);

... is valid UTF-16 code, and much faster than the same operation in 
UTF-8.

Daniël

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] UTF-8 versions of Copy() and Length()

Reply via email to