Re: [fpc-pascal] UTF-8 versions of Copy() and Length()

Rimgaudas Laucius Sat, 19 May 2007 03:31:45 -0700

Storage:
UTF8<UTF16 only for most of latin scripts,
all other scripts (Chinese, Greek, Cylilic, Arabic, Indic, ...)
UTF8>UTF16.


Performance:
Length (UTF8) = UTF8->UTF16
2*Lenth(UTF8)> UTF8->UTF16

4-byte characters are used by UTF32. UTF16 uses sequences of 2 code pointsfrom surrogates area to expess charactes outside basic multilingual planethat are very rarely used (actully i do not know any program that implementsthat).

----- Original Message -----From: "Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]>

To: "FPC-Pascal users discussions" <fpc-pascal@lists.freepascal.org>
Sent: Saturday, May 19, 2007 12:57 PM
Subject: Re: [fpc-pascal] UTF-8 versions of Copy() and Length()

On 5/19/07, Rimgaudas Laucius <[EMAIL PROTECTED]> wrote:

It is not useful to have functions for both encodings, because these
encodings are interconvertable and it is more effective to use UTF-16 for
data processing


I disagree. The conversion impacts performance heavely. It will also
require memory to store the converted string, and after you perform a
operation you need to convert back.

Further, UTF-16 contains both 2-byte characters and 4-byte characters,
so I don't see how it would be any faster to process it in comparison
to process a utf-8 string.

About being easier to implement, that's irrelevant, because the
functions are already done.

--
Felipe Monteiro de Carvalho
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal



_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] UTF-8 versions of Copy() and Length()

Reply via email to