Re: [lazarus]

Razvan Adrian Bogdan Sat, 24 Nov 2007 01:57:26 -0800

On Nov 24, 2007 8:47 AM, Vincent Snijders <[EMAIL PROTECTED]> wrote:
> Vasily I. Volchenko schreef:
> > And lazarus team is trying to force UTF8 introduction with a revolution 
> > without supporting neither old project nor saving files (and only saving) 
> > in compartible with other projects format. Besides, that revolutionary 
> > process begins when the other version of the same product doesn't support 
> > such utf8. OK, I'll try to do something...
> >
>
> Fortunately Lazarus is still beta, breaking things can be expected.
> Other versions of Lazarus (for linux-gtk2, windows-qt) already use UTF8.


The logic behind Unicode is quite clear, there are countries which
simply cannot use some ANSI charset, i tend to agree that in your
special case, with the cyrilic charset it might be an increase to use
UTF8 or UTF16 since you get twice the size of the same text, i presume
you don't write international applications otherwise you would have
realised why Unicode is important, ideally UTF32 should be used
everywhere but the price for using UTF32 is too high with the current
internet speed, RAM and hard disk sizes and almost nobody uses it (i
think perl does) at first people used UCS2 for API implementation but
now UCS2 needed an upgrade to UTF16 to support all languages and it
proved that UTF16 also needs special processing for 4 Byte chars but
compatibility with UCS2 was needed so UTF16 is for UCS2 what UTF8 is
for ANSI ... an extension, both need special processing, UTF8 has 2
advantages over UTF16: size and speed, from what i understand
WideStrings in Delphi have a problem, they are not reference counted,
that is their length is not precalculated and this means WideString
are much slower than AnsiString, probably one of the reasons
CodeGear/Borland doesn't support Unicode everywhere, the fact that GTK
uses UTF8 by default forced people into adopting UTF8 internally since
there was no reason why not to do so as most latin based languages got
a real advantage with this and one could truly write an international
multilanguage application with little cost to it, of course Asian
languages need more storage but they need a lot with UTF16 also so for
them the impact is not so great, in your case with cyrilic you do need
twice the size for the text but how much is that, did you notice
Office applications also use Unicode for storage and nobody seems to
complain about it, internationalization is a fact, it is here, it is
needed and it's not so painful done the right way, in our case UTF8 is
the best way to do it, instead of having 2 sets of components for each
encoding, Ansi and UTF16 and options to use UTF8 instead of Ansi, it
is simply easier to just use UTF8 and you can also use some converter
such as the one included with synapse library if you need support for
Ansi codepages, unless you will only write russian applications you
will need more codepages and possibly some Unicode form and why not
implement it by default as all sane minded people would, i'm not sure
if .NET also adopted it but most languages did adopt some form of
Unicode already, be it UTF8 or UTF16 or even UTF32 they all have it, i
see no reason why to stick to Ansi when unicode is there with so
little cost and you can still use Ansi for storage when needed.

I hope this mail wasn't too long to read, if you will every write an
app in another language than your native language or mixed languages,
you will see why Unicode is important and probably why UTF16 is not
better than UTF8 neither in size nor in speed and not even in
implementation.

Razvan

_________________________________________________________________
     To unsubscribe: mail [EMAIL PROTECTED] with
                "unsubscribe" as the Subject
   archives at http://www.lazarus.freepascal.org/mailarchives

Re: [lazarus]

Reply via email to