On Nov 24, 2007 8:47 AM, Vincent Snijders <[EMAIL PROTECTED]> wrote: > Vasily I. Volchenko schreef: > > And lazarus team is trying to force UTF8 introduction with a revolution > > without supporting neither old project nor saving files (and only saving) > > in compartible with other projects format. Besides, that revolutionary > > process begins when the other version of the same product doesn't support > > such utf8. OK, I'll try to do something... > > > > Fortunately Lazarus is still beta, breaking things can be expected. > Other versions of Lazarus (for linux-gtk2, windows-qt) already use UTF8.
The logic behind Unicode is quite clear, there are countries which simply cannot use some ANSI charset, i tend to agree that in your special case, with the cyrilic charset it might be an increase to use UTF8 or UTF16 since you get twice the size of the same text, i presume you don't write international applications otherwise you would have realised why Unicode is important, ideally UTF32 should be used everywhere but the price for using UTF32 is too high with the current internet speed, RAM and hard disk sizes and almost nobody uses it (i think perl does) at first people used UCS2 for API implementation but now UCS2 needed an upgrade to UTF16 to support all languages and it proved that UTF16 also needs special processing for 4 Byte chars but compatibility with UCS2 was needed so UTF16 is for UCS2 what UTF8 is for ANSI ... an extension, both need special processing, UTF8 has 2 advantages over UTF16: size and speed, from what i understand WideStrings in Delphi have a problem, they are not reference counted, that is their length is not precalculated and this means WideString are much slower than AnsiString, probably one of the reasons CodeGear/Borland doesn't support Unicode everywhere, the fact that GTK uses UTF8 by default forced people into adopting UTF8 internally since there was no reason why not to do so as most latin based languages got a real advantage with this and one could truly write an international multilanguage application with little cost to it, of course Asian languages need more storage but they need a lot with UTF16 also so for them the impact is not so great, in your case with cyrilic you do need twice the size for the text but how much is that, did you notice Office applications also use Unicode for storage and nobody seems to complain about it, internationalization is a fact, it is here, it is needed and it's not so painful done the right way, in our case UTF8 is the best way to do it, instead of having 2 sets of components for each encoding, Ansi and UTF16 and options to use UTF8 instead of Ansi, it is simply easier to just use UTF8 and you can also use some converter such as the one included with synapse library if you need support for Ansi codepages, unless you will only write russian applications you will need more codepages and possibly some Unicode form and why not implement it by default as all sane minded people would, i'm not sure if .NET also adopted it but most languages did adopt some form of Unicode already, be it UTF8 or UTF16 or even UTF32 they all have it, i see no reason why to stick to Ansi when unicode is there with so little cost and you can still use Ansi for storage when needed. I hope this mail wasn't too long to read, if you will every write an app in another language than your native language or mixed languages, you will see why Unicode is important and probably why UTF16 is not better than UTF8 neither in size nor in speed and not even in implementation. Razvan _________________________________________________________________ To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe" as the Subject archives at http://www.lazarus.freepascal.org/mailarchives