Re: [fpc-pascal] Unicode filenames
2008/6/30 Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>: > > Nothing arived here. Can't you just say if it worked or not? Crap! See the email below, I removed the screenshot for now. Under Linux, I could see and query the directory and file, while my system as setup for English. Again, no idea how to do this test under Windows without Russian locale support. -- Forwarded message -- From: Graeme Geldenhuys <[EMAIL PROTECTED]> Date: 2008/6/30 Subject: Re: [fpc-pascal] Unicode filenames To: FPC-Pascal users discussions 2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>: > > If you create a file with a russian name on your hard disk, can you use > fpgFileExists to check for its existence? I don't know how to change the Windows language to Russian, but I do under Linux. So I did the following. I changed my Linux systems locale for an application to Russian. I had Russian translations of fpGUI Toolkit so used those. I copied the rsCancel resourcestring value (in Russian) to a Edit component. Copied that to clipboard, used the File Open dialog to 'create directory' and pasted the Russian word for Cancel in their. Now I had a Russian directory on the hard drive. I quit the program, changed back to English locale. Loaded the program, and it displayed the Russian directory correctly. I could also query a russian file inside the russian directory. PS: Even Linux's terminal didn't display the russian directory correcty, though Nautilus did. The terminal showed a whole bunch of '?' instead. In fpGUI it looked fine. [ screenshot removed ] Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>: > > Even if it doesn't contain Russian locale, you would be able to create such > files in the windows explorer to create such file, for example by copy / > pasting the file name while renaming it. > > Then let your fpGui program check for its existence. Under Linux I simply change the LANG environment variable and place the fpgui_ru.po file in the directory, and it uses the Russian language. I use GetText like done in Lazarus. How to I change the language locale in Windows, so I can use the Russian translaction of fpGUI? Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
On Mon, 30 Jun 2008 09:04:21 +0200 "Graeme Geldenhuys" <[EMAIL PROTECTED]> wrote: > 2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>: > > > > Even if it doesn't contain Russian locale, you would be able to > > create such files in the windows explorer to create such file, for > > example by copy / pasting the file name while renaming it. > > > > Then let your fpGui program check for its existence. > > Under Linux I simply change the LANG environment variable and place > the fpgui_ru.po file in the directory, and it uses the Russian > language. I use GetText like done in Lazarus. How to I change the > language locale in Windows, so I can use the Russian translaction of > fpGUI? Can you test to create a chinese filename in the explorer. For example: 北方话.txt Then check if the file exists. This should work independent of the locale. With Utf8ToAnsi this can not work on 1-byte locales. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>: > > If you create a file with a russian name on your hard disk, can you use > fpgFileExists to check for its existence? I don't know how to change the Windows language to Russian, but I do under Linux. So I did the following. I changed my Linux systems locale for an application to Russian. I had Russian translations of fpGUI Toolkit so used those. I copied the rsCancel resourcestring value (in Russian) to a Edit component. Copied that to clipboard, used the File Open dialog to 'create directory' and pasted the Russian word for Cancel in their. Now I had a Russian directory on the hard drive. I quit the program, changed back to English locale. Loaded the program, and it displayed the Russian directory correctly. PS: Even Linux's terminal didn't display the russian directory correcty. A whole bunch of '?' instead. fpGUI worked fine! ;-) See attached screenshot. Application using English locale (en_ZA.UTF-8) as can be seen by the grid headers and displaying a Russian directory name. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ <>___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
On Mon, 30 Jun 2008 01:03:38 +0200 "Graeme Geldenhuys" <[EMAIL PROTECTED]> wrote: > 2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>: > > > > If you create a file with a russian name on your hard disk, can you > > use fpgFileExists to check for its existence? > > I don't know how to change the Windows language to Russian, but I do > under Linux. So I did the following. I changed my Linux systems locale > for an application to Russian. I had Russian translations of fpGUI > Toolkit so used those. I copied the rsCancel resourcestring value (in > Russian) to a Edit component. Copied that to clipboard, used the File > Open dialog to 'create directory' and pasted the Russian word for > Cancel in their. Now I had a Russian directory on the hard drive. I > quit the program, changed back to English locale. Loaded the program, > and it displayed the Russian directory correctly. > > PS: > Even Linux's terminal didn't display the russian directory correcty. > A whole bunch of '?' instead. fpGUI worked fine! ;-) How did you change to russian locale? > See attached screenshot. Application using English locale > (en_ZA.UTF-8) as can be seen by the grid headers and displaying a > Russian directory name. Under Linux you use Result := aString; - no conversion. Still UTF-8. Linux is not the problem here. Windows is. You must use the W functions. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: >> Even Linux's terminal didn't display the russian directory correcty. >> A whole bunch of '?' instead. fpGUI worked fine! ;-) > > How did you change to russian locale? By setting the LANG environment variable. export LANG=ru.UTF-8 > > Linux is not the problem here. Windows is. You must use the W functions. We do use the W functions in the GDI backend. As show in Felipe's post. if UnicodeEnabledOS then ... // W functions else // plain functions Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: > > Can you test to create a chinese filename in the explorer. For example: > > 北方话.txt > > Then check if the file exists. This should work independent of the > locale. > With Utf8ToAnsi this can not work on 1-byte locales. OK, I'll copy and paste that filename from the email into explorer. Then see what happens. I'll reply with the results. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Graeme Geldenhuys schreef: 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: Even Linux's terminal didn't display the russian directory correcty. A whole bunch of '?' instead. fpGUI worked fine! ;-) How did you change to russian locale? By setting the LANG environment variable. export LANG=ru.UTF-8 Linux is not the problem here. Windows is. You must use the W functions. We do use the W functions in the GDI backend. As show in Felipe's post. if UnicodeEnabledOS then ... // W functions else // plain functions But for filenames in fpgFileExists? Vincent ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: > > Can you test to create a chinese filename in the explorer. For example: > > 北方话.txt I can't create that file in Windows Explorer. I'm using Windows 2000 in a VMWare session. I opened explorer. Then copied the filename above to the clipboard from Linux (where Firefox and GMail runs). The then go to Window VMWare session. Right Click and create now text file. Past the chinese name, which then seems to escape the name. When I press enter to confirm the name, Windows gives me an error. See the screenshot. http://opensoft.homeip.net/~graemeg/test.png PS: I hope I got the Apache access permisions correct to my home folder. Let me know if there is a problem accessing the image, then I'll move it somewhere else on the web. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>: > > But for filenames in fpgFileExists? Sorry, yes the previous code was for things like painting text. Here is the code for fpgFileExists // platform indepenent function fpgFileExists(const FileName: TfpgString): Boolean; begin Result := FileExists(fpgToOSEncoding(FileName)); end; // GDI dependent function fpgToOSEncoding(aString: TfpgString): string; begin Result := Utf8ToAnsi(aString); end; function fpgFromOSEncoding(aString: string): TfpgString; begin Result := AnsiToUtf8(aString); end; No idea if this is enough for all cases (I'm not a unicode guru), but Vladimir (russian developer) reported that it works for him. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>: > 2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>: > > > > But for filenames in fpgFileExists? > > Sorry, yes the previous code was for things like painting text. > > > Here is the code for fpgFileExists > > // platform indepenent > function fpgFileExists(const FileName: TfpgString): Boolean; > begin > Result := FileExists(fpgToOSEncoding(FileName)); > end; ok > // GDI dependent > function fpgToOSEncoding(aString: TfpgString): string; > begin > Result := Utf8ToAnsi(aString); > end; Not ok. UTF8ToAnsi converts to the current windows 8bit code page. Each code page only supports a few languages, but not all. See: http://www.microsoft.com/globaldev/reference/WinCP.mspx As you can see, the 1250 is for whole central europe, so french, spanish, german, english - they all use the same codepage. Russian needs another. And afaik chinese is converted to DBCS. (I did not try) > function fpgFromOSEncoding(aString: string): TfpgString; > begin > Result := AnsiToUtf8(aString); > end; ok. I wonder, what windows gives for characters not in the current code page. > No idea if this is enough for all cases (I'm not a unicode guru), but > Vladimir (russian developer) reported that it works for him. Of course it works for him - MS keeps things going. As long as a user stays in his code page it works. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
2008/6/30 Mattias Gärtner <[EMAIL PROTECTED]>: > >> function fpgFromOSEncoding(aString: string): TfpgString; >> begin >> Result := AnsiToUtf8(aString); >> end; > > ok. > I wonder, what windows gives for characters not in the current code page. Umm, no idea really. >> No idea if this is enough for all cases (I'm not a unicode guru), but >> Vladimir (russian developer) reported that it works for him. > > Of course it works for him - MS keeps things going. As long as a user stays in > his code page it works. Ah, now I get it!! I'm a bit slow today. :-) Using characters outside of the current code page range needs special attention. Thanks for your explanation. I now understand the problem. Makes me wonder how CodeGear is going to handle something like this. Have they released more information on their Unicode support? Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
> 2008/6/30 Mattias G?rtner <[EMAIL PROTECTED]>: > > >> No idea if this is enough for all cases (I'm not a unicode guru), but > >> Vladimir (russian developer) reported that it works for him. > > > > Of course it works for him - MS keeps things going. As long as a user stays > > in > > his code page it works. > > Ah, now I get it!! I'm a bit slow today. :-) Using characters > outside of the current code page range needs special attention. > Thanks for your explanation. I now understand the problem. > > Makes me wonder how CodeGear is going to handle something like this. > Have they released more information on their Unicode support? They actually use unicode support, so NT only -W functions. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Graeme Geldenhuys schreef: 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: Can you test to create a chinese filename in the explorer. For example: 北方话.txt I can't create that file in Windows Explorer. I'm using Windows 2000 in a VMWare session. I opened explorer. Then copied the filename above to the clipboard from Linux (where Firefox and GMail runs). The then go to Window VMWare session. Right Click and create now text file. Past the chinese name, which then seems to escape the name. When I press enter to confirm the name, Windows gives me an error. I cannot create the chinese filename either, but I could create a russian filename: пишет.txt UTF8ToAnsi converts the russian characters to ?. Vincent ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>: > 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: > >> Even Linux's terminal didn't display the russian directory correcty. > >> A whole bunch of '?' instead. fpGUI worked fine! ;-) > > > > How did you change to russian locale? > > By setting the LANG environment variable. > export LANG=ru.UTF-8 That's not a valid lang value. export LANG=ru_RU.UTF-8 But the important part is the UTF-8. If you don't change this, then the terminal will still use UTF-8 and the fonts will not change. If it shows '??' then the filename is not a valid UTF-8 string or your terminal shows unsupported characters as '??'. For example xterm shows rectangles for valid but unsupported characters. > > Linux is not the problem here. Windows is. You must use the W functions. > > We do use the W functions in the GDI backend. As show in Felipe's post. > > if UnicodeEnabledOS then > ... // W functions > else > // plain functions And the same must be done for filenames. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>: > 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>: > > > > Can you test to create a chinese filename in the explorer. For example: > > > > ±±·½»°.txt > > I can't create that file in Windows Explorer. I'm using Windows 2000 > in a VMWare session. I opened explorer. Then copied the filename above > to the clipboard from Linux (where Firefox and GMail runs). The then > go to Window VMWare session. Right Click and create now text file. > Past the chinese name, which then seems to escape the name. Maybe that's the way 'explorer' handles unsupported characters in the font. Windows has still to go a long way for real unicode. > When I > press enter to confirm the name, Windows gives me an error. > > See the screenshot. > http://opensoft.homeip.net/~graemeg/test.png Ok. Thanks for trying out. It's not that important. I think it's clear now, that windows need the W functions. > PS: > I hope I got the Apache access permisions correct to my home folder. > Let me know if there is a problem accessing the image, then I'll move > it somewhere else on the web. Works here. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > 2008/6/30 Mattias G?rtner <[EMAIL PROTECTED]>: > > > > >> No idea if this is enough for all cases (I'm not a unicode guru), but > > >> Vladimir (russian developer) reported that it works for him. > > > > > > Of course it works for him - MS keeps things going. As long as a user > stays in > > > his code page it works. > > > > Ah, now I get it!! I'm a bit slow today. :-) Using characters > > outside of the current code page range needs special attention. > > Thanks for your explanation. I now understand the problem. > > > > Makes me wonder how CodeGear is going to handle something like this. > > Have they released more information on their Unicode support? > > They actually use unicode support, so NT only -W functions. Do you know what happen with their plans of creating a new string type (a reference counted 2-byte string) and changing all String and Char to the 2-byte types? Is this only for .net or for the win32 too? Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
> Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > > > > > Makes me wonder how CodeGear is going to handle something like this. > > > Have they released more information on their Unicode support? > > > > They actually use unicode support, so NT only -W functions. > > Do you know what happen with their plans of creating a new string type (a > reference counted 2-byte string) and changing all String and Char to the > 2-byte > types? It will be released in August. > Is this only for .net or for the win32 too? Both. (and afaik .NET already is), but they don't have an unix to keep track of, so we can't simply follow them blindly. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > > > > > > > Makes me wonder how CodeGear is going to handle something like this. > > > > Have they released more information on their Unicode support? > > > > > > They actually use unicode support, so NT only -W functions. > > > > Do you know what happen with their plans of creating a new string type (a > > reference counted 2-byte string) and changing all String and Char to the > 2-byte > > types? > > It will be released in August. If they are bold enough for this big incompatibility, then maybe we must be braver too. > > Is this only for .net or for the win32 too? > > Both. (and afaik .NET already is), but they don't have an unix to keep track > of, so we can't simply follow them blindly. True. I guess, FPC will support the new string type and create some compiler directives/flags to control the default string/char type, won't it? And it will use the widestring manager to auto convert these types, won't it? So FPC will be able compile the new Delphi code, although the code will run slower and there will conversion errors on non ansi characters. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
> Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > > types? > > > > It will be released in August. > > If they are bold enough for this big incompatibility, then maybe we must be > braver too. What, also stop supporting anything but Windows? > > Both. (and afaik .NET already is), but they don't have an unix to keep > > track of, so we can't simply follow them blindly. > > True. > I guess, FPC will support the new string type and create some compiler > directives/flags to control the default string/char type, won't it? No decision yet. It is not really easy. > So FPC will be able compile the new Delphi code, although the code will run > slower and there will conversion errors on non ansi characters. No commitment to remain Delphi compat either. Until we 1) know exactly what Delphi does, 2) if we can unify that with a workable multi platform vision. As you said, it is delphi that breaks in a major way. Doesn't mean we'll follow that. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode filenames
On Mon, 30 Jun 2008, Marco van de Voort wrote: > > Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > > > types? > > > > > > It will be released in August. > > > > If they are bold enough for this big incompatibility, then maybe we must be > > braver too. > > What, also stop supporting anything but Windows? > > > > Both. (and afaik .NET already is), but they don't have an unix to keep > > > track of, so we can't simply follow them blindly. > > > > True. > > I guess, FPC will support the new string type and create some compiler > > directives/flags to control the default string/char type, won't it? > > No decision yet. It is not really easy. > > > So FPC will be able compile the new Delphi code, although the code will run > > slower and there will conversion errors on non ansi characters. > > No commitment to remain Delphi compat either. Until we 1) know exactly what > Delphi does, 2) if we can unify that with a workable multi platform vision. > > As you said, it is delphi that breaks in a major way. Doesn't mean we'll > follow that. Given that most polls I've seen indicate that most people still use D7, I don't think we should be too eager to jump into the void ourselves... Of course a decision will be forced on us anyway in the long run :-) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] Unicode file routines proposal
Hello, There is already another thread about that, but the thread got too long, and I would like to make a concrete proposal about unicode file routines. It looks simple to me, there are just 2 ways to go, either utf-8 or utf-16. Correct me if I am wrong, but I beliave that FPC developers prefer utf-16, so we can have a widestring version of every routine in the RTL which involves filenames. So let me start with a concrete example: http://www.freepascal.org/docs-html/rtl/system/assign.html We would need to add a: procedure Assign( var f: ; const Name: widestring ); Also for all this routines: http://www.freepascal.org/docs-html/rtl/sysutils/filenameroutines.html Under Windows it can be implemented like this with Windows 9x support: procedure AnyFileRoutineInWin32(AFileName: widestring); begin if UnicodeEnabledOS then SomeWin32APIW() else AnsiToWideString(SomeWin32ApiA()) end; One can initialize UnicodeEnabledOS by reading the operating system version and the operating system type NT/9x very easily. Under Windows 9x we won't support true unicode filenames, but this doesn't matter, because the operating system doesn't support them anyway. The widestring routines will keep working under Windows 9x for most code. This method is used with great success in the LCL. Extended information here: http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Guidelines -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] UTF8String
Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > Zitat von Marco van de Voort <[EMAIL PROTECTED]>: > > > > types? > > > > > > It will be released in August. > > > > If they are bold enough for this big incompatibility, then maybe we must be > > braver too. > > What, also stop supporting anything but Windows? ;) I meant, break some compatibility too. At the moment the RTL uses Ansi = current code page < unicode. And so any filename that uses indirectly these functions is limited to the current code page. One solution is to create a new string type UTF8String. Similar to AnsiString/WideString auto conversions, there would be an auto conversion AnsiString/UTF8String, IFF there is an UTF-8 string manager. The RTL and FCL can use it for filenames and similar things. There should be some compiler flags to set the default string type to either ansistring or utf8string. All lazarus related sources would set the default string type to utf8 and would load an UTF-8 string manager. Drawbacks: - If a UTF8 string manager is enabled, old code that uses ansistring would double recode filenames under windows. From ansistring to UTF8 to widestring. I guess in case of filenames this is hardly measurable. - Mixing code compiled with ansistring and compiled with UTF8String will be slower due to the conversions. Especially when exchanging big lists of string like TStringList.Assign. Remedy: enable UTF8String and fix the few issues. This might be arbitrary difficult and/or unpleasant. But hey, CodeGear says that even switching from ansistring to widestring is easy. And of course you can still use 'ansistring' explicitly. - Mixing code compiled with ansistring and compiled with UTF8String with dirty typecasts will create hidden conversion errors. Remedy: Same as above. - If the most common RTL/FCL functions are *not* converted to UTF8String then all UTF8String programs will become very slow and might even get conversion errors, due to typecasts. So, if RTL/FCL don't use it, no one will. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
Zitat von Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>: > Hello, > > There is already another thread about that, but the thread got too > long, and I would like to make a concrete proposal about unicode file > routines. > > It looks simple to me, there are just 2 ways to go, either utf-8 or > utf-16. Correct me if I am wrong, but I beliave that FPC developers > prefer utf-16, so we can have a widestring version of every routine in > the RTL which involves filenames. > > So let me start with a concrete example: > > http://www.freepascal.org/docs-html/rtl/system/assign.html > > We would need to add a: > > procedure Assign( > var f: ; > const Name: widestring > ); > > Also for all this routines: > > http://www.freepascal.org/docs-html/rtl/sysutils/filenameroutines.html > > Under Windows it can be implemented like this with Windows 9x support: > > procedure AnyFileRoutineInWin32(AFileName: widestring); > begin > if UnicodeEnabledOS then SomeWin32APIW() > else AnsiToWideString(SomeWin32ApiA()) > end; But what about all existing code? For example the FCL? How will TStringList.LoadFromFile be converted? > One can initialize UnicodeEnabledOS by reading the operating system > version and the operating system type NT/9x very easily. > > Under Windows 9x we won't support true unicode filenames, but this > doesn't matter, because the operating system doesn't support them > anyway. The widestring routines will keep working under Windows 9x for > most code. > > This method is used with great success in the LCL. Extended information here: > > http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Guidelines Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
> There is already another thread about that, but the thread got too > long, and I would like to make a concrete proposal about unicode file > routines. > > It looks simple to me, there are just 2 ways to go, either utf-8 or > utf-16. There are more possibilities: - native encoding (utf-8 on *nix, utf-16 on windows) - have two types. - an unified type (type contains encoding) Even this has not been decided. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
Marco van de Voort schreef: It looks simple to me, there are just 2 ways to go, either utf-8 or utf-16. There are more possibilities: - native encoding (utf-8 on *nix, utf-16 on windows) - have two types. How can one write portable code with these options? - an unified type (type contains encoding) Vincent ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
2008/6/30 Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>: > It looks simple to me, there are just 2 ways to go, either utf-8 or > utf-16. Correct me if I am wrong, but I beliave that FPC developers > prefer utf-16, so we can have a widestring version of every routine in > the RTL which involves filenames. I thought UTF-8 was prefered. Hence the reason Lazarus followed the UTF-8 route in LCL and Unicode support. Regards, - Graeme - ___ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Mon, Jun 30, 2008 at 9:31 AM, Mattias Gärtner <[EMAIL PROTECTED]> wrote: > But what about all existing code? > For example the FCL? > How will TStringList.LoadFromFile be converted? TStringList.LoadFromFile(AFileName: widestring); overload The ansi version could call the wide version and just do the string conversion. -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Mon, Jun 30, 2008 at 9:55 AM, Graeme Geldenhuys <[EMAIL PROTECTED]> wrote: > I thought UTF-8 was prefered. Hence the reason Lazarus followed the > UTF-8 route in LCL and Unicode support. UTF-8 is much better for the LCL because it just fits much better in out existing codebase. For the RTL we would also like to have UTF-8, but in previous conversations I got the impression that RTL developers prefer UTF-16. -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
> Marco van de Voort schreef: > >> It looks simple to me, there are just 2 ways to go, either utf-8 or > >> utf-16. > > > > There are more possibilities: > > - native encoding (utf-8 on *nix, utf-16 on windows) > > - have two types. > > How can one write portable code with these options? How can you consider yourself portable by picking one systems encoding, and emulating it on others? Note also that reliance on encoding is way less important, since fewer people will be parsing through strings manually (simply because it is more difficult) ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
Marco van de Voort schreef: Marco van de Voort schreef: It looks simple to me, there are just 2 ways to go, either utf-8 or utf-16. There are more possibilities: - native encoding (utf-8 on *nix, utf-16 on windows) - have two types. How can one write portable code with these options? How can you consider yourself portable by picking one systems encoding, and emulating it on others? At the borders of my I convert all strings to the 'internal type' and encoding and use it like that. Kind of like we are doing nowadays to convert the line-endings in text files. I see what you are trying to say, but having a string type that is UTF8 encoded on one system and UTF16 encoded on another system, doesn't seem easy to work with to me, even if you name it for example RTLString. Even widestring is an example of bad portability, because they are refcounted everywhere except on windows. Note also that reliance on encoding is way less important, since fewer people will be parsing through strings manually (simply because it is more difficult) Right, but they rely on not having to convert it all the time. ATM, all the client libs above the RTL have chosen one encoding, string type: LCL en fpGUI: UTF8, MseGui: widestring So for those libs to interface with a platform dependent string type in the LCL, they would have to write platform dependent code. I don't feel much like writing a LCLSysutils.FileExists, like Graham already has done, to hide these conversions. Vincent ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
> Marco van de Voort schreef: > At the borders of my I convert all strings to the 'internal type' and > encoding and > use it like that. Kind of like we are doing nowadays to convert the > line-endings in > text files. I don't like this. This makes e.g. processing a database export on Unix unnecessarily costly > I see what you are trying to say, but having a string type that is UTF8 > encoded on one system and UTF16 encoded on another system, doesn't seem > easy to work with to me, even if you name it for example RTLString. It should be possible to work in the native encoding. One doesn't want to wrap _every_ function in _every_ header with conversions procs. > > Note also that reliance on encoding is way less important, since fewer > > people will be parsing through strings manually (simply because it is more > > difficult) > > Right, but they rely on not having to convert it all the time. Well, they will have to do that with one string type too, at every external barrier. That also kills the benefit of choosing UTF-16 in the first place, since Delphi code won't work on Unix without manually inserting a lot of conversion code. > ATM, all the client libs above the RTL have chosen one encoding, string type: > LCL en > fpGUI: UTF8, MseGui: widestring That has nothing to do with these decisions. They chose that in the absence of a good solution. This is about picking a good solution. > So for those libs to interface with a platform dependent string type in > the LCL, they would have to write platform dependent code. You will have to anyway for any solution that only supports one encoding. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <[EMAIL PROTECTED]> wrote: > It should be possible to work in the native encoding. One doesn't want to > wrap _every_ function in _every_ header with conversions procs. It is not possible to work with a ever changing encoding. MyLabel.Caption := 'Lição'; How would that ever work with a ever changing encoding? It would not. If you go to the real implementation level a changing encoding quickly becomes unmanagable. And what about the LFM files? In which encoding will they be? What if you develop a software in one system and tryes to build it in another? Ok, to go one step further: Has anyone ever seen a fully unicode system which works with changing encodings? I beliave there exists none, because this is not a good solution. > Well, they will have to do that with one string type too, at every external > barrier. This is already necessary. > That also kills the benefit of choosing UTF-16 in the first place, since > Delphi code won't work on Unix without manually inserting a lot of > conversion code. Delphi code can use the ansi routines, which could just call the utf-16 routines with a string conversion, or you can implement every routine twice to maximize speed. -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
> On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <[EMAIL PROTECTED]> > wrote: > > It should be possible to work in the native encoding. One doesn't want to > > wrap _every_ function in _every_ header with conversions procs. > > It is not possible to work with a ever changing encoding. > > MyLabel.Caption := 'Li??o'; > > How would that ever work with a ever changing encoding? It would not. Encoding in source is something totally different. This is '\u1232\u2314' like syntax can be changed to utf8/16 by the compiler. In theory I think, practice might be else. > If you go to the real implementation level a changing encoding quickly > becomes unmanagable. That's why I don't believe the one string type two encoding helps. But if fileexists is utf-8 on unix and utf-16 on windows, and any utf-16 or UTF-8 string that you pass from Lazarus is auto converted, what is the exact problem? Everybody can maintain certain subsystems in a certain encoding, but doesn't force that choice upon others. > And what about the LFM files? In which encoding will they be? The one you annotate in it? The loading code can decode both, since both systems have both ? > What if you develop a software in one system and tryes to build it in > another? What does that mean for the fully UTF-16 system? First you may start with wrapping all C api's that use utf-8 on Unix. I understand the simplicity of one encoding is appealing, but you have to look at all aspects, and that is not just representation in the GUI. It will mean that _every_ string transactie to the outside will have to be manually wrapped AND have a performance penalty. That is a heavy price to pay for not touching a bit of lfm loading code. > Ok, to go one step further: Has anyone ever seen a fully unicode > system which works with changing encodings? I beliave there exists > none, because this is not a good solution. How many systems do you know have datafiles of like .lfm's over system borders? > > Well, they will have to do that with one string type too, at every > > external barrier. > > This is already necessary. But if you properly type them, some conversions maybe automatic. Something you don't have with a single type. > > That also kills the benefit of choosing UTF-16 in the first place, since > > Delphi code won't work on Unix without manually inserting a lot of > > conversion code. > > Delphi code can use the ansi routines, which could just call the > utf-16 routines with a string conversion, or you can implement every > routine twice to maximize speed. If the unicode code is not compatible with Delphi (UTF-16), there is no point in using UTf-16 in the first place. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Mon, 30 Jun 2008 10:03:18 -0300 "Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote: > On Mon, Jun 30, 2008 at 9:55 AM, Graeme Geldenhuys > <[EMAIL PROTECTED]> wrote: > > I thought UTF-8 was prefered. Hence the reason Lazarus followed the > > UTF-8 route in LCL and Unicode support. > > UTF-8 is much better for the LCL because it just fits much better in > out existing codebase. This may have been discussed before - but should the encoding not be dependent on the locale? What would happen if I write a FPC program, if the internal routines are, eg., UTF-16, and my locale is set to en_US.UTF8? Anyway, I have the impression that most of Linux is utf-8 oriented by now. John ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Mon, Jun 30, 2008 at 11:35 AM, Marco van de Voort <[EMAIL PROTECTED]> wrote: > I understand the simplicity of one encoding is appealing, but you have to > look at all aspects, and that is not just representation in the GUI. > > It will mean that _every_ string transactie to the outside will have to be > manually wrapped AND have a performance penalty. That is a heavy price to > pay for not touching a bit of lfm loading code. It won't need to be wrapped in platforms which nativelly support the choosen encoding. UTF-16 is natively supported in Windows and Windows CE. Not sure on unixes. Because LCL uses a single encoding this performance difference disappears as soon as you need to convert the string in LCL. > How many systems do you know have datafiles of like .lfm's over system > borders? Gtk can load XML files, somewhat equivalent to our LFMs. They use UTF-8 everywhere. Java is cross-platform and uses UTF-16 everywhere. wxWidgets uses UTF-16 everywhere. Let me try to sumarize my oppinion on multiple encodings vs single encoding: multiple encodings: * More complex * Innovative solution, no known example of a implementation of this system exists = uncertainty if it works at all, or if it is convenient for developers * Depends on a not yet implemented string type * Potentially will have a higher performance then a single encoding system, but only if you use this new special string type Single encoding: * Simple, proved solution * Does not need any new string type, can start being implemented immediately * Potentially has a lower performance due to string conversions. Actually for Lazarus the only advantage I see in the multiple encoding system does not exist, because we use a single encoding system in some platforms we will need conversion and in others we won't need, which just makes things worse for us. -- Felipe Monteiro de Carvalho ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
En/na John Coppens ha escrit: This may have been discussed before - but should the encoding not be dependent on the locale? What would happen if I write a FPC program, if the internal routines are, eg., UTF-16, and my locale is set to en_US.UTF8? Anyway, I have the impression that most of Linux is utf-8 oriented by now. Well, yes, but that's the external representation. I'd say to take a look at how python managed to integrate unicode support: http://www.google.com/search?domains=www.python.org&sitesearch=www.python.org&sourceid=google-search&q=unicode&submit=search Bye -- Luca ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Unicode file routines proposal
On Monday 30 June 2008 22.19:49 Luca Olivetti wrote: > En/na John Coppens ha escrit: > > This may have been discussed before - but should the encoding not be > > dependent on the locale? What would happen if I write a FPC program, > > if the internal routines are, eg., UTF-16, and my locale is set to > > en_US.UTF8? > > > > Anyway, I have the impression that most of Linux is utf-8 oriented by > > now. > > Well, yes, but that's the external representation. > I'd say to take a look at how python managed to integrate unicode support: > > http://www.google.com/search?domains=www.python.org&sitesearch=www.python.o >rg&sourceid=google-search&q=unicode&submit=search > They have a UTF-16/UCS-2 internal representation, same as MSEgui which works very well and is fast and handy BTW. What is missing is a reference counted widestring type on Windows. ;-) Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal