Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>:
>
> Nothing arived here. Can't you just say if it worked or not?

Crap!  See the email below, I removed the screenshot for now.  Under
Linux, I could see and query the directory and file, while my system
as setup for English. Again, no idea how to do this test under Windows
without Russian locale support.


-- Forwarded message --
From: Graeme Geldenhuys <[EMAIL PROTECTED]>
Date: 2008/6/30
Subject: Re: [fpc-pascal] Unicode filenames
To: FPC-Pascal users discussions 


2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>:
>
> If you create a file with a russian name on your hard disk, can you use
> fpgFileExists to check for its existence?

I don't know how to change the Windows language to Russian, but I do
under Linux. So I did the following. I changed my Linux systems locale
for an application to Russian. I had Russian translations of fpGUI
Toolkit so used those. I copied the rsCancel resourcestring value (in
Russian) to a Edit component. Copied that to clipboard, used the File
Open dialog to 'create directory' and pasted the Russian word for
Cancel in their.  Now I had a Russian directory on the hard drive.  I
quit the program, changed back to English locale. Loaded the program,
and it displayed the Russian directory correctly. I could also query a
russian file inside the russian directory.

PS:
Even Linux's terminal didn't display the russian directory correcty,
though Nautilus did.
The terminal showed a whole bunch of '?' instead. In fpGUI it
looked fine.

[ screenshot removed ]

Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>:
>
> Even if it doesn't contain Russian locale, you would be able to create such
> files in the windows explorer to create such file, for example by copy /
> pasting the file name while renaming it.
>
> Then let your fpGui program check for its existence.

Under Linux I simply change the LANG environment variable and place
the fpgui_ru.po file in the directory, and it uses the Russian
language. I use GetText like done in Lazarus. How to I change the
language locale in Windows, so I can use the Russian translaction of
fpGUI?


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gaertner
On Mon, 30 Jun 2008 09:04:21 +0200
"Graeme Geldenhuys" <[EMAIL PROTECTED]> wrote:

> 2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>:
> >
> > Even if it doesn't contain Russian locale, you would be able to
> > create such files in the windows explorer to create such file, for
> > example by copy / pasting the file name while renaming it.
> >
> > Then let your fpGui program check for its existence.
> 
> Under Linux I simply change the LANG environment variable and place
> the fpgui_ru.po file in the directory, and it uses the Russian
> language. I use GetText like done in Lazarus. How to I change the
> language locale in Windows, so I can use the Russian translaction of
> fpGUI?

Can you test to create a chinese filename in the explorer. For example:

北方话.txt

Then check if the file exists. This should work independent of the
locale.
With Utf8ToAnsi this can not work on 1-byte locales.


Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>:
>
> If you create a file with a russian name on your hard disk, can you use
> fpgFileExists to check for its existence?

I don't know how to change the Windows language to Russian, but I do
under Linux. So I did the following. I changed my Linux systems locale
for an application to Russian. I had Russian translations of fpGUI
Toolkit so used those. I copied the rsCancel resourcestring value (in
Russian) to a Edit component. Copied that to clipboard, used the File
Open dialog to 'create directory' and pasted the Russian word for
Cancel in their.  Now I had a Russian directory on the hard drive.  I
quit the program, changed back to English locale. Loaded the program,
and it displayed the Russian directory correctly.

PS:
 Even Linux's terminal didn't display the russian directory correcty.
A whole bunch of '?' instead. fpGUI worked fine! ;-)

See attached screenshot. Application using English locale
(en_ZA.UTF-8) as can be seen by the grid headers and displaying a
Russian directory name.


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
<>___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gaertner
On Mon, 30 Jun 2008 01:03:38 +0200
"Graeme Geldenhuys" <[EMAIL PROTECTED]> wrote:

> 2008/6/29 Vincent Snijders <[EMAIL PROTECTED]>:
> >
> > If you create a file with a russian name on your hard disk, can you
> > use fpgFileExists to check for its existence?
> 
> I don't know how to change the Windows language to Russian, but I do
> under Linux. So I did the following. I changed my Linux systems locale
> for an application to Russian. I had Russian translations of fpGUI
> Toolkit so used those. I copied the rsCancel resourcestring value (in
> Russian) to a Edit component. Copied that to clipboard, used the File
> Open dialog to 'create directory' and pasted the Russian word for
> Cancel in their.  Now I had a Russian directory on the hard drive.  I
> quit the program, changed back to English locale. Loaded the program,
> and it displayed the Russian directory correctly.
> 
> PS:
>  Even Linux's terminal didn't display the russian directory correcty.
> A whole bunch of '?' instead. fpGUI worked fine! ;-)

How did you change to russian locale?

 
> See attached screenshot. Application using English locale
> (en_ZA.UTF-8) as can be seen by the grid headers and displaying a
> Russian directory name.

Under Linux you use Result := aString; - no conversion. Still UTF-8.

Linux is not the problem here. Windows is. You must use the W functions.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:
>>  Even Linux's terminal didn't display the russian directory correcty.
>> A whole bunch of '?' instead. fpGUI worked fine! ;-)
>
> How did you change to russian locale?

By setting the LANG environment variable.
  export LANG=ru.UTF-8

>
> Linux is not the problem here. Windows is. You must use the W functions.

We do use the W functions in the GDI backend. As show in Felipe's post.

  if UnicodeEnabledOS then
...   // W functions
  else
  // plain functions


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:
>
> Can you test to create a chinese filename in the explorer. For example:
>
> 北方话.txt
>
> Then check if the file exists. This should work independent of the
> locale.
> With Utf8ToAnsi this can not work on 1-byte locales.


OK, I'll copy and paste that filename from the email into explorer.
Then see what happens. I'll reply with the results.


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Vincent Snijders

Graeme Geldenhuys schreef:

2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:

 Even Linux's terminal didn't display the russian directory correcty.
A whole bunch of '?' instead. fpGUI worked fine! ;-)

How did you change to russian locale?


By setting the LANG environment variable.
  export LANG=ru.UTF-8


Linux is not the problem here. Windows is. You must use the W functions.


We do use the W functions in the GDI backend. As show in Felipe's post.

  if UnicodeEnabledOS then
...   // W functions
  else
  // plain functions


But for filenames in fpgFileExists?

Vincent
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:
>
> Can you test to create a chinese filename in the explorer. For example:
>
> 北方话.txt

I can't create that file in Windows Explorer. I'm using Windows 2000
in a VMWare session. I opened explorer. Then copied the filename above
to the clipboard from Linux (where Firefox and GMail runs).  The then
go to Window VMWare session. Right Click and create now text file.
Past the chinese name, which then seems to escape the name. When I
press enter to confirm the name, Windows gives me an error.

See the screenshot.
  http://opensoft.homeip.net/~graemeg/test.png

PS:
I hope I got the Apache access permisions correct to my home folder.
Let me know if there is a problem accessing the image, then I'll move
it somewhere else on the web.


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>:
>
> But for filenames in fpgFileExists?

Sorry, yes the previous code was for things like painting text.


Here is the code for fpgFileExists

// platform indepenent
function fpgFileExists(const FileName: TfpgString): Boolean;
begin
  Result := FileExists(fpgToOSEncoding(FileName));
end;


// GDI dependent
function fpgToOSEncoding(aString: TfpgString): string;
begin
  Result := Utf8ToAnsi(aString);
end;

function fpgFromOSEncoding(aString: string): TfpgString;
begin
  Result := AnsiToUtf8(aString);
end;


No idea if this is enough for all cases (I'm not a unicode guru), but
Vladimir (russian developer) reported that it works for him.


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>:

> 2008/6/30 Vincent Snijders <[EMAIL PROTECTED]>:
> >
> > But for filenames in fpgFileExists?
>
> Sorry, yes the previous code was for things like painting text.
>
>
> Here is the code for fpgFileExists
>
> // platform indepenent
> function fpgFileExists(const FileName: TfpgString): Boolean;
> begin
>   Result := FileExists(fpgToOSEncoding(FileName));
> end;

ok


> // GDI dependent
> function fpgToOSEncoding(aString: TfpgString): string;
> begin
>   Result := Utf8ToAnsi(aString);
> end;

Not ok.
UTF8ToAnsi converts to the current windows 8bit code page. Each code page only
supports a few languages, but not all. See:

http://www.microsoft.com/globaldev/reference/WinCP.mspx

As you can see, the 1250 is for whole central europe, so french, spanish,
german, english - they all use the same codepage.
Russian needs another.
And afaik chinese is converted to DBCS. (I did not try)


> function fpgFromOSEncoding(aString: string): TfpgString;
> begin
>   Result := AnsiToUtf8(aString);
> end;

ok.
I wonder, what windows gives for characters not in the current code page.


> No idea if this is enough for all cases (I'm not a unicode guru), but
> Vladimir (russian developer) reported that it works for him.

Of course it works for him - MS keeps things going. As long as a user stays in
his code page it works.


Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Mattias Gärtner <[EMAIL PROTECTED]>:
>
>> function fpgFromOSEncoding(aString: string): TfpgString;
>> begin
>>   Result := AnsiToUtf8(aString);
>> end;
>
> ok.
> I wonder, what windows gives for characters not in the current code page.

Umm, no idea really.


>> No idea if this is enough for all cases (I'm not a unicode guru), but
>> Vladimir (russian developer) reported that it works for him.
>
> Of course it works for him - MS keeps things going. As long as a user stays in
> his code page it works.

Ah, now I get it!!  I'm a bit slow today.  :-)  Using characters
outside of the current code page range needs special attention.
Thanks for your explanation. I now understand the problem.

Makes me wonder how CodeGear is going to handle something like this.
Have they released more information on their Unicode support?


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Marco van de Voort
> 2008/6/30 Mattias G?rtner <[EMAIL PROTECTED]>:
> 
> >> No idea if this is enough for all cases (I'm not a unicode guru), but
> >> Vladimir (russian developer) reported that it works for him.
> >
> > Of course it works for him - MS keeps things going. As long as a user stays 
> > in
> > his code page it works.
> 
> Ah, now I get it!!  I'm a bit slow today.  :-)  Using characters
> outside of the current code page range needs special attention.
> Thanks for your explanation. I now understand the problem.
> 
> Makes me wonder how CodeGear is going to handle something like this.
> Have they released more information on their Unicode support?

They actually use unicode support, so NT only -W functions.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Vincent Snijders

Graeme Geldenhuys schreef:

2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:

Can you test to create a chinese filename in the explorer. For example:

北方话.txt


I can't create that file in Windows Explorer. I'm using Windows 2000
in a VMWare session. I opened explorer. Then copied the filename above
to the clipboard from Linux (where Firefox and GMail runs).  The then
go to Window VMWare session. Right Click and create now text file.
Past the chinese name, which then seems to escape the name. When I
press enter to confirm the name, Windows gives me an error.


I cannot create the chinese filename either, but I could create a russian 
filename:
пишет.txt

UTF8ToAnsi converts the russian characters to ?.

Vincent
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>:

> 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:
> >>  Even Linux's terminal didn't display the russian directory correcty.
> >> A whole bunch of '?' instead. fpGUI worked fine! ;-)
> >
> > How did you change to russian locale?
>
> By setting the LANG environment variable.
>   export LANG=ru.UTF-8

That's not a valid lang value.

export LANG=ru_RU.UTF-8

But the important part is the UTF-8. If you don't change this, then the terminal
will still use UTF-8 and the fonts will not change. If it shows '??' then the
filename is not a valid UTF-8 string or your terminal shows unsupported
characters as '??'. For example xterm shows rectangles for valid but
unsupported characters.


> > Linux is not the problem here. Windows is. You must use the W functions.
>
> We do use the W functions in the GDI backend. As show in Felipe's post.
>
>   if UnicodeEnabledOS then
> ...   // W functions
>   else
>   // plain functions

And the same must be done for filenames.

Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gärtner
Zitat von Graeme Geldenhuys <[EMAIL PROTECTED]>:

> 2008/6/30 Mattias Gaertner <[EMAIL PROTECTED]>:
> >
> > Can you test to create a chinese filename in the explorer. For example:
> >
> > ±±·½»°.txt
>
> I can't create that file in Windows Explorer. I'm using Windows 2000
> in a VMWare session. I opened explorer. Then copied the filename above
> to the clipboard from Linux (where Firefox and GMail runs).  The then
> go to Window VMWare session. Right Click and create now text file.
> Past the chinese name, which then seems to escape the name.

Maybe that's the way 'explorer' handles unsupported characters in the font.
Windows has still to go a long way for real unicode.


> When I
> press enter to confirm the name, Windows gives me an error.
>
> See the screenshot.
>   http://opensoft.homeip.net/~graemeg/test.png

Ok. Thanks for trying out. It's not that important. I think it's clear now, that
windows need the W functions.


> PS:
> I hope I got the Apache access permisions correct to my home folder.
> Let me know if there is a problem accessing the image, then I'll move
> it somewhere else on the web.

Works here.

Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gärtner
Zitat von Marco van de Voort <[EMAIL PROTECTED]>:

> > 2008/6/30 Mattias G?rtner <[EMAIL PROTECTED]>:
> >
> > >> No idea if this is enough for all cases (I'm not a unicode guru), but
> > >> Vladimir (russian developer) reported that it works for him.
> > >
> > > Of course it works for him - MS keeps things going. As long as a user
> stays in
> > > his code page it works.
> >
> > Ah, now I get it!!  I'm a bit slow today.  :-)  Using characters
> > outside of the current code page range needs special attention.
> > Thanks for your explanation. I now understand the problem.
> >
> > Makes me wonder how CodeGear is going to handle something like this.
> > Have they released more information on their Unicode support?
>
> They actually use unicode support, so NT only -W functions.

Do you know what happen with their plans of creating a new string type (a
reference counted 2-byte string) and changing all String and Char to the 2-byte
types? Is this only for .net or for the win32 too?


Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Marco van de Voort
> Zitat von Marco van de Voort <[EMAIL PROTECTED]>:
> > >
> > > Makes me wonder how CodeGear is going to handle something like this.
> > > Have they released more information on their Unicode support?
> >
> > They actually use unicode support, so NT only -W functions.
> 
> Do you know what happen with their plans of creating a new string type (a
> reference counted 2-byte string) and changing all String and Char to the 
> 2-byte
> types? 

It will be released in August.

> Is this only for .net or for the win32 too?

Both. (and afaik .NET already is), but they don't have an unix to keep track
of, so we can't simply follow them blindly.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Mattias Gärtner
Zitat von Marco van de Voort <[EMAIL PROTECTED]>:

> > Zitat von Marco van de Voort <[EMAIL PROTECTED]>:
> > > >
> > > > Makes me wonder how CodeGear is going to handle something like this.
> > > > Have they released more information on their Unicode support?
> > >
> > > They actually use unicode support, so NT only -W functions.
> >
> > Do you know what happen with their plans of creating a new string type (a
> > reference counted 2-byte string) and changing all String and Char to the
> 2-byte
> > types?
>
> It will be released in August.

If they are bold enough for this big incompatibility, then maybe we must be
braver too.


> > Is this only for .net or for the win32 too?
>
> Both. (and afaik .NET already is), but they don't have an unix to keep track
> of, so we can't simply follow them blindly.

True.
I guess, FPC will support the new string type and create some compiler
directives/flags to control the default string/char type, won't it?

And it will use the widestring manager to auto convert these types, won't it?

So FPC will be able compile the new Delphi code, although the code will run
slower and there will conversion errors on non ansi characters.


Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Marco van de Voort
> Zitat von Marco van de Voort <[EMAIL PROTECTED]>:
> > > types?
> >
> > It will be released in August.
> 
> If they are bold enough for this big incompatibility, then maybe we must be
> braver too.

What, also stop supporting anything but Windows?

> > Both. (and afaik .NET already is), but they don't have an unix to keep
> > track of, so we can't simply follow them blindly.
> 
> True.
> I guess, FPC will support the new string type and create some compiler
> directives/flags to control the default string/char type, won't it?

No decision yet. It is not really easy.
 
> So FPC will be able compile the new Delphi code, although the code will run
> slower and there will conversion errors on non ansi characters.

No commitment to remain Delphi compat either. Until we 1) know exactly what
Delphi does, 2) if we can unify that with a workable multi platform vision.
 
As you said, it is delphi that breaks in a major way. Doesn't mean we'll
follow that.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode filenames

2008-06-30 Thread Michael Van Canneyt


On Mon, 30 Jun 2008, Marco van de Voort wrote:

> > Zitat von Marco van de Voort <[EMAIL PROTECTED]>:
> > > > types?
> > >
> > > It will be released in August.
> > 
> > If they are bold enough for this big incompatibility, then maybe we must be
> > braver too.
> 
> What, also stop supporting anything but Windows?
> 
> > > Both. (and afaik .NET already is), but they don't have an unix to keep
> > > track of, so we can't simply follow them blindly.
> > 
> > True.
> > I guess, FPC will support the new string type and create some compiler
> > directives/flags to control the default string/char type, won't it?
> 
> No decision yet. It is not really easy.
>  
> > So FPC will be able compile the new Delphi code, although the code will run
> > slower and there will conversion errors on non ansi characters.
> 
> No commitment to remain Delphi compat either. Until we 1) know exactly what
> Delphi does, 2) if we can unify that with a workable multi platform vision.
>  
> As you said, it is delphi that breaks in a major way. Doesn't mean we'll
> follow that.

Given that most polls I've seen indicate that most people still use D7,
I don't think we should be too eager to jump into the void ourselves...

Of course a decision will be forced on us anyway in the long run :-)

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Felipe Monteiro de Carvalho
Hello,

There is already another thread about that, but the thread got too
long, and I would like to make a concrete proposal about unicode file
routines.

It looks simple to me, there are just 2 ways to go, either utf-8 or
utf-16. Correct me if I am wrong, but I beliave that FPC developers
prefer utf-16, so we can have a widestring version of every routine in
the RTL which involves filenames.

So let me start with a concrete example:

http://www.freepascal.org/docs-html/rtl/system/assign.html

We would need to add a:

procedure Assign(
  var f: ;
  const Name: widestring
);

Also for all this routines:

http://www.freepascal.org/docs-html/rtl/sysutils/filenameroutines.html

Under Windows it can be implemented like this with Windows 9x support:

procedure AnyFileRoutineInWin32(AFileName: widestring);
begin
 if UnicodeEnabledOS then SomeWin32APIW()
 else AnsiToWideString(SomeWin32ApiA())
end;

One can initialize UnicodeEnabledOS by reading the operating system
version and the operating system type NT/9x very easily.

Under Windows 9x we won't support true unicode filenames, but this
doesn't matter, because the operating system doesn't support them
anyway. The widestring routines will keep working under Windows 9x for
most code.

This method is used with great success in the LCL. Extended information here:

http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Guidelines

-- 
Felipe Monteiro de Carvalho
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


[fpc-pascal] UTF8String

2008-06-30 Thread Mattias Gärtner
Zitat von Marco van de Voort <[EMAIL PROTECTED]>:

> > Zitat von Marco van de Voort <[EMAIL PROTECTED]>:
> > > > types?
> > >
> > > It will be released in August.
> >
> > If they are bold enough for this big incompatibility, then maybe we must be
> > braver too.
>
> What, also stop supporting anything but Windows?

;)

I meant, break some compatibility too.

At the moment the RTL uses Ansi = current code page < unicode.
And so any filename that uses indirectly these functions is limited to the
current code page.

One solution is to create a new string type UTF8String.
Similar to AnsiString/WideString auto conversions, there would be an auto
conversion AnsiString/UTF8String, IFF there is an UTF-8 string manager.
The RTL and FCL can use it for filenames and similar things.
There should be some compiler flags to set the default string type to either
ansistring or utf8string.
All lazarus related sources would set the default string type to utf8 and would
load an UTF-8 string manager.

Drawbacks:
- If a UTF8 string manager is enabled, old code that uses ansistring would
double recode filenames under windows. From ansistring to UTF8 to widestring. I
guess in case of filenames this is hardly measurable.

- Mixing code compiled with ansistring and compiled with UTF8String will be
slower due to the conversions. Especially when exchanging big lists of string
like TStringList.Assign. Remedy: enable UTF8String and fix the few issues. This
might be arbitrary difficult and/or unpleasant. But hey, CodeGear says that even
switching from ansistring to widestring is easy. And of course you can still use
'ansistring' explicitly.

- Mixing code compiled with ansistring and compiled with UTF8String with dirty
typecasts will create hidden conversion errors. Remedy: Same as above.

- If the most common RTL/FCL functions are *not* converted to UTF8String then
all UTF8String programs will become very slow and might even get conversion
errors, due to typecasts. So, if RTL/FCL don't use it, no one will.


Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Mattias Gärtner
Zitat von Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>:

> Hello,
>
> There is already another thread about that, but the thread got too
> long, and I would like to make a concrete proposal about unicode file
> routines.
>
> It looks simple to me, there are just 2 ways to go, either utf-8 or
> utf-16. Correct me if I am wrong, but I beliave that FPC developers
> prefer utf-16, so we can have a widestring version of every routine in
> the RTL which involves filenames.
>
> So let me start with a concrete example:
>
> http://www.freepascal.org/docs-html/rtl/system/assign.html
>
> We would need to add a:
>
> procedure Assign(
>   var f: ;
>   const Name: widestring
> );
>
> Also for all this routines:
>
> http://www.freepascal.org/docs-html/rtl/sysutils/filenameroutines.html
>
> Under Windows it can be implemented like this with Windows 9x support:
>
> procedure AnyFileRoutineInWin32(AFileName: widestring);
> begin
>  if UnicodeEnabledOS then SomeWin32APIW()
>  else AnsiToWideString(SomeWin32ApiA())
> end;

But what about all existing code?
For example the FCL?
How will TStringList.LoadFromFile be converted?


> One can initialize UnicodeEnabledOS by reading the operating system
> version and the operating system type NT/9x very easily.
>
> Under Windows 9x we won't support true unicode filenames, but this
> doesn't matter, because the operating system doesn't support them
> anyway. The widestring routines will keep working under Windows 9x for
> most code.
>
> This method is used with great success in the LCL. Extended information here:
>
> http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Guidelines

Mattias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Marco van de Voort
> There is already another thread about that, but the thread got too
> long, and I would like to make a concrete proposal about unicode file
> routines.
> 
> It looks simple to me, there are just 2 ways to go, either utf-8 or
> utf-16. 

There are more possibilities:
- native encoding (utf-8 on *nix, utf-16 on windows)
- have two types.
- an unified type (type contains encoding)

Even this has not been decided.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Vincent Snijders

Marco van de Voort schreef:

It looks simple to me, there are just 2 ways to go, either utf-8 or
utf-16. 


There are more possibilities:
- native encoding (utf-8 on *nix, utf-16 on windows)
- have two types.


How can one write portable code with these options?


- an unified type (type contains encoding)



Vincent
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Graeme Geldenhuys
2008/6/30 Felipe Monteiro de Carvalho <[EMAIL PROTECTED]>:
> It looks simple to me, there are just 2 ways to go, either utf-8 or
> utf-16. Correct me if I am wrong, but I beliave that FPC developers
> prefer utf-16, so we can have a widestring version of every routine in
> the RTL which involves filenames.


I thought UTF-8 was prefered. Hence the reason Lazarus followed the
UTF-8 route in LCL and Unicode support.


Regards,
 - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Felipe Monteiro de Carvalho
On Mon, Jun 30, 2008 at 9:31 AM, Mattias Gärtner
<[EMAIL PROTECTED]> wrote:
> But what about all existing code?
> For example the FCL?
> How will TStringList.LoadFromFile be converted?

TStringList.LoadFromFile(AFileName: widestring); overload

The ansi version could call the wide version and just do the string conversion.

-- 
Felipe Monteiro de Carvalho
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Felipe Monteiro de Carvalho
On Mon, Jun 30, 2008 at 9:55 AM, Graeme Geldenhuys
<[EMAIL PROTECTED]> wrote:
> I thought UTF-8 was prefered. Hence the reason Lazarus followed the
> UTF-8 route in LCL and Unicode support.

UTF-8 is much better for the LCL because it just fits much better in
out existing codebase.

For the RTL we would also like to have UTF-8, but in previous
conversations I got the impression that RTL developers prefer UTF-16.

-- 
Felipe Monteiro de Carvalho
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Marco van de Voort
> Marco van de Voort schreef:
> >> It looks simple to me, there are just 2 ways to go, either utf-8 or
> >> utf-16. 
> > 
> > There are more possibilities:
> > - native encoding (utf-8 on *nix, utf-16 on windows)
> > - have two types.
> 
> How can one write portable code with these options?

How can you consider yourself portable by picking one systems encoding, and
emulating it on others?

Note also that reliance on encoding is way less important, since fewer
people will be parsing through strings manually (simply because it is more
difficult)
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Vincent Snijders

Marco van de Voort schreef:

Marco van de Voort schreef:

It looks simple to me, there are just 2 ways to go, either utf-8 or
utf-16. 

There are more possibilities:
- native encoding (utf-8 on *nix, utf-16 on windows)
- have two types.

How can one write portable code with these options?


How can you consider yourself portable by picking one systems encoding, and
emulating it on others?



At the borders of my I convert all strings to the 'internal type' and encoding and 
use it like that. Kind of like we are doing nowadays to convert the line-endings in 
text files.


I see what you are trying to say, but having a string type that is UTF8 encoded on 
one system and UTF16 encoded on another system, doesn't seem easy to work with to 
me, even if you name it for example RTLString. Even widestring is an example of bad 
portability, because they are refcounted everywhere except on windows.



Note also that reliance on encoding is way less important, since fewer
people will be parsing through strings manually (simply because it is more
difficult)


Right, but they rely on not having to convert it all the time.

ATM, all the client libs above the RTL have chosen one encoding, string type: LCL en 
 fpGUI: UTF8, MseGui: widestring


So for those libs to interface with a platform dependent string type in the LCL, 
they would have to write platform dependent code. I don't feel much like writing a 
LCLSysutils.FileExists, like Graham already has done, to hide these conversions.


Vincent
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Marco van de Voort
> Marco van de Voort schreef:
> At the borders of my I convert all strings to the 'internal type' and 
> encoding and 
> use it like that. Kind of like we are doing nowadays to convert the 
> line-endings in 
> text files.

I don't like this. This makes e.g. processing a database export on Unix
unnecessarily costly
 
> I see what you are trying to say, but having a string type that is UTF8
> encoded on one system and UTF16 encoded on another system, doesn't seem
> easy to work with to me, even if you name it for example RTLString. 

It should be possible to work in the native encoding. One doesn't want to
wrap _every_ function in _every_ header with conversions procs.

> > Note also that reliance on encoding is way less important, since fewer
> > people will be parsing through strings manually (simply because it is more
> > difficult)
> 
> Right, but they rely on not having to convert it all the time.

Well, they will have to do that with one string type too, at every external
barrier.

That also kills the benefit of choosing UTF-16 in the first place, since
Delphi code won't work on Unix without manually inserting a lot of
conversion code.
 
> ATM, all the client libs above the RTL have chosen one encoding, string type: 
> LCL en 
>   fpGUI: UTF8, MseGui: widestring

That has nothing to do with these decisions. They chose that in the absence
of a good solution. This is about picking a good solution.
 
> So for those libs to interface with a platform dependent string type in
> the LCL, they would have to write platform dependent code.

You will have to anyway for any solution that only supports one encoding.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Felipe Monteiro de Carvalho
On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <[EMAIL PROTECTED]> wrote:
> It should be possible to work in the native encoding. One doesn't want to
> wrap _every_ function in _every_ header with conversions procs.

It is not possible to work with a ever changing encoding.

MyLabel.Caption := 'Lição';

How would that ever work with a ever changing encoding? It would not.

If you go to the real implementation level a changing encoding quickly
becomes unmanagable.

And what about the LFM files? In which encoding will they be? What if
you develop a software in one system and tryes to build it in another?

Ok, to go one step further: Has anyone ever seen a fully unicode
system which works with changing encodings? I beliave there exists
none, because this is not a good solution.

> Well, they will have to do that with one string type too, at every external
> barrier.

This is already necessary.

> That also kills the benefit of choosing UTF-16 in the first place, since
> Delphi code won't work on Unix without manually inserting a lot of
> conversion code.

Delphi code can use the ansi routines, which could just call the
utf-16 routines with a string conversion, or you can implement every
routine twice to maximize speed.

-- 
Felipe Monteiro de Carvalho
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Marco van de Voort
> On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <[EMAIL PROTECTED]> 
> wrote:
> > It should be possible to work in the native encoding. One doesn't want to
> > wrap _every_ function in _every_ header with conversions procs.
> 
> It is not possible to work with a ever changing encoding.
> 
> MyLabel.Caption := 'Li??o';
> 
> How would that ever work with a ever changing encoding? It would not.

Encoding in source is something totally different. This is '\u1232\u2314'
like syntax can be changed to utf8/16 by the compiler. In theory I think,
practice might be else.
 
> If you go to the real implementation level a changing encoding quickly
> becomes unmanagable.

That's why I don't believe the one string type two encoding helps. But if
fileexists is utf-8 on unix and utf-16 on windows, and any utf-16 or UTF-8
string that you pass from Lazarus is auto converted, what is the exact
problem? Everybody can maintain certain subsystems in a certain encoding,
but doesn't force that choice upon others.

> And what about the LFM files? In which encoding will they be?

The one you annotate in it? The loading code can decode both, since both
systems have both ?

> What if you develop a software in one system and tryes to build it in
> another?

What does that mean for the fully UTF-16 system? First you may start with
wrapping all C api's that use utf-8 on Unix. 

I understand the simplicity of one encoding is appealing, but you have to
look at all aspects, and that is not just representation in the GUI.

It will mean that _every_ string transactie to the outside will have to be
manually wrapped AND have a performance penalty. That is a heavy price to
pay for not touching a bit of lfm loading code.
 
> Ok, to go one step further: Has anyone ever seen a fully unicode
> system which works with changing encodings? I beliave there exists
> none, because this is not a good solution.

How many systems do you know have datafiles of like .lfm's over system
borders?
 
> > Well, they will have to do that with one string type too, at every
> > external barrier.
> 
> This is already necessary.

But if you properly type them, some conversions maybe automatic. Something
you don't have with a single type.
 
> > That also kills the benefit of choosing UTF-16 in the first place, since
> > Delphi code won't work on Unix without manually inserting a lot of
> > conversion code.
> 
> Delphi code can use the ansi routines, which could just call the
> utf-16 routines with a string conversion, or you can implement every
> routine twice to maximize speed.

If the unicode code is not compatible with Delphi (UTF-16), there is no
point in using UTf-16 in the first place.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread John Coppens
On Mon, 30 Jun 2008 10:03:18 -0300
"Felipe Monteiro de Carvalho" <[EMAIL PROTECTED]> wrote:

> On Mon, Jun 30, 2008 at 9:55 AM, Graeme Geldenhuys
> <[EMAIL PROTECTED]> wrote:
> > I thought UTF-8 was prefered. Hence the reason Lazarus followed the
> > UTF-8 route in LCL and Unicode support.
> 
> UTF-8 is much better for the LCL because it just fits much better in
> out existing codebase.

This may have been discussed before - but should the encoding not be
dependent on the locale? What would happen if I write a FPC program,
if the internal routines are, eg., UTF-16, and my locale is set to
en_US.UTF8?

Anyway, I have the impression that most of Linux is utf-8 oriented by now.

John
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Felipe Monteiro de Carvalho
On Mon, Jun 30, 2008 at 11:35 AM, Marco van de Voort <[EMAIL PROTECTED]> wrote:
> I understand the simplicity of one encoding is appealing, but you have to
> look at all aspects, and that is not just representation in the GUI.
>
> It will mean that _every_ string transactie to the outside will have to be
> manually wrapped AND have a performance penalty. That is a heavy price to
> pay for not touching a bit of lfm loading code.

It won't need to be wrapped in platforms which nativelly support the
choosen encoding. UTF-16 is natively supported in Windows and Windows
CE. Not sure on unixes.

Because LCL uses a single encoding this performance difference
disappears as soon as you need to convert the string in LCL.

> How many systems do you know have datafiles of like .lfm's over system
> borders?

Gtk can load XML files, somewhat equivalent to our LFMs. They use
UTF-8 everywhere.

Java is cross-platform and uses UTF-16 everywhere.

wxWidgets uses UTF-16 everywhere.

Let me try to sumarize my oppinion on multiple encodings vs single encoding:

multiple encodings:

* More complex
* Innovative solution, no known example of a implementation of this
system exists = uncertainty if it works at all, or if it is convenient
for developers
* Depends on a not yet implemented string type
* Potentially will have a higher performance then a single encoding
system, but only if you use this new special string type

Single encoding:

* Simple, proved solution
* Does not need any new string type, can start being implemented immediately
* Potentially has a lower performance due to string conversions.

Actually for Lazarus the only advantage I see in the multiple encoding
system does not exist, because we use a single encoding system in some
platforms we will need conversion and in others we won't need, which
just makes things worse for us.

-- 
Felipe Monteiro de Carvalho
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Luca Olivetti

En/na John Coppens ha escrit:


This may have been discussed before - but should the encoding not be
dependent on the locale? What would happen if I write a FPC program,
if the internal routines are, eg., UTF-16, and my locale is set to
en_US.UTF8?

Anyway, I have the impression that most of Linux is utf-8 oriented by now.


Well, yes, but that's the external representation.
I'd say to take a look at how python managed to integrate unicode support:

http://www.google.com/search?domains=www.python.org&sitesearch=www.python.org&sourceid=google-search&q=unicode&submit=search

Bye
--
Luca

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Unicode file routines proposal

2008-06-30 Thread Martin Schreiber
On Monday 30 June 2008 22.19:49 Luca Olivetti wrote:
> En/na John Coppens ha escrit:
> > This may have been discussed before - but should the encoding not be
> > dependent on the locale? What would happen if I write a FPC program,
> > if the internal routines are, eg., UTF-16, and my locale is set to
> > en_US.UTF8?
> >
> > Anyway, I have the impression that most of Linux is utf-8 oriented by
> > now.
>
> Well, yes, but that's the external representation.
> I'd say to take a look at how python managed to integrate unicode support:
>
> http://www.google.com/search?domains=www.python.org&sitesearch=www.python.o
>rg&sourceid=google-search&q=unicode&submit=search
>
They have a UTF-16/UCS-2 internal representation, same as MSEgui which works 
very well and is fast and handy BTW.
What is missing is a reference counted widestring type on Windows. ;-)

Martin
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal