Re: [fpc-pascal] fpc/lazarus and kinect sample

2016-05-11 Thread Björn Lundin
On 2016-05-10 23:41, wkitt...@windstream.net wrote:
> On 05/10/2016 04:33 PM, Björn Lundin wrote:
>> I found the nice article "Programming the Microsoft Kinect in Pascal" by
>> Michaël Van Canneyt, in Blaise Pascal Magazine. (Nr 5 / 2013)
>>
>> However, I wonder if anyone have seen the sample code
>> somewhere published on the web ?
>> I do not get it all quite together as is,
>> and copy/paste from a 2 columns pdf is challenging to say the least.
> 
> doesn't your pdf reader have an "export as text" option? that's what i
> use when i have to convert pdfs for some blind friends so they can enjoy
> them, too...
> 

I did try pdftotext, but it made a mess the article
The two columns where merged into one, intertwined with each other.

My current reader, Adobe (win 7), does not have an export to text
option. It does have a export to word/excel/rtf/ppt option,
but only if you have a paid account at adobe.

However, I'm not quite sure I'll get the sample correct,
even if I do manage copy/paste.

But if it not available, then it isn't...


--
Björn
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Warning not to use the "String" type with FPC 3.x

2016-05-11 Thread Mark Morgan Lloyd

Mark Morgan Lloyd wrote:

Graeme Geldenhuys wrote:
On 2016-05-09 17:40, Mark Morgan Lloyd wrote:> What, /exactly/, are 



So which of these are you complaining about:

a) AnsiString doesn't support codepoints > 0xff ?

b) AnsiString doesn't support codepoints > 0x7f ?

c) AnsiString might apply an inappropriate translation for a codepoint 
<= 0x7f ?


Come on Graham, I'm waiting for your explicit statement here rather than 
this continual FUD that you seem to enjoy injecting.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread LacaK





How the sqldb package handles this point?


sqlDB does not perform any character translation.
Only stores data in record buffers as they arrive.
So it expects, that programmer is aware of that and sets correct 
"connection encoding".
In case of Lazarus it is often UTF-8, because Lazarus expects that 
character data are UTF-8 encoded (at least it was so).
So user programmer must set connection encoding to UTF-8 then data 
arrive utf-8 encoded and sqlDB only stores them and forwards them to for 
example data-aware controls for displaying.

-Laco.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] fpc/lazarus and kinect sample

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Björn Lundin wrote:


On 2016-05-10 23:41, wkitt...@windstream.net wrote:

On 05/10/2016 04:33 PM, Björn Lundin wrote:

I found the nice article "Programming the Microsoft Kinect in Pascal" by
Michaël Van Canneyt, in Blaise Pascal Magazine. (Nr 5 / 2013)

However, I wonder if anyone have seen the sample code
somewhere published on the web ?
I do not get it all quite together as is,
and copy/paste from a 2 columns pdf is challenging to say the least.


doesn't your pdf reader have an "export as text" option? that's what i
use when i have to convert pdfs for some blind friends so they can enjoy
them, too...



I did try pdftotext, but it made a mess the article
The two columns where merged into one, intertwined with each other.

My current reader, Adobe (win 7), does not have an export to text
option. It does have a export to word/excel/rtf/ppt option,
but only if you have a paid account at adobe.

However, I'm not quite sure I'll get the sample correct,
even if I do manage copy/paste.

But if it not available, then it isn't...


No worries. I will send it to you, if I can still find it :-)

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Jonas Maebe

Mazo Winst wrote:

Suppose that my app needs to read a file encoded with UTF-8. Suppose
that my app runs on Windows, where the system codepage is most likely to
be Windows ANSI. As RTL will use the system codepage, Windows ANSI
doesn't support the full range of unicode chars and need to use RTL to
read the file, what should i do to prevent data loss?


If by reading you mean read/readln, then you can use 
http://www.freepascal.org/docs-html/rtl/system/settextcodepage.html to 
specify to the RTL what the encoding is of the text file you are reading.


In other cases, like LacaK said, you will have to read the data as plain 
bytes into e.g. a RawByteString and next use 
http://www.freepascal.org/docs-html/rtl/system/setcodepage.html (with 
the last parameter set to "false") to afterwards specify the code page 
this data has.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Invoking methods through rtti

2016-05-11 Thread Michael Schnell

On 05/09/2016 05:36 PM, Sven Barth wrote:


As long as you know the signature of the method you can cast the 
pointer to the method to an approbiate method variable and call that. 
For dynamic calls you'll need to wait for Invoke() support which is on 
the ToDo list, but there's no ETA yet.


Will this result in a new kind of Pascal Script support ? (In the German 
Forum there recently had been discussions on Pascal script not working 
decently on 64 Bit archs.)


-Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] The world is ending

2016-05-11 Thread Michael Schnell

On 05/09/2016 09:34 PM, Jonas Maebe wrote:
While still missing in the documentation, you can already do that with 
{$modeswitch unicodestrings}.


If this avoids the issues, Graeme found, why is this not enabled as 
default, as well in the user code as in the RTL interface ?


-Michael

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Marco van de Voort
In our previous episode, Florian Kl?mpfl said:
> > whatever...
> 
> If it depends on classes: fcl-generic, else RTL-generic.

Some rtl-objpas units use classes, like fmtbcd.

If you don't use fcl-base I would make it RTL, so we can use it in fcl-base
if needed :-)
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] {$modeswitch unicodestring} (was: Re: The world is ending)

2016-05-11 Thread Jonas Maebe


Michael Schnell wrote on Tue, 10 May 2016:


On 05/09/2016 09:34 PM, Jonas Maebe wrote:
While still missing in the documentation, you can already do that  
with {$modeswitch unicodestrings}.


If this avoids the issues,


It does not avoid "the issues". All that modeswitch does is make  
"String" an alias for "Unicodestring" in the declaring module.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] {$modeswitch unicodestring}

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 09:58, Jonas Maebe wrote:
> It does not avoid "the issues". All that modeswitch does is make  
> "String" an alias for "Unicodestring" in the declaring module.

Am I correct in understanding that it only affects individual units. So
using it in my application, it would not affect the meaning of String in
the RTL for instance (String in the RTL still means
AnsiString() )?

Regards,
  Graeme

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] {$modeswitch unicodestring}

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


On 2016-05-11 09:58, Jonas Maebe wrote:

It does not avoid "the issues". All that modeswitch does is make
"String" an alias for "Unicodestring" in the declaring module.


Am I correct in understanding that it only affects individual units. So
using it in my application, it would not affect the meaning of String in
the RTL for instance (String in the RTL still means
AnsiString() )?


Indeed, just like any other {$mode xxx} and {$modeswitch xxx}  
directive. I don't think we have any directive or switch that can  
cross compilation unit boundaries.


Technically, it would also be very hard to make e.g. {$modeswitch  
unicodestring} affect other units that you use. After all, the source  
code for those other units may not be available, and they would have  
to be recompiled if the meaning of "string" changed in their source  
code.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] The world is ending

2016-05-11 Thread Michael Van Canneyt



On Tue, 10 May 2016, Michael Schnell wrote:


On 05/09/2016 09:34 PM, Jonas Maebe wrote:
While still missing in the documentation, you can already do that with 
{$modeswitch unicodestrings}.


If this avoids the issues, Graeme found, why is this not enabled as default, 
as well in the user code as in the RTL interface ?


Several reasons:

1. Backwards compatibility. 
2. We'd need to check all code. Much of the code assumes char = 1 byte.

   That includes all other code besides RTL: Database code, web code etc.
3. It increases memory use.
   All of a sudden, all your programs will use lots more memory.
   Maybe not important on a server, but it is e.g. in embedded.
4. For Delphi, the choice was clear: Windows is using UTF16, so all APIs are
   better off using UT16. Under unix-like OSes, UTF8 is a better choice
   IMHO.

There is no simple answer...

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 09:21, Jonas Maebe wrote:
> In other cases, like LacaK said, you will have to read the data as plain 
> bytes into e.g. a RawByteString and next use 
> http://www.freepascal.org/docs-html/rtl/system/setcodepage.html (with 
> the last parameter set to "false") to afterwards specify the code page 
> this data has.

But this is where I'm getting a bit confused too.

The RTL and FCL uses String data type predominantly.
  eg: TField.AsString: String.

The RTL and FCL uses String (AnsiString) with default encoding set to Auto.

In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.

Then I read the field value into my application. ie: Latin-1 -> UTF-16

The problem as I see it, is that I already lost data when SqlDB
converted it to Latin-1. Am I not understanding the problem?

I checked the FPC 3.x db.pas unit. It uses {$mode objfpc}{$H+} - it
doesn't use UnicodeString and neither does in use RawByteString. So a
text encoding conversion to AnsiString(latin-1) [based on my example] is
going to happen, right?

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


marcov wrote on Tue, 10 May 2016:


In our previous episode, Jonas Maebe said:

> ExecuteProcess is in trunk since late march.
>
> It is fairly self contained and could be merged to 3.0.2 technically.

I mentioned that it is in trunk, but it's incomplete: it does not
translate the command line arguments to the code page of the
environment in which the child is started


It assumes for windows utf16, and for Unix filesystem encoding (e.g.
unix.pp:254).

If you have more info, please share.


The filesystem encoding is fine to find the binary on Unix, but the  
parameters you specify to that binary will be interpreted by that same  
binary once it's running. It will not interpret those parameters  
according to the filesystem encoding, but according to whatever is  
determined to be the DefaultSystemCodePage *by that executed binary*.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Marco van de Voort
In our previous episode, Jonas Maebe said:
> >
> > It assumes for windows utf16, and for Unix filesystem encoding (e.g.
> > unix.pp:254).
> >
> > If you have more info, please share.
> 
> The filesystem encoding is fine to find the binary on Unix, but the  
> parameters you specify to that binary will be interpreted by that same  
> binary once it's running. It will not interpret those parameters  
> according to the filesystem encoding, but according to whatever is  
> determined to be the DefaultSystemCodePage *by that executed binary*.

I do understand that, and it will be fine most of the time.

I'm not sure what you are suggesting as alternative. Saving the encoding at
startup, so it can be reused in cases like this, in case somebody change
defaultsystemcodepage?

Or using defaultsystemcodepage instead of filesystem encoding?
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] {$modeswitch unicodestring}

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 10:06, Jonas Maebe wrote:
> Indeed, just like any other {$mode xxx} and {$modeswitch xxx}  
> directive.

Thank you. That’s what I thought. I just wanted to double check.

Regards,
  Graeme

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Invoking methods through rtti

2016-05-11 Thread Sven Barth
Am 11.05.2016 10:22 schrieb "Michael Schnell" :
>
> On 05/09/2016 05:36 PM, Sven Barth wrote:
>>
>>
>> As long as you know the signature of the method you can cast the pointer
to the method to an approbiate method variable and call that. For dynamic
calls you'll need to wait for Invoke() support which is on the ToDo list,
but there's no ETA yet.
>>
> Will this result in a new kind of Pascal Script support ? (In the German
Forum there recently had been discussions on Pascal script not working
decently on 64 Bit archs.)

Don't know as I'm not using PascalScript, but potentially it could be.

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Marco van de Voort wrote:


In our previous episode, Jonas Maebe said:


It assumes for windows utf16, and for Unix filesystem encoding (e.g.
unix.pp:254).

If you have more info, please share.


The filesystem encoding is fine to find the binary on Unix, but the
parameters you specify to that binary will be interpreted by that same
binary once it's running. It will not interpret those parameters
according to the filesystem encoding, but according to whatever is
determined to be the DefaultSystemCodePage *by that executed binary*.


I do understand that, and it will be fine most of the time.

I'm not sure what you are suggesting as alternative. Saving the encoding at
startup, so it can be reused in cases like this, in case somebody change
defaultsystemcodepage?

Or using defaultsystemcodepage instead of filesystem encoding?


Why not simply make the exec calls use RawByteString ?

It is then the responsability of the programmer. 
99,99% of cases it will get passed the correct system encoding, and if need

be the programmer can change it.

We should not fall in the trap of overengineering.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.


This depends on how sqlDB is implemented, and I have absolutely no  
clue about that (other than what LacaK wrote).


As mentioned at  
http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page ,  
conversions on assignment only happen when the *declared* code page of  
the target string is different from that of the source string (other  
than the special case for RawByteString). So if sqlDB only uses plain  
String with {$h+} and/or AnsiString, then no conversions will happen  
anywhere in the scenario you describe since it will just assign  
ansistrings with declared code page CP_ACP to each other.



Then I read the field value into my application. ie: Latin-1 -> UTF-16


If sqlDB correctly sets the dynamic codepage of the strings it creates  
via SetCodePage(x,CP_UTF8,false), then when you assign those strings  
with declared codepage = CP_ACP and dynamic code page CP_UTF8 to your  
unicodestrings, they will be converted from UTF-8 to UTF-16 at that  
point.


If it does not set the dynamic code page of the strings it creates to  
the appropriate encoding, then you will indeed get data corruption at  
this point, because the UTF-8 encoded data will be interpreted as  
Latin-1 and then be "converted" to UTF-16.


For dealing with such code, which is not yet codepage-aware, by  
default the situation is no worse or no better than it was in previous  
FPC versions: exactly the same would happen there. However, in FPC 3.x  
you can generally fix it by changing the default code page for  
ansistrings using SetMultiByteConversionCodePage() to what you  
know/want to be the encoding of ansistrings, like Lazarus does.


All of this is moreover completely independent of {$modeswitch  
unicodestrings}, since that is just a shortcut to make String an alias  
for UnicodeString in the current compilation module (and Char for  
WideChar, and PChar for PWideChar).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Tony Whyman


On 11/05/16 10:18, Graeme Geldenhuys wrote:

In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.


Now this is what interests me as well - in the context of IBX if nothing 
else.


It was news to me yesterday that FPC now stores page code information 
with AnsiStrings and while IBX still works OK with FPC 3.0.0, it should 
work better with this new facility. The IBX code here comes from years 
ago and is:



function TIBStringField.GetValue(var Value: string): Boolean;
var
  Buffer: PChar;
begin
  Buffer := nil;
  IBAlloc(Buffer, 0, Size + 1);
  try
Result := GetData(Buffer);
if Result then
begin
  Value := string(Buffer);
  if Transliterate and (Value <> '') then
DataSet.Translate(PChar(Value), PChar(Value), False);
end
  finally
FreeMem(Buffer);
  end;
end; 
Note the really nasty coercion that comes after the call to 
TField.GetData (which is common to all DB Drivers)  - GetData returns 
untyped data into a buffer. DataSet.Translate is a no-op, and I was 
never sure what purpose it has - if anything.


To make this code play properly with the new AnsiString, it looks like I 
should revise this to (e.g. for utf-8 fields)


  Value := string(Buffer);
  SetCodePage(Value,cp_UTF8,false);
  ...

The outgoing side has a similar problem e.g.


procedure TIBStringField.SetAsString(const Value: string);
var
  Buffer: PChar;
begin
  Buffer := nil;
  IBAlloc(Buffer, 0, Size + 1);
  try
StrLCopy(Buffer, PChar(Value), Size);
if Transliterate then
  DataSet.Translate(Buffer, Buffer, True);
SetData(Buffer);
  finally
FreeMem(Buffer);
  end;
end; 


This probably needs a

SetCodePage(Value,cp_UTF8,true);

before the StrLCopy.

Anyone know if this is a correct interpretation of the AnsiString 
codepage facility?

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
> > In other cases, like LacaK said, you will have to read the data as plain 
> > bytes into e.g. a RawByteString and next use 
> > http://www.freepascal.org/docs-html/rtl/system/setcodepage.html (with 
> > the last parameter set to "false") to afterwards specify the code page 
> > this data has.
> 
> But this is where I'm getting a bit confused too.
> 
> The RTL and FCL uses String data type predominantly.
>   eg: TField.AsString: String.

String is not a type, but an alias, that is key. So any definition is as how
string is defined when it was compiled. (short/ansi/unicodestring)

> The RTL and FCL uses String (AnsiString) with default encoding set to Auto.

To the default encoding, which is the only runtime variable one, and the
base type that is used as.  So in Orwellian speak ansistring(0) is more
equal then the other ansistring()'s.

> In my application I enable unicodestring mode. So I'm reading data from
> a Firebird database. The data is stored as UTF-8 in a VarChar field. The
> DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
> up with a default encoding of Latin-1.
> 
> So I read the UTF-8 data from the database, somewhere inside the SqlDB
> code it gets assigned to a TField's String property. ie: UTF-8 ->
> Latin-1 conversion.

Then it is basically equal to 2.6.x, and old Delphi. You are on your own and
must handle conversions yourself and be careful to not mutilate your utf8
content.

> Then I read the field value into my application. ie: Latin-1 -> UTF-16

Yes, you must also handle that conversion manually (either by moving the
character dat to an utf8 typed string and then assigning, or by a manual
encoding routine that basically takes an adress and disregards the codepage
info)

> The problem as I see it, is that I already lost data when SqlDB
> converted it to Latin-1. Am I not understanding the problem?

It depends. Sqldb assigned non ansistring data to an ansistring. In the old
(2.6.4, old delphi) logic it would simply move without conversion, and you
would obtain an ansistring with utf8 in it and be converting forever.

Nothing changed there, except your expectations :-)
 
> I checked the FPC 3.x db.pas unit. It uses {$mode objfpc}{$H+} - it
> doesn't use UnicodeString and neither does in use RawByteString. So a
> text encoding conversion to AnsiString(latin-1) [based on my example] is
> going to happen, right?

Yes. As said many times before, the parts above RTL level have been kept
working, but not changed.
 
So basically the only viable cases are the utf16 D2009+ model. (for Windows,
but works elsewhere too) and the utf8 as default (which needs to be hacked
for systems that don't default to utf8 as one byte conversion).

Both have advantages and disadvantages (and the utf8 ones are not as big as
many people think. They confuse utf8 as dominant document encoding with
apis).

But in the end the choice is simple IMHO. One is delphi compatible, one not.
Period.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Marco van de Voort wrote:


In our previous episode, Jonas Maebe said:


The filesystem encoding is fine to find the binary on Unix, but the
parameters you specify to that binary will be interpreted by that same
binary once it's running. It will not interpret those parameters
according to the filesystem encoding, but according to whatever is
determined to be the DefaultSystemCodePage *by that executed binary*.


I do understand that, and it will be fine most of the time.

I'm not sure what you are suggesting as alternative. Saving the encoding at
startup, so it can be reused in cases like this, in case somebody change
defaultsystemcodepage?


We have to look up the code page to use in the current program's  
environment at the time fpexec*() is called (or, in case the  
environment is specified explicitly, in that environment)



Why not simply make the exec calls use RawByteString ?


They will use RawByteString in any case, because we don't want the  
caller's string arguments to be converted before the routine is  
called, for two reasons:

1) a plain "ansistring" argument could result in data loss
2) we will have to convert the data afterwards to the correct code  
page anyway, so adding a potential extra conversion by declaring the  
argument as e.g. utf8string would serve no purpose (in the worst case  
it would introduce an additional, useless, conversion; in the best  
case it would move the conversion from the callee to the caller side)



It is then the responsability of the programmer.


It is the responsibility of the programmer when they call the pchar  
version of those routines. The string versions of all other fp*()  
routines also convert the arguments to the appropriate code page  
required by their purpose.



We should not fall in the trap of overengineering.


We should not fall in the trap of going for an inconsistent,  
buggy-by-design solution because doing it correctly is not trivial.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Marco van de Voort
In our previous episode, Michael Van Canneyt said:
> > I'm not sure what you are suggesting as alternative. Saving the encoding at
> > startup, so it can be reused in cases like this, in case somebody change
> > defaultsystemcodepage?
> >
> > Or using defaultsystemcodepage instead of filesystem encoding?
> 
> Why not simply make the exec calls use RawByteString ?
> 
> It is then the responsability of the programmer. 
> 99,99% of cases it will get passed the correct system encoding, and if need
> be the programmer can change it.

I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the old
practices to rawbytestring.
 
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread LacaK



In other cases, like LacaK said, you will have to read the data as plain
bytes into e.g. a RawByteString and next use
http://www.freepascal.org/docs-html/rtl/system/setcodepage.html (with
the last parameter set to "false") to afterwards specify the code page
this data has.

But this is where I'm getting a bit confused too.

The RTL and FCL uses String data type predominantly.
   eg: TField.AsString: String.

The RTL and FCL uses String (AnsiString) with default encoding set to Auto.

In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.

IMO this does not happen.
Because sqlDB provides only pointers to field buffers where "sql 
connector" stores data which receives from server.
DB unit only allocates memory of given size and then provides pointer to 
that memory, where data are stored.
(may be that somewhere popups any issue, for now I still use FPC 2.6.4 
so I can not say more about FPC 3.0.0)

-Laco.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Graeme Geldenhuys wrote on Wed, 11 May 2016:


In my application I enable unicodestring mode. So I'm reading data from
a Firebird database. The data is stored as UTF-8 in a VarChar field. The
DB connection is set up as UTF-8.  Now lets assume my FreeBSD box is set
up with a default encoding of Latin-1.

So I read the UTF-8 data from the database, somewhere inside the SqlDB
code it gets assigned to a TField's String property. ie: UTF-8 ->
Latin-1 conversion.


This depends on how sqlDB is implemented, and I have absolutely no clue about 
that (other than what LacaK wrote).


As mentioned at 
http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page , 
conversions on assignment only happen when the *declared* code page of the 
target string is different from that of the source string (other than the 
special case for RawByteString). So if sqlDB only uses plain String with 
{$h+} and/or AnsiString, then no conversions will happen anywhere in the 
scenario you describe since it will just assign ansistrings with declared 
code page CP_ACP to each other.


This is the case.




Then I read the field value into my application. ie: Latin-1 -> UTF-16


If sqlDB correctly sets the dynamic codepage of the strings it creates via 
SetCodePage(x,CP_UTF8,false), then when you assign those strings with 
declared codepage = CP_ACP and dynamic code page CP_UTF8 to your 
unicodestrings, they will be converted from UTF-8 to UTF-16 at that point.


It does not do this.



If it does not set the dynamic code page of the strings it creates to the 
appropriate encoding, then you will indeed get data corruption at this point, 
because the UTF-8 encoded data will be interpreted as Latin-1 and then be 
"converted" to UTF-16.


That is what happens.

Currently, the ONLY provision that is made is that, if SQLDB detects somehow 
that the
server uses UTF8, it will use an ansistring, allocate 4 bytes in the buffers 
for each
character.

But it currently does not set the code page of the allocated string to UTF8.

For dealing with such code, which is not yet codepage-aware, by default the 
situation is no worse or no better than it was in previous FPC versions: 
exactly the same would happen there. However, in FPC 3.x you can generally 
fix it by changing the default code page for ansistrings using 
SetMultiByteConversionCodePage() to what you know/want to be the encoding of 
ansistrings, like Lazarus does.


If Lazarus already sets SetMultiByteConversionCodePage, then it will wreak
havoc to set it to something else.

This matter must be decided at the TDataset level: it should have a property
to determine the character set of string fields (and possibly different for
each field, since this can differ in the database on a field basis).



All of this is moreover completely independent of {$modeswitch 
unicodestrings}, since that is just a shortcut to make String an alias for 
UnicodeString in the current compilation module (and Char for WideChar, and 
PChar for PWideChar).


Honestly, I don't understand this preoccupation with {$modeswitch  
unicodestrings}

It just means that

Var
 a : string;

is read by the compiler as

Var
 a : unicodestring;

No more, no less.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Marco van de Voort wrote:


In our previous episode, Michael Van Canneyt said:

I'm not sure what you are suggesting as alternative. Saving the encoding at
startup, so it can be reused in cases like this, in case somebody change
defaultsystemcodepage?

Or using defaultsystemcodepage instead of filesystem encoding?


Why not simply make the exec calls use RawByteString ?

It is then the responsability of the programmer.
99,99% of cases it will get passed the correct system encoding, and if need
be the programmer can change it.


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the old
practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.

Only the programmer knows what the receiving program will do (or should), 
so he must take it into account. Hence rawbytestring.


Don't pamper the programmer so much. He needs to make correct decisions.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Marco van de Voort wrote:


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the old
practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.


It will either interpret them according to the code page specified by  
the environment, or not perform any interpretation at all. In both  
cases, converting the arguments to the code page of the environment is  
the right thing to do.


Only the programmer knows what the receiving program will do (or  
should), so he must take it into account. Hence rawbytestring.


If it is an FPC program, then if it is compiled with FPC 2.6.x it will  
not reinterpret the command line arguments at all, while when compiled  
with FPC 3.x it will reinterpret the command line arguments. The  
caller of such a program should not have to worry with which version  
of FPC it has been compiled.



Don't pamper the programmer so much. He needs to make correct decisions.


They have to correctly set or specify the environment for the program  
they are executing when they want to force a specific encoding for its  
command line arguments.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Marco van de Voort wrote:


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the 
old

practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.


It will either interpret them according to the code page specified by the 
environment, or not perform any interpretation at all. In both cases, 
converting the arguments to the code page of the environment is the right 
thing to do.


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer that does so)

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 10:48, Michael Van Canneyt wrote:
> Honestly, I don't understand this preoccupation with {$modeswitch  
> unicodestrings}
> 
> It just means that
> 
> Var
>   a : string;
> 
> is read by the compiler as
> 
> Var
>   a : unicodestring;
> 
> No more, no less.


It saves you from data loss in the case where you use units that use the
String data type and assign Unicode data to it -- and you run your
program on a system where the locale is not UTF-8 or UTF-16. eg: Latin-1.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] fpc/lazarus and kinect sample

2016-05-11 Thread Björn Lundin
On 2016-05-11 10:12, Michael Van Canneyt wrote:

> 
> No worries. I will send it to you, if I can still find it :-)
> 
> Michael.

Thanks, I'd really appreciate that


--
Björn
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Marco van de Voort wrote:


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just  
rebadge the old

practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.


It will either interpret them according to the code page specified  
by the environment, or not perform any interpretation at all. In  
both cases, converting the arguments to the code page of the  
environment is the right thing to do.


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer that does so)


The caller can work around such bugs by either
a) using the pchar version of fpexec, or
b) specifying the code page that this target program uses in the  
environment used to invoke it



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Maciej Izak
2016-05-11 10:22 GMT+02:00 Marco van de Voort :

> Some rtl-objpas units use classes, like fmtbcd.
>
> If you don't use fcl-base I would make it RTL, so we can use it in fcl-base
> if needed :-)


Library is low level without using anything outside RTL, IMO it should be
part of RTL like in Delphi.

Note that I need the library as part of RTL not as package. Generics.*
contains _LookupVtableInfo/_LookupVtableInfoEx and few other things which
are used for my next compiler work...

Generics.* as package will complicate all my plans -,- . If it can't be
part of RTL, more comfortable for me is excluding whole library from FPC
(easier maintenance of FreePascal fork...)

-- 
Best regards,
Maciej Izak
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 10:43, Marco van de Voort wrote:
>> The problem as I see it, is that I already lost data when SqlDB
>> converted it to Latin-1. Am I not understanding the problem?
> 
> It depends. Sqldb assigned non ansistring data to an ansistring. In the old
> (2.6.4, old delphi) logic it would simply move without conversion, and you
> would obtain an ansistring with utf8 in it and be converting forever.

Correct, and because 2.6.4 did no conversions I can accurately assume in
my application that an AnsiString contains UTF-8 encoded data, and work
with it appropriately. This is how fpGUI and LCL has been working for
many years.

But now with 3.0.0, auto-conversion occurs inside the RTL and FCL code,
corrupting the data before I can get to it.

That's a massive difference between 2.6.4 and 3.x
As it stands now, I cannot see how anybody can actually switch to FPC
3.0 - it simply isn't ready to be used.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Marco van de Voort wrote:


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the 
old

practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.


It will either interpret them according to the code page specified by the 
environment, or not perform any interpretation at all. In both cases, 
converting the arguments to the code page of the environment is the right 
thing to do.


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer that does so)


The caller can work around such bugs by either
a) using the pchar version of fpexec, or
b) specifying the code page that this target program uses in the environment 
used to invoke it


a) obviously
b) As said, the target program completely ignores the environment.

I was just trying to point out that while your solution is undoubtedly correct
in the large majority of cases (let's assume 99,99%), it is not a rock-hard 
guarantee.

IMHO, in this large majority of cases, using RawByteString will be correct as well, 
since chances that the calling program uses a different codepage from the 
called one are very small.  (in casu: UTF8)


I am not saying that attempting to convert to the code page used by the
environment is wrong (except maybe in an small minority as pointed out), 
but I do think it is overengineering.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer that does so)


The caller can work around such bugs by either
a) using the pchar version of fpexec, or
b) specifying the code page that this target program uses in the  
environment used to invoke it


a) obviously
b) As said, the target program completely ignores the environment.


b) is exactly why you have to specify the code page that this *target  
program* uses in the environment when executing it, so that the  
invoking FPC program will convert the parameters to this code page.


I was just trying to point out that while your solution is  
undoubtedly correct
in the large majority of cases (let's assume 99,99%), it is not a  
rock-hard guarantee.


I never claimed it was. I only said it is the only possible correct  
behaviour. It obviously cannot fix other broken programs, although as  
explained it is sufficiently flexible to deal with them.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer that does 
so)


The caller can work around such bugs by either
a) using the pchar version of fpexec, or
b) specifying the code page that this target program uses in the 
environment used to invoke it


a) obviously
b) As said, the target program completely ignores the environment.


b) is exactly why you have to specify the code page that this *target 
program* uses in the environment when executing it, so that the invoking FPC 
program will convert the parameters to this code page.


You are now assuming that this is possible. This may not be the case.

I was just trying to point out that while your solution is undoubtedly 
correct
in the large majority of cases (let's assume 99,99%), it is not a rock-hard 
guarantee.


I never claimed it was. I only said it is the only possible correct 
behaviour.


We clearly have different understandings of the words 'correct behaviour' then 
:-)

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Maciej Izak wrote:


2016-05-11 10:22 GMT+02:00 Marco van de Voort :


Some rtl-objpas units use classes, like fmtbcd.

If you don't use fcl-base I would make it RTL, so we can use it in fcl-base
if needed :-)



Library is low level without using anything outside RTL, IMO it should be
part of RTL like in Delphi.

Note that I need the library as part of RTL not as package. Generics.*
contains _LookupVtableInfo/_LookupVtableInfoEx and few other things which
are used for my next compiler work...


Anything the compiler needs *must* be in the system unit. The compiler
should only assume the system unit, possibly objpas or macpas or so.

All the rest should remain out of the RTL, which should be as small as
possible. So rtl-generics is your best bet. Even the classes unit is better
outside the rtl, but I think Marco is reluctant to remove it.

I have remarked on this before: this tight dependency you are creating 
is very worrying.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


On 2016-05-11 10:48, Michael Van Canneyt wrote:

Honestly, I don't understand this preoccupation with {$modeswitch  
unicodestrings}

It just means that

Var
  a : string;

is read by the compiler as

Var
  a : unicodestring;

No more, no less.



It saves you from data loss in the case where you use units that use the
String data type and assign Unicode data to it -- and you run your
program on a system where the locale is not UTF-8 or UTF-16. eg: Latin-1.


No, it does not save you, where did you get that from ?

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] RTL and Unicode Strings

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 12:48, Michael Van Canneyt wrote:
> No, it does not save you, where did you get that from ?

It helps. Any encoding to UTF-16 (or UTF-8) is safe. The other way round
is not. There is no guarantee that String (or AnsiString) is using a
Unicode encoding. So depending on where you get your data from, in my
case that data is one of the Unicode encodings, doing a conversion to
anything other than another Unicode encoded variable (or RawByteString)
means I could loose data.

See my actual database example (with sample code) titled "code example
where AnsiString used in FCL (SqlDB) causes data loss" - whenever a
moderator releases that post to the mailing list. Otherwise I can
forward it to you in private.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


And in the case it makes an assumption of the code page, regardless of
environment variables ?

(don't say that doesn't happen. It does, I know a programmer  
that does so)


The caller can work around such bugs by either
a) using the pchar version of fpexec, or
b) specifying the code page that this target program uses in the  
environment used to invoke it


a) obviously
b) As said, the target program completely ignores the environment.


b) is exactly why you have to specify the code page that this  
*target program* uses in the environment when executing it, so that  
the invoking FPC program will convert the parameters to this code  
page.


You are now assuming that this is possible. This may not be the case.


Why would it not be possible? Please be more concrete, because right  
now I feel like I'm arguing against my own imagination, which is not  
very useful.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
Hi,

Here is an example [proof if you will] of the problem. I wrote a small
test program that reads data from a Firebird database where the database
and field charset is set to UTF8.

I compile the program, then run it. No recompiles between the two runs.
The first run my system is set to have a UTF-8 locale. The second run is
where I set my system to have a ISO8859-1 (Latin-1) locale. The program
outputs the DefaultSystemCodePage to the console.

Because the locale changes the behaviour of String (aka AnsiString) in
the RTL and FCL, the first run works, but the second run corrupts my data.

Console output:

[unicode_test]$ export LANG=en_US.UTF-8
[unicode_test]$ ./unicodetest
65001

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest
28591


In my test program I write the data read from the database to a file
using TFileStream, thus console and file encoding settings will not
affect the data being written to file. TFileStream is simply writing bytes.

The “locale_utf8.png” screenshots shows the actual data in the database
on the left, and the read (and saved to a file “output.data”) data on
the right

Compiled with 64-bit FPC 3.0.1 (updated yesterday) on my FreeBSD 10.3
system. Firebird v2.5.4 is being used. I can supply a backup of the test
Firebird database too if needed - it is small.


I'm honestly trying very hard to understand the string changes
implemented in FPC 3.x, and the best way to use it going forward. In
this example I tried everything I learned from the recent mailing list
discussions. My concern with the usage of String/AnsiString still
stands, as this test program shows.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
program project1;

{$mode objfpc}
{$H+}
{$modeswitch unicodestrings} // this makes String = UnicodeString

 // use UTF8String instead of String
{$DEFINE u8}

uses
  cwstring,
  classes,
  sysutils,
  db,
  sqldb,
  IBConnection;

const
  cBOM = #$EF#$BB#$BF;

var
  FDatabase: TIBConnection;
  FTransaction: TSQLTransaction;
  FQuery: TSQLQuery;
  f: TFileStream;
  u: {$IFDEF u8} UTF8String {$ELSE} String {$ENDIF};
begin
  writeln(DefaultSystemCodePage);
  FDatabase := TIBConnection.Create(nil);
  FDatabase.Dialect := 3;
  FDatabase.LoginPrompt := False;
  FDatabase.CharSet := 'UTF8';
  FDatabase.DatabaseName := '192.168.0.2:/data/devel/data/unicode_test.fdb';
  FDatabase.UserName := 'sysdba';
  FDatabase.Password := 'masterkey';

  FTransaction := TSQLTransaction.Create(nil);
  FDatabase.Transaction := FTransaction;

  FQuery := TSQLQuery.Create(nil);
  FQuery.DataBase := FDatabase;
  FQuery.SQL.Text := 'SELECT DESCRIPTION, UNIVALUE FROM UNICODE'; // where Description = ''Ligatures''';

  FDatabase.Connected := True;
  FQuery.Open;

  f := TFileStream.Create('output.data', fmCreate);
  f.Write(cBOM[1], 3);

  FQuery.First;
  while not FQuery.EOF do
  begin
// field one
u := FQuery.FieldByName('DESCRIPTION').AsString;
f.write(u[1], Length(u));
f.WriteByte(10); // new line
// field two
u := FQuery.FieldByName('UNIVALUE').AsString;
f.write(u[1], Length(u));
f.WriteByte(10); // new line
f.WriteByte(10); // new line to separate records
FQuery.Next;
  end;
  f.Free;

  FDatabase.Connected := False;

  FQuery.Free;
  FTransaction.Free;
  FDatabase.Free;
end.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


I'm honestly trying very hard to understand the string changes
implemented in FPC 3.x, and the best way to use it going forward. In
this example I tried everything I learned from the recent mailing list
discussions. My concern with the usage of String/AnsiString still
stands, as this test program shows.


Your concern is with utf8string, not with string/ansistring. If you  
only use string/ansistring/unicodestring, then the behaviour of your  
program will be identical with FPC 2.6.4 and 3.0. With utf8string, the  
result is different in FPC 3.0 because now, just like when assigning  
an ansistring to a unicodestring, if you assign an ansistring to a  
utf8string the compiler will insert a code page conversion if necessary.


So yes: if you use utf8string, then your code may behave differently.  
It's not due to code page conversions in the RLT or FCL though (which  
is what you claimed before), but due to code page conversions in your  
own code.


I also think utf8string is the only such case where behaviour is  
different. Ideally, we should not have introduced that type before FPC  
3.0. Of course, hindsight is 20/20.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 12:03, Graeme Geldenhuys wrote:
> Console output:
> 
> [unicode_test]$ export LANG=en_US.UTF-8
> [unicode_test]$ ./unicodetest
> 65001
> 
> [unicode_test]$ export LANG=en_US.ISO8859-1
> [unicode_test]$ ./unicodetest
> 28591

Just to add, compiling that test program with FPC 2.6.4 I get the
correct output in output.data, no matter what my locale setting is.
That's what I meant by the fact that I can accurately assume AnsiString
contains a UTF-8 payload (because I'm reading UTF-8 data), and that the
RTL and FCL did not make any encoding conversions.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


Hi,

Here is an example [proof if you will] of the problem. I wrote a small
test program that reads data from a Firebird database where the database
and field charset is set to UTF8.

I compile the program, then run it. No recompiles between the two runs.
The first run my system is set to have a UTF-8 locale. The second run is
where I set my system to have a ISO8859-1 (Latin-1) locale. The program
outputs the DefaultSystemCodePage to the console.

Because the locale changes the behaviour of String (aka AnsiString) in
the RTL and FCL, the first run works, but the second run corrupts my data.

Console output:

[unicode_test]$ export LANG=en_US.UTF-8
[unicode_test]$ ./unicodetest
65001

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest
28591


In my test program I write the data read from the database to a file
using TFileStream, thus console and file encoding settings will not
affect the data being written to file. TFileStream is simply writing bytes.


But what does your program prove ?

You're only proving that a conversion happens when you do
s := fieldByName('somefield').asString;
and that the conversion takes into account the locale, which in one of the 2
runs is different from the actual locale data in the database.

This conversion is as-designed, and known to be wrong in the case of TField.AsString, 
but will not be solved by simply using {$modeswitch unicodestring} in the database code.


AFAIK 3.0 is no different in this matter from 2.6.4, Jonas can confirm/deny. 
Unlike 2.6.4, 3.0.0 offers us the possibility to fix it by allowing to specify 
the codepage in TField. This is not yet implemented, however.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Marco van de Voort
In our previous episode, Michael Van Canneyt said:
> >
> > I don't like that. The 3.x idea is to get rid of manual conversions and
> > hack-and-convert-it-as-you-go encoding management, not just rebadge the old
> > practices to rawbytestring.
> 
> You may not like it, but there is simply no other choice:
> 
> Since you don't know what the receiving program does with the receives
> arguments, all attempts to guess it are erroneous by definition.
> 
> Only the programmer knows what the receiving program will do (or should), 
> so he must take it into account. Hence rawbytestring.
 
> Don't pamper the programmer so much. He needs to make correct decisions.

You are right, if that scenario was the common one. Note that Jonas hasn't
named a scenario where it actually happens, so keeping it all manual for one
basket case is IMHO not an option.

(and even then better served by a "no translation" boolean than forgetting
about automatic conversions alltogether)
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Maciej Izak
2016-05-11 13:46 GMT+02:00 Michael Van Canneyt :

> Anything the compiler needs *must* be in the system unit. The compiler
> should only assume the system unit, possibly objpas or macpas or so.
>
> All the rest should remain out of the RTL, which should be as small as
> possible. So rtl-generics is your best bet. Even the classes unit is better
> outside the rtl, but I think Marco is reluctant to remove it.
>
> I have remarked on this before: this tight dependency you are creating is
> very worrying.


FPC team is very selective and with double standards and that is very
worrying. For example fgl module is part of RTL just because is usefully
for Sven for testing purposes (! that is curious). There is no reason in
keeping that module in RTL but is in RTL because we have double, selective
and irrational standards. Generics.* is absolute other category of module
with base and complex support for *any* stuff related to generics + has
better testing purposes than fgl.

Please just don't add Generics.* into FPC as package. :/

-- 
Best regards,
Maciej Izak
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:05, Jonas Maebe wrote:
> 
> Your concern is with utf8string, not with string/ansistring.

UTF8String is a AnsiString with utf-8 code page set.

 If you
> only use string/ansistring/unicodestring, then the behaviour of your  
> program will be identical with FPC 2.6.4 and 3.0. With utf8string, the  
> result is different in FPC 3.0 because now, just like when assigning  

No it's not. I welcome you to try the program yourself. The test program
includes a $DEFINE where I can toggle between using String or
UTF8String. Simply disable that define at the top of the unit.

I just double checked my results again. With u: String variable and
compiled with FPC 3.0 and running in a Latin-1 environment, data is
completely corrupted.

[unicode_test]$ export LANG=en_US.ISO8859-1
[unicode_test]$ ./unicodetest

And see the attached screenshot for the result of the data.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Marco van de Voort
In our previous episode, Maciej Izak said:
> > Some rtl-objpas units use classes, like fmtbcd.
> >
> > If you don't use fcl-base I would make it RTL, so we can use it in fcl-base
> > if needed :-)
> 
> 
> Library is low level without using anything outside RTL, IMO it should be
> part of RTL like in Delphi.

Delphi's decisions are irrelevant. (and they regroup across directories
every other version too)

There is no reason to compile these reasonably large units 3 times. The only
reason to keep them in the base RTL is because units there would need it (like a
unicode classes version)
 
> Note that I need the library as part of RTL not as package. Generics.*
> contains _LookupVtableInfo/_LookupVtableInfoEx and few other things which
> are used for my next compiler work...

Then consider spinning that out to a lower level unit. It is not a reason to
stuff everything and the kitchen sink in the RTL.

> Generics.* as package will complicate all my plans -,- . If it can't be
> part of RTL, more comfortable for me is excluding whole library from FPC
> (easier maintenance of FreePascal fork...)

In forks you are free to modify and move whatever you want. Usually in cases
like this the developer doesn't want to use the more conservative version
packaged with FPC, but one with a quick feedback with own changes.

And isolated in a package they are more easily disabled (edit one fpmake)
then in the RTL (edit all targets makefiles)
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Marco van de Voort wrote:


In our previous episode, Michael Van Canneyt said:


I don't like that. The 3.x idea is to get rid of manual conversions and
hack-and-convert-it-as-you-go encoding management, not just rebadge the old
practices to rawbytestring.


You may not like it, but there is simply no other choice:

Since you don't know what the receiving program does with the receives
arguments, all attempts to guess it are erroneous by definition.

Only the programmer knows what the receiving program will do (or should),
so he must take it into account. Hence rawbytestring.



Don't pamper the programmer so much. He needs to make correct decisions.


You are right, if that scenario was the common one. Note that Jonas hasn't
named a scenario where it actually happens, so keeping it all manual for one
basket case is IMHO not an option.



I honestly don't understand this argument: in 99,99% of all cases, the code 
will just
work if you use RawByteString, since everything will be in the codepage of
the environment anyway ?

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Maciej Izak wrote:


2016-05-11 13:46 GMT+02:00 Michael Van Canneyt :


Anything the compiler needs *must* be in the system unit. The compiler
should only assume the system unit, possibly objpas or macpas or so.

All the rest should remain out of the RTL, which should be as small as
possible. So rtl-generics is your best bet. Even the classes unit is better
outside the rtl, but I think Marco is reluctant to remove it.

I have remarked on this before: this tight dependency you are creating is
very worrying.



FPC team is very selective and with double standards and that is very
worrying.


There are no double standards ?


For example fgl module is part of RTL just because is usefully
for Sven for testing purposes (! that is curious).


Where is that written ?  As far as I know, it is only there because classes
is there, and classes has some define to allow it to be compiled with fgl.


There is no reason in keeping that module in RTL


As said, I think classes may be moved as well, and then fgl may be moved as
well.


but is in RTL because we have double, selective
and irrational standards.


No, this is not so. See above.

The rtl needs to contain whatever is necessary to compile the compiler.

No more, no less.


Generics.* is absolute other category of module
with base and complex support for *any* stuff related to generics + has
better testing purposes than fgl.

Please just don't add Generics.* into FPC as package. :/


Nevertheless, that is what is going to happen.

There should not be double standards.
If the consequence is that the fgl unit should move as well: No problem with 
that.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:07, Michael Van Canneyt wrote:
> 
> But what does your program prove ?

See below...

> You're only proving that a conversion happens when you do
> s := fieldByName('somefield').asString;

I'm proving that using the String type anywhere in the RTL and FCL is
now terrible. If the FPC team did a search and replace (String ->
UnicodeString) all over the RTL and FCL, then such data corrupts
probably would not have occurred because UnicodeString is not affected
by the running environment. Probably the exact reason why Delphi now
using String = UnicodeString on all platforms.

> AFAIK 3.0 is no different in this matter from 2.6.4, Jonas can confirm/deny. 
> Unlike 2.6.4, 3.0.0 offers us the possibility to fix it by allowing to specify

See my reply to Jonas. There is a massive difference between FPC 2.6.4
and 3.0.0 using the exact same program and test environment.

I can't see how anybody can currently switch to FPC 3.0.0 - it simply
isn't ready for prime usage. As my test shows, you can't simply
recompile your application with FPC 3.0.0 and think it is going to work
like it did in FPC 2.6.4 - it doesn't.

Yes some parts in FPC 3.0 are now in place going forward, but there is
still too much that can go wrong (in the RTL and FCL) due to the
dynamically changing AnsiString type being used everywhere.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


On 2016-05-11 13:05, Jonas Maebe wrote:


Your concern is with utf8string, not with string/ansistring.
UTF8String is a AnsiString with utf-8 code page set.
If you
only use string/ansistring/unicodestring, then the behaviour of your
program will be identical with FPC 2.6.4 and 3.0. With utf8string, the
result is different in FPC 3.0 because now, just like when assigning


No it's not. I welcome you to try the program yourself. The test program
includes a $DEFINE where I can toggle between using String or
UTF8String. Simply disable that define at the top of the unit.


That's because you have {$modeswitch unicodestring}, so  
string=unicodestring. If you change the string to unicodestring (since  
FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get  
the same results in FPC 2.6.4 and FPC 3.x (since then they are also  
actually using the same string type).


I can't easily try myself as I have no database server whatsoever  
running, nor any experience with setting them up.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Marco van de Voort
In our previous episode, Michael Van Canneyt said:
> > For example fgl module is part of RTL just because is usefully
> > for Sven for testing purposes (! that is curious).
> 
> Where is that written ?  As far as I know, it is only there because classes
> is there, and classes has some define to allow it to be compiled with fgl.

(I don't know if that is the reason, but probably there was not much
discussion about it because it is small)
 
> > There is no reason in keeping that module in RTL
> 
> As said, I think classes may be moved as well, and then fgl may be moved as
> well.

Indeed, this is only a working division, and nothing is set in stone, but
the objective is to avoid letting the rtl get too big.

Originally that was because of speed reasons (packages/ can be built in
parallel and rtl is compiled many times), but in retrospect packages/ is
much easier to administer, and it makes life a bit easier for new porters.

> > Generics.* is absolute other category of module
> > with base and complex support for *any* stuff related to generics + has
> > better testing purposes than fgl.
> >
> > Please just don't add Generics.* into FPC as package. :/
> 
> Nevertheless, that is what is going to happen.

With the current situation: imho yes. And even if there are changes they
will be more directed to move more units out of the RTL rather than move
than back.
 
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread LacaK






I just double checked my results again. With u: String variable and
compiled with FPC 3.0 and running in a Latin-1 environment, data is
completely corrupted.

It will be good to know where this happens.
Because AFAIK fcl-db internaly uses AnsiString/String so assigning 
between them should not trigger any code page conversion.
So if you fetch UTF-8 data from database and then you move them between 
various string instances, they should be preserved.

(no matter than ACP of String is Latin1)
So in end when you save this data to file they should still be UTF-8 
encoded ?

Can you dump binary content of "u" before is saved to file ?
-Laco.

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread LacaK






That's because you have {$modeswitch unicodestring}, so 
string=unicodestring.

This is answer to my question :-)

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


On 2016-05-11 13:07, Michael Van Canneyt wrote:


But what does your program prove ?


See below...


You're only proving that a conversion happens when you do
s := fieldByName('somefield').asString;


I'm proving that using the String type anywhere in the RTL and FCL is
now terrible. If the FPC team did a search and replace (String ->
UnicodeString) all over the RTL and FCL, then such data corrupts
probably would not have occurred because UnicodeString is not affected
by the running environment. Probably the exact reason why Delphi now
using String = UnicodeString on all platforms.


It would not help if we did this: the data would be wrong in the TDataset
buffers, and the result would be worse.

I agree that the situation is currently not ideal: TString.AsString is a
problem, but it's not nowhere near as bad as you make it out to be.

You just need to know what conversions happen where, and if you do it 
works just fine.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 12:03, Graeme Geldenhuys wrote:
> I wrote a small
> test program that reads data from a Firebird database where the database
> and field charset is set to UTF8.

For those that want to try the sample application, a backup of the
database (3.7MB in size) can be found at:

  http://geldenhuys.co.uk/~graemeg/temp/unicode_test.fbk


Regards,
  Graeme


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Maciej Izak
2016-05-11 14:23 GMT+02:00 Michael Van Canneyt :

> Where is that written ?  As far as I know, it is only there because classes
> is there, and classes has some define to allow it to be compiled with fgl.
>
>
by Sven (29 January 2016 10:54 thread "Generics.Collections as package for
Lazarus or package for FPC RTL"):

"also fgl is a nice test durog cycling the compiler that nothing basic was
broken with generics; one of the main reason it's still in rtl and not
rtl-objpas or rtl-extra"

 keeping module in RTL just to have nice test to check compiler cycle is
... very very very strange.


> There should not be double standards.
> If the consequence is that the fgl unit should move as well: No problem
> with that.


If we will keep right order then I have no problem with Generics.* as
rtl-generics package, and I can realize my plans with compiler in other,
more correct way.

-- 
Best regards,
Maciej Izak
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Maciej Izak wrote:


2016-05-11 14:23 GMT+02:00 Michael Van Canneyt :


Where is that written ?  As far as I know, it is only there because classes
is there, and classes has some define to allow it to be compiled with fgl.



by Sven (29 January 2016 10:54 thread "Generics.Collections as package for
Lazarus or package for FPC RTL"):

"also fgl is a nice test durog cycling the compiler that nothing basic was
broken with generics; one of the main reason it's still in rtl and not
rtl-objpas or rtl-extra"


Well I missed that, but I think this argument will be overruled, it is
rather a weak one.



keeping module in RTL just to have nice test to check compiler cycle is
... very very very strange.


I agree. And hence I think we will move it.


There should not be double standards.
If the consequence is that the fgl unit should move as well: No problem
with that.



If we will keep right order then I have no problem with Generics.* as
rtl-generics package, and I can realize my plans with compiler in other,
more correct way.


See the reply by Marco. We'll be moving units out of RTL, rather than in.
That includes fgl and classes (better late than never ;) ).

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:37, Michael Van Canneyt wrote:
> 
> It would not help if we did this: the data would be wrong in the TDataset
> buffers, and the result would be worse.

I didn't mean literally search and replace - that would simply be too
easy. ;-) Some work and testing would be required, otherwise Delphi
would have had Unicode support much sooner.


> You just need to know what conversions happen where, and if you do it 
> works just fine.

If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Sven Barth
Am 11.05.2016 14:42 schrieb "Maciej Izak" :
>
> 2016-05-11 14:23 GMT+02:00 Michael Van Canneyt :
>>
>> Where is that written ?  As far as I know, it is only there because
classes
>> is there, and classes has some define to allow it to be compiled with
fgl.
>>
>
> by Sven (29 January 2016 10:54 thread "Generics.Collections as package
for Lazarus or package for FPC RTL"):
>
> "also fgl is a nice test durog cycling the compiler that nothing basic
was broken with generics; one of the main reason it's still in rtl and not
rtl-objpas or rtl-extra"
>
>  keeping module in RTL just to have nice test to check compiler cycle is
... very very very strange.

And I stand by that decision. Generics are a rather frickle feature and I
want to know of critical failures as early as possible (and yes, I've made
use of that already numerous times!), thus I prefer fgl to be part of the
cycling and unlike fcl-stl or these Delphi compatible ones the fgl unit is
comparatively small.

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 13:27, Jonas Maebe wrote:
> If you change the string to unicodestring (since  
> FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get  
> the same results in FPC 2.6.4 and FPC 3.x

No, because FPC 2.6.4 doesn't do automatic encoding conversions. I would
first have to add UTF8Decode() calls wherever I assign known UTF-8 data
to a UnicodeString.

With FPC 2.6.4 I never use UnicodeString. Like with Lazarus LCL, I use
AnsiString with a UTF-8 payload. I define a new type which I use in my
application to remind me of that fact.

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Maciej Izak
2016-05-11 14:51 GMT+02:00 Sven Barth :

> And I stand by that decision. Generics are a rather frickle feature and I
> want to know of critical failures as early as possible (and yes, I've made
> use of that already numerous times!), thus I prefer fgl to be part of the
> cycling and unlike fcl-stl or these Delphi compatible ones the fgl unit is
> comparatively small.
>
And that is double standard. I prefer Generics.* to be part of the compiler
cycle. It perform better test for critical failures in my compiler code and
for more advanced generics code. It is not only for Delphi compatibility it
performs infrastructure to *any* generics library like Spring4D etc.

Sorry Sven but keeping fgl in RTL is ridiculous (!). There is no technical
reason for that. Only your personal convenience. You can use normal tests
suite like others.

Or maybe Marco reply is untrue?

-- 
Best regards,
Maciej Izak
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Sven Barth wrote:


Am 11.05.2016 14:42 schrieb "Maciej Izak" :


2016-05-11 14:23 GMT+02:00 Michael Van Canneyt :


Where is that written ?  As far as I know, it is only there because

classes

is there, and classes has some define to allow it to be compiled with

fgl.




by Sven (29 January 2016 10:54 thread "Generics.Collections as package

for Lazarus or package for FPC RTL"):


"also fgl is a nice test durog cycling the compiler that nothing basic

was broken with generics; one of the main reason it's still in rtl and not
rtl-objpas or rtl-extra"


 keeping module in RTL just to have nice test to check compiler cycle is

... very very very strange.

And I stand by that decision. Generics are a rather frickle feature and I
want to know of critical failures as early as possible (and yes, I've made
use of that already numerous times!), thus I prefer fgl to be part of the
cycling and unlike fcl-stl or these Delphi compatible ones the fgl unit is
comparatively small.


Sven, you must run the testsuite for this. 
So do Jonas and Florian. That is what it is for.


I have no doubt that generics are frickle. So are many other features.
So this is in its core a lazyness argument, which I think is not very correct.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Graeme Geldenhuys wrote:


On 2016-05-11 13:37, Michael Van Canneyt wrote:


It would not help if we did this: the data would be wrong in the TDataset
buffers, and the result would be worse.


I didn't mean literally search and replace - that would simply be too
easy. ;-) Some work and testing would be required, otherwise Delphi
would have had Unicode support much sooner.



You just need to know what conversions happen where, and if you do it
works just fine.


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory. 
As I said: TField.AsString is a problem, we are aware of it.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe

Graeme Geldenhuys wrote on Wed, 11 May 2016:


On 2016-05-11 13:27, Jonas Maebe wrote:

If you change the string to unicodestring (since
FPC 2.6.4 does not know {$modeswitch unicodestring}), you should get
the same results in FPC 2.6.4 and FPC 3.x


No, because FPC 2.6.4 doesn't do automatic encoding conversions.


FPC 2.6.x and FPC 3.0 perform exactly the same automatic encoding  
conversions when assigning that ansistring property to a unicodestring  
variable in your test program.



I would
first have to add UTF8Decode() calls wherever I assign known UTF-8 data
to a UnicodeString.


If you do the same in FPC 3.0, you will get exactly the same results  
as in FPC 2.6.x.



With FPC 2.6.4 I never use UnicodeString. Like with Lazarus LCL, I use
AnsiString with a UTF-8 payload. I define a new type which I use in my
application to remind me of that fact.


First, you start with a warning about how no one should use FPC 3.0  
with the String type, because it completely changes the behaviour  
compared to FPC 2.6.x due to automatic conversions in the RTL and FCL.  
When it is clear that is not true, you are now saying that the  
behaviour of FPC 3.0 is different to FPC 2.6.x if you compile  
different code with each one.


Well, yes: if you use different code in FPC 2.6.x and FPC 3.x, then  
you can indeed get very different behaviour.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Graeme Geldenhuys wrote:


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory.


While not 100% satisfactory, doing exactly the same as in FPC 2.6.x  
will give you exactly the same as you got there.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 14:14, Jonas Maebe wrote:
> compared to FPC 2.6.x due to automatic conversions in the RTL and FCL.  
> When it is clear that is not true, you are now saying that the  
> behaviour of FPC 3.0 is different to FPC 2.6.x if you compile  
> different code with each one.

My test program under FPC 2.6.4 doesn't give problems. It's when that
same program is compiled under FPC 3.0.0 that it does. All due to String
(and thus AnsiString) changing its encoding based on the running
environment. With FPC 2.6.4 compiled programs, no matter the environment
(UTF-8 or Latin-1), my test program behaves the same.

I give up!

Regards,
  Graeme

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Graeme Geldenhuys wrote:


If anybody has the time, I would really like to learn how. Using FPC
3.x. Running the example program in a Latin-1 [console] environment and
still get the correct data stored in the output.data file. I can't see a
solution until SqlDB is changed.


There is none satisfactory.


While not 100% satisfactory, doing exactly the same as in FPC 2.6.x will give 
you exactly the same as you got there.


I am aware of this. We are using lots of DB apps at my work and so I tested 
this.
The apps work without change.

The biggest problem for going to 3.0.0 is the currency bug.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Graeme Geldenhuys wrote on Wed, 11 May 2016:


My test program under FPC 2.6.4 doesn't give problems. It's when that
same program is compiled under FPC 3.0.0 that it does. All due to String
(and thus AnsiString) changing its encoding based on the running
environment.


In FPC 2.6.x, if you use a widestring manager (such as cwstring), the  
code page of shortstring/ansistring/pchar /also/ depends on the  
running environment, in fact in exactly the same way as in FPC 3.x.  
The main thing that is new in FPC 3.x, is that instead of the RTL  
assuming that all ansistring contents are always encoded in this code  
page, we now explicitly attach the code page information in a hidden  
field of the ansistring structure (so different ansistrings can have  
different encodings, but the default for plain ansistrings remains  
exactly the same).



With FPC 2.6.4 compiled programs, no matter the environment
(UTF-8 or Latin-1), my test program behaves the same.


As I asked before: did you change "String" to "Unicodestring" when  
compiling under FPC 2.6.4? As mentioned before, {$modeswitch  
unicodestrings} is a new feature in FPC 3.0 and is ignored by FPC  
2.6.4 (you will get a warning about this when compiling with FPC  
2.6.4, but not an error).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] Summary of Unicode Strings Debate

2016-05-11 Thread Mazo Winst
Hello,

A summary of what i learned from the Unicode String debate:

1 - FPC 3 introduces code page aware strings

2 - FPC 3 updates the RTL to provide a better support for code page aware
strings

3 - The dynamic behavior of the string type regarding to the platform was
not introduced by FPC 3. In the previous versions of FPC, the string type
depends on the platform too.

4 - However, there is a key difference among FPC 3 and the previous
versions: in the FPC 3, the compiler do perform automatic conversions in
certain circumstances (two ansistring variables with diferent codepages). I
think these automatic conversions can potentially be the source of
unexpected data corruption.

5 - The automatic conversion is a desirable feature. IMHO, the definitive
solution would be to stop the unpredictable dynamic behavior of the String
type following the same path that Delphi followed: addopt the same unicode
string code page on all platforms.

6 - A workaround when developing a cross-platform app is to use the Lazarus
Unicode Support
;


Best regards
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Executing external processes and codepages

2016-05-11 Thread Marco van de Voort
In our previous episode, Michael Van Canneyt said:
> I honestly don't understand this argument: in 99,99% of all cases, the code 
> will just
> work if you use RawByteString, since everything will be in the codepage of
> the environment anyway ?

Well, we already convert to target encoding on windows(utf16), and that has to 
stay
anyway.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Andreas Dorn
All in all Graeme is right. FPC looks pretty much broken to me, too.

For my projects I pulled the emergency-break on anything FPC.

 

The most serious flaws for me of FPC 3.0 are:

- assuming that it's possible to assign an encoding to every string

- using an (unsafe) guess about the encoding for auto-conversions

 

It's not possible to assign a valid encoding to every string (not automatically, and not even manually).

 

Some examples:


1) String-Buffers

Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an encoding to

those chunks, and allowing auto-conversions will just lead to corruption.


 

Where is the string-type for string-buffers gone?


 

2) Most programming languages out there use something like "sequence of UTF-16 codepoints" as a string-type.

(That's not the same as UTF-16 string !)

It's a proper string type for "UTF-16 buffer" - pretty much nobody out there uses a low-level string-type that assumes

that the content is a complete UTF-16 string.
 


3) Filenames on Windows

You can't convert any random filename on Windows to UTF8 and back without dataloss.

There simply isn't any encoding that correctly fits to all possible filenames.

 

A lot of APIs use buffers. You can try to assign an encoding to a buffer, but if you use that encoding

to auto-convert anything you made a blatant mistake. Assuming that anything from the outside world

(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...


 

4) some Barcodes,

5) Various File-Format-Standards,

6) anything that uses ASCII + some Control-Bytes for communication,

7) some encodings used in databases, ...

all that won't fit into the FCP scheme of 'known encodings'..

 


The most obvious showstoppers for FPC 3.0 are:

FPC 3.0 doesn't have a useful type for string-buffers.

FPC 3.0 doesn't have a useful type for Filenames

FPC 3.0 adds unsafe auto-conversions
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Andreas Dorn wrote:


All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions
 
It's not possible to assign a valid encoding to every string (not 
automatically, and not even manually).


Please stop spreading FUD, this is plainly a false statement.


 
Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?


There never was one, this would break in 2.6.4 too.

If you thought there was in 2.6.4, you are simply mistaken.


 
2) Most programming languages out there use something like "sequence of UTF-16 
codepoints" as a string-type.
(That's not the same as UTF-16 string !)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out there 
uses a low-level string-type that assumes
that the content is a complete UTF-16 string.  


No-one stops you from using Unicodestring ?


3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without 
dataloss.
There simply isn't any encoding that correctly fits to all possible filenames.


You will need to explain what you mean by this.


A lot of APIs use buffers. You can try to assign an encoding to a buffer, but 
if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything 
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...
 
4) some Barcodes,
5) Various File-Format-Standards,
6) anything that uses ASCII + some Control-Bytes for communication,
7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..


FPC 3.0.0 has not changed with regard to 2.6.4 in this regard.
  

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.


Please explain what you mean with 'string buffers'.

When using e.g. windows or C apis, the string buffer you need to use is 
either "Array of char" or "array of widechar".


Which one you should use depends on the API you want to access.

In the case of Array of Char, you must take care of encoding, but this was so 
in 2.6.4 as well.

Nothing has changed in this regard.


FPC 3.0 doesn't have a useful type for Filenames


Just use the native filename type, or UnicodeString.


FPC 3.0 adds unsafe auto-conversions


Why do you think it is unsafe ?

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Summary of Unicode Strings Debate

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Mazo Winst wrote:


Hello,

A summary of what i learned from the Unicode String debate:

1 - FPC 3 introduces code page aware strings

2 - FPC 3 updates the RTL to provide a better support for code page aware
strings

3 - The dynamic behavior of the string type regarding to the platform was
not introduced by FPC 3. In the previous versions of FPC, the string type
depends on the platform too.


The string type does not depend on the platform.

The only thing that depends on the platform is the default encoding used in an 
AnsiString.

But that was true in FPC 2.6.4 as well: 
On windows that would have been CP_ACP, on unix UTF8. Now this is simply settable.


(Jonas can confirm/deny this)


4 - However, there is a key difference among FPC 3 and the previous
versions: in the FPC 3, the compiler do perform automatic conversions in
certain circumstances (two ansistring variables with diferent codepages). I
think these automatic conversions can potentially be the source of
unexpected data corruption.


In such cases the codepage is determined by the declared code page of the
result string. (Jonas should be able to confirm/deny this).

If the codepages cannot be converted correctly, then the data may be corrupted.


5 - The automatic conversion is a desirable feature. IMHO, the definitive
solution would be to stop the unpredictable dynamic behavior of the String
type following the same path that Delphi followed: addopt the same unicode
string code page on all platforms.

6 - A workaround when developing a cross-platform app is to use the Lazarus
Unicode Support
;


Huh ? What kind of statement is this ?
What do you think Lazarus uses under the hood ? 
Last I looked, it was FPC :-)


It seems to me that most people simply do not understand fully what is
happening, and as a result, a lot of misunderstandings are abound.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe


Andreas Dorn wrote on Wed, 11 May 2016:


All in all Graeme is right. FPC looks pretty much broken to me, too.
For my projects I pulled the emergency-break on anything FPC.
 
The most serious flaws for me of FPC 3.0 are:
- assuming that it's possible to assign an encoding to every string
- using an (unsafe) guess about the encoding for auto-conversions


Do you have code that works correctly in FPC 2.6.x, but not in FPC  
3.0? If so, can you please post it or file bug reports? Again: the  
main focus when designing all of this new functionality was backward  
compatibility: existing code that uses plain  
string/shortstring/ansistring/unicodestring/char/widechar/unicodechar/pchar/pwidechar/punicodechar should have the same behaviour in FPC 3.0 as in previous FPC versions if you don't make any changes. And in virtually all cases it does (the utf8string type being a notable  
exception).



Some examples:
1) String-Buffers
Split a UTF-8 String into chunks of 1024 bytes. Trying to assign an
encoding to
those chunks, and allowing auto-conversions will just lead to corruption.
 
Where is the string-type for string-buffers gone?


There never was any, but as long as you don't try to convert strings  
containing such arbitrary data from one code page to another (by  
either calling setcodepage() or by assigning them from a string with  
declared code page X to a string with declared code page Y), no  
conversions will happen.



2) Most programming languages out there use something like "sequence of
UTF-16 codepoints" as a string-type.
(That's not the same as UTF-16 string !)
It's a proper string type for "UTF-16 buffer" - pretty much nobody out
there uses a low-level string-type that assumes
that the content is a complete UTF-16 string.


The meaning of UnicodeString has not changed in FPC 3.0 compared to  
previous FPC versions, nor the way they are converted to/from other  
string types. You can argue it was broken from the start, but that's  
unrelated to the present animosity that's getting vented about FPC 3.0.



 3) Filenames on Windows
You can't convert any random filename on Windows to UTF8 and back without
dataloss.
There simply isn't any encoding that correctly fits to all possible
filenames.


We only auto-convert Windows file names from UTF-16 to anything else  
if you use non-unicodestring/widestring variables with the file name  
APIs. If you consistently use unicodestring/widestring, no conversion  
will happen (except with not yet converted APIs, such as classes).



A lot of APIs use buffers. You can try to assign an encoding to a buffer,
but if you use that encoding
to auto-convert anything you made a blatant mistake. Assuming that anything
from the outside world
(WindowsAPI, C#, Java...) is UTF-16 is yet another blatant mistake...


Maybe we should add support for "WTF-8" like in Rust:  
https://github.com/rust-lang/rust/issues/12056



4) some Barcodes,


I would not consider these to be strings, but other than that the same  
holds as for String Buffers above.



5) Various File-Format-Standards,


Idem.


6) anything that uses ASCII + some Control-Bytes for communication,


Idem.


7) some encodings used in databases, ...
all that won't fit into the FCP scheme of 'known encodings'..
 

The most obvious showstoppers for FPC 3.0 are:
FPC 3.0 doesn't have a useful type for string-buffers.


Use arrays, like in any other programming language. If you insist on  
using strings, simply stick to consistently using a single string type.



FPC 3.0 doesn't have a useful type for Filenames


Use UnicodeString: as long as you do not assign it to another string  
type, it won't get converted.



FPC 3.0 adds unsafe auto-conversions


Where/when?


Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Summary of Unicode Strings Debate

2016-05-11 Thread Jonas Maebe


Michael Van Canneyt wrote on Wed, 11 May 2016:


On Wed, 11 May 2016, Mazo Winst wrote:


3 - The dynamic behavior of the string type regarding to the platform was
not introduced by FPC 3. In the previous versions of FPC, the string type
depends on the platform too.


The string type does not depend on the platform.


Indeed, it only depends on the syntax mode and {$h+/-} setting (and,  
since FPC 3.0, {$modeswitch unicodestrings}


The only thing that depends on the platform is the default encoding  
used in an AnsiString.


But that was true in FPC 2.6.4 as well: On windows that would have  
been CP_ACP, on unix UTF8. Now this is simply settable.


On Unix it was whatever is configured in the environment via the  
LC_ALL/LC_CTYPE/LANG environment variable, and only if a widestring  
manager is used (such as cwstring). Otherwise it defaulted to ASCII.  
None of this changed in FPC 3.0 indeed.


The rest of what you said is correct.


Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Summary of Unicode Strings Debate

2016-05-11 Thread Mazo Winst
2016-05-11 11:44 GMT-03:00 Michael Van Canneyt :

>
>
> On Wed, 11 May 2016, Mazo Winst wrote:
>
> Hello,
>>
>> A summary of what i learned from the Unicode String debate:
>>
>> 1 - FPC 3 introduces code page aware strings
>>
>> 2 - FPC 3 updates the RTL to provide a better support for code page aware
>> strings
>>
>> 3 - The dynamic behavior of the string type regarding to the platform was
>> not introduced by FPC 3. In the previous versions of FPC, the string type
>> depends on the platform too.
>>
>
> The string type does not depend on the platform.
>
> The only thing that depends on the platform is the default encoding used
> in an AnsiString.
>
>
When i said "dynamic String type behavior" i mean "dynamic definition of
the String encoding".


> 5 - The automatic conversion is a desirable feature. IMHO, the definitive
>> solution would be to stop the unpredictable dynamic behavior of the String
>> type following the same path that Delphi followed: addopt the same unicode
>> string code page on all platforms.
>>
>> 6 - A workaround when developing a cross-platform app is to use the
>> Lazarus
>> Unicode Support
>> ;
>>
>
> Huh ? What kind of statement is this ?
> What do you think Lazarus uses under the hood ? Last I looked, it was FPC
> :-)
>
> It seems to me that most people simply do not understand fully what is
> happening, and as a result, a lot of misunderstandings are abound.
>
>
The Lazarus support for Unicode provide some units which allows us to adopt
the same encoding on all platforms (UTF-8). When i say "allow us to adopt
the same encoding on all platforms" i mean "they take care of many issues
that arises when we adopt the UTF-8 on all platforms. I suppose that adopt
UTF-8 on Windows is not as simple as set the system code page through
SetMultiByteConversionCodePage.
I suppose there are several side effects when we do that. In this regard,
the Lazarus unicode support helps a lot."

Best regards
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Santiago A.
El 11/05/2016 a las 16:38, Michael Van Canneyt escribió:
>
>> FPC 3.0 adds unsafe auto-conversions
>
> Why do you think it is unsafe ?
>
I have an answer for this.

In short:
Different codepage strings and raw strings should be considered
different incompatible types. Pascal is a hardtyped language, and I love
that, and codepages are prone to errors (all these threads prove it).

Something about codpages needs a second thought.

a) There shouldn't be automatic conversion at all.
b) The codepage of a string shouldn't change when you assign a string
with another codepage, just rise an error.
c) Corollary of previous premises: Empty strings should also have codepage.

Extra 1) Beside calling SetSetcodepage, it would be handy that you could
set the codepage when you declare a string. I don't mean codepage should
be statically typed, just it would be handy.
Extra 2)  Being able to set the codepage statically, so that mismatch
codepage could be detected in compiler time, would be handy. In this
case I do mean codepage could also be statically typed,

-- 
Saludos

Santi
s...@ciberpiula.net

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Maciej Izak
2016-05-11 15:07 GMT+02:00 Maciej Izak :

>
> 2016-05-11 14:51 GMT+02:00 Sven Barth :
>
>> And I stand by that decision. Generics are a rather frickle feature and I
>> want to know of critical failures as early as possible (and yes, I've made
>> use of that already numerous times!), thus I prefer fgl to be part of the
>> cycling and unlike fcl-stl or these Delphi compatible ones the fgl unit is
>> comparatively small.
>>
> And that is double standard. I prefer Generics.* to be part of the
> compiler cycle. It perform better test for critical failures in my compiler
> code and for more advanced generics code. It is not only for Delphi
> compatibility it performs infrastructure to *any* generics library like
> Spring4D etc.
>
> Sorry Sven but keeping fgl in RTL is ridiculous (!). There is no technical
> reason for that. Only your personal convenience. You can use normal tests
> suite like others.
>
> Or maybe Marco reply is untrue?
>

To be clear: part with Generics.* is just irony. Like below proposition:

Can I add module SmartPointers.pp and few other modules as test to the RTL?
I have few features (rather frickle features) and I want to know of
critical failures as early as possible.

Looks like we have new *best practice* for new language features and new
condition to add the module to RTL :)

...

Sven, seriously? I'm a confused and disgusted.

-- 
Best regards,
Maciej Izak
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Santiago A. wrote:


El 11/05/2016 a las 16:38, Michael Van Canneyt escribió:



FPC 3.0 adds unsafe auto-conversions


Why do you think it is unsafe ?


I have an answer for this.

In short:
Different codepage strings and raw strings should be considered
different incompatible types. Pascal is a hardtyped language, and I love
that, and codepages are prone to errors (all these threads prove it).


They are only prone to errors if you don't understand what is happening.

That is so for any feature.


Something about codpages needs a second thought.

a) There shouldn't be automatic conversion at all.


This is simply not debatable, it is Delphi compatibility that requires this.


To be clear: I think all the problems are hugely exaggerated and blown out
of proportion.

For 99,99% of cases, no changes to your code are required. 
If it worked in 2.6.4, it will work in 3.0.0


Only if somewhere explicitly different codepages are used will you have
problems, or if the characters are a different codepage than what is said 
in the string codepage setting. 
(which is what is happening in TStringField.AsString)


In those cases, you would have problems anyway, no matter what the solution.

I have a huge codebase dealing with databases and lots of string manipulation. 
It uses 2.6.4. It converts data from a database with cp1251 data to UTF8, 
in 2.6.4.


I have recompiled the code, I am running this since 3.0 came out, and have yet 
to encounter the first problem in the applications.


Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Sven Barth
Am 11.05.2016 19:35 schrieb "Santiago A." :
> Something about codpages needs a second thought.
>
> a) There shouldn't be automatic conversion at all.
> b) The codepage of a string shouldn't change when you assign a string
with another codepage, just rise an error.
> c) Corollary of previous premises: Empty strings should also have
codepage.

The codepage aware ansistring was implemented for Delphi-compatibility so
this is highly unlikely to change.

> Extra 1) Beside calling SetSetcodepage, it would be handy that you could
set the codepage when you declare a string. I don't mean codepage should be
statically typed, just it would be handy.

A string is Nil upon it's declaration so there is nowhere where you could
store that information. It only has the static codepage that it had been
declared with.

> Extra 2)  Being able to set the codepage statically, so that mismatch
codepage could be detected in compiler time, would be handy. In this case I
do mean codepage could also be statically typed,

Codepages are already set statically.

Regards,
Sven
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 18:58, Michael Van Canneyt wrote:
> For 99,99% of cases, no changes to your code are required. 
> If it worked in 2.6.4, it will work in 3.0.0

Just curious, so why was there so many changes required for LCL, and a
whole wiki page of its own to explain it?

Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Florian Klämpfl
Am 11.05.2016 um 14:12 schrieb Maciej Izak:
> 2016-05-11 13:46 GMT+02:00 Michael Van Canneyt  >:
> 
> Anything the compiler needs *must* be in the system unit. The compiler
> should only assume the system unit, possibly objpas or macpas or so.
> 
> All the rest should remain out of the RTL, which should be as small as
> possible. So rtl-generics is your best bet. Even the classes unit is 
> better
> outside the rtl, but I think Marco is reluctant to remove it.
> 
> I have remarked on this before: this tight dependency you are creating is 
> very worrying.
> 
> 
> FPC team is very selective and with double standards and that is very 
> worrying. For example fgl
> module is part of RTL 

Actually, once classes should again use fgl for the TList etc. implementations. 
This was already
done, but reverted again. I do not remember the reason though.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Jonas Maebe

Graeme Geldenhuys wrote:

On 2016-05-11 18:58, Michael Van Canneyt wrote:

>  For 99,99% of cases, no changes to your code are required.
>  If it worked in 2.6.4, it will work in 3.0.0


Just curious, so why was there so many changes required for LCL, and a
whole wiki page of its own to explain it?


Those changes were not required, (almost) everything worked still fine 
with the old code. They made the changes to take advantage of the new 
functionality in FPC 3.0, because the end result is much simpler code 
both in the LCL and in user programs.


The wiki page is mainly to explain all of the things you no longer have 
to do when using this new method.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Tomas Hajny
On Wed, May 11, 2016 22:08, Graeme Geldenhuys wrote:
> On 2016-05-11 18:58, Michael Van Canneyt wrote:
>> For 99,99% of cases, no changes to your code are required.
>> If it worked in 2.6.4, it will work in 3.0.0
>
> Just curious, so why was there so many changes required for LCL, and a
> whole wiki page of its own to explain it?

My understanding: Because LCL wanted to benefit from new possibilities in
version 3.0.0 (e.g. use functionality newly provided by the RTL instead of
certain own alternative routines), but the previous LCL code included some
assumptions (like that all ansistrings should always contain UTF-8) which
may not always be the case in FPC RTL by default (e.g. certainly not for
MS Windows applications). But again - they could continue using the
original code as it was if they wanted to do so.

Tomas


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Mattias Gaertner
On Wed, 11 May 2016 22:33:13 +0200
Jonas Maebe  wrote:

> Graeme Geldenhuys wrote:
> > On 2016-05-11 18:58, Michael Van Canneyt wrote:
> >> >  For 99,99% of cases, no changes to your code are required.
> >> >  If it worked in 2.6.4, it will work in 3.0.0
> >
> > Just curious, so why was there so many changes required for LCL, and a
> > whole wiki page of its own to explain it?
> 
> Those changes were not required, (almost) everything worked still fine 
> with the old code. They made the changes to take advantage of the new 
> functionality in FPC 3.0, because the end result is much simpler code 
> both in the LCL and in user programs.

Yes, simpler and more powerful. For example FPC now supports full UTF-8
in many RTL/FCL functions under Windows.

 
> The wiki page is mainly to explain all of the things you no longer have 
> to do when using this new method.

Yes. And the few incompatibilities.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Sven Barth
On 11.05.2016 22:23, Florian Klämpfl wrote:
> Am 11.05.2016 um 14:12 schrieb Maciej Izak:
>> 2016-05-11 13:46 GMT+02:00 Michael Van Canneyt > >:
>>
>> Anything the compiler needs *must* be in the system unit. The compiler
>> should only assume the system unit, possibly objpas or macpas or so.
>>
>> All the rest should remain out of the RTL, which should be as small as
>> possible. So rtl-generics is your best bet. Even the classes unit is 
>> better
>> outside the rtl, but I think Marco is reluctant to remove it.
>>
>> I have remarked on this before: this tight dependency you are creating 
>> is very worrying.
>>
>>
>> FPC team is very selective and with double standards and that is very 
>> worrying. For example fgl
>> module is part of RTL 
> 
> Actually, once classes should again use fgl for the TList etc. 
> implementations. This was already
> done, but reverted again. I do not remember the reason though.

According to the log Michael had reverted it to break the dependency of
Classes on fgl. Then Micha had again added the code under the
FPC_TESTGENERICS define... Nothing more was given :/
Maybe I'll try and see whether the code in that define still works or
what changes would be needed to get it working and up to state with the
non-generic one again. :) (at least with 3.0.0 generics would definitely
be up to the task now :D )

Regards,
Sven

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] code example where AnsiString used in FCL (SqlDB) causes data loss

2016-05-11 Thread Graeme Geldenhuys
On 2016-05-11 21:58, Mattias Gaertner wrote:
>> > They made the changes to take advantage of the new
>> > functionality in FPC 3.0, because the end result is much simpler code
>> > both in the LCL and in user programs.
>
> Yes, simpler and more powerful. For example FPC now supports full UTF-8
> in many RTL/FCL functions under Windows.

Thanks Jonas and Mattias. That is at least some promising news.

Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Florian Klämpfl wrote:


Am 11.05.2016 um 14:12 schrieb Maciej Izak:

2016-05-11 13:46 GMT+02:00 Michael Van Canneyt mailto:mich...@freepascal.org>>:

Anything the compiler needs *must* be in the system unit. The compiler
should only assume the system unit, possibly objpas or macpas or so.

All the rest should remain out of the RTL, which should be as small as
possible. So rtl-generics is your best bet. Even the classes unit is better
outside the rtl, but I think Marco is reluctant to remove it.

I have remarked on this before: this tight dependency you are creating is 
very worrying.


FPC team is very selective and with double standards and that is very worrying. 
For example fgl
module is part of RTL


Actually, once classes should again use fgl for the TList etc. implementations. 
This was already
done, but reverted again. I do not remember the reason though.


10% Speed penalty. Generics code is slower.

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Smart Pointers

2016-05-11 Thread Michael Van Canneyt



On Wed, 11 May 2016, Sven Barth wrote:


On 11.05.2016 22:23, Florian Klämpfl wrote:

Am 11.05.2016 um 14:12 schrieb Maciej Izak:

2016-05-11 13:46 GMT+02:00 Michael Van Canneyt mailto:mich...@freepascal.org>>:

Anything the compiler needs *must* be in the system unit. The compiler
should only assume the system unit, possibly objpas or macpas or so.

All the rest should remain out of the RTL, which should be as small as
possible. So rtl-generics is your best bet. Even the classes unit is better
outside the rtl, but I think Marco is reluctant to remove it.

I have remarked on this before: this tight dependency you are creating is 
very worrying.


FPC team is very selective and with double standards and that is very worrying. 
For example fgl
module is part of RTL


Actually, once classes should again use fgl for the TList etc. implementations. 
This was already
done, but reverted again. I do not remember the reason though.


According to the log Michael had reverted it to break the dependency of
Classes on fgl. Then Micha had again added the code under the
FPC_TESTGENERICS define... Nothing more was given :/
Maybe I'll try and see whether the code in that define still works or
what changes would be needed to get it working and up to state with the
non-generic one again. :) (at least with 3.0.0 generics would definitely
be up to the task now :D )


It is not a matter of being up to the task. I'm sure it is possible.

At the time, I did speed tests, and the generics-based code was 10% slower.

That was the clincher, together with "don't fix it if it ain't broken"

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal