[fpc-pascal] json parser line numbers

2020-09-29 Thread Benito van der Zander via fpc-pascal

Hi,

the line numbering of the json parser has been changed recently.

It used to say "Error at line 1"... when there was an error in the first 
line, but now it says "Error at line 0"...


Was that on purpose, or can someone change it back?

Benito

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] json parser line numbers

2020-09-29 Thread Michael Van Canneyt via fpc-pascal



On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote:


Hi,

the line numbering of the json parser has been changed recently.

It used to say "Error at line 1"... when there was an error in the first 
line, but now it says "Error at line 0"...


Was that on purpose, or can someone change it back?


It was not on purpose. Please file a bugreport.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] json parser line numbers

2020-09-29 Thread Benito van der Zander via fpc-pascal


here: https://bugs.freepascal.org/view.php?id=37836


On 29.09.20 10:47, Michael Van Canneyt via fpc-pascal wrote:



On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote:


Hi,

the line numbering of the json parser has been changed recently.

It used to say "Error at line 1"... when there was an error in the 
first line, but now it says "Error at line 0"...


Was that on purpose, or can someone change it back?


It was not on purpose. Please file a bugreport.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] json numbers with leading dots

2020-09-29 Thread Benito van der Zander via fpc-pascal

Hi,
there are also two lines in the json scanner where it tries to repair 
numbers with leading dots '.123' to '0.123':


 If (FCurTokenString[1]='.') then
  FCurTokenString:='0'+FCurTokenString;

They should probably be removed. Not only are those numbers invalid in 
json, it is also very slow to allocate a new string. And StrToFloat 
works with '.123', so it should not change anything.


Although removing them would break programs that cannot handle '.123', 
they are broken anyways, because it is not adding a zero to  '-.123'.



Best,
Benito
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] json parsing: detecting invalid escape sequences

2020-09-29 Thread Benito van der Zander via fpc-pascal

Hi,

I am supposed to find invalid escape sequences when parsing JSON and 
replace them with a user defined fallback. Invalid in the sense that the 
unicode codepoint is not defined or a missing surrogate, not 
syntactically invalid.


For example, any occurrence of \u and \uDEAD should be replaced by 
\u and \udead respectively. Or alternatively with  depending on 
the settings.


I think I need to change the JSON scanner to be able to do that.

I could add a callback function OnInvalidEscape: function (escapeStart: 
pchar): string; of object;
Or perhaps OnInvalidEscape: function (unicodePoint, 
previousUnicodePointSurrogate: integer): string; of object; {although 
that would be troublesome if \uDEAD and \udead are supposed to be 
replaced with a different fallback}
Or OnInvalidEscape: function (const escapedString: string[4]): string; 
of object;


The function would return the unescaped value. Alternatively, the 
current string could be passed to it as var parameter, and the function 
would append its unescaped value directly.


Or move all unescaping to a callback function, could be called 
OnUnescape or OnDecodeEscape. So the scanner does not need to decide 
which escapes are invalid. Then


  if (joUTF8 in Options) or 
(DefaultSystemCodePage=CP_UTF8) then
S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster 
function

  else
    S:=String(WideChar(u1)+WideChar(u2)); // 
WideChar converts the encoding. Should it warn on loss?


could be replaced by one function call. And if the user does not set a 
callback function, the scanner would set its own callback function 
depending on the option.


Any interest in a patch that adds such a callback function? Or is there 
another way to do this?


Best,
Benito
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] json parsing: detecting invalid escape sequences

2020-09-29 Thread Michael Van Canneyt via fpc-pascal



On Tue, 29 Sep 2020, Benito van der Zander via fpc-pascal wrote:


Hi,

I am supposed to find invalid escape sequences when parsing JSON and replace 
them with a user defined fallback. Invalid in the sense that the unicode 
codepoint is not defined or a missing surrogate, not syntactically invalid.


For example, any occurrence of \u and \uDEAD should be replaced by \u 
and \udead respectively. Or alternatively with  depending on the 
settings.


I think I need to change the JSON scanner to be able to do that.

I could add a callback function OnInvalidEscape: function (escapeStart: 
pchar): string; of object;
Or perhaps OnInvalidEscape: function (unicodePoint, 
previousUnicodePointSurrogate: integer): string; of object; {although that 
would be troublesome if \uDEAD and \udead are supposed to be replaced with a 
different fallback}
Or OnInvalidEscape: function (const escapedString: string[4]): string; of 
object;


The function would return the unescaped value. Alternatively, the current 
string could be passed to it as var parameter, and the function would append 
its unescaped value directly.


Or move all unescaping to a callback function, could be called OnUnescape or 
OnDecodeEscape. So the scanner does not need to decide which escapes are 
invalid. Then


  if (joUTF8 in Options) or 
(DefaultSystemCodePage=CP_UTF8) then
S:=Utf8Encode(WideString(WideChar(u1)+WideChar(u2))) // ToDo: use faster 
function

  else
    S:=String(WideChar(u1)+WideChar(u2)); // WideChar 
converts the encoding. Should it warn on loss?


could be replaced by one function call. And if the user does not set a 
callback function, the scanner would set its own callback function depending 
on the option.


Such a function existed some iterations back (although not for the same 
purpose).
You will see that this drastically reduces the speed of the scanner because
of the extra exception handling frames.

I think even the checking of 'valid' escape sequences will already reduce
speed significantly.

While I am interested in improving the scanner, I am not interested in what
is essentially an error-correcting mechanism for faulty JSON.

I am strengthened in by opinion by this part of the various RFCs:

"However, the ABNF in this specification allows member names and
 string values to contain bit sequences that cannot encode Unicode
 characters;"

So I see little point in trying to correct that.

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal